
Molecular Hypergraph Grammar with Its Application to Molecular Optimization Hiroshi Kajino MIT-IBM Watson AI Lab IBM Research - Tokyo © 2019 IBM Corporation Background We wish to learn a generative model of a molecule. §Generative model of a molecule ! " #) [Gómez-Bombarelli+, 16] ' –Input: Latent vector # ∈ ℝ ∼ )(0, -') –Output: Molecular graph " (graph w/ node labels) Continuous optimization problem ⇔ Molecular optimization problem Discrete Continunous molecular graph sp. latent sp. ! " #) © 2019 IBM Corporation 2 Image from https://openi.nlm.nih.gov/detailedresult.php?img=PMC3403880_1758-2946-4-12-7&req=4 Technical Challenge Molecular graph generation is a non-trivial task §Two technical challenges 1. No consensus on a generative model of a graph • LSTM, GRU for path and tree • Ring is non-trivial 2. Hard constraints such as valence conditions • Degree of each node (=atom) is specified by its label • e.g., carbon has degree 4, oxygen has degree 2. 3 © 2019 IBM Corporation Existing work 1/3 Most of the existing work employs a text representation of a molecule, called SMILES. §Use text representation ”SMILES” [Gómez-Bombarelli+, 16] –Pros: a standard sequential model can generate SMILES Latent Molecular graph sp. continuous sp. SMILES sp. ! / #) " = 1(/) c1c(C=O)cccc1 Seq. model SMILES’ grammar 4 © 2019 IBM Corporation Existing work 1/3 Statistical model has to learn a rule-based grammar §Use text representation ”SMILES” [Gómez-Bombarelli+, 16] –Pros: a standard sequential model can generate SMILES –Cons: • NN has to learn SMILES’ grammar • No guarantee on valence conditions CC(C)(O)C#Cc1ccc(C[NH2+][C@H]2CCCN(c3nc4ccccc4s3)C2)s1 ! 5 © 2019 IBM Corporation Existing work 1/3 SMILES’ grammar does not prescribe valence conditions §Use text representation ”SMILES” [Gómez-Bombarelli+, 16] –Pros: a standard sequential model can generate SMILES –Cons: • NN has to learn SMILES’ grammar • No guarantee on valence conditions Grammatically correct c1c(C=O)(C)cccc1 ! This atom has a valence of 5. That of carbon must be 4. 6 © 2019 IBM Corporation Existing work 2/3 JT-VAE achieves 100% validity for the first time, but requires multiple NNs. §Generate a molecule by assembling subgraphs [Jin+, 18] Enumerate all possible combinations, and predict the best using NN ! 7 Image from [Jin+, 18] © 2019 IBM Corporation Existing work 3/3 RL-based method discovers better molecules than VAE-based methods, but with enormous cost §Molecular optimization using RL [You+, 18] –Idea: Molecular generation as MDP current action next state • State: Molecular graph state “add O” • Action: Modify the graph • Reward: Target property –Pros: Better optimization capability than VAE-based methods –Cons: Requires a number of target property evaluations • # of evals > 104 ! • Infeasible to work with the first principle calculation / wet-lab experiments 8 © 2019 IBM Corporation Existing work No existing work satisfies all of the three properties Approach VAE-based RL-based Representation SMILES Graph Graph Validity ✔ ✔ Easy-to- ✔ generate Sample ✔ ✔ complexity 9 © 2019 IBM Corporation Existing work Our contribution is to facilitate graph-based generation with help of “graph grammar”. VAE-based RL-based Representation SMILES Graph Graph Validity ✔ ✔ Easy-to- ✔ ✔ ✔? generate Sample ✔ ✔ complexity 10 © 2019 IBM Corporation Our work We develop a graph grammar tailored for molecular generation §Idea Use a context-free graph grammar for graph generation Molecular MolecularMolecular ParseMolecular Tree Parse Tree Latent vector Latent vector grap–hGraph generation boilshypergraphgrap downh to tree generationaccordinghypergraphto MHG according to MHG Easy to generate# Enc Enc Enc Enc Enc Enc $ $ H G H G = N ! ∈ ℝ N ! ∈ ℝ –Requirements: • Always satisfy valence conditions Our main contribution • Context-freeness Hyperedge replacement • Inference algorithm from data, not hand-written rules grammar 11 © 2019 IBM Corporation Our work A hyperedge replacement grammar can be constructed from data via tree decomposition Existing work on graph grammar [Aguiñaga+, 16] Set of general hypergraphs Tree decomposition Hyperedge Replacement Grammar * HRG is a context-free grammar generating hypergraphs 12 © 2019 IBM Corporation Our work Each restriction is our contribution in the literature of graph grammar Existing work on graph grammar Our method [Aguiñaga+, 16] Set of general Set of molecular Restrict hypergraphs hypergraphs Irredundant Tree decomposition Restrict tree decomposition Hyperedge Molecular Hypergraph Restrict Replacement Grammar Grammar * HRG is a context-free grammar generating hypergraphs 13 © 2019 IBM Corporation Our work Each restriction is our contribution in the literature of graph grammar Existing work on graph grammar Our method [Aguiñaga+, 16] Set of general Set of molecular Restrict hypergraphs hypergraphs Irredundant Tree decomposition Restrict tree decomposition Hyperedge Molecular Hypergraph Restrict Replacement Grammar Grammar * HRG is a context-free grammar generating hypergraphs 14 © 2019 IBM Corporation Hypergraph & Molecular Hypergraph Hypergraph is a generalization of a graph §Hypergraph ℋ = (6, 7) consists of… –Node 8 ∈ 6 –Hyperedge 9 ∈ 7 ⊆ 2 < : Connect an arbitrary number of nodes cf, An edge in a graph connects exactly two nodes Node Hyperedge 15 © 2019 IBM Corporation Hypergraph & Molecular Hypergraph Our method We represent a molecule using a hypergraph, not a graph. This helps to satisfy the valence conditions. H H H H §Molecular hypergraph models… H C C H – Atom = hyperedgeH C C H – bond = node H C C H H H H H §Molecular graph models…H H – Atom = nodeH C C H H H – Bond = Edge H H H C C H H C C H H H 16 © 2019 IBM Corporation Hypergraph & Molecular Hypergraph Our method These requirements guarantee transformation between graph & hypergraph §Molecular hypergraph 1. Each node has degree 2 (=2-regular) 2. Label on a hyperedge determines # of nodes it has (= valence) H H H H H H H C C H H H H C C H H H H C C H H C C ↔H H C C H H C C H H H H H H H 17 © 2019 IBM Corporation Our work Each restriction is our contribution in the literature of graph grammar Existing work on graph grammar Our method [Aguiñaga+, 16] Set of general Set of molecular Restrict hypergraphs hypergraphs Irredundant Tree decomposition Restrict tree decomposition Hyperedge Molecular Hypergraph Restrict Replacement Grammar Grammar * HRG is a context-free grammar generating hypergraphs 18 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar HRG generates a hypergraph by repeatedly replacing non-terminal hyperedges with hypergraphs §Hyperedge replacement grammar (HRG) = = (>, ?, /, @) –>: set of non-terminals N Labels on hyperedges –?: set of terminals C –/: starting symbol S –@: set of production rules – A rule replaces a non-terminal hyperedge with a hypergraph H N S N N 1 H C 1 N 2 2 Replace l.h.s. with r.h.s. © 2019 IBM Corporation 19 Hyperedge replacement grammar & Molecular hypergraph grammar Our method MHG is defined as a subclass of HRG HRG MHG §Molecular Hypergraph Grammar (MHG) –Definition: HRG that generates molecular hypergraphs only –Counterexamples: H H Valence $ H C C H H H H H H 2-regularity $ H C C H 20 H H H © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Start from starting symbol S S Production rules @ H N S N N 1 H C 1 N 2 2 21 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar The left rule is applicable S Production rules @ H N S N N 1 H C 1 N 2 2 22 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar We obtain a hypergraph with three non-terminals N N N Production rules @ H N S N N 1 H C 1 N 2 2 23 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Apply the right rule to one of the non-terminals N N N Production rules @ H N S N N 1 H C 1 N 2 2 24 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Two non-terminals remain H C H N N Production rules @ H N S N N 1 H C 1 N 2 2 25 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Repeat the procedure until there is no non-terminal H C H N N Production rules @ H N S N N 1 H C 1 N 2 2 26 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Repeat the procedure until there is no non-terminal H C H N C H Production rules @ H H N S N N 1 H C 1 N 2 2 27 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Repeat the procedure until there is no non-terminal H C H N C H Production rules @ H H N S N N 1 H C 1 N 2 2 28 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Graph generation halts when there is no non-terminal, H H C H C C H H Production rules @ H H N S N N 1 H C 1 N 2 2 29 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Our method MHG is defined as a subclass of HRG HRG MHG §Molecular Hypergraph Grammar (MHG) –Definition: HRG that generates molecular hypergraphs only –Counterexamples: H H This can be Valence avoided by $ H C C H learning HRG from H H H data [Aguiñaga+, 16] H H 2-regularity $ H C C H 30 H H H © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Our method MHG is defined as a subclass of HRG HRG MHG §Molecular Hypergraph Grammar (MHG) –Definition: HRG that generates molecular hypergraphs only –Counterexamples: H H This can be Valence avoided by $ H C C H learning HRG from H H H data [Aguiñaga+, 16] H H Use an irredundant
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages52 Page
-
File Size-