Molecular Hypergraph Grammar with Its Application to Molecular Optimization

Molecular Hypergraph Grammar with Its Application to Molecular Optimization

Molecular Hypergraph Grammar with Its Application to Molecular Optimization Hiroshi Kajino MIT-IBM Watson AI Lab IBM Research - Tokyo © 2019 IBM Corporation Background We wish to learn a generative model of a molecule. §Generative model of a molecule ! " #) [Gómez-Bombarelli+, 16] ' –Input: Latent vector # ∈ ℝ ∼ )(0, -') –Output: Molecular graph " (graph w/ node labels) Continuous optimization problem ⇔ Molecular optimization problem Discrete Continunous molecular graph sp. latent sp. ! " #) © 2019 IBM Corporation 2 Image from https://openi.nlm.nih.gov/detailedresult.php?img=PMC3403880_1758-2946-4-12-7&req=4 Technical Challenge Molecular graph generation is a non-trivial task §Two technical challenges 1. No consensus on a generative model of a graph • LSTM, GRU for path and tree • Ring is non-trivial 2. Hard constraints such as valence conditions • Degree of each node (=atom) is specified by its label • e.g., carbon has degree 4, oxygen has degree 2. 3 © 2019 IBM Corporation Existing work 1/3 Most of the existing work employs a text representation of a molecule, called SMILES. §Use text representation ”SMILES” [Gómez-Bombarelli+, 16] –Pros: a standard sequential model can generate SMILES Latent Molecular graph sp. continuous sp. SMILES sp. ! / #) " = 1(/) c1c(C=O)cccc1 Seq. model SMILES’ grammar 4 © 2019 IBM Corporation Existing work 1/3 Statistical model has to learn a rule-based grammar §Use text representation ”SMILES” [Gómez-Bombarelli+, 16] –Pros: a standard sequential model can generate SMILES –Cons: • NN has to learn SMILES’ grammar • No guarantee on valence conditions CC(C)(O)C#Cc1ccc(C[NH2+][C@H]2CCCN(c3nc4ccccc4s3)C2)s1 ! 5 © 2019 IBM Corporation Existing work 1/3 SMILES’ grammar does not prescribe valence conditions §Use text representation ”SMILES” [Gómez-Bombarelli+, 16] –Pros: a standard sequential model can generate SMILES –Cons: • NN has to learn SMILES’ grammar • No guarantee on valence conditions Grammatically correct c1c(C=O)(C)cccc1 ! This atom has a valence of 5. That of carbon must be 4. 6 © 2019 IBM Corporation Existing work 2/3 JT-VAE achieves 100% validity for the first time, but requires multiple NNs. §Generate a molecule by assembling subgraphs [Jin+, 18] Enumerate all possible combinations, and predict the best using NN ! 7 Image from [Jin+, 18] © 2019 IBM Corporation Existing work 3/3 RL-based method discovers better molecules than VAE-based methods, but with enormous cost §Molecular optimization using RL [You+, 18] –Idea: Molecular generation as MDP current action next state • State: Molecular graph state “add O” • Action: Modify the graph • Reward: Target property –Pros: Better optimization capability than VAE-based methods –Cons: Requires a number of target property evaluations • # of evals > 104 ! • Infeasible to work with the first principle calculation / wet-lab experiments 8 © 2019 IBM Corporation Existing work No existing work satisfies all of the three properties Approach VAE-based RL-based Representation SMILES Graph Graph Validity ✔ ✔ Easy-to- ✔ generate Sample ✔ ✔ complexity 9 © 2019 IBM Corporation Existing work Our contribution is to facilitate graph-based generation with help of “graph grammar”. VAE-based RL-based Representation SMILES Graph Graph Validity ✔ ✔ Easy-to- ✔ ✔ ✔? generate Sample ✔ ✔ complexity 10 © 2019 IBM Corporation Our work We develop a graph grammar tailored for molecular generation §Idea Use a context-free graph grammar for graph generation Molecular MolecularMolecular ParseMolecular Tree Parse Tree Latent vector Latent vector grap–hGraph generation boilshypergraphgrap downh to tree generationaccordinghypergraphto MHG according to MHG Easy to generate# Enc Enc Enc Enc Enc Enc $ $ H G H G = N ! ∈ ℝ N ! ∈ ℝ –Requirements: • Always satisfy valence conditions Our main contribution • Context-freeness Hyperedge replacement • Inference algorithm from data, not hand-written rules grammar 11 © 2019 IBM Corporation Our work A hyperedge replacement grammar can be constructed from data via tree decomposition Existing work on graph grammar [Aguiñaga+, 16] Set of general hypergraphs Tree decomposition Hyperedge Replacement Grammar * HRG is a context-free grammar generating hypergraphs 12 © 2019 IBM Corporation Our work Each restriction is our contribution in the literature of graph grammar Existing work on graph grammar Our method [Aguiñaga+, 16] Set of general Set of molecular Restrict hypergraphs hypergraphs Irredundant Tree decomposition Restrict tree decomposition Hyperedge Molecular Hypergraph Restrict Replacement Grammar Grammar * HRG is a context-free grammar generating hypergraphs 13 © 2019 IBM Corporation Our work Each restriction is our contribution in the literature of graph grammar Existing work on graph grammar Our method [Aguiñaga+, 16] Set of general Set of molecular Restrict hypergraphs hypergraphs Irredundant Tree decomposition Restrict tree decomposition Hyperedge Molecular Hypergraph Restrict Replacement Grammar Grammar * HRG is a context-free grammar generating hypergraphs 14 © 2019 IBM Corporation Hypergraph & Molecular Hypergraph Hypergraph is a generalization of a graph §Hypergraph ℋ = (6, 7) consists of… –Node 8 ∈ 6 –Hyperedge 9 ∈ 7 ⊆ 2 < : Connect an arbitrary number of nodes cf, An edge in a graph connects exactly two nodes Node Hyperedge 15 © 2019 IBM Corporation Hypergraph & Molecular Hypergraph Our method We represent a molecule using a hypergraph, not a graph. This helps to satisfy the valence conditions. H H H H §Molecular hypergraph models… H C C H – Atom = hyperedgeH C C H – bond = node H C C H H H H H §Molecular graph models…H H – Atom = nodeH C C H H H – Bond = Edge H H H C C H H C C H H H 16 © 2019 IBM Corporation Hypergraph & Molecular Hypergraph Our method These requirements guarantee transformation between graph & hypergraph §Molecular hypergraph 1. Each node has degree 2 (=2-regular) 2. Label on a hyperedge determines # of nodes it has (= valence) H H H H H H H C C H H H H C C H H H H C C H H C C ↔H H C C H H C C H H H H H H H 17 © 2019 IBM Corporation Our work Each restriction is our contribution in the literature of graph grammar Existing work on graph grammar Our method [Aguiñaga+, 16] Set of general Set of molecular Restrict hypergraphs hypergraphs Irredundant Tree decomposition Restrict tree decomposition Hyperedge Molecular Hypergraph Restrict Replacement Grammar Grammar * HRG is a context-free grammar generating hypergraphs 18 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar HRG generates a hypergraph by repeatedly replacing non-terminal hyperedges with hypergraphs §Hyperedge replacement grammar (HRG) = = (>, ?, /, @) –>: set of non-terminals N Labels on hyperedges –?: set of terminals C –/: starting symbol S –@: set of production rules – A rule replaces a non-terminal hyperedge with a hypergraph H N S N N 1 H C 1 N 2 2 Replace l.h.s. with r.h.s. © 2019 IBM Corporation 19 Hyperedge replacement grammar & Molecular hypergraph grammar Our method MHG is defined as a subclass of HRG HRG MHG §Molecular Hypergraph Grammar (MHG) –Definition: HRG that generates molecular hypergraphs only –Counterexamples: H H Valence $ H C C H H H H H H 2-regularity $ H C C H 20 H H H © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Start from starting symbol S S Production rules @ H N S N N 1 H C 1 N 2 2 21 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar The left rule is applicable S Production rules @ H N S N N 1 H C 1 N 2 2 22 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar We obtain a hypergraph with three non-terminals N N N Production rules @ H N S N N 1 H C 1 N 2 2 23 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Apply the right rule to one of the non-terminals N N N Production rules @ H N S N N 1 H C 1 N 2 2 24 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Two non-terminals remain H C H N N Production rules @ H N S N N 1 H C 1 N 2 2 25 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Repeat the procedure until there is no non-terminal H C H N N Production rules @ H N S N N 1 H C 1 N 2 2 26 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Repeat the procedure until there is no non-terminal H C H N C H Production rules @ H H N S N N 1 H C 1 N 2 2 27 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Repeat the procedure until there is no non-terminal H C H N C H Production rules @ H H N S N N 1 H C 1 N 2 2 28 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Graph generation halts when there is no non-terminal, H H C H C C H H Production rules @ H H N S N N 1 H C 1 N 2 2 29 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Our method MHG is defined as a subclass of HRG HRG MHG §Molecular Hypergraph Grammar (MHG) –Definition: HRG that generates molecular hypergraphs only –Counterexamples: H H This can be Valence avoided by $ H C C H learning HRG from H H H data [Aguiñaga+, 16] H H 2-regularity $ H C C H 30 H H H © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Our method MHG is defined as a subclass of HRG HRG MHG §Molecular Hypergraph Grammar (MHG) –Definition: HRG that generates molecular hypergraphs only –Counterexamples: H H This can be Valence avoided by $ H C C H learning HRG from H H H data [Aguiñaga+, 16] H H Use an irredundant

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    52 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us