<<

Molecular Hypergraph Grammar with Its Application to Molecular Optimization

Hiroshi Kajino

MIT-IBM Watson AI Lab IBM Research - Tokyo

© 2019 IBM Corporation Background We wish to learn a generative model of a molecule.

§Generative model of a molecule ! " #) [Gómez-Bombarelli+, 16] ' –Input: Latent vector # ∈ ℝ ∼ )(0, -') –Output: Molecular graph " (graph w/ node labels)

Continuous optimization problem ⇔ Molecular optimization problem Discrete Continunous molecular graph sp. latent sp.

! " #)

© 2019 IBM Corporation 2 Image from https://openi.nlm.nih.gov/detailedresult.php?img=PMC3403880_1758-2946-4-12-7&req=4 Technical Challenge Molecular graph generation is a non-trivial task

§Two technical challenges 1. No consensus on a generative model of a graph • LSTM, GRU for path and • Ring is non-trivial 2. Hard constraints such as valence conditions • of each node (=atom) is specified by its label • e.g., carbon has degree 4, oxygen has degree 2.

3 © 2019 IBM Corporation Existing work 1/3 Most of the existing work employs a text representation of a molecule, called SMILES.

§Use text representation ”SMILES” [Gómez-Bombarelli+, 16] –Pros: a standard sequential model can generate SMILES

Latent Molecular graph sp. continuous sp. SMILES sp.

! / #) " = 1(/) c1c(C=O)cccc1 Seq. model SMILES’ grammar

4 © 2019 IBM Corporation Existing work 1/3 Statistical model has to learn a rule-based grammar

§Use text representation ”SMILES” [Gómez-Bombarelli+, 16] –Pros: a standard sequential model can generate SMILES –Cons: • NN has to learn SMILES’ grammar • No guarantee on valence conditions

CC(C)(O)C#Cc1ccc(C[NH2+][C@H]2CCCN(c3nc4ccccc4s3)C2)s1 ! 5 © 2019 IBM Corporation Existing work 1/3 SMILES’ grammar does not prescribe valence conditions

§Use text representation ”SMILES” [Gómez-Bombarelli+, 16] –Pros: a standard sequential model can generate SMILES –Cons: • NN has to learn SMILES’ grammar • No guarantee on valence conditions Grammatically correct

c1c(C=O)(C)cccc1

! This atom has a valence of 5. That of carbon must be 4.

6 © 2019 IBM Corporation Existing work 2/3 JT-VAE achieves 100% validity for the first time, but requires multiple NNs.

§Generate a molecule by assembling subgraphs [Jin+, 18]

Enumerate all possible combinations, and predict the best using NN !

7 Image from [Jin+, 18] © 2019 IBM Corporation Existing work 3/3 RL-based method discovers better molecules than VAE-based methods, but with enormous cost

§Molecular optimization using RL [You+, 18]

–Idea: Molecular generation as MDP current action next state • State: Molecular graph state “add O” • Action: Modify the graph • Reward: Target property

–Pros: Better optimization capability than VAE-based methods –Cons: Requires a number of target property evaluations • # of evals > 104 ! • Infeasible to work with the first principle calculation / wet-lab experiments

8 © 2019 IBM Corporation Existing work No existing work satisfies all of the three properties

Approach VAE-based RL-based Representation SMILES Graph Graph Validity ✔ ✔ Easy-to- ✔ generate Sample ✔ ✔ complexity

9 © 2019 IBM Corporation Existing work Our contribution is to facilitate graph-based generation with help of “graph grammar”.

VAE-based RL-based Representation SMILES Graph Graph Validity ✔ ✔ Easy-to- ✔ ✔ ✔? generate Sample ✔ ✔ complexity

10 © 2019 IBM Corporation Our work We develop a graph grammar tailored for molecular generation

§Idea Use a context-free graph grammar for graph generation

Molecular MolecularMolecular ParseMolecular Tree Parse Tree Latent vector Latent vector grap–hGraph generation boilshypergraphgrap downh to tree generationaccordinghypergraphto MHG according to MHG

Easy to generate# Enc Enc Enc Enc Enc Enc $ $ H G H G = N ! ∈ ℝ N ! ∈ ℝ –Requirements: • Always satisfy valence conditions Our main contribution • Context-freeness Hyperedge replacement • Inference algorithm from data, not hand-written rules grammar

11 © 2019 IBM Corporation Our work A hyperedge replacement grammar can be constructed from data via tree decomposition

Existing work on graph grammar [Aguiñaga+, 16]

Set of general hypergraphs

Tree decomposition

Hyperedge Replacement Grammar

* HRG is a context-free grammar generating hypergraphs 12 © 2019 IBM Corporation Our work Each restriction is our contribution in the literature of graph grammar

Existing work on graph grammar Our method [Aguiñaga+, 16]

Set of general Set of molecular Restrict hypergraphs hypergraphs

Irredundant Tree decomposition Restrict tree decomposition

Hyperedge Molecular Hypergraph Restrict Replacement Grammar Grammar

* HRG is a context-free grammar generating hypergraphs 13 © 2019 IBM Corporation Our work Each restriction is our contribution in the literature of graph grammar

Existing work on graph grammar Our method [Aguiñaga+, 16]

Set of general Set of molecular Restrict hypergraphs hypergraphs

Irredundant Tree decomposition Restrict tree decomposition

Hyperedge Molecular Hypergraph Restrict Replacement Grammar Grammar

* HRG is a context-free grammar generating hypergraphs 14 © 2019 IBM Corporation Hypergraph & Molecular Hypergraph Hypergraph is a generalization of a graph

§Hypergraph ℋ = (6, 7) consists of… –Node 8 ∈ 6 –Hyperedge 9 ∈ 7 ⊆ 2 < : Connect an arbitrary number of nodes cf, An edge in a graph connects exactly two nodes

Node Hyperedge

15 © 2019 IBM Corporation Hypergraph & Molecular Hypergraph Our method We represent a molecule using a hypergraph, not a graph. This helps to satisfy the valence conditions.

H H H H §Molecular hypergraph models…

H C C H – Atom = hyperedgeH C C H

– bond = node H C C H H H

H H §Molecular graph models…H H

– Atom = nodeH C C H H H

– Bond = Edge H H H C C H

H C C H

H H

16 © 2019 IBM Corporation Hypergraph & Molecular Hypergraph Our method These requirements guarantee transformation between graph & hypergraph

§Molecular hypergraph 1. Each node has degree 2 (=2-regular) 2. Label on a hyperedge determines # of nodes it has (= valence)

H H H H H H H C C H H H

H C C H H H H C C H H C C ↔H H C C H H C C H H H H H

H H

17 © 2019 IBM Corporation Our work Each restriction is our contribution in the literature of graph grammar

Existing work on graph grammar Our method [Aguiñaga+, 16]

Set of general Set of molecular Restrict hypergraphs hypergraphs

Irredundant Tree decomposition Restrict tree decomposition

Hyperedge Molecular Hypergraph Restrict Replacement Grammar Grammar

* HRG is a context-free grammar generating hypergraphs 18 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar HRG generates a hypergraph by repeatedly replacing non-terminal hyperedges with hypergraphs

§Hyperedge replacement grammar (HRG) = = (>, ?, /, @) –>: set of non-terminals N Labels on hyperedges –?: set of terminals C –/: starting symbol S –@: set of production rules

– A rule replaces a non-terminal hyperedge with a hypergraph

H

N

S N N 1 H C 1

N 2 2

Replace l.h.s. with r.h.s. © 2019 IBM Corporation 19 Hyperedge replacement grammar & Molecular hypergraph grammar Our method MHG is defined as a subclass of HRG HRG

MHG §Molecular Hypergraph Grammar (MHG) –Definition: HRG that generates molecular hypergraphs only –Counterexamples: H H Valence $ H C C H

H H H

H H

2-regularity $ H C C H

20 H H H © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Start from starting symbol S

S

Production rules @

H

N

S N N 1 H C 1

N 2 2 21 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar The left rule is applicable

S

Production rules @

H

N

S N N 1 H C 1

N 2 2 22 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar We obtain a hypergraph with three non-terminals

N N N

Production rules @

H

N

S N N 1 H C 1

N 2 2 23 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Apply the right rule to one of the non-terminals

N N N

Production rules @

H

N

S N N 1 H C 1

N 2 2 24 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Two non-terminals remain

H

C H N N

Production rules @

H

N

S N N 1 H C 1

N 2 2 25 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Repeat the procedure until there is no non-terminal

H

C H N N

Production rules @

H

N

S N N 1 H C 1

N 2 2 26 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Repeat the procedure until there is no non-terminal

H

C H N C H

Production rules @ H

H

N

S N N 1 H C 1

N 2 2 27 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Repeat the procedure until there is no non-terminal

H

C H N C H

Production rules @ H

H

N

S N N 1 H C 1

N 2 2 28 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Graph generation halts when there is no non-terminal,

H H

C H C C H

H Production rules @ H

H

N

S N N 1 H C 1

N 2 2 29 © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Our method MHG is defined as a subclass of HRG HRG

MHG §Molecular Hypergraph Grammar (MHG) –Definition: HRG that generates molecular hypergraphs only –Counterexamples: H H This can be Valence avoided by $ H C C H learning HRG from H H H data [Aguiñaga+, 16]

H H

2-regularity $ H C C H

30 H H H © 2019 IBM Corporation Hyperedge replacement grammar & Molecular hypergraph grammar Our method MHG is defined as a subclass of HRG HRG

MHG §Molecular Hypergraph Grammar (MHG) –Definition: HRG that generates molecular hypergraphs only –Counterexamples: H H This can be Valence avoided by $ H C C H learning HRG from H H H data [Aguiñaga+, 16]

H H Use an irredundant 2-regularity $ H C C H tree decomposition (our contribution)

31 H H H © 2019 IBM Corporation Our work Each restriction is our contribution in the literature of graph grammar

Existing work on graph grammar Our method [Aguiñaga+, 16]

Set of general Set of molecular Restrict hypergraphs hypergraphs

Irredundant Tree decomposition Restrict tree decomposition

Hyperedge Molecular Hypergraph Restrict Replacement Grammar Grammar

* HRG is a context-free grammar generating hypergraphs 32 © 2019 IBM Corporation Tree decomposition & Irredundant tree decomposition Tree decomposition discovers a tree-like structure in a graph

§Tree decomposition –All of the nodes and edges must be included in the tree –For each node, the tree nodes that contain it must be connected

H H H H

H C 1 1 C H H C C H 1 1 4 2 4 2 4 2 H C C H 3 3 H C 3 3 C H

H H H H 33 * Digits represent the node correspondence © 2019 IBM Corporation Tree decomposition & Irredundant tree decomposition Tree decomposition and (a syntax tree of) HRG are equivalent

§Relationship between tree decomposition and HRG 1. Connecting hypergraphs in tree recovers the original hypergraph 2. Connection ⇔ Hyperedge replacement

H H H H

H C 1 1 C H H C C H 1 1 4 2 4 2 4 2 H C C H 3 3 H C 3 3 C H

H H H H 34 © 2019 IBM Corporation Tree decomposition & Irredundant tree decomposition Tree decomposition and (a syntax tree of) HRG are equivalent

§Relationship between tree decomposition and HRG 1. Connecting hypergraphs in tree recovers the original hypergraph 2. Connection ⇔ Hyperedge replacement

H H H H

H C 1 1 C H H C C H 1 1 4 2 4 2 4 2 H C C H 3 3 H C 3 3 C H

H H H H 35 © 2019 IBM Corporation Tree decomposition & Irredundant tree decomposition Tree decomposition and (a syntax tree of) HRG are equivalent

§Relationship between tree decomposition and HRG 1. Connecting hypergraphs in tree recovers the original hypergraph 2. Connection ⇔ Hyperedge replacement

H H H attach H C 1 H C 1 1 C H 1 1 4 N 4 2 = 4 2 Production rule 4 2 1 3 3 1 H C 3 3 C H 4 4 N 3 H H 36 © 2019 IBM Corporation Tree decomposition & Irredundant tree decomposition Tree decomposition and (a syntax tree of) HRG are equivalent

§Relationship between tree decomposition and HRG 1. Connecting hypergraphs in tree recovers the original hypergraph 2. Connection ⇔ Hyperedge replacement

H H H attach H C 1 H C 1 1 C H 1 1 4 N 4 2 = 4 2 Production rule 4 2 1 3 3 1 H C 3 3 C H 4 N 4 N N 3 H H 37 We want to attach next! © 2019 IBM Corporation Tree decomposition & Irredundant tree decomposition Extract production rules from a tree decomposition; then HRG with the rules can reconstruct the original hypergraph

§HRG inference algorithm [Aguiñaga+, 16] –Input: Set of hypergraphs –Output: HRG w/ the following properties: • All of the input hypergraphs are in the language % • Guarantee the valence conditions % • No guarantee on 2-regularity &

1. Compute tree decompositions of input hypergraphs 2. Extract production rules 3. Compose HRG by taking their union

38 © 2019 IBM Corporation Tree decomposition & Irredundant tree decomposition Extract production rules from a tree decomposition; then HRG with the rules can reconstruct the original hypergraph

§HRG inference algorithm [Aguiñaga+, 16] –Input: Set of hypergraphs –Output: HRG w/ the following properties: • All of the input hypergraphs are in the language % • Guarantee the valence conditions % • No guarantee on 2-regularity &

H H

H C C H This cannot be transformed into a molecular graph H H H 39 © 2019 IBM Corporation Tree decomposition & Irredundant tree decomposition Our method Irredundant tree decomposition is a key to guarantee 2-regularity

§Irredundant tree decomposition –The connected subgraph induced by a node must be a path –Any tree decomposition can be made irredundant in poly-time

H H

H C 1 1 C H 1 1 4 2 4 2 4 4 2 3 3 H C 3 3 C H

H H Irredundant Redundant 40 © 2019 IBM Corporation Our work Each restriction is our contribution in the literature of graph grammar

Existing work on graph grammar Our method [Aguiñaga+, 16]

Set of general Set of molecular Restrict hypergraphs hypergraphs

Irredundant Tree decomposition Restrict tree decomposition

Hyperedge Molecular Hypergraph Restrict Replacement Grammar Grammar

* HRG is a context-free grammar generating hypergraphs 41 © 2019 IBM Corporation Our method in 1-page Our method Molecular hypergraph is used to satisfy the valence conditions, and irreduntant tree decomposition guarantees 2-regularity.

§MHG Inference algorithm –Input: Set of molecular graphs –Output: MHG w/ the following properties: • All of the input hypergraphs are in the language % Thanks to HRG • Guarantee the valence conditions % • Guarantee 2-regularity # Our contribution

1. Convert molecular graphs into molecular hypergraphs 2. Compute tree decompositions of molecular hypergraphs 3. Convert each tree decomposition to be irredundant 4. Extract production rules 5. Compose MHG by taking their union 42 © 2019 IBM Corporation Application to Molecular Optimization

43 © 2019 IBM Corporation Application to Molecular Optimization We obtain (Enc, Dec) between molecule and latent vector by combining MHG and RNN-VAE

§MHG-VAE: (Enc, Dec) between molecule & latent vector

MHG Enc of RNN-VAE

Molecular Molecular Parse Tree Latent vector graph hypergraph according to MHG Enc Enc Enc $ H G N ! ∈ ℝ

MHG-VAE encoder

44 © 2019 IBM Corporation Application to Molecular Optimization First, we learn (Enc, Dec) between a molecule and its vector representation using MHG-VAE

§Global molecular optimization –Find: Molecule that maximizes the target –Method: VAE+BO 1. Obtain MHG from the input molecules 2. Train RNN-VAE on syntax trees ' D 3. Obtain vector representations #A ∈ ℝ ABC

1. Some of which have target values {FA ∈ ℝ} ' I 4. BO gives us candidates #H ∈ ℝ HBC that may maximize the target I 5. Decode them to obtain molecules "H HBC

45 Image from [Gómez-Bombarelli+, 16] © 2019 IBM Corporation Application to Molecular Optimization Given vector representations and their target values, we use BO to obtain a vector that optimizes the target

§Global molecular optimization –Find: Molecule that maximizes the target –Method: VAE+BO 1. Obtain MHG from the input molecules 2. Train RNN-VAE on syntax trees ' D 3. Obtain vector representations #A ∈ ℝ ABC

1. Some of which have target values {FA ∈ ℝ} ' I 4. BO gives us candidates #H ∈ ℝ HBC that may maximize the target I 5. Decode them to obtain molecules "H HBC

46 Image from [Gómez-Bombarelli+, 16] © 2019 IBM Corporation Empirical studies We wish to understand the dependency of the performance on # of evaluations of function, J: Molecule ⟼ Property

§Purpose of our empirical studies Validate the following questions: 1. MHG improves the performance compared to JT-VAE? 2. If # of evaluations is not limited, RL will be better than VAE 3. If # of evaluations is limited, VAE will be better than RL

Penalty to a ring larger than six Synthetic accessibility score

Water solubility 47 © 2019 IBM Corporation Empirical studies MHG achieves higher performance than the other VAE-based methods '

§Purpose of our empirical studies Validate the following questions: 1. MHG improves the performance compared to VAE-based ones? 2. If # of evaluations is not limited, RL will be better than VAE 3. If # of evaluations is limited, VAE will be better than RL

JT-VAE by Jin et al., ICML ’18 (VAE+BO) GCPN by You et al., NeurIPS ‘18 (RL) 48 © 2019 IBM Corporation Empirical studies But RL-based method is better than ours when # of evaluations is not limited(

§Purpose of our empirical studies Validate the following questions: 1. MHG improves the performance compared to VAE-based ones? 2. If # of evaluations is not limited, RL will be better than VAE 3. If # of evaluations is limited, VAE will be better than RL

JT-VAE by Jin et al., ICML ’18 (VAE+BO) GCPN by You et al., NeurIPS ‘18 (RL) 49 © 2019 IBM Corporation Empirical studies Ours is better than the RL-based method when # of evaluations is limited to 500 '

§Purpose of our empirical studies Validate the following questions: 1. MHG improves the performance compared to VAE-based ones? 2. If # of evaluations is not limited, RL will be better than VAE 3. If # of evaluations is limited, VAE will be better than RL

JT-VAE by Jin et al., ICML ’18 (VAE+BO) GCPN by You et al., NeurIPS ‘18 (RL) 50 © 2019 IBM Corporation Conclusion We develop a graph-grammar based molecular representation, MHG, as well as its combination with VAE.

§We develop a molecular hypergraph grammar (MHG) –Molecules generated by MHG always satisfy the valence conditions • Hypergraph representation • Irredundant tree decomposition –Inference algorithm from data • No need to write rules by hand • The inferred MHG can describe all of the input data

§We apply MHG to molecular optimization –MHG-VAE > JT-VAE –MHG-VAE > GCPN when # of evals is limited 51 © 2019 IBM Corporation Reference

[Gómez-Bombarelli+, 16] Gómez-Bombarelli, R., Wei, J. N., Duvenaud, D., Hernández-Lobato, J. M., Sánchez-Lengeling, B., Sheberla, D., Aguilera-Iparraguirre, J., Hirzel, T. D., Adams, R. P., and Aspuru- Guzik, A. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Science, 2018. (ArXiv ver. appears in 2016)

[Jin+, 18] Jin, W., Barzilay, R., and Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proceedings of the Thirty-fifth International Conference on , 2018.

[You+, 18] You, J., Liu, B., Ying, Z., Pande, V., and Leskovec, J. Graph convolutional policy network for goal-directed molecular graph generation. In Advances in Neural Information Processing Systems 31, pp. 6412–6422, 2018.

[Aguiñaga+, 16] Aguiñaga, S., Palacios, R., Chiang, D., and Weninger, T. Growing graphs from hyperedge replacement graph grammars. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 469–478, 2016.

52 © 2019 IBM Corporation