THESE DE DOCTORAT DES UNIVERSITÉS BORDEAUX I ET BORDEAUX SEGALEN Ecole Doctorale Sciences de la Vie et de la Santé Spécialité : Biochimie Présentée par

Daimel CASTILLO GONZÁLEZ

Pour obtenir le grade de Docteur de l’Université de Bordeaux

Nouveaux ligands de quadruplexes. Approches in silico et in vitro .

Thèse dirigée par le Dr. Jean-Louis MERGNY

Soutenue le 14 novembre 2013 devant le jury composé de:

Pr. Mathy Froeyen KU. Leuven, Belgique Président Dr. Jacques Chomilier UPMC, Paris, France Rapporteur Dr. Genevieve Pratviel Université de Toulouse, France Rapporteur Pr. Sandro Cosconati Université de Naples, Italie Examinateur Pr. Maria Natália Dias Soeiro Université de Porto, Portugal Examinateur Dr. Jean-Louis Mergny Université de Bordeaux, France Directeur de thèse

DEDICO ESTE TRABAJO

A Ernesto, por su luz, por la fuerza, por la vida. A Gladys, por el apoyo, la inspiración, la constancia, la confianza, la incondicionalidad. A Luis y Morbila, por el amor, la ternura, los cuidados, madrugones, la educación y todas las leches con azúcar, café y sal. A Alain, por la humanidad, por las manos amigas siempre a tiempo y salvadoras.

AGRADECIMIENTOS

“Honrar, honra”, escribió hace más de un siglo José Martí, el más universal de los cubanos. Quisiera, pues, agradecer a los que de alguna manera han contribuido al desarrollo de la presente investigación.

Al concilio interuniversitario flamenco (Bélgica), por financiar el proyecto “Strengthening postgraduate education and research in Pharmaceutical Sciences” de la cual esta investigación es parte. Al proyecto de la AECID (España) “Montaje de un laboratorio de química computacional, con fines académicos y científicos, para el diseño racional de nuevos candidatos a fármacos en enfermedades de alto impacto social (D/024153/09)” y a INSERM U869. Al doctor Jean-Louis Mergny, no solo por recibirme en su laboratorio y facilitarme el uso de sus instalaciones, sino también por dejarme hacer e innovar, sintiendo siempre su respaldo y su guía.

A los profesores doctores Miguel Ángel Cabrera y Gisselle Pérez Machado, quienes asesoraron los primeros pasos de esta labor científica en la Universidad de Las Villas (Cuba). En la misma institución, a los profesores Dr. Luis Bravo, Dr. Leisy Nieto, Dr. Mirta Mayra González, Dr. Julio Omar Prieto, MSc. Luis Torres y MSc. Enoel Hernández. Y a mis colegas y amigos cubanos del Grupo de Diseño de Fármacos: Reynaldo, Aliuska, Maikel X, Chiqui y el Guille. A Guy, Josefa y María, por su facilitación logística, siempre a tiempo.

A quienes más tarde, en los laboratorios europeos, continuaron ofreciéndome el mismo apoyo desinteresado y decisivo: los profesores doctores Arthur van Aerschoft, Teresa María Garriguez Pelufo y María Teresa Varea.

A los especialistas y técnicos del laboratorio INSERM U869, en Burdeos por ser compañeros y amigos: Aurore, Amandine, Anne, Samir, Gilmar, Katy, Thao, Amina, Rui, Abdelazziz y Souheila.

Al profesor doctor Mathy Froeyen, por sus oportunas revisiones, apreciables críticas y sabios consejos.

A los doctores Prof. Sandro Cosconati, Prof. Maria Natália Dias Soeiro, Prof. Mathy Froeyen, Dr. Jacques Chomilier y Dr. Genevieve Pratviel, que aceptaron evaluar mi trabajo e integrar el tribunal para su defensa.

A Ronal Ramos, por estar en la génesis de esta labor. A Maikel Pérez: dondequiera que estés, ojalá puedas ver estos resultados.

A los jesuitas de la comunidad Campanar, en Valencia: Jesús, Mariano, Ramiro, Tony, Pedro y Joaquín. A Encarna.

A Liane, por la hermandad, la complicidad, la alegría, el apoyo.

A los amigos de toda la vida, los nuevos y los viejos, por tolerarme, ayudarme y, sobre todo, por hacerme reír: Aramís, Durán, Maylén, el Tyler, las Roxsis, Yaíma, Lesbia, Pilar, Misael, Alain, Rafa, Gelsy... Gracias por estar y por ser tan especiales.

A mi familia, pues siempre me ha acompañado en la luz y en la sombra: mamá, mami, papi, tigre, Andrés, pompón, Laura Pi, Luis Enrique, tío Carlos, Brian y Misdelkis.

A todos, mi agradecimiento sincero.

ACKNOWLEDGEMENTS “To honor do us honor”, wrote José Martí about a century ago, who has been considered the most universal of the Cubans. Following this lead I would like to thank all the people that in many ways contributed to this research.

I am thankful to the Flemish Interuniversity Council (VLIR/UOS) for funding the project: “Strengthening postgraduate education and research in Pharmaceutical Sciences”, of which this research is part. Likewise I am grateful to the AECID project: “Montaje de un laboratorio de química computacion al, con fines académicos y científicos, para el diseño racional de nuevos candidatos a fármacos en enfermedades de alto impacto social (D/024153/09)” and INSERM U869 for the financial support provided to this research .

I would also like to express a deep sense of gratitude to Dr. Jean-Louis Mergny not only for the warm welcome that he gave me in his laboratory, providing me with all kind of equipment I needed for my research, but also for allowing me to innovate and work in an independent way, knowing that I could always count on his support and the guide of his scientific experience.

In this sense it is also a pleasure for me to recognize the contribution of Prof. Dr Miguel Ángel Cabrera and Dr. Gisselle Pérez Machado from Universidad Central de Las Villas (UCLV), who advised me in the first steps of this research. I would also like to thank to Dr. Luis Bravo, Dr. Leisy Nieto, Dr. Mirta Mayra González, Dr. Julio Omar Prieto, MSc. Luis Torres and MSc. Enoel Hernández from the same institution for their support.

I am deeply grateful to my colleagues from the Molecular Simulation and Drug Design (MSDD) group of UCLV in Cuba: Aliuska, Maikel X, Reynaldo, Chiqui and Guille for sharing with me their friendship, ideas and knowledge. I offer my sincerest gratitude to Guy, Josefa and Maria for their guidance in the laboratory. It is also very important for me to thank the persons that later in the European laboratories were very supportive for me: Prof. Dr Arthur Van Aerschot, Prof. Dr. Teresa María Garriguez Pelufo, Prof. Dr. María Teresa Varea as well as a whole team of friendly and capable specialists and technicians Aurore, Amandine, Anne, Samir, Gilmar, Katy, Thao, Amina, Rui, Abdelazziz and Souheila.

I would like to thank Prof. Dr. Mathy Froeyen for providing remarks that stimulated me to improve the discussion of the results and all the thesis committee members for accepting to assess the results of my work done during the last 4 years and finalize this thesis.

I offer my utmost appreciation to Ronal Ramos for being the genesis of this work and to Maykel Perez, who encouraged me in the beginning of this investigation; wishing you would have seen the results.

I have many friends I would like to thank for their unselfish support in my most difficult times, like the Jesuit from the Campanar community in Valencia: Jesús, Mariano, Ramiro, Tony, Pedro and Joaquín. Thanks Encarma.

Thanks Liane for the friendship, your joy, complicity and unselfish support.

I am thankful to the friends, the ones that have been there all my life and the new ones Aramís, Durán, Maylén, el Tyler, the Roxsis, Yaíma, Lesbia, Pilar, Misael, Alain, Rafa, Gelsy for helping me to get through my moments of frustrations and making my years of PhD more smooth and amusing.

Last but not least I would like to express special gratitude to my family that has always been there for me: mamá, mami, papi, tigre, Andrés, pompón, Laura Pi, Luis Enrique, tío Carlos, Brian and Misdelkis.

To all of you my most sincere thanks.

Contents

Abbreviations……………………………………………………………………………….…….… I Key words/Mots-clés ….…………………………………………………………………………… III Abstract……………………………………………………………………………………...……… IV Résumé étendu en français ………………...………………………………………………………. VI Introduction ……………………...………………………………………………………………..... 1 Chapter 1. Telomeres, telomerase, and guanine qu adruplexes. Main concepts….…………….. 16 1.1 Formation of G-quadrupl exes by DNA……………………………………………………... . 16 1.2 Telomeres structure and function…… .…………………………………………………..….. 17 1.3 Telomerase: Role in the cellular senes cence and cancer…… ... ……………………..……… 18 1.4 Cellular senescence and t elomerase…………… ... ……………………………..…………… 22 1.5 Role of telomerase in cellular immortalization………………… .………………...………… 22 1.6 Genetic disorders, telomeres, an d telomerase………………………………………..……… 23 1.7 Elongation of telomeres independent ly of telomerase…………………….………………… 24 1.8 Prevalence of guanine-rich sequences in genomic regions………………… .………….…… 25 1.9 G-quadruplex structures as therapeutic targets in telomerase-positive cell lines …… ... ……. 26 1.10 Comparison of different G-quadruplex targeted anticancer strategies… .…………………. 30 Chapter 2. Concepts and useful parameters for in silico studies..…………………………….… 32 2.1 Molecular Descript ors…… .……….…..……………………………………………….……. 32 2.2 Chemome trics………………………………… .……………………………………………. 37 2.2.1 Cluster anal ysis..…… .……………………………………………………..…………. 37 2.2.2 Classi fication…… .……………………………………………………………………. 37 2.2.3 Parameters for measuring the quality of discriminant analysis…………… .. …..……. 38 2.2.4 Regression a nalysis…… .………………………………………………………..……. 39 2.2.5 Regression p arameters………………………………………………………..………. 40 2.2.6 Goodness o f prediction…… .…………………………………………………………. 42 2.3 Genetic algorithm (GA) for the sele ction of variables………… .……… .…… .……….……. 42 2.4 Valida tion techniques… .………………………………………………………………….…. 43 2.5 Enrichment Analysis………………………………………………………………………… 46 Chapter 3. Search for new compounds with G-quadruplex binding activity using computational methods ………………………………………………… ………………………….. 49 Introduction…………………………………………………………………… .……………….. . 49 3.1 Article 1. Computational tools in the discovery of new G-quadruplex ligands with potential anticancer activity. Current Topics in Medicinal Chemistry, 2012, 12, 2843 -2856……...... 49 3.2 Systematization of some characteristics of the G4 ligands proposal by SAR studies …… .… 51 3.2.1 Groove binders………………………………………… .….…………………………. 52 3.2.2 Loop binders. ………………………… .…… .……………………………… ..………. 55

Chapter 4. Linear multivariate techniques for evaluation of a congeneric set of compounds acting on G-quadruplexes……… .……………………………………… …………………….…… 57 Introduction……… ... ………………… .………………………………………………………… 57 4.1 Article 2. Telomerase Inhibitory Activity of Acridinic Derivatives: A 3D-QSAR Approach. QSAR & Comb. Sci. 28, 2009, No. 5, 526 – 536…………………………… ...... 58 4.2 Article 3. Prediction of telomerase inhibitory activity for acridinic derivatives based on chemical structure. European Journal of Medicinal C hemistry 44 (2009) 4826 –4840………… 59 4.3 Discussion of Articles 2 and 3………………………………………………………………. 60 Chapter 5. Database collection, linear models of non-congeneric sets of compounds, virtual screening of FDA -approved compounds, and experimental evidence for the G-quadruplex stabilizing activity of neuroleptic drugs……………… .. ……………………… ...... 71 Introduction… .. ………………………………………………………………………………….. 71 5.1 Xenograft models demonstrate the anticancer activity of G-quadruplex ligands…… .. ….…. 71 5.2 Article 4. “FDA -approved Drugs Selected Using Virtual Screening Bind Specifically to G-quadruplex DNA”. Current Pharmaceutical Design, 2013, 19(12):2164 -73………………..... 73 5.3 Overview of evidence that identified compounds have potential as anticancer agents ….. …. 75 5.4 FDA-approved compounds with positive predictions in the virtual screening and negatives activities …………………………………………….……………… ………………..………… . 77 Chapter 6. Curation of databases of ligands and use of non-linear techniques to develop reliable models. Virtual screening of commercial databases of compounds and biophysical evaluation………… ……………………………………………….…… .. ….. …… .…… .…… .…… 79 Introduction………………………………………… .……………………………………..……. 79 6.1 Materials and methods ………………………………………………………………………. 81 6.1.1 Dataset ……………………………………………………………………………….. 81 6.1.2 Dataset Curation ……………………………………………………………………… 82 6.1.3 Detection of Activity Cliffs and Removal of Compounds Inducing SAR Discontinuity ……………………………………………………………………………….. 83 6.1.4 Classes Balancing ……………………………………………………………………. 85 6.1.5 Training / Evaluation Data Splitting ………………………………………………….. 86 6.1.6 Structure Codification ………………………………………………………………… 87

6.1.7 Molecular docking …………………………………………………………………… 88

6.1.8 Oligonucleotides ……………………………………………………………………… 89 6.1.9 FRET melting studies ………………………………………………………………... 89 6.1.10 UV-Vis spectroscopy studies ………………………………………………………. 90 6.1.11 Fluorescence direct detection ………………………………………………………. 90 6.2 Results and discussion ………………………………………………………………………. 91 6.2.1 Dataset Curation ……………………………………………………………………… 91 6.2.2 Detection of Activity Cliffs and Removal of Compounds Inducing SAR 92

Discontinuity ……………………………………………………………………………….. 6.2.3 Classes Balancing …………………………………………………………………….. 95 6.2.4 Models Building and Consensus Classifier …………………………………………... 97 6.2.5 Ligand-Based Virtual Screening Performance ……………………………………….. 102 6.2.6 Virtual screening to Asinex and Life chemicals database ……………………………. 108 6.2.7 Docking modeling campaign and selection of the most promising candidates ……… 109 6.2.8 Harmonizing Ligand- and Structure-Based Information for Virtual Screening ……… 112 6.2.9 Selection and Purchase of the Most Promising and Structurally Diverse Candidates ………………………………………………………………………………….. 112 6.2.10 Study of the stabilizing capacity of the ligands in the human telomeric sequence …. 114 6.2.11 Stabilization of G-quadruplex-prone sequences by ligands ………………………… 117 6.2.12 UV-Vis spectroscopy studies …… .…………………………………………………. 119 6.2.13 Fluorescence properties of L13 ……………………………………………………. 121 6.3 Antiproliferatives properties of other similar compounds to M1 and L13 ………………….. 123 6.4 Brief commentaries concerning to the Detection of Activity Cliffs and Removal of Compounds Inducing SAR Discontinuity ……………………………………………………….. 126 Conclusions ……….…….………………………………………………………………………. 127 Chapter 7. Effects of L13 and M1 on G-quadruplex structures that are hallmarks of cancer cell……………………………… …………………...…………………………………. …..……….. 129 7.1 Article 5. “Hallmarks of cancer, quadruplexes and antitumor properties. Speculation, circumstantial evidence or possibility?”..………………………… …………………..………… 129 Introduction…………………… .…………………………………… ... ……………………. 129 Materials and m ethods………… .………………..…………………………………………. 131 Compounds…………… .………… .………………………… .. ….……………………. 131 Oligonucleotides……………………………………………… .. ….………………….. 131 UV spectr oscopy studies………… .…………………………………………………… 132 Fluorescence e nhancement for L13…… .……………………………………………… 132 Fluor escence measurements……………………… .….……………………………….. 133 Fluorescent intercalator displa cement assay (“G4 -FID”)…… .…….………………….. 133 Results and discussion…… .……….………………………………………………………. . 136 Compounds…………… .………… .….. …… .…………………………………………. 136 UV spectroscopy studies……………… .………….. …………………………………. . 136 Enhancement of fluorescence for L13 at 466 nm of excitation…… .…………………. . 141 Activity of drugs in the fluorescent intercalator displacement (FID) assay...... 143 Conclusions………………………………………………………………..………………... 144 Chapter 8. General discussion of this research. Conclusions and perspectives …… ... …………. 145 Introduction……… .…………………………………………………………………………..…. 145 8.1 Predicitivity of anti-telomerase activit y of a congeneric set……………… .…………..……. 146 8.2 Novel structures identifie d using QSAR methodologies… .…… .…………………………. . 147 General conclusions ……… .……………………………..………………………………………. 156

Recommendations, perspectives, and future directions……….……………………………….… 158 List of publications related to the thesis…………………………………………………………. .. 159 Poster in internati onal conferences……………………………………………….……………..… 160 References…………………………………………………………………………………………… 162

Abbreviations

l Wavelength (often expressed in nm) ΔTm Variation in melting temperature r Cases per variable ratio (number of adjustable parameters in the model) 1D One-dimensional 2D Two-dimensional 3D Three-dimensional A Adenine AIC Akaike Informational Criterion ALT Alternative lengthening of the Telomeres AS-ODNs Antisense oligonucleotides C Cytosine CD Circular dichroism Cm Matews` Coefficient CV Cross Validation D2 Mahalanobis Distance DK QUIK Rule DNA Deoxyribonucleic Acid (DN)-hTERT Dominant negative Reverse Transcriptase of the human Telomerase DQ Q asymptotic Rule F Fischer ratio test FDA Food and Drug Administration FIT Kubinyi Function FRET Fluorescence Resonance Energy Tranference FS Forward stepwise G Guanine G4 Guanine quadruplex G4 ligands Ligands that stabilize the G-quadruplex G4-FID Fluorescent intercalator displacement assay GA Genetic Algorithm GMP Guanosine monophosphate G-quadruplex Guanine quadruplex hTERC or hTR RNA component of the human telomerase hTERT Catalytic subunit (Reverse Transcriptase) of the human Telomerase Tel IC 50 or IC 50 Half Inhibitory Concentration kbp kilo base pairs Kxx Total correlation between the predictor variables Kxy Total correlation between predictor and response variables LDA Linear Discriminant Analysis LFER Linear free energy relationships LMO Leave more than one LMR Lineal Multiple Regression LOF Friedman Modification LOO Leave one out LSER Linear solvation energy relationships LTL Leukocyte telomere length M1 Replicative senescence, or mortality stage 1 M2 Crisis or Mortality stage 2 MD Molecular Descriptors MHC Major histocompatibility complex mRNA Messenger Ribonucleic Acid NCI National Cancer Institute NHE Nuclease hypersensitive element NMR Nuclear magnetic resonance

I

OECD The Organisation for Economic Co-operation and Development p53 Gene that encodes a tumor suppressor protein PC Principal components analysis PCR Polymerase Chain Reaction Q2 R2 of the validation 2 2 Q BOOT R of the Bootstrap validation 2 2 Q EXT R of the external validation QSAR Quantitative Structure Activity Relationship QSPR Quantitative Structure Property Relationship R Multiple Correlation Coefficient R2 Determination Coefficient rDNA Ribosomal Deoxyribonucleic Acid RNA Ribonucleic Acid ROS Reactive oxygen species RT Reverse transcriptase s Standard Deviation S The immunoglobulin heavy chain switch SAR Structure -Activity Relationship SEC Standard Deviation Error in Calculation SPR Surface Plasmon Resonance SSR Structure-Selectivity Relationship T Thymine TAA Tumour associated antigen TO Thiazole orange TOPS-MODE Topological Sub-structural Molecular Design TRAP Telomeric Repetition Amplification Protocol TRF Telomeric Repeat Binding Factor U Uracil UV Ultraviolet spectroscopy VS Virtual screening

II

Key words

G-quadruplexes or G4, telomeres, telomerase, cancer, G4 ligands, virtual screening, QSAR, oncogenes sequences

Mots-clés G-quadruplexes ou G4, télomères, télomérase, cancer, ligands G4, criblage, QSAR, séquences oncogènes

III

Abstract DNA and RNA G-rich sequences can adopt unusual arrangements that are known as G-quadruplexes (G4). The topologies and forms of these fascinating structures are very diverse. G4 are stabilized by the presence of monovalent cations and Hoogsteen Hydrogen bonds. Small molecules also contribute to the formation of stable forms mainly via π -π stacking interactions. Although G4s are known for decades, interest in this field started with their potential effect on inhibition of telomerase enzyme, a Reverse Transcriptase involved in the malignant transformation of most cancer cells. With regards to telomerase, cancer and G4, several groups have been involved in the discovery of new G4 stabilizers that would indirectly inhibit the enzyme. Most of the G4 ligands were identified following this paradigm. Hundreds of ligands have been identified during the past decade and this is still a very active field in science. Taking into account the advantages and easiness that offers the identification of new structures using computational techniques we built single and reproducible mathematical models with high screening capacity and low computational cost in order to use them on the identification of G4 ligands. tel With the use of QSAR modelling we can predict the IC 50 of a congeneric set of compounds. We have also been able to relate the molecular descriptors that appear in ours models with some structural features that scientific literature and SAR studies have reported in previous studies as appropriated for describing the above mentioned activity, also for congeneric set of ligands. Moreover, we built different models using non congeneric sets of compounds applying a consensus strategy and could identify six FDA approved ligands that stabilize G4 structures. Subsequently, by applying nonlinear techniques and a process for the cure of the database proposed for us in previous publications, we have performed a virtual screening of more than 500 000 ligands from a commercial database of compounds, followed of structure-based model in order to reduce the number of candidates. We were able to identify new ligands with stronger potency than the previous ones, which can also stabilize other G4 structures involved in processes related to cancer. These observations open a wide-ranging spectrum of possibilities to be explored.

IV

Despite the limitations of the QSAR modelling techniques explored along this work, we consider they can be combined and used carefully to address the search for new G4 stabilizers.

V

Résumé étendu en français

Les séquences d’ADN et d'ARN riches en G peuvent adopter des conformations inhabituelles bien différentes de la double-hélice classique. Ces structures sont connues sous le nom de G-quadruplexes ou G4. Les topologies et le repliement de ces structures fascinantes sont très diverses. Les G4 sont stabilisés par la présence de cations monovalents et la formation de 8 liaisons Hydrogène de type Hoogsteen par quartet. Ces structures peuvent être reconnues par de petites molécules, qui stabilisent ces édifices, le plus souvent par des interactions d'empilement π - π . Bien que les G4 soient connus depuis des décennies, l'intérêt pour ce domaine a pris un nouvel essor à la fin des années 90, lorsqu’il a été démontré un effet inhibiteur sur l’activité de la télomérase, une transcriptase inverse impliquée dans l’immortalisation de la plupart des cellules cancéreuses. En ce qui concerne la télomérase, le cancer et G4, plusieurs groupes ont été impliqués dans la découverte de nouveaux ligands de G4 qui inhiberaient indirectement l'enzyme. La plupart des ligands de quadruplexes ont été identifiés en relation avec cette cible. Des centaines de composés ont été caractérisés au cours de la dernière décennie. En prenant en compte les avantages et la facilité qu'offre l'identification de nouvelles structures à l'aide de techniques de calcul, nous avons construit des modèles mathématiques simples et reproductibles possédant une grande capacité de criblage pour un faible coût de calcul afin de pouvoir identifier de nouveaux ligands G4.

Avec l'utilisation de la modélisation QSAR , nous pouvons prédire l’ IC 50 sur la télomérase d'un ensemble de composés congénères. Nous avons également été en mesure de relier les descripteurs moléculaires qui apparaissent dans nos modèles avec certaines caractéristiques structurales préalablement décrites. En outre, nous avons construit des modèles différents utilisant des ensembles non congénères de composés en appliquant une stratégie de consensus et pu identifier six ligands approuvés par la FDA qui stabilisent ces G4. Par la suite, en appliquant des techniques non linéaires et un processus que nous avons proposé pour le traitement des bases de données, nous avons effectué un criblage virtuel de plus de 500 000 ligands à partir d'une base de données commerciale de composés. Nous avons ensuite appliqué un modèle basé sur la structure afin de réduire le nombre de candidats. Nous avons pu identifier de nouveaux ligands

VI avec un potentiel inhibiteur plus importants que les précédents, et qui peuvent également stabiliser d'autres structures G4 impliqués dans d’autres processus liés au cancer. Ces observations ouvrent un spectre large de possibilités à explorer. Dans ce travail nous proposons les objectifs spécifiques suivants: 1. Développer et valider expérimentalement de nouveaux modèles de calcul qui rendront possibles le criblage in silico d’un grand nombre de composés. 2. Identifier de nouveaux ligands des G-quadruplex. 3. Valider expérimentalement ces prédictions.

Nous allons décrire brièvement la structure de cette thèse. Le premier chapitre est une introduction qui présente les télomères, la télomérase et les quadruplexes. L'objectif principal de ce chapitre est de fournir des informations de fond liées à ces structures. La structure et les fonctions des télomères et de la télomérase ainsi que les conditions de formation des quadruplexes de guanine sont discutées. Nous présentons également les relations entre les télomères et le cancer, et comment les premiers ligands développés été conçus pour obtenir des propriétés anticancéreuses. D'autres régions du génome non télomériques ayant la possibilité de former G4 sont présentés. Le deuxième Chapitre est consacré à la présentation de concepts et paramètres liés aux méthodes QSAR. Ils sont cruciaux pour une meilleure compréhension de la thèse. Les détails concernant la technique employé pour transformer une molécule en un “nombre utile”, et comment ce nombre peut être utilisé dans l a construction de modèles mathématiques pour prédire une activité. Nous ajoutons des explications sur certains outils statistiques qui permettent de vérifier la qualité des modèles développés. Comme l'un des objectifs de cette recherche est l'identification et la prédiction de nouveaux ligands G4 avec nos méthodes de calcul, le chapitre 3 présente une revue de synthèse qui récapitule les stratégies de calcul qui ont été utilisées à cet effet. Une brève description des méthodes de calcul et les ligands identifiés à l'issue de ce récapitulatif est donné. Enfin, certaines caractéristiques importantes des ligands provenant d'études SAR sont discutées. Le chapitre 4 est composé de deux articles qui traitent des prédictions sur l'activité de l'enzyme télomérase par le mécanisme de stabilisation G4. Les premiers essais pour vérifier la possibilité d'obtenir des prédictions fiables à l'aide d’une

VII méthodologie QSAR, basée à la fois sur une régression et des méthodes de classification, ont été réalisés. Nous avons ici conçu des modèles appliqués à une famille de composes seulement, des (base de données congénères de composés) qui utilisent des techniques simples mais robuste (LDA et MLR). Cette approche permet de reproduire les valeurs trouvées dans les tests expérimentaux. Dans le même temps, il est possible de relier nos descripteurs moléculaires avec certaines caractéristiques d’activité. En d'autres termes, en utilisant des techniques de calcul très simples, nous pouvons fournir des outils prédictifs. Ils peuvent être utilisés afin d'obtenir de nouvelles informations concernant la SAR à cette famille de composés (interprétation), et nous pouvons reproduire, à l'aide des descripteurs moléculaires, les observations expérimentales rapportées précédemment. Dans le chapitre 5, nous avons étendu l'analyse à un ensemble structurellement varié. Nous utilisons également la LDA, mais dans un ensemble de données non congénères de composés. Nous mettons en place une base de données d’inhibiteurs de la télomérase, où les ligands rapportés ont été décrits dans la littérature scientifique comme stabilisateurs de G4. Plusieurs modèles LDA et une stratégie de consensus pour la prédiction des futurs candidats sont proposés. Nous effectuons un criblage virtuel sur une base de molécules comprenant tous les composés approuvé par la FDA. Ce dépistage permet d'identifier de nouveaux ligands G4 avec une efficacité acceptable. De tous les ligands testés choisis pour l'évaluation expérimentale, nous confirmons l’affinité pour les G4 pou r six d'entre eux. Dans le chapitre 6, nous avons décidé d’appliquer les meilleures pratiques pour l'élaboration de modèles QSAR avec des méthodes basées sur la structure (accueil) dans une stratégie de criblage virtuel intégré non linéaire. Nous utilisons une stratégie non - linéaire pour la modélisation des données. Tout d’abord, nous “corrigeons” la base de données, avec l'objectif d'éliminer les composés nocifs pour la procédure de modélisation, suivant les recommandations de Tropsha. Par la suite, de nombreux modèles de QSAR sont développés. Nous effectuons un criblage virtuel sur deux bases de données commerciales de composés, suivi par une campagne de “docking” afin de réduire le nombre de candidats. Les ligands sont expérimentalement testés comme stabilisateurs de la séquence télomérique humaine ainsi que d'autres séquences oncogènes et deux d'entre eux présentent des valeurs de stabilisations intéressantes. Le

VIII résultat final est l’identification de nouvelles familles, avec des valeurs de stabilisation plus élevées que les molécules décrites dans le chapitre précédent Le chapitre 7 est le début d'un travail inachevé. Ce chapitre ouvre des perspectives pour de futures études. Ce chapitre essaie de présenter les propriétés de stabilisation des composés identifiés dans le chapitre 6 dans des séquences considérées comme « caractéristique du cancer ». Ayant obtenu des résultats positifs sur toutes les séquences testées, les prochaines étapes seront d’appliquer ces composés sur des cellules en culture dans le but ultime de découvrir de nouveaux agents antiprolifératifs. Le chapitre 8 conclut cette thèse avec une discussion générale, et inclut les perspectives.

Les travaux effectués lors de ce doctorat ont donné lieu à 4 publications: FDA-approved drugs selected using virtual screening bind specifically to G- quadruplex DNA. Curr Pharm Des, 19(12), 2164-2173. Castillo-Gonzalez, D., Perez-Machado, G., Guedin, A., Mergny, J. L., & Cabrera-Perez, M. A. (2013). Computational tools in the discovery of new G-quadruplex ligands with potential anticancer activity. Curr Top Med Chem, 12(24), 2843-2856. Castillo-Gonzalez, D., Perez-Machado, G., Pallardo, F., Garrigues-Pelufo, T. M., & Cabrera-Perez, M. A. (2012). Prediction of telomerase inhibitory activity for acridinic derivatives based on chemical structure. Eur J Med Chem, 44(12), 4826-4840. Castillo- Gonzalez, D., Cabrera-Perez, M. A., Perez-Gonzalez, M., Morales Helguera, A., & Duran-Martinez, A. (2009). Telomerase Inhibitory Activity of Acridinic Derivatives: A 3D-QSAR Approach. QSAR & Combinatorial Science, 28(5), 526-536. Cabrera-Pérez, M. A., Castillo-González, D., Pérez-González, M., & Durán-Martínez, A. (2009).

Un cinquième article, en rapport avec les chapitres 6 et 7, est en cours de rédaction.

IX

Introduction. .

Introduction Most genomic DNA adopts the classic double-helix structure; however, there are certain sequences that likely exist in other conformations. The formation of these diverse and dynamic structures depends on sequence, topology, binding of proteins, and DNA modifications. Evidence is accumulating that these non-canonical conformations have biological roles. The very fact that a large part of the genome is non-coding makes plausible the reasoning that evolution has preserved and reserved certain functions for these regions. The presence of certain conformations in genomic DNA can cause errors in some biological processes, whereas other conformations are likely required for the proper functioning. The guanine quadruplex structure is the best characterized non- canonical DNA conformation. It has been studied since its discovery at telomeric ends in the 1980s (1-3). The core of the guanine quadruplex motif was proposed by Gellert and Davies in 1962, and consists of vertically stacked guanine tetrads, with a cation located in the center (4, 5). In each tetrad four guanines occupy the corners and form a plane that is stabilized by Hoogsteen Hydrogen bonds (Figure 1.1 A). The stabilities of these alternative secondary structures depend on the primary sequence, stoichiometry, polarity, geometry, conformation of sugars, temperature, crowding conditions, concentration and nature of the cations (6). This creates a great versatility of structures that can differently interact with proteins that act on DNA or RNA, such as helicases and transcription factors. The incidence of cancer in the world population has increased as life expectancies have improved. According to the World Health Organization, approximately nine million new cases of cancer are detected per year. The annual death toll due to cancer is predicted to increase from 7.6 million in 2008 to 13 million in 2030 (7). More than two thirds of all cancer deaths occur in low and middle-income countries with lung, breast, colorectal, stomach, and liver cancers causing the majority of such deaths. In high- income countries, the leading causes of cancer deaths are lung cancer among men and breast cancer among women (7). Currently only cardiovascular diseases cause more deaths than cancer. Cancer is generally the result of many successive genetic alterations that confer to the transformed cells a number of characteristics that distinguish them from normal cells. These characteristics include capacity for autonomous growth,

1

Introduction. . evasion of apoptosis, insensitivity to anti-proliferative signals, tissue invasion capacity, metastasis formation, and induction and maintenance of angiogenesis. In addition, malignant cells must acquire unlimited replicative potential (immortality) in order to form tumors in humans (8). In man's struggle against the disease, diverse methods for the purpose of eradicating it have appeared. One of the most applied therapies for the treatment of neoplastic processes is chemotherapy (9). The employment of cytostatic and cytotoxic agents is widely applied in the treatment of malignancies. There is no effective drug that kills cancer cells without secondary effects on normal cells. For this reason new compounds are needed that attack tumor cells directly in an effective and specific way to minimize collateral effects. The telomerase enzyme, a reverse transcriptase responsible for addition of a repetitive DNA sequence to the ends of linear eukaryotic chromosomes, is involved in the phenomenon of cellular immortalization (10, 11). This enzyme is active in more than 85% of tumor cells, but is not active in the majority of normal cells, with some notable exceptions such as stem cells and germline cells. Telomerase is therefore a promising target for the treatment of malignancies (12). Various strategies have been proposed for the inhibition of the telomerase enzyme (13-16). Scheme I.1 shows a schematic representation of different approaches in order to achieve telomerase inhibition. Besides, we will provide information concerning to patents and clinical trials developed in the last years based on telomerase inhibition; some of them are directly related to the formation of Guanine quadruplexes

2

Introduction. .

G-Quadruplex Stabilisat ion RNA Interference

ANTI -TELOMERASE CANCER THERAPY

Small Molecule Gene Therapy Inhibitors

Catalytic Human Telomerase Telomerase Hammerhead Immunotherapy Reverse RNA template Ribozym es Transcriptase

Dominant - Reverse Antisense Negative hTERT Transcriptase Oligonucleotides Inhibitors

Scheme I.1. Schematic representation of diverse anti-telomerase approaches . Image taken from (15).

Several patents are related to the inhibition of telomerase. Table I.1 summarizes some of them (15). The Antisense oligonucleotides (AS-ODNs) are short oligonucleotides (often DNA) which bind to complimentary target RNA by Watson- Crick interactions. The inhibition of target RNA function can be achieved by two mechanisms: a passive mechanism which involves the hybridization of the AS-ODNs to the target hTR competitively, the second one is an active mechanism that involves the cleavage of the target hTR post-hybridization by RNase H. Physical (passive) blocking is generally considered unsufficient when targeting a mRNA. The situation is different here as hTR is not translated but reverse transcribed. These oligonucleotides are highly specific and form a stable duplex with the target hTR and are resistant to nuclease digestion. Thus, AS-ODNs binding leads to progressive telomere shortening and reduced cell growth. The Hammerhead Ribozymes are small catalytic RNA molecules which possess specific endonuclease activity and hence, catalyse self-cleavage reaction. They are composed of a catalytic core of conserved nucleotides flanked by three helices, two of which form essential tertiary interactions for fast self-scission under

3

Introduction. . physiological conditions. A ribozyme specific for a target RNA can be obtained by designing the stem I and III sequences complementary to target RNAs. Hence, in this way the binding of ribozymes to the target RNAs results in their cleavage. After cleavage, the substrate is also accessible by ribonucleases resulting in its inactivation. (1, 15-17). Others approaches are directly related to the inhibition or recognition of the telomerase reverse transcriptase (hTERT) component of telomerase. These approaches include Reverse Transcriptase (RT) Inhibitors, immunotherapy and Dominant Negative hTERT. The Reverse transcriptase inhibitors are a class of ligands which interrupt the proper functioning of reverse transcriptase enzyme. They usually act by competing with the endogenous nucleotides that are incorporated into DNA and hence interfere with the chain elongation function of reverse transcriptase. Another interesting RT inhibitor is BIBR1332, a ligand that acts on the catalytic subunit of telomerase, but not as a classical nucleoside analog (18-21). Immunotherapy is based on the observation that telomerase is present in more than the 85% of all tumours, but nearly absent in normal cells. hTERT can therefore be targeted as a tumour associated antigen (TAA). It is a source of peptides which, in association with Class I MHC molecules, can be used to stimulate cytotoxic T-lymphocytes to target and kill cancerous cells. A different strategy involves Dominant negative (DN)-hTERT mutants, which are catalytically inactive and can potentially inhibit telomerase activity by binding and sequestering hTR. Thus, if expressed at high enough levels, such a telomerase mutant can be used to deplete a necessary telomerase component (hTR) and thereby functions as an inhibitor of wild- type telomerase activity (1, 15-17).

Table I.1. List of Patents related to Anti-Telomerase Approaches . Adapted from (15).

Anti - Telomerase Approach Patent No./ Patent Title / Patent Assignee Year US7157436 Therapeutically useful synthetic oligonucleotides Bioniche Life 2007 Sciences, Inc. (Pointe-Claire, CA) Antisense Oligo US6468983 RNase L activators and antisense oligonucleotides effective to Nucleotides treat telomerase expressing malignancies The Cleveland Clinic Foundation 2002 (Cleveland, OH); The United States of America as represented by the Department of Health and Human Services (Washington, DC)

4

Introduction. .

Anti - Telomerase Approach Patent No./ Patent Title / Patent Assignee Year US6194206 Use of oligonucleotide telomerase inhibitors to reduce telomere 2001 length University of Texas System Board of Regents (Austin, TX) US8377644 B2 2′ -arabino-fluorooligonucleotide N3′→P5′ phosphoramidates: their synthesis and use Geron Corporation (Menlo Park, 2013 CA) (22) Antisense Oligo US8183222 B2 Method to inhibit cell growth using oligonucleotides 2012 Nucleotides Trustees of Boston University (Boston, MA) (23) US8426379 B2 POT1 alternative splicing variants The United States of America, as represented by the Secretary, Department of Health & Human 2013 Services (Washington, DC) (24) US8097595 B2 Modulation of telomere length in telomerase positive cells 2012 and cancer therapy ALT Solutions, Inc. (Wilmington, DE) (25) Hammerhead US7732402 Mammalian telomerase W. Geron Corporation (Menlo Park, 2010 Ribozymes CA) Dominant US7517971 Muteins of human telomerase reverse transcriptase lacking Negative telomerase catalytic activity H. Geron Corporation (Menlo Park, CA); The 2009 hTERT Regents of the University of Colorado (Boulder, CO) US6593306 Methods for modulation and inhibition of telomerase Board of Regents, University of Texas System (Austin); CTRC Research Foundation 2003 (San Antonio, TX) US6372742 Substituted indole compounds and methods of their use Geron 2002 Corporation (Menlo Park, CA) Reverse US6054442 Methods and compositions for modulation and inhibition of Transcriptase telomerase in vitro. Board of Regents, University of Texas System (Austin); 2000 Inhibitors CTRC Research Foundation (San Antonio, TX)

US6004939 Methods for modulation and inhibition of telomerase CTRC Research Foundation Board of Regents (San Antonio, TX); The University of 1999 Texas System (Austin, TX) US7662361 Methods and compositions for modulating drug activity through 2010 telomere damage US7622549 Human telomerase reverse transcriptase polypeptides Geron Corporation (Menlo Park, CA); The Regents of the University of Colorado 2009 (Boulder,CO) Geron Corporation (Menlo Park, CA) US7560437 Nucleic acid compositions for eliciting an immune response against telomerase reverse transcriptase Geron Corporation (Menlo Park, 2009 CA); The Regents of the University of Colorado (Boulder,CO) Geron Corporation (Menlo Park, CA) US7413864 Treating cancer using a telomerase vaccine Geron Corporation (Menlo Park, CA); The Regents of the University of Colorado (Boulder,CO) 2008 Geron Corporation (Menlo Park, CA) Immunotherapy US7056513 Telomerase Geron Corporation (Menlo Park, CA); The Regents of the University of Colorado (Boulder,CO) Geron Corporation (Menlo 2006 Park, CA) US7824849 Cellular telomerase vaccine and its use for treating cancer

Geron Corporation (Menlo Park, CA); The Regents of the University of 2010 Colorado (Boulder,CO) Geron Corporation (Menlo Park, CA) US7851591 Cancer immunotherapy and diagnosis using universal tumor associated antigens, including hTERT Dana Farber Cancer Institute, Inc. 2010 (Boston, MA) US7794723 Antigenic peptides derived from telomerase GemVax AS (Oslo, 2010 NO) US7750121 Antibody to telomerase reverse Transcriptive Geron Corporation (MenloPark, CA); The Regents of the University of Colorado 2010 (Boulder, CO)

5

Introduction. .

Anti - Telomerase Approach Patent No./ Patent Title / Patent Assignee Year US7045523 Combination comprising N-{5-[4-(4-methyl-piperazinomethyl)- benzoylamido] -2- methylphenyl}-4-(3-pyridyl)- 2-pyrimidine-amine and 2006 G-Quadruplex telomerase inhibitor (Basel, CH) stabilisation US7205311 Therapeutic acridone and compounds Cancer Research 2007 Technology Limited (London) US8101357 B2 Method for inhibiting telomerase reaction using an anionic 2012 phthalocyanine compound Panasonic Corporation (Osaka, JP)(26) US7429660 ATM inhibitors Kudos Pharmaceuticals Limited (Cambridge, 2008 GB) Synergistic US7642254 ATM inhibitors Kudos Pharmaceuticals Limited (Cambridge, 2010 action with GB) anti -telomerase US7049313 ATM inhibitors Kudos Pharmaceuticals Limited (Cambridge, 2006 drugs GB) US8389526 B2 3-heteroarylmethyl-imidazo[1,2-b]pyridazin-6-yl derivatives 2013 Novartis AG (Basel, CH)(27) Telomerase US7795416 Telomerase expression repressor proteins and methods of using repressor 2010 the same Sierra Sciences Inc. (Reno, NV) proteins US7897752 RNA interference mediated inhibition of telomerase gene expression using short interfering nucleic acid (siNA) Sirna Therapeutics, 2011 Inc. (San Francisco, CA) US7893248 RNA interference mediated inhibition of Myc and/or Myb gene expression using short interfering nucleic acid (siNA) Sirna Therapeutics, 2011 RNA Inc. (San Francisco, CA) interference US7659389 RNA interference mediated inhibition of MYC and/or MYB gene expression using short interfering nucleic acid (siNA) Sirna 2010 Therapeutics, Inc. (San Francisco, CA) US8148513 B2 hTERT gene expression regulatory gene National University 2012 Corporation Tottori University (Tottori, JP)(28) T-pocket or US8377992 B2 TRBD-binding effectors and methods for using the same to 2013 Fingers -Palm modulate telomerase activity The Wistar Institute (Philadelphia, PA) (29) pocket US 8234080 B2 Method for identifying a compound that modulates interacting 2012 telomerase activity The Wistar Institute (Philadelphia, PA) (30) compounds

Gene therapy is based on the principle of substituting a defective gene with a normal functional gene which can express in the cell or can interfere with the protein synthesis of a particular gene so as to provide a cure for a disease. This therapy aims to suppress cancer growth, to induce apoptosis in cancer cells and to inhibit the spread of malignancy to other tissues with the most important requisite of being highly specific to the tumour cells (1, 15-17). Small molecule inhibitors are other class of telomerase inhibitors that act by disrupting the anchoring of telomerase to the telomere, disrupting the telomerase holoenzyme assembly or G-quadruplex stabilization (1, 15-17). Table I.2 shows a list of clinical trials based on targeting telomerase (31).

6

Introduction. .

Table I.2. List of some ongoing human clinical trials using a variety of approaches to targeting telomerase . Data adapted from (31) and www.ClinicalTrials.gov. (Actualized February 2013).

Trial identifier/ Condition Sponsor /Purpouse / Phase (Status) Intervention Geron Corporation/ To determine the safety and maximum Chronic NCT00124189 / tolerated dose of GRN163L in treating patients with refractory Imetelstat lymphoproliferative or relapsed chronic lymphoproliferative disease. Phase I (GRN163L) diseases (ongoing) Geron C orporation/ To determine the rate of improvement in NCT01242930/ response of patients with previously treated multiple myeloma Imetelstat Multiple myeloma to imetelstat alone or in combination with lenalidomide (GRN163L) maintenance therapy. / Phase II (ongoing) Geron Corporation/ To evaluate the efficacy and safety of NCT01256762/ treatment with imetelstat + paclitaxel (with or without Imetelstat Locally recurrent or bevacizumab) versus paclitaxel (with or without bevacizumab) (GRN163L) + metastatic breast alone for patients with locally recurrent or metastatic breast bevacizumab + cancer cancer who have not received chemotherapy or have received paclitaxel one non -taxane based chemotherapy for metastatic breast cancer. / Phase II (completed) NCT01137968/ Geron Corporation/ To evaluate the efficacy and safety of Imetelstat Non -small cell lung imetelstat (GRN163L) as maintenance therapy for patients with (GRN163L) + cancer advanced stage NSCLC who have not progressed after 4 cycles bevacizumab of platinum based therapy./ Phase II (ongoing) NCT01265927/ Indiana University/ To evaluate safety and biologic effects of giving GRN163L in combination with trastuzumab in patients Imetelstat Breast neoplasms (GRN163L) + diagnosed with HER2+ metastatic breast cancer that is resistant trastuzumab to therapy with trastuzumab./ Phase I (ongoing) Geron Corporation / Study of single agent imetelstat in patients NCT01243073/ Essential with essential thrombocytopenia or with polycythemia vera who Imetelstat thrombocythemia have failed or are intolerant to at least one prior therapy, or who (GRN163L) refuse standard therapy./ Phase II (ongoing) NCT00021164/ Aldesleukin + incomplete National Cancer Institute (NCI)/ To study the effectiveness of Melanoma adult Freund’s vaccine therapy in treating patients who have metastatic adjuvant + solid tumor telomerase cancer./ Phase II ( Completed ) 540 –548 peptide vaccine Brain and central NCT00069940/ nervous system Dana -Farber Cancer Institute/To study the side effects of Telomerase 540 – tumors; vaccine therapy when given together with sargramostim in 548 peptide gastrointestinal treating patients with advanced sarcoma or brain tumor./ Phase I vaccine + sargramostim stromal tumor; (Completed) sarcoma Oslo University Hospital/ To examine the safety and efficacy of NCT00509457/ telomerase peptide vaccination ( stimulation of the immune Carcinoma, non- GV 1001 system) in patients with NSCLC after having been treated with telomerase small -cell lung peptide conventional therapy with radiotherapy and docetaxel as a radiosensitizer./ Study completed NCT01247623/ GV 1001 Oslo University Hospital /Determination of safety and telomerase Malignant peptide + tolerability of GV1001 administration combined with melanoma temozolomide Temozolomide/ Phase I, II (completed)

7

Introduction. .

Trial identifier/ Condition Sponsor /Purpouse / Phase (Status) Intervention NCT00061035/ Anti -telomerase Cosmo Bioscience / To test an experimental investigational gene transgenic therapy vaccine designed to make the patient's immune system lymphocyte Prostatic neoplasms react against telomerase, an enzyme expressed in prostate cancer immunization vaccine cells./ Phase I (completed) (TLI) Herlev Hospital/ To show if vaccination with autologous NCT00197912/ dendritic cells pulsed with peptides or tumor lysate in Tumor antigen Advanced combination with adjuvant cytokines and Cyclophosphamide loaded melanoma can induce a measurable immune response in patients with autologous dendritic cells metastatic malignant melanoma, and to evaluate the clinical effect of the vaccination regime./ Phase I, II (completed) NCT00925314/ Cosmo Bioscience/ To assess the safety, efficacy, and Anti -telomerase immunological response to the study product, TLI, as an transgenic Stage III melanoma adjuvant therapy in subjects with Stage III Melanoma./ Phase II lymphocyte immunization (ongoing) NCT00079157/ Incomplete University of Pennsylvania / To study the side effects and best Freund’s dose of vaccine therapy when given together with Montanide adjuvant + Breast cancer ISA -51 and sargramostim in treating patients with stage IV telomerase 540 – 548 peptide breast cancer./ Phase I (the information has not been verified vaccine + recently ) sargramostim NCT00425360/ Sargramostim + Ro yal Liverpool University Hospital/ To study gemcitabine, telomerase capecitabine, and vaccine therapy to see how well they work peptide vaccine Pancreatic cancer compared with gemcitabine and capecitabine alone in treating GV1001 + patients with locally advanced or metastatic pancreatic cancer./ Capecitabine + Phase III (the information has not been verified recently) gemcitabine NCT00573495/ University of Pennsylvania / To test the safety of the combination of agents and to find out what effects the treatment hTERT/survivin Breast cancer multi -peptide has on advanced breast cancer/ Phase I ( This study is vaccine currently recruiting participants) Inge Marie Svane / To show if vaccination with autologous NCT00197860/ dendritic cells pulsed with peptides or tumor lysate in Tumor antigen Advanced renal cell combination with adjuvant cytokines can induce a measurable loaded carcinoma immune response in patients with metastatic renal cell autologous dendritic cells carcinoma, and to evaluate the clinical effect of the vaccination regime./ Phase I, II (completed) Geron Corporation / To evaluate the safety, feasibility and NCT00510133/ Acute myelogenous GRNVAC1 efficacy of immunotherapy with GRNVAC1 in patients with leukemia AML./ Phase II (ongoing) University of Florida/ To develop a new and powerful type of NCT01153113/ Metastatic prostate immune therapy for prostate cancer patients./ Phase I, II GRNVAC1 cancer (withdrawn prior to enrollment)

One of the strategies used to indirectly achieve inhibition of the telomerase is the stabilization of telomeric G-quadruplex structures. This has been the strategy chosen by us in this research due to it being the mechanistic simplicity and ease of modeling. The first G-quadruplex-stabilizing ligands were developed to inhibit telomerase activity, and

8

Introduction. . most of them have been discovered based on this approach. Many compounds have been found that stabilize G-quadruplexes and inhibit the activity of the telomerase enzyme (Figure I.1) (32). As G-quadruplex structures are associated to different regions of the genome, such as the promoters of oncogene, a multi target approach may be an interesting strategy for the search for new compounds with potential antitumor activity.

Figure I.1. (A) Mechanism of telomerase inhibition by G-quadruplex formation. ( B) Structure of the G 4-tetrad in a guanine quadruplex . Image taken from (11).

Telomeres have been described as a “mitotic clock ”, as they shorten with each cell division in normal cells. Increased longevity involves an increase in the incidence of malignancies . Therefore, the study of the physiology of the telomere continues to be of medical interest as well as relevant in carcinogenesis and the antitumor therapy . Many tumor cells avoid replicative senescence via the activation of telomerase, but an alternative telomere maintenance mechanism called ALT involves recombination . In any of these mechanisms, the participation of guanine quadruplexes structures is potentially important. The differences in telomere length or accessibility between normal and malignant cells may provide specificity and a high therapeutic index for G- quadruplex ligands . Ligands that bind to telomeric G-quadruplexes offer the advantage

9

Introduction. . that they may also act in tumors mediated by alternative mechanisms, in contrast to inhibitors of the catalytic activity of telomerase (3, 33). Although therapeutic interventions targeted at telomeric guanine quadruplexes have been the most explored (34-36), G-quadruplex forming sequences are found in other contexts. More than 40% of human gene promoters contain one or more G4 motifs, and these motifs have a greater probability of being located in promoters of proto-oncogenes than in genes that suppress tumor growth. Consequently, G-quadruplex elements could act as molecular switches in conjunction with DNA binding proteins to control transcription and respond to changes in chromatin structure or signaling mechanisms. The presence of G tetrads in regulatory regions and promoters of oncogenes suggest that these structures will provide targets for antitumor interventions (37-41). A paradigm of stabilizing a G-quadruplex structure is the promoter of the oncogene c-myc . Members of the c-myc gene family are frequently overexpressed in cancer; the c- myc protein is involved in cell cycle regulation, angiogenesis, and cell adhesion during neoplasia. The product of this oncogene activates transcription of hTERT and controls a variety of genes that collectively increase the proliferative capacity of cells. Interestingly, hTERT and c-myc are frequently deregulated and overexpressed during malignant transformation (42). Studies have demonstrated that a guanine quadruplex in the NHE III1 region of the promoter of c-myc can act as a suppressor of transcription. The hTERT promoter region also contains sequences with potential to form G- quadruplexes; these sites are hypersensitive to DNase I in chromatin and do not form nucleosomes (40, 42, 43). Therefore, developing ligands that stabilize G-quadruplex sequences in c-myc and/or hTERT promoters has novel and promising therapeutic value. The search for molecules that bind G-quadruplex structures can be performed using techniques based on circular dichroism (CD), UV absorbance spectroscopy, NMR, SPR, Mass spectrometry and fluorescence (FRET) (44, 45). Once a ligand is identified, it is important to optimize the design based on an understanding of the correlation between the structure and mechanism of action, and to develop screening methods relevant to the mechanism that are reproducible, inexpensive, and highly efficient. The process of obtaining a drug is extremely long and expensive, and it has been estimated that by the conventional methods of "trial and error" there is a success rate of less than one per 10 000 substances that were active in initial testing (46).

10

Introduction. .

At the present time, the medical-pharmaceutical industry is under increasing pressure to find new effective drugs more quickly and efficiently than in the past. Computational drug design has become a powerful alternative to speed the development of new drug candidates (47). Quantitative structure –activity relationship (QSAR) studies model correlations between molecular structure and biological activity. This approach has been extensively used in the design of drugs with different activities (48-51). These studies may be considered as a more rational and economical alternative to one-by-one screening of compound libraries, as they try to model or predict specific biological activity based on parameters that characterize the molecular structure of the target and the potential ligand (52). When using linear discriminant analysis or multiple linear regression (for detailed information and definitions see chapter 2), a relationship is established between the dependent variable (activity) and molecular descriptors (independent variables) (52). It is then possible to analyze how the information coded in the form of molecular descriptors can potentiate or decrease desired activity. QSAR studies evaluate the quantitative relationship between molecular structure and biological activity, a paradigm of medicinal chemistry. Approximations such as Hansch and Free-Wilson analysis, linear free energy relationships (LFER), and linear solvation energy relationships (LSER) are methods developed between 1960 and 1980 and may be considered the beginning of modern QSAR studies (53-56). Definitions and important terms in QSAR modeling are included in the chapter 2 of this thesis. The combined application of rationale methodologies (computational and experimental) will allow a reduction in the use of laboratory animals, in the cost of developing new antitumor candidates, and increase our knowledge of this disease, to allow progress in drug discovery and development for treatment of diseases like cancer.

Given all these elements we are in conditions to formulate the scientific aims of this work. Cancer continues to be one of the leading health problems worldwide. The focus of the present work was the search for new compounds targeting guanine quadruplexes in different regions of the genome. Classical "trial and error" methods are expensive and almost impossible to implement in the massive search for new antitumor candidates that target multiple site in the genome. As a way to solve the scientific problem the following hypothesis is formulated: Practicing QSAR

11

Introduction. . best practices and its proper integration with structure based approaches provides a virtual screening strategy leading to the discovery of novel G-quadruplex ligands. These compounds are likely to result both in telomeric effects and in inhibition of transcription from oncogenic promoters.

The general scientific objectives of the present work are to contribute to the discovery and development of new compounds that stabilize G-quadruplexes, with possible antitumor action, by developing validated computational and experimental methodologies. Such assays should be simple and with high screening capacity, leading to more effective and safer potential candidates. Specifically I aimed to develop a battery of computational models for the prediction and design of potential telomerase inhibitors that act by the stabilization of the G-quadruplex structure and to validate the theoretical results with simple and inexpensive experimental methods with high capacity of screening. This research generated a system of mathematical models that can predict optimal structures of stabilizers of guanine quadruplex structures; this work should have a significant impact on the experimental study of compounds that stabilize G-quadruplex structures and on the selection of candidates for further development of pre-clinical studies. For this purpose, I propose the following specific objectives:

1. To develop and validate experimentally new computational models that will make possible the prediction ligand structures that will stabilize G-quadruplex structures and thus interfere with telomere maintenance.

2. To identify new ligands with potentiality to stabilize the guanine quadruplex. 3. To corroborate the computational predictions using experimental techniques that demonstrate the stabilizing capacity of the ligands on the telomeric G-quadruplex structure and other genome sequences with the potential to form this conformation.

I will then briefly describe the structure of this thesis. Scheme I.2 shows a graphical abstract about the structure, progress and relationship between the chapters.

12

Introduction. .

The first chapter displays the telomere, telomerase and Guanine quadruplex main characteristics. The main objective of this chapter is providing background information related to these structures. Topics as such as telomere and telomerase structure and functions, guanine quadruplex formation are discussed. We also present the relationships between telomeres and cancer, and how the first ligands developed were designed to get anticancer properties. Other genome regions (non telomeric) with possibilities to form G4 are presented. Chapter number two is dedicated to enunciating some concepts and parameters related with QSAR methodologies. These topics are crucial for the better understanding of the thesis. Details concerning to how a molecule can become in a “useful number” and how it can be used in the building of mathematical models in order to predict a determinate activity are provided. We add explanations about some statistical tools that allow check the developed models quality. As one of the objectives of this research is the identification and prediction of new G4 ligands with computational methodologies, chapter 3 it is based on a review paper that summarizes how several computational strategies have been used to this purpose. A brief description of the computational methods and ligands identified following this paradigm is given. Finally, some important characteristics of the ligands derived from SAR studies are discussed. Chapter 4 is composed of two articles that address predictions on telomerase enzyme activity by the G4 stabilization mechanism. We first check the possibility of obtaining reliable predictions using QSAR methodology based both on regression and classification approaches. There, we carried out models applied to acridines only (congeneric database of compounds) that use LDA and MLR (simple but robust techniques) for framing, enclosing the activity in a determinate value ranges. These approaches allowed us to reproduce the values found it in the experimental tests. At the same time it is possible to relate molecular descriptors with points that the experimental evidence has identified as good characteristics for the activity. In other words, in chapter 4 we demonstrate that very simple computational techniques may provide appropriated predictive tools. They can be used to obtain new SAR information concerning to this family of compounds (interpretation); we could reproduce, using our molecular descriptors, experimental observations that had been previously reported.

13

Nouveaux ligands de quadruplexes. Approches in silico et in vitro

Introduction

Chapter 1 Chapter 2

Scheme I.2 Scheme Telomeres, telomerase, and guanine Concepts and useful parameters for quadruplexes. Main concepts. in silico studies. .

Schematic representation of thesis structure. thesis structure. of representation Schematic Chapter 3 Chapter 4 Chapter 5 Chapter 6 Chapter 7 Chapter 8

Linear multivariate Database collection, linear models Curation of databases of techniques for evaluation of non-congeneric sets of ligands and use of non- Effects of L13 and M1 General of a congeneric set of compounds, virtual screening of linear techniques to develop on G-quadruplex Search for new discussion. compounds G-quadruplex FDA-approved compounds, and reliable models. Virtual structures that are compounds with G- Conclusions and stabilization as a experimental evidence for the G- screening of commercial hallmarks of cancer quadruplex binding mechanism to inhibit databases of compounds perspectives. activity using quadruplex stabilizing activity of cells telomerase. neuroleptic antipsychotic drugs. and biophysical evaluation. computational methods.

Predict the activity of unknown Predict the activity of unknown Review work´s Description, reproduction Relation of G4 ligands ligands ligands Discussion, and prediction the activity with hallmark of cancer Many linear models Too many non-linear models Conclussions of known ligands Experimental assays Virtual screening to >2400 Virtual screening to >500.000 One Linear Model Open possibilities for & ligands ligands Non experimental assays. future works… Perspectives Experimental assays Experimental assays

- Complexity, Model numbers, Screened ligands,1 G4 stabilization values + Introduction. .

In chapter 5, we spread out the analysis to a structurally diverse set. We also use LDA, but in a non-congeneric dataset of compounds. We set up a database of telomerase inhibitors, where the reported ligands have been described in the scientific literature as G4 stabilizers. Several LDA models and a consensus strategy for the prediction of future candidates are proposed. We perform a virtual screening on FDA- approved compounds database. This screening allows the identification of new G4 ligands with acceptable potency. From all the tested ligands chosen for experimental evaluation, we confirm G4 binding for six of them. Finally, evidences of the potential antitumor applications of any tested ligands with stabilizing activity are provided. The trend in the chapter 6 is different; we decided to apply best practices for QSAR model development (57) with structure based methods (docking) in an integrated non-linear classifying consensus virtual screening strategy. We use a non-linear strategy for modeling the data. Firstly we “cure” to the database, with the objective to remove harmful compounds for the modeling procedure, following the recommendations of Tropsha et al .: “ Trust, But verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research ” (58). Subsequently, a huge number a QSAR models are developed and we propose another consensus strategy for future candidates prediction. We perform a virtual screening on Asinex and Life Chemical Databases, two commercial databases of compounds, followed by a docking campaign to reduce the number of candidates. The ligands are tested as G4 stabilizing in human telomeric sequence and other oncogene sequences and two of them exhibit interesting stabilization values. The final result is the new scaffold identifications with higher potency values than the found in the previous chapter. Chapter 7 is the beginning of an unfinished work. This chapter opens perspectives for future studies. We determine the stabilizing properties of the compounds identified on chapter 6 in sequences considered as “hallmark of cancer”. With positives results in all the tested sequences, future steps are channeled to perform cellular assays with these ligands, and attempt to establish generalizations in order to relate G4 stabilization in cancer hallmark sequences and antitumor activity. It is an incipient but enthralling work. Chapter 8 concludes this thesis with a general discussion including perspectives.

15

Chapter 1. .

Chapter 1. Telomeres, telomerase, and guanine quadruplexes. Main concepts.

1.1 Formation of G-quadruplexes by DNA The DNA structure proposed by James Watson and Francis Crick in 1953 (59, 60) is a right-handed helix formed by two antiparallel strands with Hydrogen bonding between adenine (A) and thymine (T) and between guanine (G) and cytosine (C). This helical structure is stabilized by π stacking interactions between the bases. This DNA is called B form or B-DNA and is the canonical form mainly found in the cell. This form has also a high biological interest because it is the form presented by DNA when it interacts with most nuclear proteins. This B form structure is not the only conformation that DNA strands can adopt (61). Only four years after the discovery of the double-helix structure of DNA, G. Felsenfeld, D. Davis, and A. Ricos described a structure of a three-stranded complex that contained one polyribonucleotide (rA) chain and two poly (rU) chains (4). The formation of a non- canonical pair between 1-methylthymine and 9-methyladenine explains the formation of this triplex, which was demonstrated in 1959 by Karst Hoogsteen using X-ray diffraction (62, 63). In honor of his discovery, hydrogen bonds that are not formed on the Watson-Crick face of the bases are called Hoogsteen bonds. Another non-canonical pairing is observed in G-quartets formed by the association of four guanines. A network of eight Hydrogen bonds connects the guanines in a quartet (Figure 1.1 A). It has been found experimentally that the formation of these aggregates is favored by alkaline cations such as Na +, K +, and Rb +. These ions are chelated by oxygen atoms that belong to the O6 carbonyl groups of the four guanines in a G-quartet (64) (Figure. 1.1 A). This coordination reduces repulsions among the carbonyls to facilitate the stacking of the guanines bases. In contrast, cations such as Li + and Cs + have a limited ability to induce the formation of these aggregates, suggesting that the formation of G-quadruplexes depends on the size and the desolvation energy cost of the central ion. Wong and Wu ranked monovalent cations according to their ability to induce and stabilize the G4 structures of 5'-GMP (6): + + + + + + K > NH 4 > Rb > Na > Cs > Li

16

Chapter 1. .

Even if they are not connected by a phosphate backbone, guanosines are capable of forming helical structures up to 30 nm in length (65). This highlights the strong cooperation between the formation of Hydrogen bonds and dipole and π-π stacking interactions inherent to these molecular assemblies in the G-quadruplex. G- quadruplexes are composed of two or more tetrads, assembled in intermolecular or intramolecular arrangements (Figure. 1.1 B).

Figure 1.1. G-tetrads and G-quadruplexes. (A) Four guanine bases form a planar G-tetrad through Hoogsteen hydrogen bonding interactions. Dimensions of the quartet (66). (B) Polymorphic G- quadruplex structures (67).

1.2 Telomeres structure and function Telomeres are essential nucleoprotein structures located at the ends of linear chromosomes. In humans, telomeres are composed of several hundred repetitions of the sequence 5'-TTAGGG-3' (5-15 kilo base pairs [kbp]) that are associated with a protein complex termed shelterin (68-74). The shelterin complex protects chromosome ends from end-to-end fusions and degradation through stabilization of special T-loop-like structures that prevent the linear ends of chromosomes from being recognized as single-

17

Chapter 1. . or double-stranded DNA breaks (13). These complexes also are involved in organizing the chromosomes in the nucleus (75). The ends of linear chromosomes cannot be entirely replicated by the conventional DNA polymerase complex. Telomere replication requires a labile RNA primer to initiate DNA synthesis in the direction 5' to 3'. The number of TTAGGG telomeric repeats are reduced with each cell division due to the end replication problem, oxidative damage, and other end processing events (13). In the absence of a mechanism to compensate for this end replication problem, 50 to 100 base pairs of telomeric DNA are lost at each cell division. The final result of the successive telomere erosion in each cell division leads to the loss of their ability to protect the end of chromosomes.

1.3 Telomerase: Role in the cellular senescence and cancer Most eukaryotic species use a specialized reverse transcriptase, the telomerase, to replicate and regenerate telomeric DNA (14, 16, 76, 77). This enzyme is a DNA polymerase that carries its own RNA template for the DNA synthesis. The human telomerase, also known as telomere terminal transferase, uses its RNA component as a template to add GGTTAG repeats to telomeres to compensate for the loss of genomic information that occurs after successive rounds of DNA replication (78). The two essential components of telomerase are an RNA molecule named the telomerase RNA component (hTERC or hTR in humans) and a catalytic subunit called telomerase reverse transcriptase (hTERT in humans) (Figure 1.2). The RNA subunit TERC is used as a template by the TERT to synthesize a G-rich complementary DNA strand (10, 74, 76, 77, 79-81). Telomerase predominantly exists as a double dimer composed of two RNA subunits and two catalytic subunits (82-86).

18

Chapter 1. .

Figure 1.2. Components of Telomerase and other proteins associated to the complex . Image taken from (87).

The human telomerase RNA component (called hTERC or hTR) is 451 nucleotides long and contains approximately 1.5 copies of the telomeric repetition. This region of RNA acts as a template for the synthesis of telomere repeats (GGTTAG). Thus, telomerase acts as a reverse transcriptase that provides an active site for the DNA synthesis, dependent on RNA. In contrast to retroviral reverse-transcriptases, the catalytic subunit of telomerase copies only a small segment of its RNA partner (10, 88, 89).

19

Chapter 1. .

The mechanism of telomeric synthesis involves the recognition by the telomerase of the 3´ overhanging telomeric sequence that exists at the end of the chromosome. Elongation of the 3´ DNA end is then begun using as the template the RNA carried by telomerase. The RNA template has only 11 bases that match the TTAGGG tandem repeats; for this reason a single repetition is added at each elongation step. Telomerase can continue the telomeric synthesis on the same DNA strand by unwinding the DNA from the DNA-RNA hybrid holding the DNA end while the RNA slides six bases to allow proper alignment and base pairing. Coordination between C-strand and G-strand synthesis is necessary for correct telomere structure preservation (81). The single-stranded 3' overhang is protected by specific proteins. This overhang may be involved in T-loop formation, in which the end of the single strand is folded back and paired with its complementary portion in the telomeric double-stranded. The T-loop formation involves invasion of the single-stranded telomeric DNA into duplex DNA by a mechanism similar to the initiation of homologous genetic recombination. This loop may protect the chromosomes 3' ends, making it inaccessible to nucleases and to double-strand break repair enzymes. The ability of the telomerase to elongate telomeres is regulated by other factors. In mammals, the telomere-binding protein (87) (Telomeric Repeat Binding Factor-1) TRF1/Pin2, TRF2, TANK (TRF1-interacting, Ankyrin-related ADP-ribose polymerase , also known as Tankyrase), TIN2 (TRF1-Interacting Nuclear Factor-2), and heterogeneous nuclear ribonucleoproteins such as hnRNP A1 ( Heterogeneous Nuclear Ribonucleoprotein-A1 ), hnRNP A2/B1 ( Heterogeneous Nuclear Ribonucleoprotein- A2/B1 ) affect telomere maintenance (Figure 1.2) (90). TRF1 is a negative regulator of telomere length, whereas TRF2 plays an essential role in protecting telomeric integrity. The TRF1 complex interacts with POT1 (Protection Of Telomeres-1; a single-stranded telomeric DNA-binding protein) and controls telomerase-mediated telomere elongation. TRF2 assists in the formation of the T-loop and helps to maintain the secondary structure of the telomere. TRF2 also interacts with several proteins, including the human RAP-1 (Repressor Activator Protein-1)/TERF2IP ( Telomeric Repeat Binding Factor 2 Interacting Protein ) and the MRE11 (Meiotic Recombination-11 ) complex, composed of MRE11, Rad50, and the NBS1 (Nijmegen Breakage Syndrome-1) protein, which is implicated in the cellular response to agents that damage DNA. In addition to the MRE11 complex, the Ku complex, involved in certain types of DNA double-

20

Chapter 1. . stranded break repair, localizes to the telomere. Consequently, the physiologic maintenance of the telomere requires complex interactions among these proteins, telomeric DNA, and other cellular factors (79, 91, 92). A number of in vitro and in vivo experimental observations link telomerase with cancer, and these led in the mid-1990s to the conclusion that telomerase was involved in the acquisition of malignant phenotypes and was expressed in most tumors (93). The fact that telomerase activity was present in malignant tumors but not in the majority of normal cells (Figure 1.3) made this enzyme an attractive therapeutic target. The definitive destruction of malignant tumors by telomerase inhibition should have two important characteristics: broad antitumor spectrum and high specificity for cancer cells (14, 94-96).

Figure 1.3. Expression of telomerase activity in human cancers . The percentages in brackets refer to the percentages of tumors that overexpress telomerase activity relative to levels in normal tissue. Image taken from (97).

21

Chapter 1. .

1.4 Cellular senescence and telomerase Proliferative cellular senescence is defined as the limitation of proliferative capacity in normal cells that allows homeostatic maintenance in the human body (98-100). Essentially this mechanism constitutes a "mitotic clock" that counts the number of divisions that one cell may undergo. Cells lose their proliferative capacity after a critical number of divisions. The maximum capacity of division in a culture of human somatic cells from young donors is generally between 50 and 100 duplications. This limit (known as the Hayflick limit) decreases with donor age. When a normal cell exhaust its replication capacity, it acquires a senescent state associated with an irreversible block to further cell division (101). In this replicative senescence state, resulting from telomere shortening, the p53 gene is activated (102-104). Senescence is controlled in normal cells during the cell cycle by p16/Rb and p53/p21. The cell first enters a pre-senescence state, also called M1. If the function of p53 gene is altered or blocked (as occurs when the SV40 T antigen or E6/E7 proteins of papillomavirus are express), the cells continue to divide with a further gradual shortening of the telomeres until a state known as M2, or mortality stage 2, is reached. Telomeric shortening thus regulates cell entry into the M1 and M2 phases (97).

1.5 Role of telomerase in cellular immortalization In vivo evidence confirms the association between telomerase, telomeres, and replicative potential. The cells in normal somatic tissues do not show significant telomerase activity and experience a progressive telomere shortening; their replicative history is reflected by the lengths of their telomeres (12, 101, 105). In contrast, cells in germinal tissues and most immortalized human cell lines show telomerase activity and stable telomere lengths (68, 93, 106, 107). The lengths of telomeres vary across the different cancer cells, and the telomeric control mechanism is not yet completely defined in these cells. In many cells, the inhibition of telomerase does not have an immediate effect on cell proliferation; instead, telomere shortening occurs with each cell division (92, 108, 109). Telomerase expression alone does not stimulate the neoplastic transformation of a cell. Telomerase is present in many cancer cell lines, but its introduction into cells does not induce a change in cell phenotype. It has been suggested that at least six essential alterations are necessary for malignancy in all types of cancer: capacity for autonomous

22

Chapter 1. . growth, evasion of apoptosis, insensitivity to anti-proliferative signals, tissue invasion capacity, metastasis formation, and induction and maintenance of angiogenesis (8). This list was later increased to eight hallmarks to include reprogramming of energy metabolism and evading immune destruction (110). The relative rarity of human cancer is explained by the multitude of mechanisms that regulate proliferation and apoptosis (8). Based in the statistics from the National Cancer Institute (NCI) (111) Shay and coworkers state in the article “Telomerase in cancer and aging” that “The cancer rate is estimated to be of 400 cases per 100 000 individuals for all types, age, sex, and sites (average rate of new cases for year) ”. However, when viewed adjusted for age the rate rises sharply. For individuals over age 65 the estimated incidence is 2151 cases per 100 000 (97). More recent statistics can be found in the official web site of the NCI (112).

1.6 Genetic disorders, telomeres, and telomerase A number of diseases of genetic origin are associated with abnormalities in telomeres. Abnormally short telomeres have been found in patients with Hutchison – Gilford progeria (97) disease. Telomere shortening of the lymphocytes in patients with Down’s syndrome is three times faster than in normal subjects (97, 113). Skin fibroblasts of patients with Werner’s syndrome show similar characteristics. Patients with Werner’s syndrome exhibit premature aging, vascular diseases, diabetes mellitus, cataracts, skin degeneration, graying hair, testicular atrophy, and high risk of cancer with an average life expectancy of 47 years (97, 114-117). Normally there is heterogeneity in the clinical features of these diseases, but symptoms often include abnormalities in blood cell production, leading to aplastic anemia. This demonstrates the importance of cell proliferation to the development of blood cells. Other affected tissues are hair, nails, and skin. Telomerase deficiency may result from mutations in hTERC or in hTERT or dyskerin. These mutations show ‘‘anticipation’’ effects as symptoms become increasingly severe with successive generation as telomeres from a normal ancestor gradually erode. Moreover, levels of telomerase are limiting, so disease can develop in heterozygotes despite the presence of some functional telomerase (118). Mutations in telomerase are sometimes observed in patients with pulmonary fibrosis (119), and telomerase dysfunction and/or telomere shortening can play a role in the development of

23

Chapter 1. . bone marrow disorders, such as dyskeratosis congenita, aplastic anemia, α-thalassemia, and the lung disease idiopathic pulmonary fibrosis. Other studies have identified an association between mutations in hTERC and hTERT with familial myelodysplastic syndrome (17, 120), acute myeloid leukemia, hepatic nodular regenerative hyperplasia, and cirrhosis. Accelerated telomere attrition is a likely pathophysiology of cancer arising from chronic inflammation (121). Joined to TERC and TERT, other genetics structures that are critically involved in genomic stability and cellular functions have been recently identified. They are: Oligonucleotide/oligosaccharide-binding fold containing 1 (OBFC1), Nuclear assembly factor 1 (NAF1), Regulator of telomere elongation helicase 1 (RTEL1), Acylphosphatase 2, muscle type (ACYP2), Zinc finger protein 208 (ZNF208) (122). Conclusions in the aforementioned article suggest that proteins encoded by these genes are related with important known proteins in telomere biology and contribute to leukocyte telomere length (LTL) associations (LTL is associated with cancer and several age-associated diseases). The identification of these genes could be crucial for the better understanding of the role of the telomere length in aging related diseases.

1.7 Elongation of telomeres independently of telomerase A minority of immortalized cell lines maintain their telomeres by a telomerase- independent mechanism. These cells lack detectable levels of telomerase activity and have abnormally long telomeres (123-127). This mechanism is called alternative lengthening of telomeres (ALT) and may be caused by genetic recombination events. In fact, in about 77% of mesenchymal tumors this mechanism is active and telomere length is maintained through inter-and intra telomeric recombination. The signaling machinery required for the expression of hTERT is active in ALT cells, which may be a mechanism of resistance to therapies aimed at telomerase. This means that it must be determined whether compounds identified as hTERT repressors also suppress ALT (126, 128-130).

24

Chapter 1. .

1.8 Prevalence of guanine-rich sequences in genomic regions There are a number of non-telomeric sequences that have the ability to form G- quadruplex structures. Of interest are the rDNA and the regions encoding immunoglobulin heavy chain switch regions of higher vertebrates. It is also possible to find G-rich sequences within single-copy genes and in regulatory and promoter regions of important genes, specifically oncogenes such as c-myc , c-myb , c-Fos , c-Kit , KRAS , VEGF , PDGF-A, Rb , RET , Hif-1α , c-ABL , and hTERT . Repetitive sequences make up over half the human genome, and many minisatellite and microsatellite repeats are rich in guanine bases and potentially form G-quadruplex (118, 131-134). Eukaryotic ribosomal DNA is rich in guanine bases. The runs of G are concentrated on the non-template strand and abundant in both coding and spacer regions. The ribosomal DNA is heavily transcribed, increasing the likelihood of the G-quadruplex formation as DNA is denatured during transcription (118). In mammals, the immunoglobulin heavy chain switch (S) regions are guanine rich. The S regions are repetitions that contain reiterations of highly degenerate consensus sequences from 2 to 10 kb in length. These regions are critical for class switch recombination, a region-specific recombination process that joins an expressed variable region to a new constant region. During this recombination process numerous kb of DNA are deleted (118, 135). The human genome is full of repetitive regions, including both short micro-satellite repeats (unit repetitions smaller than 14 bp) and larger mini-satellite repeats. Many of these repetitive sequences display pronounced instability. Sequences rich in guanine are among the most unstable. Regions containing these repeats include the MS1 repeat (D1S7), D4S43, the insulin-linked hypervariable repeat, MS32, CEB1, CEB25, D1Z2, and MS205 (see table 1.1). In contrast to repetitions that are unstable as a result of replication slippage, these sequences do not have the possibility to form stable hairpin structures; however, they do have the potential to form G-quadruplex structures. The formation of stable G-quadruplexes has been confirmed in several guanine-rich repeat regions including D4S43, CEB1, CEB25, and the insulin-linked hypervariable repeat (118).

25

Chapter 1. .

Table 1.1. Some G-rich sequence motifs present in the genome .

Name Composition in bases Ref MS1 repeat D1S7 AGGGTGGAG (118) D4S43 GGGGAGGGGGAAGA (118) MS32 repeat (D1S8) GACTCAGAATGGAGCAGGCGGCCAGGGGT (136) CEB1 GGGGGGAGGGAGGGTGGCCTGCGGAGGTCCCTGGGCTGA (137) CEB25 AAGGGTGGGTGTAAGTGTGGGTGGGTGTGAGTGTGGGTGTGGAGGTAGATGT (137) D1Z2 CCTGGGGGTGCAGAGTGCTGTTCCAGGCTGTCAGAGGCTC (138) MS205 GGGACCCGGGCCGGGCCCCCGACGGGGTAAGCTAGGGGCGT (139) Insulin-linked ACAGGGGTGTGGGG (118) hypervariable repeat

1.9 G-quadruplex structures as therapeutic targets in telomerase-positive cell lines The indirect inhibition of telomerase by formation of G-quadruplex structures is mechanistically different from inhibitors of activity of the enzyme. With the stabilization of the telomere, the substrate of the elongation reaction of the enzyme is effectively blocked. Initially studies demonstrated that ions like potassium, with a positive charge, facilitate the folding of the G-rich telomeric strand into stable structures (140-142). In the absence of telomere length maintenance activity, telomeres become shorter at each cell division. It was suggested that when the telomeres reach a minimum length, a series of events associated with the activation of a replicative senescence program are initiated (105). Thus, if telomeric shortening can be induced in telomerase-positive cancer cells by stabilization of G-quadruplex structures the cancer cells may undergo senescence and possibly apoptosis. But if we consider this strategy as anticancer therapy we must ask ourselves a critical question: As germ and stem cells express significant levels of telomerase will telomere targeting agents have unacceptable toxicity to non-cancer cells? A possible answer is that the initial length of telomeres in these cells (germ and stem) is higher than in normal cells, so an acceptable therapeutic window may be found. Furthermore, it is assumed that G-quadruplex stabilizing ligands will cause minimal damage to most normal cells because telomerase levels are generally very low. In any case, the experimental comparison of toxicity against cancer cells or cell line versus normal cells may confirm these assumptions. The initial objective was to limit the growth of cancer cells by the progressive erosion of telomeres during cell division. This strategy was

26

Chapter 1. . initially considered very attractive, but did not materialize the hopes placed in it. The anti-proliferative effects of G-quadruplex stabilizing agents would depend on the initial length of telomeres and the therapeutic benefits would be delayed in relation to the start of treatment (14, 143-146). The first compounds discovered that interacted with G- quadruplex structures were not high affinity binders and were weakly selective relative to other structured forms that the DNA can adopt (147). More selective and potent molecules have since been obtained (11, 148-161). The 9-anilino proflavine derivatives were designed to optimize interactions with the G-quadruplex structure of human telomeres and minimize interactions with duplex DNA. These compounds inhibit telomerase activity with IC 50 values between 60 and 100 nM and are not toxic to cultured cells (151). Triazines have been shown to induce telomere shortening, and treatment of cells with these compounds results in the retardation of growth and in cellular senescence (11). The fluoroquinophenoxazines are redesigns of poisons that act on topoisomerase II; the new derivatives have more specific interactions with the G- quadruplexes and activity that is associated with the production of the anaphase bridges (148). These properties are also exhibited by the cationic porphyrins TMPyP4 and triazines (11, 162). Telomestatin is a natural product and is one of the most potent of characterized telomerase inhibitors (163). This compound produces the arrest and/or apoptosis of several different cell types and displays an interesting selectivity towards cancer cells when compared to their normal progenitor cells. Symmetrical 2-6 disubstituted anthraquinones have been subject of extensive investigations. They inhibit telomerase activity in the range of 10 μM (164). The activity of these compounds results from a planar area that facilitates hydrophobic interactions with the G-quadruplex terminal G-tetrad and flexible side chains with positively charged atoms that interact electrostatically with phosphates in DNA grooves or in the loops of the G-quadruplex structure. For optimal binding, positively charged groups on the side chains should be between 7.5 and 12.0 Å from the center plane, and the two methylene groups between the amido group and the charged nitrogen provide the best activity (164). Other factors affect inhibitory action such as the nature of the group at the end of the chain (piperidine or pyrrolidine groups are best), the size of this terminal group and the presence of positively charged species at the end of the chain. The perylenes are also able to interact with G-quadruplexes and are considered potent

27

Chapter 1. . telomerase inhibitors. The distance between the charged nitrogen is very similar to that of the anthraquinones (165).

Figure 1.4. Molecular structures of some G-quadruplex stabilizing agents .

The first assay to determine the activity of the telomerase enzyme, and its possible inhibition resulting from the formation of G4, was developed in the 90´s (166, 167). This assay known as TRAP (Telomeric Repeat Amplification Protocol) consists basically of two steps:

28

Chapter 1. .

-a telomerase elongation step during which telomerase is allowed to elongate an oligonucleotide substrate. -an amplification step, in which PCR is used to amplify the elongation product.

Bona fide telomerase inhibitors should act on the first step. The employment of IC 50 obtained through TRAP can be problematic, especially for non-congeneric series, where the evaluation conditions considerably differ between laboratories. Only 5 years after the first report of the technique, a large number of components has been identified and evaluated with this method. A work reported by then, on the year 2001, suggests that the TRAP technique can have as little as a 10% of variability depending on the conditions on which the trial is done (168). Other modifications to the technique were proposed over this time with the objective of improving it, as in the case of the G4 TRAP (169). In 2007 Mergny and coworkers published a paper where they demonstrated that inhibition determined by TRAP in the presence of G-quadruplex ligands was not only due to telomerase inhibition and concluded that TRAP is an inappropriate technique for the determination of telomerase inhibition by quadruplex ligands, even when PCR controls are included. TRAP values actually reflect the inhibition of PCR amplification of quadruplex prone repeats. This artefact was not identified initially, as classical PCR controls were unaffacted. As a result, the inhibitory potency of most G4 ligands has often been overestimated. Fortunately to us, these IC 50 are still meaningful, as they reflect G4 binding potency. To circumvent this problem, Neidle and coworkers proposed in 2008 an alternative version of TRAP, the "TRAP-Lig" that incorporates the addition of an intermediate step, enabling the ligand to be removed from the final PCR step after telomerase extension using a commercially available oligonucleotide purification kit (170). A more appropriate parameter to measure the stabilization of a quadruplex by a G4 ligand is the variation of the G4 melting temperature. This parameter can be obtained from the melting temperature difference of a G4 forming sequence with or without a stabilizer. When a G4 ligand is present, it displaces the equilibrium towards the folded form, and higher temperatures are required to unfold the structure. Nevertheless, for more QSAR modeling, this magnitude seems a bit impractical due to the lack of data. FRET is found between the most popular techniques to determine ΔTm. It generally offers accurate values. However, the trial is highly dependent on the experimental conditions such as ligand concentration, ionic strength and nature of the cation

29

Chapter 1. .

(generally sodium or potassium) (171). Generally, the conditions in which the technique is employed may be altered between one laboratory and another one, what makes more complex the gathering of data evaluated under the same experimental conditions. For this reason, regardless of the drawbacks presented by the TRAP, we decide to use this parameter to build the models, as there is a great number of data measured under these conditions. Furthermore, for the acridines (35, 172) perylene (173), indoloquinolines (174), benzoindoloquinolines (175), dibenzophenanthrolines (176), and ethidium derivatives (152) one can establish a direct correlation between G4-stabilisation, as measured in the FRET assay, and TRAP inhibition.

1.10 Comparison of different G-quadruplex targeted anticancer strategies In this section I summarize the potential advantages and challenges of targeting gene promoter G-quadruplex structures rather than attempting to develop drugs that inhibit telomeric G-quadruplex formation of the enzymes associated with formation of these structures (177).

Advantages of targeting gene promoter G-quadruplex structures:

Can target genes regardless of the druggability of the gene product. Lower likelihood of point mutations that impart resistance. Fewer copies of target, hence low concentration of inhibitor needed; this contrasts with the larger numbers (that is higher concentrations) of an overexpressing oncogenic protein or enzyme. Potential of unique sequence and three-dimensional structure for a given G- quadruplex meaning that selectivity may be achievable by design. A number of relevant oncogenes and kinases are established as clinically validated targets in cancer: for example, KIT and BRAF. Downregulation of expression of a target oncogene may be more important for tumor progression in a particular tumor type than telomerase activity is (for example, BRAF in some melanomas); therefore, inhibition of oncogene expression may allow selective cell killing. High-throughput functional assays are available for many oncogenes and kinase targets and are readily applicable to screen effects of promoter-targeting agents.

30

Chapter 1. .

Challenges of targeting gene promoter G-quadruplex structures:

High affinity and selectivity needed, yet potencies of state-of-the-art G- quadruplex ligands (generally micromolar) are considerably lower than that of typical enzyme inhibitors (often in the nanomolar range). Folding and structure can be altered by ligand binding, complicating target definition. Diversity in available small-molecule inhibitors is limited. Three-dimensional structures of very few promoter G-quadruplexes have been determined to date.

Advantages of targeting proteins:

Straightforward if target has a well-defined active site. Large specialized compound libraries are available (for example, for kinases). Structural data are available on many existing targets. Challenges of targeting proteins: Difficulties to inhibit protein-protein recognition. Active sites may change following ligand binding. Undruggable if target is unstable or unfolded. Advantages of targeting telomerase or telomeric G-quadruplex structures Telomerase is expressed in most human cancers and not in normal somatic cells, so there is a possibility of broad clinical activity and limited cytotoxicity. Challenges of targeting telomerase or telomeric G-quadruplex structures: Telomerase is still not a clinically validated target. High-throughput assays for telomerase inhibition by small molecules are not available. Telomere attrition resulting from telomerase inhibition is a slow process. Stem cells and germ cells express telomerase.

31

Chapter 2. .

Chapter 2. Concepts and useful parameters for in silico studies .

In this Chapter, we present the various notions and parameters used for in silico studies.

2.1 Molecular Descriptors A molecular descriptor (MD) is the final result of a logical and mathematical procedure that transforms chemical information into a useful number that is used to obtain structure-activity/property correlations (52). Molecular descriptors can be classified into two major groups: 1. Experimentally measured parameters : These experimentally determined parameters include the partition coefficient (log P), molar refractivity, dipole moment, polarizability, etc. (178). 2. Theoretical molecular descriptors : These parameters are derived from symbolic representations of the molecules and can also be classified based on the type of representation . 0D Descriptors : These are derived from the of the molecule and are independent to the molecular structure (179, 180). Examples are: the number of atoms (nAT), molecular weight (MW), number of bonds (nBT), number of Carbon atoms (nC), percentage of C atoms (C%), number of sp3-hybridized Carbon atoms (nCsp3), number of rotable bonds (RBN).

Table 2.1. Examples of 0D Molecular descriptors .

Chemical Formula/ Name nAT MW nBT nC C% nCsp3 RBN

C4H3Cl 3/ 10 157.4 9 4 40.0 2 4,4,4 -trichlorobut-1-yne

C5H11 NS/ 18 117.2 17 5 27.8 4 N-(propan-2-yl)ethanethioamide

32

Chapter 2. .

Chemical Formula/ Name nAT MW nBT nC C% nCsp3 RBN

C13 H10 O/ 24 182.2 25 13 54.2 0 diphenylmethanone

1D Descriptors : These are based on the representation of the sub-structural list type (181). The list can only be a partial list of fragments, functional groups, or substituents of interest present in the molecule, thus not requiring a complete knowledge of the molecule structure. Examples are: number of unsubstituted (nCbH), number of terminal primary C sp3 (nCp), Hydrogen attached to C1 (sp3)/C0 (sp2) (H-047), number of acceptor atoms for H-bonds (N,O,F) (nHAcc), number of total secondary C (sp3) (nCs).

Table 2.2. Examples of one-dimensional Molecular descriptors .

Name nCbH nCp H-047 nHAcc nCs 4,4,4-trichlorobut-1-yne 0 1 0 0 1 N-(propan-2-yl)ethanethioamide 0 3 1 1 1 Diphenylmethanone 10 0 10 1 0

2D Descriptors : These are based on the two-dimensional representation of the molecule. These descriptors include information about the connectivity between the atoms that form the molecule ( i.e ., the presence or absence of chemical bonds between these atoms) (182-184). We propose one example based on the representation of an “adjacency” matrix. The figure 2.1 shows the schematic representation of 2,2,3 - trimethylbutane molecule and its adjacency matrix associated. The adjacency matrix is one of possible drawings of the molecule and represents the whole set of connections between adjacent pairs of atoms (52). In the figure, δi it is the row sum of the vertex adjacency matrix.

33

Chapter 2. .

Figure 2.1. Schematic representation of 2,2,3-trimethylbutane molecule and its adjacency matrix associated .

Based on the adjacency matrix representation different kinds of molecular descriptors can be calculated. For example Zagreb index M 1 is a molecular descriptor proposed for Gutman and col (185). It is a topological index defined as: (Eq 2.1) ୅ ଶ ଵ ൌ σୟୀଵ Ɂୟ where, δi i t is the row sum of the vertex adjacency matrix

Table 2.3. Examples of two-dimensional Molecular descriptors . M1 and M2 are the first and second Zagreb index respectively. Molecular descriptor were calculated using Dragon Software (186).

Name M1 M2 4,4,4-trichlorobut-1-yne 28 26 N-(propan-2-yl)ethanethioamide 26 24 Diphenylmethanone 68 77 2,2,3-trimethylbutane 30 30

3D Descriptors : These are based on the three-dimensional representation of the molecule as a rigid object. This representation views a molecule as a rigid geometrical object in space and allows a representation not only of the nature and connectivity of the atoms, but also the overall spatial configuration of the molecule. An example of 3D Molecular descriptors is RDF ( Radial Distribution Function ) descriptors that are based on a radial distribution function, which can be interpreted as the probability distribution

34

Chapter 2. . of finding an atom in a spherical volume of radius R. The general form of the radial distribution function is represented by:

(Eq 2.2) మ ୅ିଵ ୅ ିஒήሺୖି୰౟ౠ ሻ ‰ሺሻ ൌ ˆ ή σ୧ୀଵ σ୨ୀ୧ାଵ ™୧ ή ™୨ ή ‡ where f is a scaling factor (assumed equal to one in the calculations), w indicates characteristic properties of the atoms i and j, rij is the interatomic distance and A is the number of atoms in the molecule The exponential term contains interatomic distances rij and the smoothing parameter β (Å –2), which defines the probability distribution of the individual interatomic distance; β can be interpreted as a temperature factor that defines the movement of atoms(187). Figure 2.2 provide an example of the 3 dimensional representation of a molecule and table 2.4 some molecular descriptors calculated using Dragon Software (186).

Figure 2.2. Representation of 4,4,4-trichlorobut-1-yne for the calculations of 3-dimensional molecular descriptors. (A) representation of a file .mol 2 with the special axes (x, y , z) and connectivity of the molecule. (B) Spatial representation of 4,4,4-trichlorobut-1-yne.

35

Chapter 2. .

Table 2.4. Examples of 3-dimensional Molecular descriptors. RDF Name 020u 025u 025m 030m 035m 080m 010v 015v 015e 020e 025e

4,4,4-trichlorobut-1-yne 2.33 1.51 1.53 3.07 1.08 0.00 0.39 0.93 0.93 2.20 1.52

N-(propan-2-yl)ethanethioamide 1.69 4.58 3.92 0.04 1.69 0.00 0.19 2.93 3.23 1.59 4.79

Diphenylmethanone 0.00 11.36 11.43 0.69 0.89 2.01 0.00 7.32 7.32 0.00 11.43

2,2,3-trimethylbutane 0.00 4.57 4.57 0.18 0.00 0.00 0.00 3.63 3.63 0.00 4.57 “u” represents (RDF descriptors unweighted), “m” (RDF descriptors weighted by mass), “v” (RDF descriptors weighted by van der Waals volume) and “e” (RDF descriptors weighted by Sanderson electronegativity).

4D Descriptors : These parameters are derived from the stereo-electronic representation of the molecule, are related to molecular properties, and depend on the interaction of the electron distribution with a probe that characterizes the environment (molecular interaction fields) (188, 189). We do not use this kind of molecular descriptors in this thesis. Examples of 4D molecular descriptors are those derived from GRID or CoMFA methods, Volsurf, etc.

Figure 2.3. Representation of a 4D molecular descriptors . The picture shows the interaction of the molecule with probes characterizing the surrounding space. Image taken from (190).

Molecular descriptors based on stereodynamic representations : This is a time- dependent representation that adds structural properties to the 3D representations, such as flexibility, conformational behavior, and transport properties. These representations give rise to so-called dynamic QSAR (191). We do not use this kind of molecular descriptors in this thesis.

36

Chapter 2. .

2.2 Chemometrics Chemometrics is a discipline that includes mathematical and statistical tools to deal with complex data in the field of chemistry (52). Strategies include a multivariate approach to the problem, the search of relevant information, the generation of models capable of prediction, the comparison of the results obtained by different methods, and the definition and use of indices capable of measuring the quality of the extracted information. Chemometrics is the most widely used tool in QSAR and QSPR studies. It provides a solid basis for analysis and modeling of data and offers a battery of different methods for this purpose. A key aspect of this discipline is a focus on the predictive power of the model, its complexity and quality. The following sections describe some important notions for chemometrics.

2.2.1 Cluster analysis Cluster analysis is a special type of exploratory analysis of data that is aimed at grouping similar objects. It is based on the similarity/diversity assessments that group all pairs of objects in a database (192).

2.2.2 Classification Classification is the assignment of grouping rules to an object in one or several classes. These classes are defined a priori based on objects in the training set. The objectives are to calculate the classification rule, define frontiers between the classes based on the objects of the training set, and apply this rule in the classification of new objects of an unknown class (193). Classification methods are widely used for the modeling of multiple responses such as active vs . inactive; low, medium, vs . highly toxic, and mutagenic vs . non-mutagenic. Among the most popular methods of classification are linear discriminant analysis (LDA) (194). This is a method widely used in QSAR studies. In this technique an equation of the following type is obtained:

Group = a + b1 × x1 + b2 × x2 +... + bm × xm (Eq 2.3)

where a is a constant, x1-xm are the molecular descriptors and b1-bm are the coefficients of the regression. Variables with the largest coefficients have more

37

Chapter 2. . influence on the analyzed property. When there are more than two groups, more than one discriminant function may be determined.

2.2.3 Parameters for measuring the quality of discriminant analysis Different methods can be used to evaluate either goodness of fit and goodness of prediction (195). Most of the information is extracted from the confusion matrix, where the rows represent the known true classes and the columns the classes assigned by the classification method. It is a non-symmetric matrix of size GxG, where G is the number of classes. For example, for a three-class problem (G = 3) (see scheme 2.1). where A,B, and C represent labels for the true classes and A’, B’, and C’ labels for the assigned classes. Ng represents the total number of objects belonging to the g th class and Ng’ the total number of objects assigned to the g th class. The diagonal elements Cgg represent the correctly classified objects, while the off-diagonal elements Cgg’, represent the objects erroneously classified from class g to class g’. Usually, two confusion matrices are obtained, one in the fitting and one after a validation procedure.

Scheme 2.1. Representation of the confusion matrix.

Class A’ B’ C’ Ng

A C11 C12 C13 Na

B C21 C22 C23 Nb

C C31 C32 C33 Nc Ng’ Na’ Nb ’ Nc’ N

From this matrix several parameters are defined: Percentage of good classification : The sum of the main diagonal of the confusion matrix can be used to determine the percentage of error like 100- NER%.

åCgg NER % = g x100 (Eq 2.4) N Sensitivity : Sensitivity is a parameter that characterizes the ability of a classifier to correctly identify objects of the gth class:

Cgg Sn g = x100 (Eq 2.5) N g

38

Chapter 2. .

Sensibility: Sensibility is a parameter that characterizes the ability of a classifier to identify only objects of the correct gth class within the calculated class. Sensibility is defined as:

Cgg Sp g = x100 (Eq 2.6) N g ' Other important parameters are not extracted from this matrix such as the Mahalanobis distance . For each group within the sample, a point called centroid is defined that represents the mean of all variables in the multidimensional space defined by the model. For each case we calculate the Mahalanobis distances with respect to the centroid of the group, classifying each point according to its proximity to the central value. It is also possible to calculate the distances between the centroids of each group to obtain a measure of the good classification of the model. The value of the Wilk statistic (λ) informs us about the variance explained by the model. In theory a value of zero tells us that there is perfect discrimination between groups, because all variance is explained by the model. In contrast, a value of one shows the opposite; the model does not explain in any of the variance of the data.

2.2.4 Regression analysis A number of statistical methods use a mathematical equation to model the relationship between a response variable and a set of predictor variables, usually using least squares. This approach has two objectives: to model and to predict. The relationship is described in algebraic form as:

y = f (x) + e (Eq 2.7)

where x denotes the predictor variable(s), y the response variable(s), f(x) is the systematic part of the model, and e is the random error, also called model error or residual. The mathematical equation used to describe the relationship among response and predictor variables is called the regression model (196).

39

Chapter 2. .

2.2.5 Regression parameters Regression parameters can be divided in two groups: those that measure the goodness of fit and those that measure the goodness of prediction (52). The first group measures how well a model fits the data of the training set (e.g ., how well a regression model or a classification model accounts for the variance of the response variable). These include: Residual Sum of Squares (RSS or error sum of square): This quantity is minimized by the least square estimator and measures the differences between observed and predicted values:

n 2 RSS = å()yi (obs ) - yi ( pred ) (Eq 2.8) i=1

Model Sum of Squares (MSS): MSS is the part of the total variance explained for the regression model and is defined as the sum of the squared differences between the estimated responses and the average response:

n 2 MSS = å()yi ( pred ) - y( prom ) (Eq 2.9) i=1

Total Sum of Squares: This parameter is defined as the sum of the squared differences between the experimental responses and the average response:

n 2 TSS = å()yi (obs ) - y( prom ) (Eq 2.10) i=1 Coefficient of determination (R 2): R 2 is a measure of total variance of the response explained by a regression model. A value of one indicates perfect fit ( i.e. , a model with a zero error term). It is defined as:

n 2 ()y (obs ) - y ( pred ) MSS RSS å i i R 2 = = 1- = 1- i=1 (Eq 2.11) TSS TSS n 2 å()yi (obs ) - y( prom ) i=1 A related quantity is the multiple correlation coefficient R defined as the square root of coefficient of determination. It is a measure of linear association between the observed response and the estimated response.

40

Chapter 2. .

Residual Mean Square, RMS or s 2 ( mean square error or expected squared error): The estimate s 2 of the error variance σ 2 is defined as: RSS s 2 = (Eq 2.12) n - p where n is the number of compounds in the model and p is the numbers of parameters. Standard Deviation Error in Calculation (SEC): SEC is a function of the residual sum of squares and is defined as :

RSS SEC = (Eq 2.13) n The Fischer ratio test (F-ratio test): This test is one of the best-known statistical tests. It is defined as the ratio between the model sum of squares MSS and the residual sum of squares RSS: MSS df F = m (Eq 2.14) RSS df e

The value obtained is compared with the critical value (F crit ) for the corresponding freedom degrees of the model (dfm) and the error (dfe). This is a comparison between the explained variance of the model and the residual variance. High values of F are obtained from more reliable models. There are also several methods for comparing models with different numbers of variables (p) and compounds (n) between those are found:

2 2 2 æ n -1 ö R adjusted R Adj = 1- ()1- R ×ç ÷ è n - p ø (Eq 2.15) 2 R × ()n - p -1 FITNESS FIT = ()()n + p 2 × 1- R 2 (Eq 2.16) RSS n Y 2 = × Exner´s statistic TSS n - p (Eq 2.17)

41

Chapter 2. .

2.2.6 Goodness of prediction Prediction quality measures the ability of the model to predict future data or how well the regression (or classification) model estimates the responses of variables responses based on a number of predictor variables. These parameters are obtained by validation techniques and are widely used as selection criteria during model development. Among all statistical parameters defined in the literature, for measure the goodness of predictions, we would like to highlight two of them, which are among the most employees. They are the Predictive Residual Sum of Squares (PRESS) and the cross-validated R 2. PRESS is the sum of squared differences between the observed and estimated response by validation techniques:

n 2 PRESS = å()yi (obs ) - y / ii ( pred ) (Eq 2.18) i=1

where y i/i denotes the response of the ith object estimated by using a model obtained without using the ith object. Use of validation techniques minimizes this quantity. 2 2 2 The cross-validated R (R CV or Q ) explains variance in prediction:

PRESS Q 2 = 1- (Eq 2.19) TSS where PRESS is calculated as above and TSS the total sum of squares.

2.3 Genetic algorithm (GA) for the selection of variables This variable selection method is based on the evolution of the model population. In this system, the binary vector I is called the "chromosome" (52, 197, 198). This is a p- dimensional vector where a position (gene) is 0 if the variable is not included and is 1 if it is included. Each chromosome represents a model with a sub-set of variables. After the parameters to optimize the system are defined, the size of the population (P) and the maximum number of the variables allowed in the model (L) are set; the minimum number of variables is 1. A recombination probability (Pc), which is usually high (> 0.9), and a probability of mutation, which is usually small (<0.1), are then defined. Once these parameters are defined, the evolution of the genetic algorithm occurs in several steps:

42

Chapter 2. .

1) Random Initialization of Population : The population is constructed with random models with variable values between 1 and L, and these models are sorted according to the pre-defined value to maximize. 2) Crosslinking : Model pairs with probability proportional to quality are selected randomly, common features are preserved, variables of one model are included in another model following random criteria defined by recombination frequency, and the new models obtained are tested. The worst model is extracted of the original population and the new model is included. This process is repeated several hundred times. 3) Mutation : For each model present in the population ( i.e. each chromosome), p random numbers are tried, and one at a time each is compared with the defined mutation probability p M: each gene remains unchanged if the corresponding random number exceeds the mutation probability, otherwise, it is changed from zero to one or vice versa. Low values of p M allow only a few mutations, thus obtaining new chromosomes is not too different from the generating chromosome. Once the mutated model is obtained, the statistical parameter for the model is calculated: if the parameter value is better than the worst value found in the population, the model is included in the population, in the place corresponding to its rank; otherwise, it is no longer considered. This process is repeated for all chromosomes. Steps 2 and 3 are repeated until the stop condition is satisfied or the process is stopped arbitrarily. Through this procedure acceptable models are obtained, and it is possible to do an assessment of the relationships with the variable response from different points of view. The main disadvantage is that the optimal model is sometimes not found. This technique has been widely studied and its advantages over regression and least squares fitting have been compared (199-201).

2.4 Validation techniques Validation is fundamental to assessing the validity of the model obtained (202). A necessary condition for the validity of a regression model is that the multiple correlation coefficient R2 is as close as possible to 1 and the standard error of the estimate s small. Frequently, these parameters are correlated with the predictive power of the models (203). Other of methods of validation are availables: Cross validation . Cross validation is the most commonly used technique. Multiple modified databases are created by removing one or more groups of objects,

43

Chapter 2. . removing each object at least once (204). For these new databases, models are computed, and the responses of the deleted objects are predicted from the model obtained. Also, previously defined parameters are calculated to determine the quality of prediction. Two techniques are used: Leave-One-Out (LOO) in which only one object at a time is deleted, and Leave-More-Out (LMO), in which the user defines a number of groups (from 2 to n) for cancellation, and each block is removed at least once. The LOO technique is only effective when the number of objects is small, because when databases are large, the small disturbances introduced by removal of only one compound cannot be detected. Training/prediction sets division : This technique of validation is based on division of the database into two sets, one for training and the other for prediction. The model is calculated for the training set and the predictive power is evaluated through analysis of the prediction set. The division is performed by randomly assigning objects to each set. As the results can be dependent on the selection process, the process should be repeated several hundred times, averaging the predictive capabilities (205). It can be performed only once if the separation was established by a well-defined criterion such as cluster analysis. Bootstrap : The original size of the data set (n) is preserved for the training set, by the selection of n objects with repetition; in this way the training set usually consists of repeated objects and the evaluation set of the objects left out. The model is calculated on the training set and responses are predicted on the evaluation set. All the squared differences between the true response and the predicted response of the objects of the evaluation set are collected in PRESS. This procedure of building training sets and evaluation sets is repeated thousands of time, PRESS are summed and the average predictive power is calculated (206). External validation : A test is performed with an external set to check the predictive power the model already tested with the prediction series. Y-Scrambling : This validation technique is adopted to check models with chance correlation, i.e. models where the independent variables are randomly correlated to the response variables. The test is performed by calculating the quality of the model (usually R2 or, better, Q2) randomly modifying the sequence of the response vector y, i.e. by assigning to each object a response randomly selected from the true responses. If the original model has no chance correlation, there is a significant difference in the

44

Chapter 2. . quality of the original model and that associated with a model obtained with random responses. The procedure is repeated several hundreds of times (207). Lateral validation : This is a technique which refers to the method of validating a new model, i.e. obtained from a new data set, by comparing it with other models previously obtained for the same response. The similarity of the regression coefficients and the equality of their signs support the reliability of the models. The new model can also be based on different descriptors, but with the same physical meaning (208). QUICK rule : A rule based on the multivariate K correlation index, which compares the multivariate correlation index Kxx of the X-block of the predictor variables with the multivariate correlation index KXY obtained by the augmented X- block matrix by adding the column of the response variable. Only regression models having multivariate correlation KXY greater than multivariate correlation KXX can fulfill the QUICK rule, a necessary condition for the model validity, i.e .KXY > KXX . This constraint is included in the maximization (or minimization) of some goodness of prediction statistic and prevents models with collinearity but without predictive power, i.e. chance correlation, from being taken into account (209). All these techniques are widely used in modern chemometrics (210-212). Leave-one- out cross validated R 2 (LOO q 2) is between the most popular and used criteria for evaluating the capacity of prediction of the QSAR model. This statistic has been widely employed like a criterion of both robustness and predictive ability of the models. Nevertheless the existence of LOO q 2 high value models is a necessary condition for a model to have a high predictive power, but it is not a sufficient condition. In order to guarantee a reliable and appropriate prediction of the QSAR model, it is necessary to evaluate the predictive power in an external set. Compounds belonging to the external set are those that were not used for the development of the model. An extensive analysis about this topic is presented by Tropsha in his articles “ Beware of q2!” (213, 214). Following the Tropsha analysis and recommendations, we consider that the external validation is an absolute requirement for the development of a truly predictive QSAR model and therefore the most useful of all validation techniques in QSAR modeling.

45

Chapter 2. .

2.5 Enrichment Analysis The main goal in a virtual screening effort is to select a subset from a large pool of compounds (typically a compound database or a virtual library) and try to maximize the number of known actives in this subset, i.e. , to select the most “enriched” subset as possible. Several enrichment metrics have been proposed in the literature to measure the enrichment ability of a VS protocol (215). In the chapter 6, we will use some of the most extended metrics. From the accumulation curve we can deduce enrichment from the area under this curve ( AUAC ) which is defined as: (Eq 2.20) ଵ ௡ ௜ number ൌ of ͳ active െ ௡ σ cases௜ୀଵ ݔ in the dataset and xi is the relative rank of ܥܣܷܣwhere n is the total

the i-th active in the ordered list when their corresponding rank ri is scaled to the total

number of cases ( N) in the dataset ( xi = ri/N ). So, AUAC can be interpreted as the probability that a positive case, selected from the empirical cumulative distribution function defined by the rank-ordered list, will be ranked before a case randomly selected from a uniform distribution (215). The Receiver Operating Characteristic (ROC) curve describes the sensitivity or TP rate for any possible change of the number of selected cases as a function of (1- Specificity) or FP rate (216). The area under the ROC curve ( ROC ) can be interpreted as the probability that a positive case will be ranked earlier than a negative one within a rank-ordered list (215). The ROC metric is defined as: (Eq 2.21) ஺௎஺஼ ோೌ where Ra = n/N, andܴܱܥ stands ൌ forோ೔ theെ ଶோ ratio೔ of active cases in the dataset, whereas Ri = N-n/N, and represents the ratio of inactive cases in the dataset. On the other hand, the enrichment factor ( EF ) takes into account the improvement of the hit rate by a VS protocol compared to a random selection. This metric has the advantage of answering the question: how enriched in active cases, the set of k cases that I select for screening will be, compared to the situation where I would just pick the k cases randomly?

(Eq 2.22) ௞శ ൗ௞ ௡ where k is the numberܧܨ ൌ of  casesൗே in the filtered fraction (χ) and k+ is the number of

46

Chapter 2. .

active cases retrieved at this fraction, being χ determined by the quotient between k and N (χ = k/N). The maximum value that EF can take is 1/χ if χ ≥ n/N, N/n if χ < n/N, and the minimum value is zero (215). However, the “early recognition” ability of a VS tool is encoded by just a few enrichment metrics such as the robust initial enhancement ( RIE ) and the Boltzmann- enhanced discrimination of ROC ( BEDROC ) metrics (215). The RIE metric describes how many times the distribution of the ranks for active cases caused by a VS protocol is better than a random rank distribution and is defined as:

(Eq 2.23) ೙ షഀೣ೔ σ೔సభ ௘ ೙ భష೐ഀ ܴܫܧ ൌ  ಿቆ ഀ ቇ The parameter α is used to assign೐ ൗಿషభ a higher weight (and so a higher contribution to the RIE metric) to actives ranked at the beginning than those at the end of the ordered list and can be interpreted as the fraction of the list where the weight is important. Specifically, in this work the RIE and also EF and BEDROC metrics were evaluated at χ = 1% / 5% / 10% / 20%, which corresponds to values of α = 160.9 / 32.2 / 16.1 / 8, respectively.

However, like EF , RIE depends on N, Ra and α, which hampers its use in datasets of different size and composition. The other limitation is that unlike ROC , RIE neither provides a probabilistic interpretation nor a measurement of the enrichment performance above all thresholds (216). In order to derive a new metric overcoming these limitations Truchon and Bayly proposed the BEDROC metric (215). (Eq 2.24) ோூாିோூா ೘೔೙ RIE min and RIE max areܤܧܦܴܱܥ obtained ൌ whenோூா ೘ೌೣ allିோூா the೘೔೙ active cases are at the beginning and at the end of the ordered list, respectively.

(Eq 2.25) ഀೃೌ ଵି௘ ഀ ܴܫܧ ௠௜௡ ൌ  ோೌሺଵି௘ ሻ

(Eq 2.26) షഀೃೌ ଵି௘ షഀ The BEDROCܴܫܧ metric௠௔௫ ൌ  isோೌ ሺ aଵି௘ generalizationሻ of the ROC metric that includes a decreasing exponential weighting function that adapts it for use in early recognition problems. This metric can be interpreted as the probability that an active ranked by a VS

47

Chapter 2. . protocol will be found before a case that would come from a hypothetical exponential probability distribution function with parameter α. Thus, BEDROC should be understood as a “virtual screening usefulness scale” (215).

48

Chapter 3. .

Chapter 3. Search for new compounds with G-quadruplex binding activity using computational methods .

Introduction Cancer as one of the most feared ills of modern society and much pharmaceutical and medical research is focused on development of therapies to treat cancer. Safe and efficient candidates with high specificity and broad spectrum antitumor activity are needed. Current therapies work through various mechanisms: enhancing anti-tumor immune responses by cytotoxic effectors (217), hormone therapy (218), protein-based therapeutics (219), mitochondria-based approaches (220), inhibition of protein kinases (221) such as receptor tyrosine kinase (222), and inhibition of histone deacetylases (223). Given the growth in use of computational methodologies for the discovery of new compounds with a specific activity, we took on the task of reviewing the use of these methods in the G-quadruplex field. The following paper is a review article summarizing the tools used for the study and prediction of activity and discussion some compounds identified with the use of these methodologies. It is clear from this review that, compared with other antitumor approaches, the study and computational prediction of guanine quadruplexes has not been sufficiently exploited.

3.1 Article 1. Computational tools in the discovery of new G-quadruplex ligands with potential anticancer activity. Current Topics in Medicinal Chemistry, 2012, 12, 2843-2856 Daimel Castillo-González, Gisselle Pérez-Machado, Federico Pallardó, Teresa María Garrigues- Pelufo and Miguel-Angel Cabrera-Pérez.

Summary This work seeks to highlight how computational methods have been used in the identification of G4 ligands with potential uses as anticancer agents. It starts with an exposition of the fundamental aspects required for the formation of G4 structures, describes how the formation happens, as well as the influence that the monovalent ions have over the cited formation. Afterwards, the stability requirements of these structures

49

Chapter 3. . under physiological conditions are highlighted. A discussion over the prevalence of these structures on the genome and their possible biological roles is included, acting these on the telomeric structure or on other oncogene promoter regions with capacity to form G4 structures. In another part of the work, the G4 structure characteristics on the c-myc and c-kit sequences and the relation that can be established between these regions that can adopt a form of G4 and cancer, is specified. After this, a strategy with different steps that have to be taken into consideration for the identification of new G4 compounds is described briefly. Also, the most important aspects of what constitutes the work of revision are described: the computational strategies on the discovery of new G4 compounds. A description on the general techniques for this purpose is given as well as the approximations based on ligand structure and the others based of target structure are defined and described. Finally some procedures, which employ specific techniques which have allowed to identify different G4 structures stabilizing compounds, are described.

50

Send Orders of Reprints at [email protected]

Current Topics in Medicinal Chemistry, 2012 , 12, 2843 -2 6 2843

Computational Tools in the Discovery of New G-Quadruplex Ligands with Potential Anticancer Activity

Daimel Castillo-González a,b,c,d,* , Gisselle Pérez-Machado a,b,e , Federico Pallardó e, Teresa-María Garrigues-Pelufo f and Miguel-Angel Cabrera-Pérez b,f aDepartment of Pharmacy, Central University of Las Villas, Santa Clara 54830, Villa Clara, Cuba; b Molecular Simula- tion and Drug Design Group, Centre of Chemical Bioactive, Central University of Las Villas, Santa Clara 54830, Villa Clara, Cuba; c Univ. Bordeaux, ARNA laboratory, IECB, F-33600 Pessac, France; d INSERM, U869, ARNA laboratory, F-33000 Bordeaux, France; e Department of Physiology, Faculty of Medicine. University of Valencia. Spain; fDepart- ment of Pharmacy and Pharmaceutical Technology. Faculty of Pharmacy. University of Valencia. Av VA Estellés, sn 46100. Burjassot, Spain

Abstract: Guanine-rich sequences found at telomeres and oncogenes have the capacity to form G-quadruplex (G4) struc- tures. It has been found a relationship between the ability to stabilizing G4 structures and anticancer activity. Guanine quadruplexes stabilization and its implication in cancer phenomena is a therapeutic target relatively recent. Computer- aided drug design has been a very useful tool for the search of new candidates. In last years, methodologies have im- proved with the development of the computational sciences. The hardware is also enhanced, new techniques are explored. NMR and X-ray information about different targets are discovered continually. The continuous augmentation of new powerful and comprehensive software’s with this purpose is other significant factor that contributes to the discovering of new compounds. Nevertheless computer-aided drug design has not been vastly employed in the design of new compound with G4 stabilization activity. All things considered, this review will be focused on the influence of computational tech- niques on speeding up the discovery of new G4 ligands. Keywords: Cancer, drug design, G-quadruplex, oncogenes.

INTRODUCTION nance. Several studies showed that the biological function of those particular DNA associations may depend on their ca- There has been much effort towards the development of pacity to form such structures. Blackburn spearheaded the new drugs and novel anticancer approaches; however cancer groundbreaking investigations in this field [11-14]. There is still a significant biologic and social problem. The com- plexity of its etiology and therapeutically management sup- has been a growing interest in these peculiar structures due port the continuous search of new chemotherapeutic agents to recent demonstrations of their existence in vivo and their [1-4]. DNA was the first defined target for anticancer drugs, implication in several relevant biological processes [15]. The but the interest in DNA-targeted drugs declined because of growing amount of literature about G4 DNA clearly demon- their high toxicity, generation of resistance and the emer- strates that such a structure is no longer viewed as just a bio- gence of new molecular targets [5]. Even so, they remained physical strangeness but an important target for the treatment included in the majority of anti-cancer regimens. The first of various human disorders [16]. glimpse of a new era for DNA-targeted therapeutics came The role of telomeric or oncogenic G4 as a target for through the realization of the role of telomeres and telom- pharmaceutical drug design has been reviewed comprehen- erase in cellular aging and cancer [6, 7]. Since then, research sively by the groups of Hurley, Neidle and Balasubramanian on telomeric Guanine-quadruplexes, and the cellular conse- [5, 13, 17-19]. This target can be considered attractive due to quences of their stabilization continues to be an active field their position at the start of the amplification cascade and for drug discovery. Furthermore, interest in the more general their low abundance in cells stimulating the discovery and therapeutic significance of G4 has expanded during the past development of G4-binding ligands that could modulate cel- decade to include G4 in gene promoters as targets [5, 8-10]. lular processes and halt cancer growth and progression [14, Interest in the structural arrangements of G4 was ignited 20, 21]. in the early 1990s by the identification of G-rich repetitive While the folding of G-quadruplexes by guanine-rich sequences, located at the end of chromosomes, and a protein sequences has been extensively characterized, the functions, with a reverse transcriptase activity involved in their mainte- the potential application of G-quadruplexes and the design of small molecules targeting G4 DNA are still an active area of *Address correspondence to this author at the Molecular Simulation and research. The strategy for successful drug discovery is rap- Drug Design Group, Centre of Chemical Bioactive, Central University of Las Villas, Santa Clara 54830, Villa Clara, Cuba; Tel: +53 42 281 473; idly changing and quadruplex therapeutics is an ideal exem- Fax: + 53 42 281130; E-mail: [email protected] plar for the future, hopefully with the successful translation

1873-5294/12 $58.00+.00 © 2012 Bentham Science Publishers 2844 Current Topics in Medicinal Chemistry , 2012, Vol . 12 , No . 24 Castillo-González et al. of academic discoveries into industry and the clinic. spect to their ability to stabilize G4 structures. However, a Thereby, nowadays the strategy of structure-based drug de- number of other cations are also known to promote G4 for- sign, the establishment of a structure-activity relationship, mation [22, 31]. The capability of inducing a folded G- and the computational and chemoinformatic techniques have quartet structure by metal ions has been proposed as K + > been routinely used for accelerating or streamlining the early Na + > Rb + > Li + or Cs + [32]. phases of pharmaceutical research [14, 20]. Meanwhile a bioinformatics search of the human genome for the simplest STABILITY UNDER PHYSIOLOGICAL CONDI- G4 forming motif exposed a vast number of sequences that TIONS can potentially fold into G4 structures, no similar genome- Quadruplexes may find applications in diverse biological wide search has yet been conducted for G4-forming motifs fields, in this sense it is essential that such structures are sta- located in gene promoter, enhancer, and silencer regions ble inside a cellular environment. However, their formation [22]. often requires non-physiological concentrations of divalent In view of the continued rapid growth of the subject and cations or relatively acidic conditions. Therefore, in absence the large number of existing publications in this quadruplex of stabilizing factors, their formation in the intracellular me- field we consider opportune to review several of these topics, dium is probably much more complicated. First, most G-rich and get an insight into where the quadruplex-targeted drug regions on the genome form a stable Watson–Crick duplex molecule design area is going to. throughout most of the cell cycle. Second, genomic DNA is bound to both nonspecific (e.g. structural histone H1) and FUNDAMENTALS OF QUADRUPLEX STRUCTURES specific (e.g. transcription factors and other enzymatic or regulatory proteins) proteins. Finally, although the in vitro The inclination of guanosine monophosphate (GMP) or formation of intermolecular G-quadruplexes requires high guanine-rich poly- and oligonucleotides to self-assemble has DNA concentrations, this might not be the case in vivo [13] . been recognized for several decades [15]. The resulting un- In contrast, most quadruplexes are stable under physio- usual physical properties have been observed for more than logical conditions. The intracellular ionic environment could 100 years [23]. Exactly in the 1910 Bang noted that these explain in some extend this behavior, the high concentration gel-like substances were readily formed in aqueous solution. of K + inside cells (140 mM) significantly promotes the for- The quadruplex field received a further impetus a few years mation of folded G-quartet structures within the nucleus [32, later when a biological role for these structures in eukaryotic 33]. Moreover, it has been found various G4-interactive pro- telomeres was proposed in 1988–89 [7, 24] and NMR/crystal teins, which stabilize, modify or resolve these nucleic acid structures started emerging for these novel quadruplex ar- structures providing circumstantial evidence for their physio- rangements [25]. logical relevance. These findings suggest that cells might Guanine quadruplex-forming sequences, known as quad- have mechanisms for both purpose, and most importantly, G- ruplexes or G quartet (the name tetraplex is occasionally also quadruplexes might be normal structures that are routinely used) have been defined as those containing a minimum of assembled and disassembled within cells [13]. four short G-tracts, associate in the following general se- quence: Go Xp Go Xp Go Xp Go where Go is any number G-QUARTET STRUCTURE PREVAILS IN THE GE- of guanines involved in tetrad formation of length o, and Xp NOME. POSSIBLE BIOLOGICAL ROLES is any nucleotide of length p involved in loop formation [26- 29]. Telomeric G4 and Telomerase: Targets in Anticancer Drug Design The four guanine bases form square co-planar array (the core of a quadruplex) where each guanine pairs with two As consequences of most of guanine-rich sequences in neighbors by Hoogsteen hydrogen bonding and each base is DNA are at the ends of chromosomes, their physiological both a hydrogen bond donor and hydrogen bond acceptor. role in telomeric regions has received particular attention. This arrangement has the loops positioned on the exterior, Since 1991 the human telomeric G4 entered the spotlight helping to hold the overall structure intact [14, 19, 30]. when a seminal report showed that the formation of the Loops can be diagonal, lateral or chain-reversal (also termed quadruplex structure inhibited the activity of telomerase propeller), and the presence of a particular loop type is de- [34]. pendent on the number of G-quartets comprising a quadru- Telomeres protect chromosomes from degradation and plex, on loop length and on sequence [19]. The backbone loss of essential genes and allow the cell to distinguish be- strands that support a tetrad also differ in orientations. Fur- tween double-strand breaks and natural chromosome ends. thermore, the G4 structures are polymorphic regarding the Due to these functions, they are essential for chromosomal G-tetrad core and the loops. This structural polymorphism stability and genomic integrity. Moreover, they provide sites depends strongly on sequences and experimental conditions for recombination events and transcriptional silencing and such as the nature of cations [12]. Consequently G4 stability appear to play a critical role in cellular aging and cancer [35- depends on the hydrogen bonding between guanines and the 37]. stacking of the hydrophobic G-quartets, as well as the pres- ence of a monovalent cation within the channel formed by Telomeres, nucleoprotein complexes located at the ends the stacked. of eukaryotic chromosomes, are repetitive non-coding DNA + + sequences together with an array of telomeric proteins. The Owing to their physiological importance, K and Na telomeric DNA typically consists of a double-stranded se- ions are the most extensively characterized cations with re- quence with a single-stranded 3´-end overhang. This afore- Computational Tools in the Discovery Current Topics in Medicinal Chemistry , 2012, Vol . 12 , No . 24 2845 mentioned extreme is typically 100–200 bases long and it is either by surgery or radiotherapy or after the use of conven- extensively associated with various proteins such as te- tional cytotoxic agents. lomeric-repeat-binding factor 1 (TRF1) and TRF2, as well as However, inhibition of telomerase can be achieved by with the telomere-capping protein called protection of te- stabilization of G4 DNA structures with a range of ligands lomeres 1 (Pot1) [25, 38]. Vertebrate telomeres have been [9, 48]. Furthermore, these ligands have the ability to pro- conserved over evolutionary time and are typified by the duce short- and long-term effects on cell viability. Replica- hexanucleotide repeat 5´-TTAGGG [39-41]. tive senescence is commonly observed after short time- In normal cells, telomeres progressively decrease in scales following exposure to ligands, whereas telomere length with each successive round of cell division because shortening is a long-term effect, dependent on telomere DNA replication machinery is unable to completely replicate length and doubling times [49, 50]. The concept of ligand the extreme end of linear DNA molecules owing to the so- stabilization and binding to quadruplexes has developed into called ‘end-replication effect’. As a consequence, telomere a more generalized approach, which is applicable in principle length in almost all middle-aged human tissues is approxi- to both telomerase-positive tumor cells and those showing mately half that of the newborn length [42]. Once a critical the ALT phenotype. Thus, the strategy that involve uncap- level of telomere shortening is reached (the Hayflick limit) ping of telomere ends, might well provide significant further cells enter an irreversible phase of growth arrest, termed rep- therapeutic advantages [41]. licative senescence, when they cease to divide and might A detailed knowledge of the intramolecular human te- then be directed to apoptotic cell death [43]. lomeric G4 structure and its interactions with small mole- In striking contrast, telomeres of cancer cells show very cules is critical for the rational design of new structure- different behavior, and their length is highly regulated. They specific DNA-binding ligands [18, 36]. do not shorten on replication, but remain constant in length G4 ligands were first evaluated as telomerase inhibitors in succeeding generations [37, 42]. The telomere erosion based on the initial paradigm of the lifespan control by te- problem has been bypassed through the activation of te- lomerase activity and telomere length, but today is known lomere maintenance systems, either by upregulating the ex- that the antiproliferative effect of G4 ligands is also inde- pression of the telomerase enzyme (in at least 80–85% of pendent of the presence of telomerase activity. Such evi- tumor cells) or through a telomerase-independent alternative dence comes from several studies with triazine derivatives lengthening of telomere (ALT) mechanism in 10–15% of the (12459, 115405), telomestatin and the pyridine dicarbox- remainder tumor cell lines [44-46]. amide derivatives, even suggesting that the direct target of Telomerase is a unique reverse transcriptase with the these ligands is rather the telomere than telomerase activity main function of elongating and maintaining telomeres. The [51]. ribonucleoprotein consists of two main components – the Other G-Rich Genomic Regions RNA component (TR) containing the antisense template for telomere synthesis and the catalytic protein, hTERT. The The G-rich regions capable of forming G4 DNA have role of the major protein subunit, hTERT, is to catalyze the been identified in a number of other genetic contexts, both at polymerization of nucleotides. Telomerase is expressed at the DNA or RNA level and their physiological roles as a varying levels in human cells and tissues. A high expression natural part of the genome have grabbed interest [52]. The level is found within the germ line, embryonic stem cells and rapid growth in the study of physiologically relevant quadru- cancer cells, but the expression is greatly down-regulated in plexes has been particular fuelled by the availability of rela- most adult tissues due to down regulation of expression of tively complete genomic datasets and bioinformatic tools. the catalytic subunit of telomerase. In cells that must prolif- These tools allow rapid discovery of sequences capable of erate, such as B and T lymphocytes, reactivation of telom- forming quadruplexes (putative quadruplex sequences, or erase accompanies cell activation [9, 33, 37]. PQS) in interesting genomic locations [53, 54]. Thus, telomeres represent a mechanism of control in the Putative G4 motifs have been located in human promoter lifespan of normal cells and their lengthening is associated regions and in the 5’ untranslated regions of human mRNAs with the immortalization required for the proliferation of suggesting that these formations may play a role in transcrip- cancer cells. This highly selective activity suggests its appli- tional regulation and thereby in gene expression [5, 22, 35, cation as a molecular marker in cancer diagnostics, as in 37, 55]. Because G-quadruplexes control gene expression, normal somatic tissue there is no measurable activity of te- these structures are potential targets in cancer therapy [21]. lomerase [47]. Telomeres and telomerase also represents These structures have been also demonstrated in vitro in promising targets for neoplasia therapeutic treatments. Inhi- fragile X syndrome nucleotide repeats [56], in HIV-1 RNA bition of telomerase can be achieved by stabilization of G4 sequences [57, 58], in the immunoglobulin switch region [6] DNA structures [45, 48]. and in several mammalian gene-promoter regions, such as c- The immediate outcome of telomerase inhibition is to myc , c-kit , K-ras , hTERT, etc. [20, 22, 32, 59]. The database halt further telomere extension, leading to their progressive Greglist, for e.g. listing G4 regulated genes such as ATM, shortening, until eventually the critically short length. So, for BAD, AKT1, LEPR, UCP1, APOE, DKK1, WT1, WEE1, tumor cells that have long telomeres the effect will be ob- WNT1 and CLOCK. Furthermore, also contain promoter served only after extensive lag time. This implies that antite- PQS in human microRNAs [53, 60]. Although RNA G- lomerase drugs via direct catalytic inhibition might be of quadruplexes are likely to be formed in vivo and are more benefit only in tumor cells that have short telomeres, and stable than the DNA G-quadruplexes [61] their structure is even then only as second-line therapy after tumor debunking, less well understood. 2846 Current Topics in Medicinal Chemistry , 2012, Vol . 12 , No . 24 Castillo-González et al.

Quadruplex Formation by the c-myc Promoter Both sequences have unique occurrence in the human ge- nome [74, 76]. At present, one fascinating gene, c-myc , provides tanta- lizing insights into the potential impact of G4 DNA forma- By contrast with c-myc , the c-kit proto-oncogene is of tion on genetic stability and gene function [35, 62]. The c- more restricted proven clinical relevance. However most of myc gene is a master controller oncogene nearly ubiquitous human gastrointestinal stromal tumors (GIST) are driven by in human cancers, it is over-expressed in over 80% of them activating mutations in the proto-oncogene kit [77], fortu- [63] and encodes a transcription factor that is a key regulator nately the kit protein also differ from c-myc in their possibil- of cell proliferation; therefore deregulated c-myc expression ity to be successfully targeted by a number of selective contributes to tumorigenesis in many different tissues. A small-molecule compounds (notably by Gleevec, which was variety of mechanisms contribute to deregulation of c-myc , rapidly approved for human clinical use in GIST patients including translocation, mutation, and amplification. On the following on from extraordinary responses in clinical trials) other hand c-myc to act as both a transcriptional activator [78]. The benzo[a]phenoxazines are G4 binding molecules and repressor, inducing genes involved in proliferation, such that modulate c-kit expression[79]. Constitutive activation of as CAD, CDC25A, ODC, and repressing genes involved in c-kit expression is probably the primary causative event in growth arrest, such as GADD45 . C-myc increases expression this traditionally hard-to-treat cancer [33]. In common with level of hTERT, hence telomerase activity, suggesting other the c-myc ligands, the c-kit ones described to date are struc- possible mechanism that would contribute to cell prolifera- turally diverse, although almost all contain a polycyclic aro- tion and tumorigenesis [64]. It also has been found that c- matic core platform. A recently study suggests that ligands myc plays a role in the apoptotic response [65]. that target and stabilize more than one type of quadruplex topologies (telomeric and c-kit ) could still display high anti- The c-myc gene contains three exons and three introns; cancer activity [78] which is a very interesting postulate for and c-myc expression can be driven by several upstream future researches. promoters [33]. The c-myc oncogene promoter sequence found in the nuclease hypersensitivity element III 1 (NHE Other Genomic Quadruplexes III 1) of sequence d(TG 4AG 3TG 4AG 3TG 4A2G2) containing six G runs is one such example that has been shown to form a Studies have also confirmed the presence of a G4- G4 structure. Precisely this element controls ~90 % of the forming sequence within the promoter of k-ras , which is also sensitive to a reduction in transcriptional activity induced by transcription of this gene[60, 66]. The c-myc NHE III 1 ele- ment can form two different intramolecular G4 structures the G4 interactive ligand TMPyP4. Promoter-G- (basket and chair) but only last one, seems to be biologically quadruplexes have now been reported for a number of other cancer-related genes for which there is in vitro biophysical relevant to cause transcription inhibition. data, such as c-myb, c-Fos, Rb, RET, VEGF, PDGF, Hif-1 a, Due to the important functions of c-myc the strategy to bcl-2 and RET and in some cases also preliminary chemical find new chemical entities able to selectively interfere with biology evidence in support of the ‘promoter-G4’ hypothesis c-myc expression has emerged [19, 32, 67, 68]. In addition, [19]. Targeting promoter G-quadruplexes with small mole- their inherently different molecular recognition properties to cules can be aiming for selectivity at the individual gene those associated with duplex DNA make it attractive mo- level, in which case the ligand must be able to discriminate lecular targets for the design of small molecules to selec- between different G4 structures [80]. tively interfere with oncogene expression [69]. Recently have been reported a connection between the Strategies Proposed to Find Out New G4 Agents expressions of the Werner syndrome gene (WRN), whose The main objective in the area of looking for new G4- loss of function has been implicated in a human progeroid binding ligands is to develop appropriate compounds for syndrome (WS), and the Myc oncoprotein. Myc overexpres- clinical evaluation and eventual licensing for human use. sion directly elevates transcription of the WRN gene, whose With this purpose, Neidle proposes a scheme for the design presence is required to avoid senescence during Myc prolif- and evaluation of G4 ligands [81]. The first of the steps is erative stimuli. This new link between Myc and WRN may related with molecular modeling based in crystallographic also identify new opportunities for the treatment of myc structure of quadruplexes, followed of medicinal and syn- overexpressing human tumors [70, 71]. thetic chemistry. The next phases are cell-free highthrough- The RecQ family of DNA helicases, which includes the put FRET assays and cell-based functional assays. Finally it Bloom's (BLM), Rothmund–Thompson (RTS or RECQ4) will be the experiments associated to long term cell expo- and Werner's (WRN) syndrome gene products, are appar- sure, senescence, DNA damage, apoptosis assays and phar- ently unique among cellular helicases in their ability to effi- macokinetics, tumor xenografts, in vivo functional assays. ciently disrupt G4 DNA [72, 73]. Due to this trisubstituted All the steps like a dynamical process. acridines, inhibiting helicases, have been proposed as anti- However, the structural information from X-ray crystal- tumor agents for the disruption of both telomerase-dependent lography or NMR, considered in the first step, has been only and telomerase-independent telomere maintenance [72]. rarely used for the rational design of improved quadruplex- binding ligands. This is given fundamentally by the non- Quadruplexes in the c-kit Promoter existence of appropriate structural starting points. Most of the structure-based design studies have employed native G4 Two distinct G4-forming sequence motifs were identified structures as starting points [81]. Only a few number of G4– in the core promoter of the human c-kit oncogenes [74, 75]. ligand 3-D structures have been reported to date [82-88]. Computational Tools in the Discovery Current Topics in Medicinal Chemistry , 2012, Vol . 12 , No . 24 2847

Random screening of large chemical libraries has been major stages, which are the examination of the binding inter- very rarely used up till now in the search for new G4 binding action of the compounds with the target followed by a score ligands, in striking contrast with their extensive use in other assignment reflecting the binding energy of the ligand-target therapeutic areas. Among the main advantages of in silico complex [14, 89, 91]. However the scoring and ranking of virtual screening is its speed, enrichment rates, affordability, the highest poses of every single ligand in the binding pocket the evasion of laborious, slow and costly process of experi- of the target is one of the most controversial aspects of dock- mental screening and library synthesis/collection [89]. An- ing [148]. other advantage of virtual screening of drug-like compounds If we compare pharmacophore modeling and three- is that chemical diversity is generated without the need for dimensional molecular docking, the last one is considered chemical synthesis. The confirmed hits, identified in a more complex and computationally expensive. However, it screening could be used to guide further synthesis and quan- is necessary to mention the increment of the popularity of titative structure–activity relationship analysis. This method- molecular docking methods for the identification of new ology uses software to screen a huge number of compounds active ligands [140], with continuous accessibility of high- and to determine how the compounds can fit into an active resolution crystal structures and rapidly decreasing cost of site or inhibit a target of interest [90]. computational power in last years. Compared to pharma- cophore modeling, molecular docking offers several distinct Computational Tools Most Used for Discover- advantages. Firstly, the binding mode of the compound with ing/Developing and Virtual Screening of New Com- the target can be predicted, allowing the important features pounds of the ligand target interaction to be identified. Secondly, There are two computational approaches to perform drug molecular docking can uncover bioactive compounds with design; those based on ligand structure and the others based entirely different chemical scaffolds from reported ligands, of target structure. Ligand-based methods require the knowl- and can discover ligands for novel biomolecular targets [14]. edge of ligand structure and its biological activity. The struc- ture of the active compounds (ligands) is compared with Computational Tools in G-Quadruplex Drug Discovery others to discovery chemical and morphological parallelism, based in the paradigm that similar compounds show similar Computational tools have been frequently applied to the biological activities [91]. Among the ligand-based practices, field of G-quartets for predicting intramolecular G- classic Structure-Activity Relationships (SAR), Quantitative- quadruplexes in nucleotide sequences [28, 61] , and to de- Structure-Activity Relationships (QSAR) and pharma- scribe possible interactions of small ligands to the different cophore model are the most relevant applied to this field. G4 sequences [149-151]. For example, Wang et al. [152] SAR is an approach designed to find relationships between employed qualitative molecular modeling using AutoDock chemical structure (or structural-related properties) and bio- (v.4.0) software and G4 structures available from the Protein logical activity (or target property) of studied compounds. Data Bank to explain the interaction mode between azacalix- Several learning machine methods have been supported SAR arene, methylazacalix [6] pyridine (MACP6) and G4 struc- studies to unravel structural-activity relation in SAR studies tures. Dhamodharan et al. designed a class of 1,8- [92-137]. QSAR models consist in a function established naphthyridine based ligands for determining their interac- between physical, biological, biophysical, o chemical prop- tion with human telomeric DNA and promoter G-quadruplex erties and molecular descriptors [138]. The resultant function forming DNAs ( c-kit1 , c-kit2 , and c-myc ). The results of could be fit to a linear or nonlinear equation and it is express molecular dynamics simulations revealed indicate that the like a mathematical combination of molecular descriptors. naphthyridine-based ligands with quinolinium and pyridi- Usually this kind of models requires for their construction a nium side chains form a promising class of quadruplex DNA training set of compounds with activities determined against stabilizing agents having high selectivity for quadruplex the specific target under study [14]. In the case of models DNA structures over duplex DNA structures [153]. based in pharmacophores, the philosophy of the approach is There are some reports related to the general scheme for supported for the accumulation of specific structural and G4 drug discovery (from virtual screening until experimental conformational features about the ligands and how these evaluation) [32, 154-157]. Particularly, it has grabbed the characteristics could be used to explain the interaction with a attention the telomeric G-quadruplex as target for the inhibi- specific target receptor [139]. In addition pharmacophore tion of telomerase through its stabilization. One of the first modeling can be sub-divided into two categories, structure- attempts to discovery new candidate telomerase inhibitors based and ligand-based approaches [140]. Structure-based employing computational tools was reported in 1999 [158]. pharmacophore modeling, demands the initial building of a The authors identified berberine-like compounds (MKT077 three dimensional molecular model from available structural and FJ5002 rhodacyanine derivatives), as potent telomerase data, and this class can simultaneously be subdivided into inhibitors but they had no effect on the G-quartet structure macromolecule ligand-complex-based and macromolecule [159-162]. (without ligand)-based approaches [141]. Structured-based methods have been effective in the dis- Within the structure-based methods the molecular dock- covery of new G4 ligand, especially docking studies have ing is one of the most powerful approaches to discover new played an important role as a virtual screening method. small compounds inhibitors and to explain the binding mode with the receptor [142-147]. For this type of modeling is The first compound identified in a virtual screening em- mandatory to know the target structure either the protein or ploying structured-based method was (E)-N'-((1H-indol-3- DNA sequence. The docking process usually involves two yl)methylene)-4-methyl-3-phenyl-1H-pyrazole-5- 2848 Current Topics in Medicinal Chemistry , 2012, Vol . 12 , No . 24 Castillo-González et al. carbohydrazide ( 1, Fig 1 ) [149]. The X-ray crystal structure ligand-based studies to provide useful insights in the design of the intramolecular human telomeric G4 DNA (PDB code: and discovery of new G4 ligands [17, 80, 159, 176-183]. 1KF1) [163] allowed the virtual screening of a library of H over 100,000 compounds (through molecular docking using N ICM method (Molsoft) [164]. They calculated the binding energy of the complex, and obtained values of -38.46 C N Kcal/mol. The result suggests that the ligand binds strongly HN to the parallel intramolecular G-quadruplex. They predict N the stacked in on the ends of the G-quadruplex, close to the HN 3´-terminal face of the structure. The ability of the com- 1 pound for stabilizing the G-quadruplex structure was proved O using FRET assay. Br N Br In addition millions of compounds belonging to ZINC database [165] were screened in silico with the use of + Surflex-Dock software for selective antiparallel Hybrid-1 N S N quadruplex target (PDB ID: 2HY9) [166], which is one of 2 the proposed quadruplex structures that exists in the human + telomere. The relative binding of the small ligands to the site OH of interest compared to the binding to other sites was used to O O H H OTf - N N determine the best hits [167]. Finally, 160 compounds were N N selected for additional experimental evaluation but only one N Pt went through to next screening phase. Nevertheless the O OH O authors concluded that in silico and in vitro platform is pro- 5 Cl ficient to select a compound with G4 binding activity and 3 possible anti-cancer activity. As we seen above the discovery and development of a new drug is a difficult and expensive process, due to this (S) commercial drugs are considered privileged scaffolds for the (S) O (S) development of new therapeutics [168-170]. Ma et al. [171] O S (S) (S) reported FDA-approved drug as a c-myc G- (S) O O quadruplex DNA stabilizer. Around 3000 compounds be- N N N longing to FDA-approved drugs database were docked using H H ICM method (ICM-Pro 3.6-1d molecular docking software 4 (Molsoft)) [149, 164]. The target under study was c-myc Fig. (1). Molecular structure of Compounds identified like G4 sta- NHE III 1 element. Three out of 50 methylene blue deriva- bilizing, employing structure-based methods. tives were selected on the base of c-myc G4 binding ability (Fig 1 ). The anthraquinones were the first group of compounds to Fonsecin B1 is a naphthopyrone pigment obtained from a be shown to have G4 affinity coupled with telomerase in- fungal source ( 3, Fig 1 ). It was discovered as stabilizer of c- hibitory activity [17, 184, 185]. Later on acridines were de- myc G4 DNA [157] by the same molecular docking screen- veloped [186] as approximate structural mimetics of an- ing method from a database made up of 20 000 natural prod- thraquinones. The majority of quadruplex-binding ligands ucts. Similarly, the carbamide 1 was found ( 4, Fig 1 ) a prom- reported to date have been discovered by screening small ising c-myc G4 ligand [172]. focused compound libraries that at the outset have at least a superficial similarity to existing ligands of this class [184]. Randazzo et al. [154] reported other virtual screening study employing G4 structure [d(TGGGGT)] 4 Numerous SAR studies have been reported for several (PDBcode1S45) [173] and the docking software Autodock4 classes of small molecules that bind to telomeric G4 DNA (AD4) [174, 175] with the purposes to find new DNA quad- and inhibit telomerase activity such as porphyrins [187, 188], ruplex groove binders compounds. Thirty compounds were perylenes [189], amidoanthracene-9,10-diones[176], 2,7- considered for further NMR evaluation due to their ability to disubstituted amidofluorenones [190], acridines [179, 191], form H-bonds with any of the guanine bases and/or to estab- ethidium derivatives [192, 193], disubtituted triazines [64], lish an electrostatic interaction with the backbone phosphate fluoroquinoanthroxazines [156], indoloquinolines [194], groups were. Finally, six compounds out of them showed dibenzophenanthrolines [195], bisquinacridines [196], penta- evidence that suggest a groove binding interaction. The best cyclic acridinium [197], telomestatin [198, 199] 2,6-pyridin- compound appeared to span the grooves and produced large dicarboxamide derivatives [200, 201] and more recently a changes in the NMR spectrum of the G4, indicating high series of tri- and tetra-substituted naphthalene diimides [202] affinity. and heterocyclic dications have been showed exceptional Structural-based approaches have not exclusively applied affinity for telomeric G4 DNA [203]. to the discovery of new G4 ligands; several ligand-based On the other hand triazine derivatives (12459, 115405), methods represented by SAR and QSAR studies been ap- telomestatin and the pyridine dicarboxamide derivatives rep- plied to this aim. SAR approaches were pioneers among the resent an excellent starting point for further SAR analysis for Computational Tools in the Discovery Current Topics in Medicinal Chemistry , 2012, Vol . 12 , No . 24 2849 diverse modes of G4 recognition and subsequent structure BYL [224]. In the work hydrophobic, H-bond acceptor and optimization for drug development. H-bond donor maps were built where favorable and disfa- vored regions for locating different functional groups and Recently a QSAR model to predict the IC value of te- 50 interactions were identified. Finally, the authors concluded lomerase inhibitors by G4 stabilization mechanism was pub- lished for Cabrera-Pérez et al. [204]. The model employed that the CoMSIA model generated by a combination of hy- drophobic, hydrogen bond acceptor and hydrogen bond do- acridines developed by Neidle et al . [186, 191, 195, 205- nor fields (HAD) presented good correlative and predictive 211]. This model is based on multiple linear regression properties. The presence of hydrophobic fields in CoMSIA (MLR), considering all the acridines acting by a G4 stabili- models suggested that an optimum value of lipophilicity zation mechanism. The variable selection was carried out could influence the telomerase inhibitory activity of an- using Genetic Algorithm-Variable Subset Selection (GA- VSS) method implemented in MobyDigs [212]. We used thraquinone and acridone derivatives. internal validation, specifically leave-one-out cross- Huang et al. [225] reported the development of a 3D- validation, bootstrapping and Y-scrambling methodologies QSAR ligand-based pharmacophore model based on the to demonstrate the robustness and predictive ability of the acridine derivatives. Pharmacophore Generation module/ models. Finally we reported a model, employing GETA- Discovery Studio (version 2.5, Accelrys Inc., San Diego, WAY molecular descriptors, capable of predicting if a new CA) was used for building the pharmacophore model. The acridine would have or not have activity against telomerase first step was to create different hypotheses, based on the enzyme by the G4 stabilization mechanism. activity values of the training set molecules. For this aim HypoGen was used and 10 top-scored hypotheses were kept. We also proposed another QSAR strategy, where Linear To denote the reliability of the model three statistical pa- Discriminant Analysis (LDA) was used [213]. The starting rameters were used: null cost, fixed cost and total cost val- point was a dataset of acridines and acridones [186, 191, 205-207, 209], the corroboration of the predictive capacity of ues. The first hypothesis showed that, for obtaining an active molecule, the presence of one hydrogen bond donor, two the model developed was done using an external set of positive ionizable sites and one hydrophobic group is neces- molecules [214]. This paper concluded that the activity is sary. All training set molecules used in this study were stimulated if the number of secondary aromatic amines acridine-based structures; nevertheless, the activities of these (nArNHr) and number of aromatic carbons (nCar) increase. compounds were significantly different. For this reason the On the other hand, the increment of unsubstituted benzene atoms (C(sp2)) is a factor against the potency. In both pa- structures of the different acridine were scanned again and contributions of diverse substituent groups and the aromatic pers, molecular descriptors implemented in DRAGON soft- ring were not considered important features in the ware were used [215]. It is important to mention that the process of building the hypothesis. The validation of the Hy- analysis of how these descriptors (geometrical or one- pothesis was done by means of Fischer’s randomization dimensional) could be connected to the activity of this kind method, Receiver Operating Characteristic (ROC) curve of compounds in agreement with other reported SAR studies about acridines [216]. analysis and enrichment factor measurements [226]. Then, the hypothesis was used in a virtual screening of an in-house Acridines have been the subject of many studies; Yadav database of more than 5000 natural products. Filters were et al. [217] reported a 3D-QSAR study applied to trisubsti- applied to the molecules identified initially in the pharma- tuides acridines using Comparative Molecular Field Analysis cophore model, according to molecular weight and syntheti- (CoMFA) [218] and Comparative Molecular Similarity In- cally viability, to reduce the number of candidates. Four tri- dices Analysis (CoMSIA) [219]. The intention of this study aryl-substituted imidazole derivatives were identified and a was to expose the structural requirements for acridine deriva- fast experimental screening with FRET assay was carried tives, which would eventually contribute and accompani- out. Only molecule 1-(4-(2,5-bis[4-(4-methylpiperazin-1- ment the rational drug-design efforts. The molecular descrip- yl)phenyl]-1H-imidazol-4-yl)phenyl)-4-methylpiperazine tors available in the SYBYL package were used. CoMSIA TSIZ01 (Fig. 2 ), was found to bind and stabilize telomeric models generated by a combination of steric, electrostatic, G4 DNA significantly. Other studies were done to prove the hydrophobic, hydrogen bond donor fields exhibited good interaction between TSIZ01 and telomeric G4 by DNA, CD, correlative and predictive properties. Finally, after a detailed FRET and SPR experiments [225] FRET experiments with study of CoMFA and CoMSIA for acridines, the authors the use of human telomeric sequence F21T, exhibited an systematized a number of characteristics related directly with increment in melting temperature of the 23.5 °C at 1 M in the increase of the potency of this kind of compounds. The K+ solution. importance of a steric bulk attached to the ninth position of the acridine ring, the possibility to form hydrogen bonds, the Recently we described describes a non-congeneric data- base of 783 ligand inhibitors of the telomerase enzyme by number of protonated nitrogen atoms and the maintenance of stabilization of the G4 structure [227]. To our knowledge the bulkiness of the molecule are the most important factors. this is the first database that reports a summary of com- The use of anthraquinones as G4 stabilizers and inhibi- pounds with this activity. In this work different families of tel tors of the telomerase enzyme have been widely reported compounds appear, for which telomerase enzyme IC 50 val- [176, 178, 179, 194, 220-222]. Acridones [150, 206, 223] ues are available. In all cases there is evidence that the com- represent another class of G4 stabilizers similar to acridines pounds act by the G4 stabilization mechanism. A significant in structure and activity. A three dimensional QSAR CoM- number of models is provided by us, divided in different SIA model based in this kind of chemicals was developed families of molecular descriptors implemented in Dragon with the use of molecular descriptors implemented in SY- [215], we also employ different cut off values for making the 2850 Current Topics in Medicinal Chemistry , 2012, Vol . 12 , No . 24 Castillo-González et al.

LDA models. A consensus with the best models for each cut CONCLUSIONS off it is present. Using the consensus models, a prediction Despite of the role of guanine-rich sequences in cancer was made using the Drugbank database [228-230]. Between biology the use of computational techniques has been poorly the compounds that are positively predicted are Prochlorop- used, if we compare it with other targets usually explored in erazine, and . Its capacity as weak G4 stabilizer is checked, but the inhibitory effect over telom- cancer therapy. Although several computational methods have deal the screening for the discovery/design of new G4 erase is not tested. Also a nucleus of dibenzocycloheptene, stabilizers; ligand-based methods especially QSAR ap- present in compounds like and is proaches are at an early stage of application. In this sense, identified, and it was suggested that some modifications of we have collected comprehensive database for G4 ligands this nucleus could increase the stabilizer effect on the G4. from literature and developed some QSAR models. Both, database and QSAR models could be used into the main N streamline of the rational searches of G4 ligand as anticancer N agents.

H CONFLICT OF INTEREST N The author(s) confirm that this article content has no con- N N N flict of interest.

TSIZ01 ACKNOWLEDGEMENTS We are grateful to the Flemish Interuniversity Council (Belgium) for financial support to the project “Strengthening N postgraduate education and research in Pharmaceutical Sci- ences” where this research is included. The authors acknowl- edge financial support of Agencia Española de Cooperación N Iberoamericana para el Desarrollo (AECID) to the projects: 1- A1/036687/11: Montaje de un laboratorio de Química Fig. (2). Structure of TSIZ01 molecule 1-(4-{2,5-bis[4-(4- Computacional, con fines académicos y científicos, para el methylpiperazin-1-yl)phenyl]-1H-imidazol-4-yl}phenyl)-4- diseño de potenciales candidatos a fármacos, en enfer- methylpiperazine. medades de alto impacto social and 2- DCI- ALA/19.09.01/10/21526/245-297/ALFA 111(2010)29: Red- The ingenious computational campaign for the discovery Biofarma. Red para el desarrollo de metodologías biofar- of new ligands followed by Randazzo et al. [155] is recom- macéuticas racionales que incrementen la competencia y el mendable. Starting with the structure of a well-known cou- impacto social de las Industrias Farmacéuticas Locales . marin, and using the Tanimoto similarity index, a virtual D.C. is grateful to all the members of the INSERM U869 screening was performed to a set of 7 million ligands, be- laboratory in Pessac, France, for helpful discussions. D.C. is longing to the Zinc database. This is a very simple method- grateful also to Dr. Liane Saíz-Urra (Rega Institute, KU ology to find new compounds in huge databases. Five of de Leuven), Guillermin Agüero-Chapin (CBQ, UCLV) and to predicted coumarin derivatives were evaluated as G- the Professor Dr. Mathews Froeyen (Rega Institute, KU stabilizer and 3 of them showed activity corroborated using Leuven) for the helpful comments regarding the results in NMR. They also checked the activity of other 20 chromone this work. derivatives. Finally they show evidence of the new ligand properties to induce DNA damage and cell-cycle arrest. REFERENCES An example of conservative molecular design based on [1] Trichopoulos, D.; Lagiou, P.; Adami, H.O. Towards an integrated model for breast cancer etiology: the crucial role of the number of relevant crystal structure information is presented by Neidle mammary tissue-specific stem cells. Breast Cancer Res.., 2005 , 7, et al. [151] . The main idea in the work was to maintain 13-17. principle of three arms of this trisubstituted acridine binding [2] Cairney, C.J.; Bilsland, A.E.; Evans, T.R.; Roffey, J.; Bennett, in three grooves of a G4, like in the BRACO-19 telomeric D.C.; Narita, M.; Torrance, C.J.; Keith, W.N. Cancer cell senescence: a new frontier in drug development. Drug Discov. quadruplex crystal structure Today, 2012 , 17 , 269-276. Following this principle, the structure-based design, syn- [3] Dalla Via, L.; Nardon, C.; Fregona, D. Targeting the ubiquitin- thesis and preliminary assessment of bis-triazole quadruplex- proteasome pathway with inorganic compounds to fight cancer: a challenge for the future. Future Med. Chem., 2012 , 4, 525-543. binding ligands containing two alkylamine side-chain arms [4] Muller-Schiffmann, A.; Sticht, H.; Korth, C. Hybrid compounds: connected to a benzene ring through 1,4-triazoles (a struc- from simple combinations to nanomachines. BioDrugs, 2012 , 26 , tural mimetic of BRACO-19) was done. The affinity and 21-31. selectivity for telomeric G4 DNA of these compounds was [5] Balasubramanian, S.; Hurley, L.; Neidle, S. Targeting G- quadruplexes in gene promoters: a novel anticancer strategy? Nat. evaluated using electrospray mass spectrometry (ESI-MS). Rev. Drug Discov., 2011 , 10 , 261–275. The tris-triazole ligands had higher selectivity for quadru- [6] Sen, D.; Gilbert, W. Formation of parallel four-stranded complexes plex over duplex DNA compared to BRACO-19, with > by guanine-rich motif for meiosis. Nature, 1998 , 334 , 364-366. 1000 fold difference in Kd values. Finally, three compounds were selected for future testing in biological studies. Computational Tools in the Discovery Current Topics in Medicinal Chemistry , 2012, Vol . 12 , No . 24 2851

[7] Sundquist, W.I.; Klug, A. Telomeric DNA dimerizes by formation [29] Parkinson, G.N. Fundamentals of Quadruplex Structures. In of guanine tetrads between hairpin loops. . Nature, 1989 , 342 , 825- Quadruplex Nucleic Acids , Neidle, S.; Balasubramanian, S., Eds. 829. Royal Society of Chemistry: Cambridge, 2006 ; pp 1-27. [8] Membrino, A.; Cogoi, S.; Pedersen, E.B.; Xodo, L.E. G4-DNA [30] Neidle, S.; Parkinson, G.N. Quadruplex DNA crystal structures and Formation in the HRAS Promoter and Rational Design of Decoy drug design. Biochimie, 2008 , 90 , 1184-1196. Oligonucleotides for Cancer Therapy. PLoS One, 2011 , 6, e24421. [31] Hud, N.; Plavec, J. The Role of Cations in Determining Quadruplex [9] Miller, K.M.; Rodriguez, R. G-quadruplexes: selective DNA Structure and Stability. In Quadrupl ex Nucleic Acid Neidle, S.; targeting for cancer therapeutics? Expert Rev. Clin. Pharmacol., Balasubramanian, S., Eds. Royal Society of Chemistry: Cambridge, 2011 , 4, 139-142. 2006 ; pp 100-130. [10] Amato, J.; Pagano, B.; Borbone, N.; Oliviero, G.; Gabelica, V.; [32] Jing, N.; Sha, W.; Li, Y.; Xiong, W.; Tweardy, D. Rational drug Pauw, E.D.; D'Errico, S.; Piccialli, V.; Varra, M.; Giancola, C.; design of G quartet DNA as anticancer agents. Curr. Pharm. Des., Piccialli, G.; Mayol, L. Targeting G-quadruplex structure in the 2005 , 11 , 2841-2854. human c-Kit promoter with short PNA sequences. Bioconjug. [33] Maizels, N. Quadruplexes and the Biology of G-Rich Genomic Chem., 2011 , 22 , 654-663. Regions. In Quadruplex Nucleic Acids , Neidle, S.; [11] Parkinson, G.N. Fundamentals of Quadruplex Structures. In Balasubramanian, S., Eds. Royal Society of Chemistry: Cambridge, Quadruplex Nucleic Acids , Neidle, S.; Balasubramanian, S., Eds. 2006 ; pp 228-252. Royal Society of Chemistry: Cambridge, 2006 ; p1- 27. [34] Zahler, A.M.; Williamson, J.R.; Cech, T.R.; Prescott, D.M. [12] Phan, A.; Kuryavyi, V.; Ngoc, K.; Patel, D. Structural Diversity of Inhibition of telomerase by G-quartet DNA structures. Nature, G-Quadruplex Scaffolds. In Quadruplex Nucleic Acid Neidle, S.; 1991 , 350 , 718-720. Balasubramanian, S., Eds. Royal Society of Chemistry: Cambridge, [35] Kaushik, M.; Kaushik, S.; Bansal, A.; Saxena, S.; Kukreti, S. 2006 ; pp 81-99. Structural diversity and specific recognition of four stranded G- [13] Han, H.; Hurley, L.H. G-quadruplex DNA: a potential target for quadruplex DNA. Curr. Mol. Med., 2011 , 11 , 744-769. anti-cancer drug design. Trends Pharmacol. Sci., 2000 , 21 , 136- [36] Cummaro, A.; Fotticchia, I.; Franceschin, M.; Giancola, C.; 142. Petraccone, L. Binding properties of human telomeric quadruplex [14] Ma, D.; Pui-Yan Ma, V.; Shiu-Hin Chan, D.; Leung, K.; Zhong, multimers: a new route for drug design. Biochimie, 2011 , 93 , 1392- H.; Leung, C. In silico screening of quadruplex-binding ligands. 1400. Methods, 2012 , 57, 106-114. [37] Patel, D.J.; Phan, A.T.; Kuryavyi, V. Human telomere, oncogenic [15] Mergny, J.L.; Gros, J.; De Cian, A.; Bourdoncle, A.; Rosu, F.; promoter and 5'-UTR G-quadruplexes: diverse higher order DNA Saccà, B.; Guittat, L.; Amrane, S.; Mills, M.; Alberti, P.; M., T.; and RNA targets for cancer therapeutics. Nucleic Acids Res., 2007 , Lacroix, L. Energetics, Kinetics and Dynamics of Quadruplex 35, 7429-55. Folding. In Quadruplex Nucleic Acids Neidle, S.; [38] Phan, A.T. Human telomeric G-quadruplex: structures of DNA and Balasubramanian, S., Eds. Royal Society of Chemistry: Cambridge, RNA sequences. FEBS J, 2010 , 277 , 1107-1117. 2006 ; pp 31-72. [39] Moyzis, R.K.; Buckingham, J.M.; Cram, I.S.; Dani, M.; Deaven, [16] Trotta, R.; De Tito, S.; Lauri, I.; La Pietra, V.; Marinelli, L.; L.L.; Jones, M.D. A highly conserved repetitive DNA sequence Cosconati, S.; Martino, L.; Conte, M.; Mayol, L.; Novellino, E.; (TTAGGG)n present at the telomeres of human chromosomes. Randazzo, A. A more detailed picture of the interactions between Proc. Natl. Acad. Sci. USA, 1988 , 85 , 6622-6626. virtual screening-derived hits and the DNA G-quadruplex: NMR, [40] Meyne, J.; Ratliff, R.L.; Moyzis, R.K. Conservation of the human molecular modelling and ITC studies. Biochimie, 2011 , 93 1280- telomere sequence (TTAGGG)n among vertebrates. Proc. Natl. 1287. Acad. Sci. USA, 1989 , 86 , 7049-7053. [17] Hurley, L.H.; Wheelhouse, R.T.; Sun, D.; Kerwin, S.M.; Salazar, [41] Neidle, S.; Parkinson, G. Telomere maintenance as a target for M.; Fedoroff, O.Y.; Han, F.X.; Han, H.; Izbicka, E.; Von Hoff, anticancer drug discovery. Nat. Rev. Drug Discov., 2002 , 1, 383- D.D. G-quadruplexes as targets for drug design. Pharmacol. Ther., 393. 2000 , 85 , 141-158. [42] Sampedro Camarena, F.; Cano Serral, G.; Sampedro Santalo, F. [18] Monchaud, D.; Teulade-Fichou, M.P. A hitchhiker's guide to G- Telomerase and telomere dynamics in ageing and cancer: current quadruplex ligands. Org. Biomol. Chem., 2008 , 6, 627-636. status and future directions. Clin. Transl. Oncol., 2007 , 9, 145-154. [19] Balasubramanian, S.; Neidle, S. G-quadruplex nucleic acids as [43] Riou, J.F.; Morjani, H.; Trentesaux, C. [Telomeres and telomerase, therapeutic targets. Curr. Opin. Chem. Biol., 2009 , 13 , 345-353. new targets for anticancer chemotherapy]. Ann. Pharm. Fr., 2006 , [20] Neidle, S. Genomic Quadruplexes as Therapeutic Targets. In 64 , 97-105. Therapeutic Applications of Quadruplex Nucleic Acids. , Academic [4 4] Olaussen, K.A.; Dubrana, K.; Domont, J.; Spano, J.P.; Sabatier, L.; Press: Boston, 2012 ; pp 119-139. Soria, J.C. Telomeres and telomerase as targets for anticancer drug [21] Nagatoishi, S.; Isono, N.; Tsumoto, K.; Sugimoto, N. Loop development. Crit. Rev. Oncol. Hematol., 2006 , 57 , 191-214. residues of thrombin-binding DNA aptamer impact G-quadruplex [45] Folini, M.; Venturini, L.; Cimino-Reale, G.; Zaffaroni, N. stability and thrombin binding. Biochimie, 2011 , 93 , 1231-1238. Telomeres as targets for anticancer therapies. Expert Opin. Ther. [22] Dexheimer, T.; Fry, M.; Hurley, L. DNA Quadruplexes and Gene Targets, 2011 , 15 , 579-593. Regulation. In Quadruplex Nucleic Acids , Neidle, S.; [46] Wu, X.M.; Tang, W.R.; Luo, Y. [ALT--alternative lengthening of Balasubramanian, S., Eds. Royal Society of Chemistry Cambridge, telomere]. Yi Chuan, 2009 , 31 , 1185-1191. 2006 ; pp 180-207. [47] Terrin, L.; Rampazzo, E.; Pucciarelli, S.; Agostini, M.; Bertorelle, [23] Neidle, S. Introduction: Quadruplexes and their Biology In R.; Esposito, G.; DelBianco, P.; Nitti, D.; De Rossi, A. Therapeutic Applications of Quadruplex Nucleic Acids, Academic Relationship between tumor and plasma levels of hTERT mRNA in Press: Boston 2012 ; pp 1-20. patients with colorectal cancer: implications for monitoring of [24] Henderson, E.; Hardin, C.C.; Walk, S.K.; Jr.I, T.; Blackburn, E.H. neoplastic disease. Clin. Cancer Res.., 2008 , 14 , 7444-7451. Telomeric DNA oligonucleotides form novel intramolecular [48] Huang, F.C.; Chang, C.C.; Wang, J.M.; Chang, T.C.; Lin, J.J. structures containing guanine-guanine base pairs. . Biochemistry, Induction of senescence in cancer cells by a G-quadruplex 1987 , 51 , 899-908. stabilizer BMVC4 is independent of its telomerase inhibitory [25] Neidle, S. Introduction: Quadruplexes and their Biology. In activity. Br. J. Pharmacol., 2012 . Therapeutic Applications of Quadruplex Nucleic Acids , Academic [49] Neidle, S. The Biology and Pharmacology of Telomeric Press: Boston 2012 ; pp 1-20. Quadruplex Ligands. In Therapeutic Applications of Quadruplex [26] Todd, A.K.; Johnston, M.; Neidle, S. Highly prevalent putative Nucleic Acids , Academic Press: Boston, 2012 ; pp 109-117. quadruplex sequence motifs in human DNA. Nucleic Acids Res., [50] Bhattacharya, S.; Chaudhuri, P.; Jain, A.; Paul, A. Symmetrical 2005 , 33 , 2901-2907. Bisbenzimidazoles with Benzenediyl Spacer: The Role of the [27] Huppert, J.L.; Balasubramanian, S. Prevalence of quadruplexes in Shape of the Ligand on the Stabilization and Structural Alterations the human genome. Nucleic Acids Res.., 2005 , 33(9) , 2908-2916. in Telomeric G-Quadruplex DNA and Telomerase Inhibition. [28] D’Antonio, L.; Bagga, P. Computational methods for predicting Bioconjugate Chem., 2010 , 21 , 1148-1159 intramolecular G-quadruplexes in nucleotide sequences [51] Riou, J.; Gomez, D.; Morjani, H.; Trentesaux, C. Quadruplex Proceedings of the 2004 IEEE Computational Systems Ligand Recognition: Biological Aspects. In Quadruplex Nucleic Bioinformatics Conference (CSB 2004), 2004 . 2852 Current Topics in Medicinal Chemistry , 2012, Vol . 12 , No . 24 Castillo-González et al.

Acids Neidle, S.; Balasubramanian, S., Eds. Royal Society of diimide, a G-quadruplex-interactive ligand. Biochemistry, 2000 , 39 , Chemistry Cambridge 2006 ; pp 154-175. 9311-9316. [52] Davis, J.T. G-quartets 40 years later: from 5'-GMP to molecular [74] Rankin, S.; Reszka, A.P.; Huppert, J.; Zloh, M.; Parkinson, G.N.; biology and supramolecular chemistry. Angew. Chem. Int. Ed. Todd, A.K.; Ladame, S.; Balasubramanian, S.; Neidle, S. Putative Engl., 2004 , 43 , 668-698. DNA q uadruplex formation within the human c-kit oncogene. J. [53] Zhang, R.; Lin, Y.; Zhang, C.T. Greglist: a database listing Am. Chem. Soc., 2005 , 127 , 10584-10589. potential G-quadruplex regulated genes. Nucleic Acids Res., 2008 , [75] Fernando, H.; Reszka, A.P.; Huppert, J.; Ladame, S.; Rankin, S.; 36 , 372-376. Venkitaraman, A.R.; Neidle, S.; Balasubramanian, S. A conserved [54] Huppert, J.L. Quadruplexes in the Genome In Quadruplex Nucleic quadruplex motif located in a transcription activation site of the Acids , Neidle, S.; Balasubramanian, S., Eds. Royar Society of human c-kit oncogene. Biochemistry, 2006 , 45 , 7854-7860. Chemistry Cambridge, 2006 ; pp 208-223. [76] Todd, A.K.; Haider, S.M.; Parkinson, G.N.; Neidle, S. Sequence [55] Rahman, K.M.; Tizkova, K.; Reszka, A.P.; Neidle, S.; Thurston, occurrence and structural uniqueness of a G-quadruplex in the D.E. Identification of novel telomeric G-quadruplex-targeting human c-kit promoter. Nucleic Acids Res., 2007 , 35 , 5799-5808. chemical scaffolds through screening of three NCI libraries. [77] Fletcher, J.A.; Rubin, B.P. KIT mutations in GIST. Curr. Opin. Bioorg. Med. Chem., 2012 , 22 , 3006-3010. Genet. Dev., 2007 , 7, 3-7. [56] Fry, M.; Leob, L.A. The fragile X syndrome d(CGG)n nucleotide [78] Gunaratnam, M.; Swank, S.; Haider, S.M.; Galesa, K.; Reszka, repeats form a stable tetrahelical structure. Proc. Natl. Acad. Sci. A.P.; Beltran, M.; Cuenca, F.; Fletcher, J.A.; Neidle, S. Targeting USA, 1994 , 91 , 4950-4954. human gastrointestinal stromal tumor cells with a quadruplex- [57] Sundquist, W.I.; Heaphy, S. Evidence for interstrand quadruplex binding small molecule. J. Med. Chem., 2009 , 52 , 3774-3783. formation in the dimerization of human immunodeficiency virus 1 [79] McLuckie, K.I.; Waller, Z.A.; Sanders, D.A.; Alves, D.; Rodriguez, genomic RNA. Proc. Natl. Acad. Sci. USA, 1993 , 90 , 3393-3397. R.; Dash, J.; McKenzie, G.J.; Venkitaraman, A.R.; [58] Awang, G.; Sen, D. Mode of dimerization of HIV-1 genomic RNA. Balasubramanian, S. G-quadruplex-binding benzo[a]phenoxazines Biochemistry, 1993 , 32 , 11453-11457. down-regulate c-KIT expression in human gastric carcinoma cells. [59] Lemarteleur, T.; Gomez, D.; Paterski, R.; Mandine, E.; Mailliet, P.; J. Am. Chem. Soc., 2011 , 133 , 2658-2663. Riou, J.F. Stabilization of the c-myc gene promoter quadruplex by [80] Balasubramanian, S.; Hurley, L.H.; Neidle, S. Targeting G- specific ligands' inhibitors of telomerase. Biochem. Biophys. Res. quadruplexes in gene promoters: a novel anticancer strategy? Nat. Commun., 2004 , 323 , 802-808. Rev. Drug Discov., 2011 , 10 , 261-275. [60] Brooks, T.; Kendrick, S.; Hurley, L. Making sense of G-quadruplex [81] Neidle, S. 9 - Design Principles for Quadruplex-binding Small and i-motif functions in oncogene promoters. FEBS Journal, 2010 , Molecules. In Therapeutic Applications of Quadruplex Nucleic 277 , 3459-3469. Acids , Academic Press: Boston, 2012 ; pp 151-174. [61] Menendez, C.; Frees, S.; Bagga, P.S. QGRS-H Predictor: a web [82] Gavathiotis, E.; Heald, R.A.; Stevens, M.F.; Searle, M.S. Drug server for predicting homologous quadruplex forming G-rich recognition and stabilisation of the parallel-stranded DNA sequence motifs in nucleotide sequences. Nucleic Acids Res., 2012 . quadruplex d(TTAGGGT)4 containing the human telomeric repeat. [62] Yang, D.; Okamoto, K. Structural insights into G-quadruplexes: J. Mol. Biol., 2003 , 334 , 25-36. towards new anticancer drugs. Future Med. Chem., 2010 , 2, 619- [83] Haider, S.M.; Parkinson, G.N.; Neidle, S. Structure of a G- 646. quadruplex-ligand complex. J. Mol. Biol., 2003 , 326 , 117-125. [63] Flores, I.; Evan, G.; Blasco, M.A. Genetic analysis of myc and [84] Campbell, N.H.; Parkinson, G.N.; Reszka, A.P.; Neidle, S. telomerase interactions in vivo. Mol. Cell Biol., 2006 , 26 , 6130- Structural basis of DNA quadruplex recognition by an acridine 6138. drug. J. Am. Chem. Soc., 2008 , 130 , 6722-6724. [64] Mergny, J.L.; Riou, J.F.; Mailliet, P.; Teulade-Fichou, M.P.; [85] Parkinson, G.N.; Cuenca, F.; Neidle, S. Topology conservation and Gilson, E. Natural and pharmacological regulation of telomerase. loop flexibility in quadruplex-drug recognition: crystal structures of Nucleic Acids Res., 2002 , 30 , 839-865. inter- and intramolecular telomeric DNA quadruplex-drug [65] Siddiqui-Jain, A.; Grand, C.L.; Bearss, D.J.; Hurley, L.H. Direct complexes. J. Mol. Biol., 2008 , 381 , 1145-1156. evidence for a G-quadruplex in a promoter region and its targeting [86] Phan, A.T.; Kuryavyi, V.; Gaw, H.Y.; Patel, D.J. Small-molecule with a small molecule to repress c-MYC transcription. Proc. Natl. interaction with a five-guanine-tract G-quadruplex structure from Acad. Sci. USA, 2002 , 99 , 11593-11598. the human MYC promoter. Na t. Chem. Biol., 2005 , 1, 167-173. [66] Brown, R.V.; Danford, F.L.; Gokhale, V.; Hurley, L.H.; Brooks, [87] Parkinson Gn Fau - Ghosh, R.; Ghosh R Fau - Neidle, S.; Neidle, T.A. Demonstration that drug-targeted down-regulation of MYC in S. Structural basis for binding of porphyrin to human telomeres. non-Hodgkins lymphoma is directly mediated through the promoter Biochemistry, 2007 , 46, 2390-7. G-quadruplex. J. Biol. Chem., 2011 , 286 , 41018-41027. [88] Collie, G.W.; Sparapani, S.; Parkinson, G.N.; Neidle, S. Structural [67] Biroccio, A.; Amodei, S.; Antonelli, A.; Benassi, B.; Zupi, G. basis of telomeric RNA quadruplex--acridine ligand recognition. J. Inhibition of c-Myc oncoprotein limits the growth of human Am. Chem. Soc., 2011 , 133 , 2721-2728. melanoma cells by inducing cellular crisis. J. Biol. Chem., 2003 , [89] Holt, P.A.; Buscaglia, R.; Trent, J.O.; Chaires, J.B. A Discovery 278 , 35693-35701. Funnel for Nucleic Acid Binding Drug Candidates. Drug Dev. Res., [68] Peng, D.; Tan, J.-H.; Chen, S.-B.; Ou, T.-M.; Gu, L.-Q.; Huang, Z.- 2011 , 72 , 178-186. S. Bisaryldiketene derivatives: new class of selective ligands for c- [90] Zoete, V.; Grosdidier, A.; Michielin, O. Docking, virtual high myc G-quadruplex DNA. . Bioorg. Med. Chem., 2010 , 18 , 8235- throughput screening and in silico fragment-based drug design. J. 8242. Cell Mol. Med., 2009 , 13 , 238-248. [69] Hurley, L.H.; Von Hoff, D.D.; Siddiqui-Jain, A.; Yang, D. Drug [91] Rester, U. From virtuality to reality - Virtual screening in lead targeting of the c-MYC promoter to repress gene expression via a discovery and lead optimization: a medicinal chemistry G-quadruplex silencer element. Semin. Oncol., 2006 , 33 , 498-512. perspective. Curr. Opin. Drug Discov. Devel., 2008 , 11 , 559-568. [70] Grandori, C.; Robinson, K.L.; Galloway, D.A.; Swisshelm, K. [92] Sternberg, M.J.E.; Muggleton, S.H. Structure Activity Functional link between Myc and the Werner gene in Relationships (SAR) and Pharmacophore Discovery Using tumorigenesis. Cell Cycle, 2004 , 3, 22-25. Inductive Logic Programming (ILP). QSAR Comb. Sci., 2003 , 22 , [71] Grandori, C.; Wu, K.J.; Fernandez, P.; Ngouenet, C.; Grim, J.; 1-6. Clurman, B.E.; Moser, M.J.; Oshima, J.; Russell, D.W.; Swisshelm, [93] Concu, R.; Podda, G.; Ubeira, F.M.; Gonzalez-Diaz, H. Review of K.; Frank, S.; Amati, B.; Dalla-Favera, R.; Monnat, R.J., Jr. Werner QSAR models for enzyme classes of drug targets: Theoretical syndrome protein limits MYC-induced cellular senescence. Genes background and applications in parasites, hosts, and other Dev., 2003 , 17 , 1569-1574. organisms. Curr. Pharm. Des., 2010 , 16 , 2710-2723. [72] Li, J.L.; Harrison, R.J.; Reszka, A.P.; Brosh, R.M., Jr.; Bohr, V.A.; [94] Darnag, R.; Mostapha Mazouz, E.L.; Schmitzer, A.; Villemin, D.; Neidle, S.; Hickson, I.D. Inhibition of the Bloom's and Werner's Jarid, A.; Cherqaoui, D. Support vector machines: development of syndrome helicases by G-quadruplex interacting ligands. QSAR models for predicting anti-HIV-1 activity of TIBO Biochemistry, 2001 , 40 , 15194-15202. derivatives. Eur. J. Med. Chem., 2010 , 45 , 1590-1597. [73] Han, H.; Bennett, R.J.; Hurley, L.H. Inhibition of unwinding of G- [95] Hristozov, D.; Gasteiger, J.; Da Costa, F.B. Multilabeled quadruplex structures by Sgs1 helicase in the presence of N,N'- classification approach to find a plant source for terpenoids. J. bis[2-(1-piperidino)ethyl]-3,4,9,10-perylenetetracarboxylic Chem. Inf. Model., 2008 , 48 , 56-67. Computational Tools in the Discovery Current Topics in Medicinal Chemistry , 2012, Vol . 12 , No . 24 2853

[96] Liu, H.; Papa, E.; Walker, J.D.; Gramatica, P. In silico screening of [114] Speck-Planche, A.; Cordeiro, M.N. Computer-aided drug design estrogen-like chemicals based on different nonlinear classification methodologies toward the design of anti-hepatitis C agents. Curr. models. J. Mol. Graph. Model., 2007 , 26 , 135-144. Top. Med. Chem., 2012 , 12 , 802-813. [97] Vega, M.C.; Rolon, M.; Montero-Torres, A.; Fonseca-Berzal, C.; [115] Molina, E.; Sobarzo-Sanchez, E.; Speck-Planche, A.; Matos, M.J.; Escario, J.A.; Gomez-Barrio, A.; Galvez, J.; Marrero-Ponce, Y.; Uriarte, E.; Santana, L.; Yanez, M.; Orallo, F. Monoamino oxidase Aran, V.J. Synthesis, biological evaluation and chemometric a: an interesting pharmacological target for the development of analysis of indazole derivatives. 1,2-Disubstituted 5- multi-target QSAR. Mini Rev. Med. Chem., 2012 , 12 , 947-958. nitroindazolinones, new prototypes of antichagasic drug. Eur. J. [116] Speck-Planche, A.; Kleandrova, V.V.; Rojas-Vargas, J.A. QSAR Med. Chem., 2012 , 58C , 214-227. model toward the rational design of new agrochemical fungicides [98] Castillo-Garit, J.A.; Del Toro-Cortes, O.; Kouznetsov, V.V.; with a defined resistance risk using substructural descriptors. Mol. Puentes, C.O.; Romero Bohorquez, A.R.; Vega, M.C.; Rolon, M.; Divers., 2011 , 15 , 901-909. Escario, J.A.; Gomez-Barrio, A.; Marrero-Ponce, Y.; Torrens, F.; [117] Speck-Planche, A.; Kleandrova, V.V.; Luan, F.; Cordeiro, M.N. Abad, C. Identification in silico and in vitro of novel Multi-target drug discovery in anti-cancer therapy: fragment-based trypanosomicidal drug-like compounds. Chem. Biol. Drug Des., approach toward the design of potent and versatile anti-prostate 2012 , 80 , 38-45. cancer agents. Bioorg. Med. Chem., 2011 , 19 , 6239-6244. [99] Castillo-Garit, J.A.; Marrero-Ponce, Y.; Torrens, F.; Rotondo, R. [118] Speck-Planche, A.; Guilarte-Montero, L.; Yera-Bueno, R.; Rojas- Atom-based stochastic and non-stochastic 3D-chiral bilinear Vargas, J.A.; Garcia-Lopez, A.; Uriarte, E.; Molina-Perez, E. indices and their applications to central chirality codification. J. Rational design of new agrochemical fungicides using substructural Mol. Graph. Model., 2007 , 26 , 32-47. descriptors. Pest Manag. Sci., 2011 , 67 , 438-445. [100] Le-Thi-Thu, H.; Casanola-Martin, G.M.; Marrero-Ponce, Y.; [119] Speck-Planche, A.; Cordeiro, M.N.D.S. Application of Rescigno, A.; Saso, L.; Parmar, V.S.; Torrens, F.; Abad, C. Novel Bioinformatics for the search of novel anti-viral therapies: Rational coumarin-based tyrosinase inhibitors discovered by OECD design of anti-herpes agents. Curr. Bioinform., 2011 , 6, 81-93. principles-validated QSAR approach from an enlarged, balanced [120] Speck-Planche, A.; Cordeiro, M.N.; Guilarte-Montero, L.; Yera- database. Mol. Divers., 2011 , 15 , 507-520. Bueno, R. Current computational approaches towards the rational [101] Casañola-Martin, G.M.; Marrero-Ponce, Y.; Khan, M.T.; Khan, design of new insecticidal agents. Curr. Comput. Aided Drug Des., S.B.; Torrens, F.; Perez-Jimenez, F.; Rescigno, A.; Abad, C. Bond- 2011 , 7, 304-314. based 2D quadratic fingerprints in QSAR studies: virtual and in [121] Speck-Planche, A.; Cordeiro, M.N. Current drug design of anti- vitro tyrosinase inhibitory activity elucidation. Chem. Biol. Drug HIV agents through the inhibition of C-C chemokine receptor type Des., 2010 , 76 , 538-545. 5. Curr. Comput. Aided Drug Des., 2011 , 7, 238-248. [102] Castillo-Garit, J.A.; Marrero-Ponce, Y.; Torrens, F.; Garcia- [122] Speck-Planche, A.; Scotti, M.T.; de Paulo-Emerenciano, V. Current Domenech, R.; Romero-Zaldivar, V. Bond-based 3D-chiral linear pharmaceutical design of antituberculosis drugs: future indices: theory and QSAR applications to central chirality perspectives. Curr. Pharm. Des., 2010 , 16 , 2656-2665. codification. J. Comput. Chem., 2008 , 29 , 2500-2512. [123] Tenorio-Borroto, E.; Penuelas Rivas, C.G.; Vasquez Chagoyan, [103] Speck-Planche, A.; Luan, F.; Cordeiro, M.N. Role of ligand-based J.C.; Castanedo, N.; Prado-Prado, F.J.; Garcia-Mera, X.; Gonzalez- drug design methodologies toward the discovery of new anti- Diaz, H. ANN multiplexing model of drugs effect on macrophages; Alzheimer agents: futures perspectives in Fragment-Based Ligand theoretical and flow cytometry study on the cytotoxicity of the anti- Design. Curr. Med. Chem., 2012 , 19 , 1635-1645. microbial drug G1 in spleen. Bioorg. Med. Chem., 2012 , 20 , 6181- [104] Speck-Planche, A.; Luan, F.; Cordeiro, M.N. Discovery of anti- 6194. Alzheimer agents: current ligand-based approaches toward the [124] Riera-Fernandez, P.; Martin-Romalde, R.; Prado-Prado, F.J.; design of acetylcholinesterase inhibitors. Mini Rev. Med. Chem., Escobar, M.; Munteanu, C.R.; Concu, R.; Duardo-Sanchez, A.; 2012 , 12 , 583-591. Gonzalez-Diaz, H. From QSAR models of drugs to complex [105] Speck-Planche, A.; Kleandrova, V.V.; Scotti, M.T. Fragment-based networks: state-of-art review and introduction of new Markov- approach for the in silico discovery of multi-target insecticides. spectral moments indices. Curr. Top. Med. Chem., 2012 , 12 , 927- Chemometr. Intell. Lab. Syst., 2012 , 111 , 39–45. 960. [106] Speck-Planche, A.; Kleandrova, V.V.; Luan, F.; Cordeiro, M.N. [125] Gonzalez-Diaz, H.; Riera-Fernandez, P. New Markov- Chemoinformatics in multi-target drug discovery for anti-cancer Autocorrelation Indices for Re-evaluation of Links in Chemical and therapy: in silico design of potent and versatile anti-brain tumor Biological Complex Networks used in Metabolomics, Parasitology, agents. Anticancer Agents Med. Chem., 2012 , 12 , 678-685. Neurosciences, and Epidemiology. J. Chem. Inf. Model., 2012 , 52, [107] Speck-Planche, A.; Kleandrova, V.V.; Luan, F.; Cordeiro, M.N. In 331-40. silico discovery and virtual screening of multi-target inhibitors for [126] Gonzalez-Diaz, H.; Munteanu, C.R.; Postelnicu, L.; Prado-Prado, proteins in Mycobacterium tuberculosis. Comb. Chem. High F.; Gestal, M.; Pazos, A. LIBP-Pred: web server for lipid binding Throughput Screen., 2012 , 15 , 666-673. proteins using structural network parameters; PDB mining of [108] Speck-Planche, A.; Kleandrova, V.V.; Luan, F.; Cordeiro, M.N. human cancer biomarkers and drug targets in parasites and bacteria. Predicting multiple ecotoxicological profiles in agrochemical Mol. Biosyst., 2012 , 8, 851-862. fungicides: a multi-species chemoinformatic approach. Ecotoxicol. [127] Aguiar-Pulido, V.; Munteanu, C.R.; Seoane, J.A.; Fernandez- Environ. Saf., 2012 , 80 , 308-313. Blanco, E.; Perez-Montoto, L.G.; Gonzalez-Diaz, H.; Dorado, J. [109] Speck-Planche, A.; Kleandrova, V.V.; Luan, F.; Cordeiro, M.N. Naive Bayes QSDR classification based on spiral-graph Shannon Chemoinformatics in anti-cancer chemotherapy: multi-target ent ropi es for protein biomarkers in human colon cancer. Mol. QSAR model for the in silico discovery of anti-breast cancer Biosyst., 2012 , 8, 1716-1722. agents. Eur. J. Pharm. Sci., 2012 , 47 , 273-279. [128] Prado-Prado, F.; Garcia-Mera, X.; Abeijon, P.; Alonso, N.; [110] Speck-Planche, A.; Kleandrova, V.V.; Luan, F.; Cordeiro, M.N. A Caamano, O.; Yanez, M.; Garate, T.; Mezo, M.; Gonzalez-Warleta, ligand-based approach for the in silico discovery of multi-target M.; Muino, L.; Ubeira, F.M.; Gonzalez-Diaz, H. Using entropy of inhibitors for proteins associated with HIV infection. Mol. Biosyst., drug and protein graphs to predict FDA drug-target network: 2012 , 8, 2188-2196. theoretic-experimental study of MAO inhibitors and hemoglobin [111] Speck-Planche, A.; Kleandrova, V.V.; Luan, F.; Cordeiro, M.N. peptides from Fasciola hepatica. Eur. J. Med. Chem., 2011 , 46 , Rational drug design for anti-cancer chemotherapy: multi-target 1074-1094. QSAR models for the in silico discovery of anti-colorectal cancer [129] Rodriguez-Soca, Y.; Munteanu, C.R.; Dorado, J.; Pazos, A.; Prado- agents. Bioorg. Med. Chem., 2012 , 20 , 4848-4855. Prado, F.J.; Gonzalez-Diaz, H. Trypano-PPI: a web server for [112] Speck-Planche, A.; Kleandrova, V.V. QSAR and molecular prediction of unique targets in trypanosome proteome by using docking techniques for the discovery of potent monoamine oxidase electrostatic parameters of protein-protein interactions. J. Proteome B inhibitors: Computer-aided generation of new rasagiline Res., 2010 , 9, 1182-1190. bioisosteres. Curr. Top. Med. Chem., 2012 , 12 , 1734-1747. [130] Prado-Prado, F.J.; Garcia-Mera, X.; Gonzalez-Diaz, H. Multi-target [113] Speck-Planche, A.; Kleandrova, V.V. In silico design of multi- spectral moment QSAR versus ANN for antiparasitic drugs against target inhibitors for C-C chemokine receptors using substructural different parasite species. Bioorg. Med. Chem., 2010 , 18 , 2225- descriptors. Mol. Divers., 2012 , 16 , 183-191. 2231. 2854 Current Topics in Medicinal Chemistry , 2012, Vol . 12 , No . 24 Castillo-González et al.

[131] Gonzalez-Diaz, H.; Dea-Ayuela, M.A.; Perez-Montoto, L.G.; [151] Lombardo, C.M.; Martinez, I.S.; Haider, S.; Gabelica, V.; De Prado-Prado, F.J.; Aguero-Chapin, G.; Bolas-Fernandez, F.; Pauw, E.; Moses, J.E.; Neidle, S. Structure-based design of Vazquez-Padron, R.I.; Ubeira, F.M. QSAR for RNases and selective high-affinity telomeric quadruplex-binding ligands. theoretic-experimental study of molecular diversity on peptide Chem. Commun. (Camb), 2010 , 46 , 9116-9118. mass fingerprints of a new Leishmania infantum protein. Mol. [152] Guan, A.J.; Zhang, E.X.; Xiang, J.F.; Li, Q.; Yang, Q.F.; Li, L.; Divers., 2010 , 14 , 349-369. Tang, Y.L.; Wang, M.X. Effects of Loops and Nucleotides in G- [132] Gonzalez-Diaz, H. Network topological indices, drug metabolism, Quadruplexes on Their Interaction with an Azacalixarene, and distribution. Curr. Drug Metab., 2010 , 11 , 283-284. Methylazacalix[6]pyridine. J. Phys. Chem. B, 2011 , 43, 12584-90. [133] Vina, D.; Uriarte, E.; Orallo, F.; Gonzalez-Diaz, H. Alignment-free [153] Dhamodharan, V.; Harikrishna, S.; Jagadeeswaran, C.; Halder, K.; prediction of a drug-target complex network based on parameters Pradeepkumar, P.I. Selective G-quadruplex DNA stabilizing agents of drug connectivity and protein sequence of receptors. Mol. based on bisquinolinium and bispyridinium derivatives of 1,8- Pharm., 2009 , 6, 825-835. naphthyridine. J. Org. Chem., 2012 , 77 , 229-242. [134] Vilar, S.; Gonzalez-Diaz, H.; Santana, L.; Uriarte, E. A network- [154] Cosconati, S.; Marinelli, L.; Trotta, R.; Virno, A.; Mayol, L.; QSAR model for prediction of genetic-component biomarkers in Novellino, E.; Olson, A.J.; Randazzo, A. Tandem application of human colorectal cancer. J. Theor. Biol., 2009 , 261 , 449-458. virtual screening and NMR experiments in the discovery of brand [135] Prado-Prado, F.J.; Uriarte, E.; Borges, F.; Gonzalez-Diaz, H. Multi- new DNA quadruplex groove binders. J. Am. Chem. Soc., 2009 , target spectral moments for QSAR and Complex Networks study of 131 , 16336-16337. antibacterial drugs. J. Med. Chem.Eur. J. Med. Chem., 2009 , 44 , [155] Cosconati, S.; Rizzo, A.; Trotta, R.; Pagano, B.; Iachettini, S.; De 4516-4521. Tito, S.; Lauri, I.; Fotticchia, I.; Giustiniano, M.; Marinelli, L.; [136] Munteanu, C.R.; Vazquez, J.M.; Dorado, J.; Sierra, A.P.; Sanchez- Giancola, C.; Novellino, E.; Biroccio, A.; Randazzo, A. Shooting Gonzalez, A.; Prado-Prado, F.J.; Gonzalez-Diaz, H. Complex for Selective Druglike G-Quadruplex Binders: Evidence for network spectral moments for ATCUN motif DNA cleavage: first Telomeric DNA Damage and Tumor Cell Death. J. Med. Chem., predictive study on proteins of human pathogen parasites. J. 2012 , 55 , 9785–9792. Proteome Res., 2009 , 8, 5219-5228. [156] Kim, M.Y.; Duan, W.; Gleason-Guzman, M.; Hurley, L.H. Design, [137] Munteanu, C.R.; Magalhaes, A.L.; Uriarte, E.; Gonzalez-Diaz, H. synthesis, and biological evaluation of a series of Multi-target QPDR classification model for human breast and fluoroquinoanthroxazines with contrasting dual mechanisms of colon cancer-related proteins using star graph topological indices. action against topoisomerase II and G-quadruplexes. J. Med. J. Theor. Biol., 2009 , 257 , 303-311. Chem., 2003 , 46 , 571-583. [138] Todechini, R.; Consonni, V. Handbook of Molecular Descriptors , [157] Lee, H.M.; Chan, D.S.; Yang, F.; Lam, H.Y.; Yan, S.C.; Che, Mannhold, R., Kubinyi, H., Timmerman, H., Eds ed.; Wiley-ECH, C.M.; Ma, D.L.; Leung, C.H. Identification of natural product 2000 . fonsecin B as a stabilizing ligand of c-myc G-quadruplex DNA by [139] Dror, O.; Schneidman-Duhovny, D.; Inbar, Y.; Nussinov, R.; high-throughput virtual screening. Chem. Commun. (Camb), 2010 , Wolfson, H.J. Novel approach for efficient pharmacophore-based 46 , 4680-4682. virtual screening: method and applications. J. Chem. Inf. Model., [158] Naasani, I.; Seimiya, H.; Yamori, T.; Tsuruo, T. FJ5002: a potent 2009 , 49 , 2333-2343. telomerase inhibitor identified by exploiting the disease-oriented [140] Ma, D.-L.; Ma, V.P.-Y.; Chan, D.S.-H.; Leung, K.-H.; Zhong, H.- screening program with COMPARE analysis. Cancer Res., 1999 , J.; Leung, C.-H. In silico screening of quadruplex-binding ligands. 59 , 4004-4011. Methods . , 2012 , 57, 106- 14. [159] Franceschin, M.; Rossetti, L.; D'Ambrosio, A.; Schirripa, S.; [141] Yang, S.Y. Pharmacophore modeling and applications in drug Bianco, A.; Ortaggi, G.; Savino, M.; Schultes, C.; Neidle, S. discovery: challenges and recent advances. Drug Discov. Today, Natural and synthetic G-quadruplex interactive berberine 2010 , 15 , 444-450. derivatives. Bioorg. Med. Chem. Lett., 2006 , 16 , 1707-1711. [142] Youngman, M.A.; McNally, J.J.; Lovenberg, T.W.; Reitz, A.B.; [160] Zhang, W.J.; Ou, T.M.; Lu, Y.J.; Huang, Y.Y.; Wu, W.B.; Huang, Willard, N.M.; Nepomuceno, D.H.; Wilson, S.J.; Crooke, J.J.; Z.S.; Zhou, J.L.; Wong, K.Y.; Gu, L.Q. 9-Substituted berberine Rosenthal, D.; Vaidya, A.H.; Dax, S.L. alpha-Substituted N- derivatives as G-quadruplex stabilizing ligands in telomeric DNA. (sulfonamido)alkyl-beta-aminotetralins: potent and selective Bioorg. Med. Chem., 2007 , 15 , 5493-5501. neuropeptide Y Y5 receptor antagonists. J. Med. Chem., 2000 , 43 , [161] Ma, Y.; Ou, T.M.; Hou, J.Q.; Lu, Y.J.; Tan, J.H.; Gu, L.Q.; Huang, 346-350. Z.S. 9-N-Substituted berberine derivatives: stabilization of G- [143] Gilligan, P.J.; Robertson, D.W.; Zaczek, R. Corticotropin releasing quadruplex DNA and down-regulation of oncogene c-myc. Bioorg. factor (CRF) receptor modulators: progress and opportunities for Med. Chem., 2008 , 16 , 7582-7591. new therapeutic agents. J. Med. Chem., 2000 , 43 , 1641-1660. [162] Ma, Y.; Ou, T.M.; Tan, J.H.; Hou, J.Q.; Huang, S.L.; Gu, L.Q.; [144] Chen, Z.; Li, H.L.; Zhang, Q.J.; Bao, X.G.; Yu, K.Q.; Luo, X.M.; Huang, Z.S. Synthesis and evaluation of 9-O-substituted berberine Zhu, W.L.; Jiang, H.L. Pharmacophore-based virtual screening derivatives containing aza-aromatic terminal group as highly versus docking-based virtual screening: a benchmark comparison selective telomeric G-quadruplex stabilizing ligands. Bioorg. Med. against eight targets. Acta Pharmacol. Sin., 2009 , 30 , 1694-1708. Chem. Lett., 2009 , 19 , 3414-3417. [145] Hirayama, N. [Docking method for drug discovery]. Yakugaku [163] Parkinson, G.N.; Lee, M.P.; Neidle, S. Crystal structure of parallel Zasshi, 2007 , 127 , 113-122. quadruplexes from human telomeric DNA. Nature, 2002 , 417 , 876- [146] Minovski, N.; Perdih, A.; Solmajer, T. Combinatorially-generated 880. library of 6-fluoroquinolone analogs as potential novel [164] Totrov, M.; Abagyan, R. Flexible protein-ligand docking by global antitubercular agents: a chemometric and molecular modeling energy optimization in internal coordinates. Proteins, 1997 , Suppl assessment. J. Mol. Model., 2012 , 18 , 1735-1753. 1, 215-220. [147] Yang, J.M.; Chen, Y.F.; Shen, T.W.; Kristal, B.S.; Hsu, D.F. [165] Irwin, J.J.; Shoichet, B.K. ZINC--a free database of commercially Consensus scoring criteria for improving enrichment in virtual available compounds for virtual screening. J. Chem. Inf. Model., screening. J. Chem. Inf. Model., 2005 , 45 , 1134-1146. 2005 , 45 , 177-182. [148] Kinnings, S.L.; Jackson, R.M. LigMatch: a multiple structure- [166] Dai, J.; Punchihewa, C.; Ambrus, A.; Chen, D.; Jones, R.A.; Yang, based ligand matching method for 3D virtual screening. J. Chem. D. Structure of the intramolecular human telomeric G-quadruplex Inf. Model., 2009 , 49 , 2056-2066. in potassium solution: a novel adenine triple formation. Nucleic [149] Ma, D.L.; Lai, T.S.; Chan, F.Y.; Chung, W.H.; Abagyan, R.; Acids Res., 2007 , 35 , 2440-2450. Leung, Y.C.; Wong, K.Y. Discovery of a drug-like G-quadruplex [167] Holt, P.A.; Chaires, J.B.; Trent, J.O. Molecular docking of binding ligand by high-throughput docking. ChemMedChem, 2008 , intercalators and groove-binders to nucleic acids using Autodock 3, 881-884. and Surflex. J. Chem. Inf. Model., 2008 , 48 , 1602-1615. [150] Cuenca, F.; Moore, M.J.; Johnson, K.; Guyen, B.; De Cian, A.; [168] Carley, D.W. Drug repurposing: identify, develop and Neidle, S. Design, synthesis and evaluation of 4,5-di-substituted commercialize new uses for existing or abandoned drugs. Part II. acridone ligands with high G-quadruplex affinity and selectivity, IDrugs, 2005 , 8, 310-313. together with low toxicity to normal cells. Bioorg. Med. [169] Chong, C.R.; Sullivan, D.J., Jr. New uses for old drugs. Nature, Chem.Bioorg. Med. Chem. Lett., 2009 , 19 , 5109-5113. 2007 , 448 , 645-646. Computational Tools in the Discovery Current Topics in Medicinal Chemistry , 2012, Vol . 12 , No . 24 2855

[170] Schneider, P.; Tanrikulu, Y.; Schneider, G. Self-organizing maps in [191] Harrison, R.J.; Cuesta, J.; Chessari, G.; Read, M.A.; Basra, S.K.; drug discovery: compound library design, scaffold-hopping, Reszka, A.P.; Morrell, J.; Gowan, S.M.; Incles, C.M.; Tanious, repurposing. Curr. Med. Chem., 2009 , 16 , 258-266. F.A.; Wilson, W.D.; Kelland, L.R.; Neidle, S. Trisubstituted [171] Chan, D.S.; Yang, H.; Kwan, M.H.; Cheng, Z.; Lee, P.; Bai, L.P.; acridine derivatives as potent and selective telomerase inhibitors. J. Jiang, Z.H.; Wong, C.Y.; Fong, W.F.; Leung, C.H.; Ma, D.L. Med. Chem., 2003 , 46 , 4463-4476. Structure-based optimization of FDA-approved drug methylene [192] Koeppel, F.; Riou, J.F.; Laoui, A.; Mailliet, P.; Arimondo, P.B.; blue as a c-myc G-quadruplex DNA stabilizer. Biochimie, 2011 , Labit, D.; Petitgenet, O.; Helene, C.; Mergny, J.L. Ethidium 93 , 1055-1064. derivatives bind to G-quartets, inhibit telomerase and act as [172] Ma, D.L.; Chan, D.S.; Fu, W.C.; He, H.Z.; Yang, H.; Yan, S.C.; fluorescent probes for quadruplexes. Nucleic Acids Res., 2001 , 29 , Leung, C.H. Discovery of a natural product-like c-myc G- 1087-1096. quadruplex DNA groove-binder by molecular docking. PLoS One, [193] Rosu, F.; De Pauw, E.; Guittat, L.; Alberti, P.; Lacroix, L.; Mailliet, 2012 , 7, e43278. P.; Riou, J.F.; Mergny, J.L. Selective interaction of ethidium [173] Caceres, C.; Wright, G.; Gouyette, C.; Parkinson, G.; Subirana, derivatives with quadruplexes: an equilibrium dialysis and J.A. A thymine tetrad in d(TGGGGT) quadruplexes stabilized with electrospray ionization mass spectrometry analysis. Biochemistry, Tl+/Na+ ions. Nucleic Acids Res., 2004 , 32 , 1097-1102. 2003 , 42 , 10361-10371. [174] Huey, R.; Morris, G.M.; Olson, A.J.; Goodsell, D.S. A [194] Caprio, V.; Guyen, B.; Opoku-Boahen, Y.; Mann, J.; Gowan, S.M.; semiempirical free energy force field with charge-based Kelland, L.M.; Read, M.A.; Neidle, S. A novel inhibitor of human desolvation. J. Comput. Chem., 2007 , 28 , 1145-1152. telomerase derived from 10H-indolo[3,2-b]quinoline. Bioorg. Med. [175] Trott, O.; Olson, A.J. AutoDock Vina: improving the speed and Chem. Lett., 2000 , 10 , 2063-2066. accuracy of docking with a new scoring function, efficient [195] Mergny, J.L.; Lacroix, L.; Teulade-Fichou, M.P.; Hounsou, C.; optimization, and multithreading. J. Comput. Chem., 2010 , 31 , 455- Guittat, L.; Hoarau, M.; Arimondo, P.B.; Vigneron, J.P.; Lehn, 461. J.M.; Riou, J.F.; Garestier, T.; Helene, C. Telomerase inhibitors [176] Perry, P.J.; Gowan, S.M.; Reszka, A.P.; Polucci, P.; Jenkins, T.C.; based on quadruplex ligands selected by a fluorescence assay. Kelland, L.R.; Neidle, S. 1,4- and 2,6-disubstituted Proc. Natl. Acad. Sci. USA, 2001 , 98 , 3062-3067. amidoanthracene-9,10-dione derivatives as inhibitors of human [196] Teulade-Fichou, M.P.; Carrasco, C.; Guittat, L.; Bailly, C.; Alberti, telomerase. J. Med. Chem., 1998 , 41 , 3253-3260. P.; Mergny, J.L.; David, A.; Lehn, J.M.; Wilson, W.D. Selective [177] Neidle, S.; Harrison, R.J.; Reszka, A.P.; Read, M.A. Structure- recognition of G-qQuadruplex telomeric DNA by a activity relationships among guanine-quadruplex telomerase bis(quinacridine) macrocycle. J. Am. Chem. Soc., 2003 , 125 , 4732- inhibitors. Pharmacol. Ther., 2000 , 85 , 133-139. 4740. [178] Perry, P.J.; Reszka, A.P.; Wood, A.A.; Read, M.A.; Gowan, S.M.; [197] Gowan, S.M.; Harrison, J.R.; Patterson, L.; Valenti, M.; Read, Dosanjh, H.S.; Trent, J.O.; Jenkins, T.C.; Kelland, L.R.; Neidle, S. M.A.; Neidle, S.; Kelland, L.R. A G-quadruplex-interactive potent Human telomerase inhibition by regioisomeric disubstituted small-molecule inhibitor of telomerase exhibiting in vitro and in amidoanthracene-9,10-diones. J. Med. Chem., 1998 , 41 , 4873- vivo antitumor activity. Mol. Pharmacol., 2002 , 61 , 1154-1162. 4884. [198] Shin-ya, K.; Wierzba, K.; Matsuo, K.; Ohtani, T.; Yamada, Y.; [179] Read, M.A.; Wood, A.A.; Harrison, J.R.; Gowan, S.M.; Kelland, Furihata, K.; Hayakawa, Y.; Seto, H. Telomestatin, a novel L.R.; Dosanjh, H.S.; Neidle, S. Molecular modeling studies on G- telomerase inhibitor from Streptomyces anulatus. J. Am. Chem. quadruplex complexes of telomerase inhibitors: structure-activity Soc., 2001 , 123 , 1262-1263. relationships. J. Med. Chem., 1999 , 42 , 4538-4546. [199] Kim, M.Y.; Vankayalapati, H.; Shin-Ya, K.; Wierzba, K.; Hurley, [180] Izbicka, E.; Wheelhouse, R.T.; Raymond, E.; Davidson, K.K.; L.H. Telomestatin, a potent telomerase inhibitor that interacts quite Lawrence, R.A.; Sun, D.; Windle, B.E.; Hurley, L.H.; Von Hoff, specifically with the human telomeric intramolecular g-quadruplex. D.D. Effects of cationic porphyrins as G-quadruplex interactive J. Am. Chem. Soc., 2002 , 124 , 2098-2099. agents in human tumor cells. Cancer Res., 1999 , 59 , 639-644. [200] Mailliet, P.; Riou, J.F.; Mergny, J.L.; Laoui, A.; Lavelle, F.; [181] Neidle, S. Human telomeric G-quadruplex: the current status of Petitgenet, O. Chemical derivatives and their appliction as telomeric G-quadruplexes as therapeutic targets in human cancer. antitelomerase agent. US 6642964 B1, 2003 . FEBS J., 2010 , 277 , 1118-1125. [201] Riou, J.F. G-quadruplex interacting agents targeting the telomeric [182] Balasubramanian, S.; Neidle, S. G-quadruplex nucleic acids as G-overhang are more than simple telomerase inhibitors. Curr. Med. therapeutic targets. Curr. Opin. Chem. Biol., 2009 , 13 , 345-353. Chem. Anticancer Agents, 2004 , 4, 439-443. [183] Shi, D.F.; Wheelhouse, R.T.; Sun, D.; Hurley, L.H. Quadruplex- [202] Cuenca, F.; Greciano, O.; Gunaratnam, M.; Haider, S.; Munnur, D.; interactive agents as telomerase inhibitors: synthesis of porphyrins Nanjunda, R.; Wils on, W.D.; Neidle, S. Tri- and tetra-substituted and structure-activity relationship for the inhibition of telomerase. naphthalene diimides as potent G-quadruplex ligands. Bioorg. Med. J. Med. Chem., 2001 , 44 , 4509-4523. Chem. Lett., 2008 , 18 , 1668-1673. [184] Neidle, S. Design Principles for Quadruplex-binding Small [203] Nanjunda, R.; Musetti, C.; Kumar, A.; Ismail, M.A.; Farahat, A.A.; Molecules. In Therapeutic Applications of Quadruplex Nucleic Wang, S.; Sissi, C.; Palumbo, M.; Boykin, D.W.; Wilson, W.D. Acids , Academic Press: Boston, 2012 ; pp 151-174. Heterocyclic dications as a new class of telomeric G-quadruplex [185] Sun, D.; Thompson, B.; Cathers, B.E.; Salazar, M.; Kerwin, S.M.; targeting agents. Curr. Pharm. Des., 2012 , 18 , 1934-1947. Trent, J.O.; Jenkins, T.C.; Neidle, S.; Hurley, L.H. Inhibition of [204] Cabrera-Pérez, M.A.; Castillo-González, D.; Pérez-González, M.; human telomerase by a G-quadruplex-interactive compound. J. Durán-Martínez, A. Telomerase Inhibitory Activity of Acridinic Med. Chem., 1997 , 40 , 2113-2116. Derivatives: A 3D-QSAR Approach. QSAR Comb. Sci. , 2009 , 28 , [186] Harrison, R.J.; Gowan, S.M.; Kelland, L.R.; Neidle, S. Human 526-536. telomerase inhibition by substituted acridine derivatives. Bioorg. [205] Incles, C.M.; Schultes, C.M.; Kempski, H.; Koehler, H.; Kelland, Med. Chem. Lett., 1999 , 9, 2463-2468. L.R.; Neidle, S. A G-quadruplex telomere targeting agent produces [187] Han, H.; Langley, D.R.; Rangan, A.; Hurley, L.H. Selective p16-associated senescence and chromosomal fusions in human interactions of cationic porphyrins with G-quadruplex structures. J. prostate cancer cells. Mol. Cancer Ther., 2004 , 3, 1201-1206. Am. Chem. Soc., 2001 , 123 , 8902-8913. [206] Harrison, R.J.; Reszka, A.P.; Haider, S.M.; Romagnoli, B.; Morrell, [188] Han, H.; Cliff, C.L.; Hurley, L.H. Accelerated assembly of G- J.; Read, M.A.; Gowan, S.M.; Incles, C.M.; Kelland, L.R.; Neidle, quadruplex structures by a small molecule. Biochemistry, 1999 , 38 , S. Evaluation of by disubstituted acridone derivatives as telomerase 6981-6986. inhibitors: the importance of G-quadruplex binding. Bioorg. Med. [189] Fedoroff, O.Y.; Salazar, M.; Han, H.; Chemeris, V.V.; Kerwin, Chem. Lett., 2004 , 14 , 5845-5849. S.M.; Hurley, L.H. NMR-Based model of a telomerase-inhibiting [207] Moore, M.J.; Schultes, C.M.; Cuesta, J.; Cuenca, F.; Gunaratnam, compound bound to G-quadruplex DNA. Biochemistry, 1998 , 37 , M.; Tanious, F.A.; Wilson, W.D.; Neidle, S. Trisubstituted 12367-12374. acridines as G-quadruplex telomere targeting agents. Effects of [190] Perry, P.J.; Read, M.A.; Davies, R.T.; Gowan, S.M.; Reszka, A.P.; extensions of the 3,6- and 9-side chains on quadruplex binding, Wood, A.A.; Kelland, L.R.; Neidle, S. 2,7-Disubstituted telomerase activity, and cell proliferation. J. Med. Chem., 2006 , 49 , amidofluorenone derivatives as inhibitors of human telomerase. J. 582-599. Med. Chem., 1999 , 42 , 2679-2684. [208] Hutchinson, I.; McCarroll, A.J.; Heald, R.A.; Stevens, M.F. Synthesis and properties of bioactive 2- and 3-amino-8-methyl-8H- 2856 Current Topics in Medicinal Chemistry , 2012, Vol . 12 , No . 24 Castillo-González et al.

quino[4,3,2-kl]acridine and 8,13-dimethyl-8H-quino[4,3,2- [220] Huang, H.S.; Chou, C.L.; Guo, C.L.; Yuan, C.L.; Lu, Y.C.; Shieh, kl]acridinium salts. Org. Biomol. Chem., 2004 , 2, 220-228. F.Y.; Lin, J.J. Human telomerase inhibition and cytotoxicity of [209] Schultes, C.M.; Guyen, B.; Cuesta, J.; Neidle, S. Synthesis, regioisomeric disubstituted amidoanthraquinones and biophysical and biological evaluation of 3,6-bis-amidoacridines aminoanthraquinones. Bioorg. Med. Chem., 2005 , 13 , 1435-1444. with extended 9-anilino substituents as potent G-quadruplex- [221] Huang, H.S.; Chen, I.B.; Huang, K.F.; Lu, W.C.; Shieh, F.Y.; binding telomerase inhibitors. Bioorg. Med. Chem. Lett., 2004 , 14 , Huang, Y.Y.; Huang, F.C.; Lin, J.J. Synthesis and human 4347-4351. telomerase inhibition of a series of regioisomeric disubstituted [210] Cookson, J.C.; Heald, R.A.; Stevens, M.F. Antitumor polycyclic amidoanthraquinones. Chem. Pharm. Bull. (Tokyo), 2007 , 55 , 284- acridines. 17. Synthesis and pharmaceutical profiles of pentacyclic 292. acridinium salts designed to destabilize telomeric integrity. J. Med. [222] Zagotto, G.; Sissi, C.; Lucatello, L.; Pivetta, C.; Cadamuro, S.A.; Chem., 2005 , 48 , 7198-7207. Fox, K.R.; Neidle, S.; Palumbo, M. Aminoacyl-anthraquinone [211] Heald, R.A.; Modi, C.; Cookson, J.C.; Hutchinson, I.; Laughton, conjugates as telomerase inhibitors: synthesis, biophysical and C.A.; Gowan, S.M.; Kelland, L.R.; Stevens, M.F. Antitumor biological evaluation. J. Med. Chem., 2008 , 51 , 5566-5574. polycyclic acridines. 8.(1) Synthesis and telomerase-inhibitory [223] Belmont, P.; Bosson, J.; Godet, T.; Tiano, M. Acridine and activity of methylated pentacyclic acridinium salts. J. Med. Chem., acridone derivatives, anticancer properties and synthetic methods: 2002 , 45 , 590-597. where are we now? Anticancer Agents Med. Chem., 2007 , 7, 139- [212] Todeschini, R.; Ballabio, D.; Consonni, V.; Mauri, A.; Pavan, M. 169. Mobydigs Computer Software , 1.0; TALETE srl: Milano, 2004 . [224] Zambre, V.P.; Murumkar, P.R.; Giridhar, R.; Yadav, M.R. [213] Castillo-Gonzalez, D.; Cabrera-Perez, M.A.; Perez-Gonzalez, M.; Development of highly predictive 3D-QSAR CoMSIA models for Morales Helguera, A.; Duran-Martinez, A. Prediction of telomerase anthraquinone and acridone derivatives as telomerase inhibitors inhibitory activity for acridinic derivatives based on chemical targeting G-quadruplex DNA telomere. J. Mol. Graph. Model., structure. Eur. J. Med. Chem., 2009 , 44 , 4826-4840. 2010 , 29 , 229-239. [214] Martins, C.; Gunaratnam, M.; Stuart, J.; Makwana, V.; Greciano, [225] Chen, S.B.; Tan, J.H.; Ou, T.M.; Huang, S.L.; An, L.K.; Luo, H.B.; O.; Reszka, A.P.; Kelland, L.R.; Neidle, S. Structure-based design Li, D.; Gu, L.Q.; Huang, Z.S. Pharmacophore-based discovery of of benzylamino-acridine compounds as G-quadruplex DNA triaryl-substituted imidazole as new telomeric G-quadruplex ligand. telomere targeting agents. Bioorg. Med. Chem. Lett., 2007 , 17 , Bioorg. Med. Chem. Lett., 2011 , 21 , 1004-1009. 2293-2298. [226] Hawkins, P.C.; Skillman, A.G.; Nicholls, A. Comparison of shape- [215] Todeschini, R.; Consonni, V.; Pavan, M. Dragon for windows matching and docking as virtual screening tools. J. Med. Chem., (Software for molecular descriptors calculation) , Version 5.4; 2007 , 50 , 74-82. www.talete.mi.it: 2006 . [227] Gonzalez, D.C.; Machado, G.P.; Guedin, A.; Mergny, J.L.; [216] Neidle, S.; Read, M.A. G-quadruplexes as therapeutic targets. Cabrera-Perez, M.A. FDA-approved drugs selected using virtual Biopolymers, 2000 , 56 , 195-208. screening bind specifically to G-quadruplex DNA. Curr. Pharm. [217] Zambre, V.P.; Murumkar, P.R.; Giridhar, R.; Yadav, M.R. De s., 2013 .(In press) Structural investigations of acridine derivatives by CoMFA and [228] Knox, C.; Law, V.; Jewison, T.; Liu, P.; Ly, S.; Frolkis, A.; Pon, CoMSIA reveal novel insight into their structures toward DNA G- A.; Banco, K.; Mak, C.; Neveu, V.; Djoumbou, Y.; Eisner, R.; quadruplex mediated telomerase inhibition and offer a highly Guo, A.C.; Wishart, D.S. DrugBank 3.0: a comprehensive resource predictive 3D-model for substituted acridines. J. Chem. Inf. Model., for 'omics' research on drugs. Nucleic Acids Res., 2011 , 39 , 1035- 2009 , 49 , 1298-1311. 1041. [218] Cramer, R.D.; Patterson, D.E.; Bunce, J.D. Comparative molecular [229] Wishart, D.S.; Knox, C.; Guo, A.C.; Cheng, D.; Shrivastava, S.; field analysis (CoMFA). 1. Effect of shape on binding of steroids to Tzur, D.; Gautam, B.; Hassanali, M. DrugBank: a knowledgebase carrier proteins. J. Am. Chem. Soc., 1988 , 110 , 5959-5967. for drugs, drug actions and drug targets. Nucleic Acids Res., 2008 , [219] Klebe, G.; Abraham, U.; Mietzner, T. Molecular similarity indices 36 , 901-906. in a comparative analysis (CoMSIA) of drug molecules to correlate [230] Wishart, D.S.; Knox, C.; Guo, A.C.; Shrivastava, S.; Hassanali, M.; and predict their biological activity. J. Med. Chem., 1994 , 37 , Stothard, P.; Chang, Z.; Woolsey, J. DrugBank: a comprehensive 4130-4146. resource for in silico drug discovery and exploration. Nucleic Acids Res., 2006 , 34 , 668-672.

Received: September 15, 2012 Revised: November 23, 2012 Accepted: December 09, 2012 Chapter 3. .

The identification of compounds that form G4 structures on other regions of the genome has not been explored much, since usually far from the human telomere and sequences such as c-myc , reports of ligands with such activity are not abundant. This seems as a limit for the realization of multi-objective modeling studies that allow to predict compounds that can act as stabilizing and stacking agents on the different sequences with possibilities of forming G4 structures. Given the abundance in the genome of the regions with G4 propensity and the definition of the so called cancer hallmarks, the existence of models and techniques that may allow finding structure stabilizers and that and the same time are able of being extrapolated to all those regions, could become a valuable tool. The current number of G4 compounds found through the traditional methods of trial and error is quite high. Nevertheless, those compounds are seldom fully characterized. Computational methodologies have not been rationally exploited with the objective of facilitating the discovery process of new G4 ligands. The discovery, design and use of small molecules as G4 agents is a rapidly growing field, not only for telomeres, but also in different regions of the genome. Nevertheless, we think it is important to stress at this point that some conclusions resulting from the use of computational tools have not been used specially for the prediction of new compounds but rather to establish the structural characteristics G4 agents should have in order to obtain a good stacking activity. We present a short systematization of some features which have been reached through SAR studies.

3.2 Systematization of some characteristics of the G4 ligands proposal by SAR studies The design of new G4 compounds it is a challenge. There are many features to take in account. The aromatic groups alone are insufficient to confer high binding affinity, and their hydrophobicity reduces water solubility. These groups have frequently been functionalized with one or more cationic side chains to improve solubility and enhance interactions with the phosphates, loops or grooves found in G-quadruplex structures (224). Some of the main characteristics and functional groups most used in G4 compounds are listed in this section. For a detailed view of the molecular structures referred in this section, see the supplementary material of the article: “FDA -approved Drugs Selected Using Virtual Screening Bind Specifically to G-quadruplex DNA” §5.2.

51

Chapter 3. .

3.2.1 Groove binders Among the three major interaction modes between small molecules and DNA, electrostatic, intercalation and groove binding, the latter potentially affords the highest sequence specificity (225). Classic groove binders prefer the narrower regions with negatively polarized minor grooves that permit favorable interactions with their cationic moieties. The shapes of these molecules typically match the curvature of the groove, thus allowing a snugly fit under formation of van der Waals contacts and hydrogen bonds. Perylenes have been studied as grooves binders. Studies have been done in order to check the importance and the optimum number of nitrogen atoms in the molecular structure to achieve a good stabilization. The conclusion it is that in presence of appropriated stacking interactions the fraction of charged molecules, specifically nitrogens, is related with the strength of the electrostatic interaction with the DNA groove (165). Other studies have been focused in the effects of tri-, tetra- and heptacyclic perylene analogues on stabilization. The study shows that in perylene analogues a conjugated system consisting of more than four rings is required to efficiently target G-quartets. It also explained the main characteristic concerning to the number and location of the side chains that are crucial for selective targeting of DNA structures (173). For 3,6-bis-amidoacridines with extended 9-anilino substituents, one of the most important factors that maximize the stabilization of the G4 it is the flexibility of the 9- substituted aliphatic amine chains, because this would allow the chains to probe the quadruplex surface in order to optimize charge –charge contacts with the phosphate backbone (172). For this kind of acridines the rigidity and bulkiness of the substituent in the 9- position is also very important, because it is associated with lower stabilization values than the aliphatic analogs. The steric bulk at the 3- and 6-positions is both unfavorable for quadruplex binding and detrimental to duplex affinity (see figure 3.1 and table 3.1) (35). A similar behavior it observed with the perylenes concerning the influence of the positives charge (172). The optimum length between the amido and amino groups in the 3,6 side chains is –(CH 2)2-, and the elongation of the 3- and 6-side chains significantly decreased quadruplex affinity. A propionamide-based linker in the 9-position is optimal in this series (35).

52

Chapter 3. .

Figure 3.1. Steric bulk effects at the 3- and 6-positions in acridines .

Table 3.1. Steric bulk effects at the 3- and 6-positions in acridines . Table data’s it is referred to compounds in figure 3.1. Values taken from (35, 151).

acridine ΔTm (°C) KG4DNA (x10 6 M-1) 1a n=2 27.5 31.0 1b n=3 16.5 4.8 1c n=4 6.8 1.8 1d n=5 4.7 0.6 2 nd 1.3 3 nd 0.83 ΔTm(1 μM), change in melting temperature at 1 μM drug concentration; KG4DNA , binding constant of compound to quadruplex DNA; nd, not determined.

The berberines derivatives are isolated form Chinese herbs (226). For these compounds, it has been demonstrated that the N + containing aromatic moiety appeared suitable for π–π stacking interactions with a G -quartet. It has also been described that the introduction of a side chain with a terminal amino group in the 9-position of the berberine significantly increase the stabilization effect on the telomeric G-quadruplex. The inclusion of a substituent in the nine position is favorable for strengthening the electrostatic interactions with the phosphate backbone of G-quadruplex DNA. For the cationic porphyrins several characteristics contribute to a good stabilization. The main important features are: one face of the porphyrin must be available for stacking, positively charged substituents are important but can be interchanged with Hydrogen bonding groups, the substitution is tolerated only on the meso positions of the

53

Chapter 3. . porphyrin and the size of the substituents should match the widths of the grooves in which they lie (168, 227). The quinazoline derivatives have been developed as an imitative aromatic system of the indoloquinoline that have G4 stabilizing activity, but do not exhibit a very high selectivity with respect to the duplex DNA. For quinazoline derivatives, morpholino groups substituents lead to a decrease in stabilizing activity (see figure 3.2 and table 3.2). Meanwhile, compounds with a more basic amino terminal group such as the N-methylpiperazino analogues or longer amino terminal 3- dimethylaminopropylamino analogues are more effective. The presence of chlorine substituents in this kind of structure has a strong positive contribution to the stabilizing properties (see figure 3.2 and table 3.2). The introduction of a phenyl group through an amide bond is an effective and efficient approach for obtaining more biologically active telomeric G-quadruplex ligands (228). For the quindoline derivatives, an amino group in the side chain is more effective than an hydroxyl group to interact with G- quadruplex. The inclusion of a positive charge by methylation of one of the nitrogen present in the structure of quindoline derivatives, improves their binding affinity to G- quadruplexes and biological effect on telomeres (229, 230).

Figure 3.2. Example of the chlorine and morpholine contributions to the activity in quinazoline derivatives .

54

Chapter 3. .

Table 3.2. G4-stabilization for quinazoline derivatives . ΔTm values were obtained by CD experiments, measurements were done at 265 nm in KCl buffer (10 mM Tris-HCl, 60 mM KCl, pH 7.4), ligands are referred according to figure 3.2 (228).

ΔTm (°C) Ligand a b c d e 11 2.1 7.3 0.5 3.1 NR 12 8.2 8.7 4.3 8 12.4 13 21.6 21.4 11.9 21.8 NR 14 19.6 18.7 15.3 21.7 >28 15 21.9 >28 21.6 >28 >28 “NR ”, not reported. The effects of chlorine and 3-dimethylaminopropylamino substitution is brought in orange. Morpholine effects it is brought in blue.

3.2.2 Loop binders Contacts between the ligand and the bases in the loop alter the conformations of the loops (225, 231, 232). The sequence and lengths of the loops are important features in controlling quadruplex topology and stability. Clear examples about this alter can be seen in the following references (231, 232), where the authors studied the disubstituted acridines effects in the G4 loops. This provides an indication that loop sequence and flexibility are likely acting as modulators of the binding affinity and specificity of aromatic ligands that interact with G-quadruplexes. It is clear that the loop geometry can affect binding to G-quadruplexes substantially and that both G-tetrad surfaces at the ends of quadruplexes are the preferred sites of interaction for planar aromatic ligands. Conversely, intercalation between G-tetrads layers does not seem to occur; perhaps stacked G-tetrads do not allow for sufficient breathing to enable an aromatic compound, especially one with bulky substituents, to insert itself between layers of guanines and chase a tightly coordinated monocation (225). The flavones have been designed for interacting with the loops of the c-myc oncogene quadruplex. These structures have the ability to induce the formation of G4, even in the absence of monovalent cations. The pyridinium substituted flavones that contains 5 or 6-carbon aliphatic alkyl linkers show better potencies when compared with other analogs (233). For amido-anthraquinone derivatives (Figure 3.3) some studies have been done. Different studies have concluded that the disubstituted anthraquinones with progressively increasingly bulky substituents have close relationship between telomerase inhibition and binding energy for the G-quadruplex structure formed by

55

Chapter 3. . human telomeric DNA sequences (234, 235). The activity of 1,5-bisthioanthraquinones and 1,5-bisacyloxyanthraquinones shows that the chemical and biological activities of these compounds are greatly affected by various substituents of the planar ring system. The optimal substituents of the various homologues of 1,5-bisacyloxyanthraquinone are propionyloxy, pivaloyoxy, and 2,4-dichlorobenzoyl, while 1,5-bisthio anthraquinones did not exhibit any significant effects. The 1,5-disubstituted derivative also shows some interaction with duplex DNA (236). For the case of the 1,4-diamidoanthraquinone derivatives it has been concluded that the exact mode of binding may be dictated by the positional placement of substituent side chains. The amido substitution may lead to a different mechanism of cytotoxicity. Compounds which have –(CH2)n– side chains terminating with basic groups such as aminoalkyl-substituted, showed cytotoxic effects (237). Aminoacyl residues are responsible for sequence-selective recognition through groove interactions. Factors as slope, charge density and pattern of potential hydrogen bonding are necessary to take in account to the design of new compounds (238). In the case of 2,7 derivatives the optimum number of methyl groups between amido and amine is two, and the authors propose different functional substituents at the end of the chain to increase stabilizing and antiproliferative activities (239).

Figure 3.3. Chemical structure and substitution pattern of anthraquinone derivatives . (A) amidoanthraquinone (1,4). (B) aminoanthraquinone (1,4).

56

Chapter 4. .

Chapter 4. Linear multivariate techniques for evaluation of a congeneric set of compounds acting on G-quadruplexes.

Introduction Diverse strategies have been tested for ability to inhibit the telomerase enzyme by stabilizing the formation of G-quadruplex structures. Among the first families of ligands identified were anthraquinones (234, 240). The acridines, structurally similar to anthraquinones, were then characterized (153, 241, 242). Both have a planar tricyclic core but carbonyl groups in the anthraquinones are replaced in acridines by one nitrogen atom in antracenic ring. For the acridines it has been possible to establish a direct correlation between G4-stabilisation, as measured in the FRET assay and telomerase inhibition (35, 172). In a search for applications of QSAR studies to anti-neoplastic activity, we found a wide range of studies related to different mechanisms of action and families of compounds but not many references related to the application of QSAR studies to the design and prediction of new antitumor compounds based on the inhibition of telomerase through the stabilization of the G-quadruplex structure (243-245). A more detailed analysis of how and which have been the computational methods employed for the prediction and search of new compounds with antitumor activity by formation of G4 can be found in chapter 3. Two of the works to which we refer on this revision article are presented below. These are two articles that employ QSAR to identify ligands that stabilize the G-quadruplex structure. The use of multivariate linear relationships for the modeling of activity in congeneric series will be described with a focus on the study of the acridines.

57

Chapter 4. .

4.1 Article 2. Telomerase Inhibitory Activity of Acridinic Derivatives: A 3D- QSAR Approach. QSAR & Comb. Sci. 28, 2009, No. 5, 526 – 536 Miguel Ángel Cabrera-Pérez, Daimel Castillo-González, Maykel Pérez-González and Alexander Durán- Martínez.

Summary This work emerges as the first quantitative approximation to predict the average telomerase inhibitory concentration for a congeneric series of acridines. The acridines have been broadly studied as telomerase inhibitors via the stabilization of the G4 structures and the most part of them have been evaluated and developed by the same team, making this study for this specific family of compounds especially relevant. On this work, beyond achieving an interpretation of how the molecular descriptors can relate with the activity, we aim to find high predictive capacity and adequate statistic fitness models. The main contribution of this work in terms of modeling is the employment of genetic algorithms for the variables selection and the establishment of the model. The MOBIDIGS software allow the simultaneous realization of various models with their respective and different internal validation techniques, as well as establishing parameters that may eliminate or minimize the collinearity between predictor variables and the response variable. On the other hand, predictive power is confirmed by the evaluation of an external prediction set. Analysis concerning the number of ideal variables and the kind of descriptors to be used are carried out and discussed in depth. Moreover, the application domain for the model is presented. We also examine how the molecular activity can relate with the information concerning the geometry of the molecules, codified as tridimensional molecular descriptors.

58

Full Papers

Telomerase Inhibitory Activity of Acridinic Derivatives: A 3D-QSAR Approach

Miguel Angel Cabrera-Pe´rez a*, Daimel Castillo-Gonza´lez a, b , Maykel Pe´rez-Gonza´lez a and Alexander Dura´n-Martínez c a Molecular Simulation and Drug Design Group, Centre of Chemical Bioactive, Central University of Las Villas, Santa Clara 54830, Villa Clara, Cuba, Fax: (53)-42-281130; E-mail: [email protected] b Department of Pharmacy, Faculty of Chemistry and Pharmacy, Central University of Las Villas, Santa Clara 54830, Villa Clara, Cuba c Department of Chemistry, Faculty of Chemistry and Pharmacy, Central University of Las Villas, Santa Clara 54830, Villa Clara, Cuba

Keywords: Acridines, Acridones, Drug design, G-quartet, QSAR

Received: April 21, 2008; Revised: December 04, 2008; Accepted: December 23, 2008

DOI: 10.1002/qsar.200860042

Abstract Telomerase is a reverse transcriptase enzyme that activates in more than 85% of cancer cells and it associated with the acquisition of a malignant phenotype. Some experimental strategies have been suggested in order to avoid the enzyme effect on unstopped telomere elongation. One of them, the stabilization of the G-quartet structure, has been widely studied. Nevertheless, no QSAR studies to predict this activity have been developed. In the present study, several regression models were developed to identify, through 3-D molecular descriptors, those acridinic derivatives with better inhibitory activity on the tel telomerase enzyme (log EC 50 ). Linear regression models were developed from a dataset of 85 acridinic derivatives and the best results were achieved using GETAWAY and WHIM molecular descriptors. The final model explained 80% of the variance and the 2 ¼ predictive ability was assessed by a leave-one-out cross-validation ( QLOO 74.3), a 2 ¼ ¼ prediction set (21 compounds of the 85; Rpred 71.50 and SDEPpred 0.366), and the prediction of inhibitory activity on telomerase enzyme for external set of ten novel acridines. The results of this study suggest that the established model has a strong predictive ability and can be prospectively used in the molecular design and action mechanism analysis of this kind of compounds with anticancer activity.

1 Introduction DNA sequences that are rich in guanines can adopt un- usual secondary DNA structural forms. In particular, four Telomerase is a complex ribonucleoprotein reverse tran- guanine bases can be associated themselves in a planar, scriptase composed by an RNA template (hTR) and a cat- hydrogen bonded assembly called a G-tetrad (or quartet) alytic protein domain (hTERT) [1]. Telomerase activity where each guanine simultaneously accepts and donates has been detected in 85 – 90% of all human cancers and two hydrogen bonds in a reverse Hoogsteen arrangement. may be essential for cell immortality [1 – 3]. Most immor- Successive layers of G-tetrads allow such single-stranded talized cell lines and malignant human tumors appear to DNA to adopt a high-order fold-back structure in solution maintain constant telomere length via telomerase activity termed a G-tetraplex (or “quadruplex”) [12, 13]. [4]. Taking into consideration that telomerase represents a Many compounds such as porphyrins, disubstituted an- suitable target for specific anticancer therapies; several thraquinones, acridines, dibenzophenanthroline, and pery- strategies for the inhibition of this enzyme have been de- lene derivatives that induce and/or stabilize G-quadruplex signed and evaluated. One of them is related with the di- and inhibit telomerase in “cell-free” systems, have been rect action on the functional RNA or hTR of the enzyme studied [14 – 21]. All these molecules have an aromatic and some compounds such as antisense oligonucleotides core favoring stacking interactions with the G-quartets of and reverse transcriptase inhibitors have been tested [5 – the G-quadruplex and also a positively charged side chain 7]. Other strategy is based on the inhibition of telomerase activity by interacting with G-quadruplex DNA and, in this sense, some small-molecule inhibitors have been re- Supporting information for this article is available on ported [8 – 11]. the WWW under www.qcs.wiley-vch.de

526  2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim QSAR Comb. Sci. 28 , 2009, No. 5, 526 – 536 Telomerase Inhibitory Activity of Acridinic Derivatives: A 3D-QSAR Approach that facilitates a specific interaction with the DNA grooves. Quantitative Structure – Activity Relationship (QSAR) studies have demonstrated to be a very useful tool in the design and development of novel compounds with differ- ent biological activities. One of the main fields where these kind of studies have been carried out is in the prediction of antitumoral activity [22, 23]. Nevertheless, the telomer- ase inhibitory activity of potential candidates has not been studied using the QSAR methodology. For this reason, in the present paper we intent to develop a QSAR study of a family of acridinic compounds with demonstrated telomer- ase inhibitory activity based on stabilization of G-quadru- plex. The main goal of the present research is to develop regression models in order to identify the more potent ac- ridinic derivatives that permit the design of novel candi- dates taking into consideration the structural information Figure 1. A dendrogram illustrating the results of the hierarch- contained in 3-D molecular descriptors. ical cluster analysis of the acridinic derivatives used in the train- ing set of the present work.

2 Materials and Methods The k-MCA was developed with the STATISTICA soft- ware 6.0 [30]. For acceptable statistical quality of data clus- 2.1 Datasets and Molecular Descriptors ters, we took into account the number of members in each A dataset of 90 acridinic derivatives was carefully assem- cluster and the standard deviation of the variables in the bled from the literature [16 – 20, 24 – 28]. For the complete cluster (as low as possible). We also inspected the standard data was reported the molecular structure and the inhibi- deviation between and within clusters, the respective Fish- tory activity, by stabilization of G-quartet structure, er ratio and their p-level of significance considered to be against telomerase enzyme (see Table S1 in the Supporting lower than 0.05. Finally, the complete dataset was split in Information). Five compounds of the dataset (10, 11, 12, five clusters. Compounds for the training and prediction 16, and 54) were excluded of the analysis because their val- sets were randomly collected from each cluster. This pro- ues of telomerase inhibitory activities were not clearly de- cedure permitted us to select compounds for the training fined (ranges). Finally, 85 compounds were used to obtain and prediction sets, in a representative way, in all level of the regression models. The detection of telomerase activity the linking distance (Y-axis). Two dendrograms illustrating was carried out using the modified cell-free Telomeric Re- the results of the cluster analysis were developed. See Fig- peat Amplification Protocol (TRAP) assay using protein ures 1 and 2. extract from the A2780 ovarian carcinoma cell line. This As can be seen, there is a great number of different sub- experimental method has certain variability and the quan- sets, which demonstrate the molecular variability of the se- titative value of telomerase inhibition fluctuates from one lected compounds. experiment to the next; however, the fluctuation within Finally, the training and prediction sets were composed the same experiment is Æ10% [29]. by 64 and 21 compounds ( 25% of the whole data), re- In order to obtain a validated and predicted QSAR spectively. The compounds belonging to the prediction set model, the available dataset was divided into training and were never used in the development of the regression prediction sets. Ideally, the division into the training and models and they were reserved to estimate the predictive prediction set must satisfy some conditions such as: (i) the power of the regression models derived from the training representative compound-points of the prediction set in set. the multidimensional descriptor space must be close to The 3-D molecular descriptors used to search the best those of the training set, (ii) all the representative points regression models were calculated by the DRAGON soft- of the training set must be close to those of the prediction ware [31]. Besides, we carried out a preliminary MM þge- set, and (iii) the representative points of the training set ometry optimization calculations for each compound of must be distributed within the whole area occupied by the this study, and then using the quantum chemical semi-em- entire dataset. Taking into consideration the previous pirical method AM1 [32] included in MOPAC 6.0 comput- mentioned, the division into training, and prediction sets er software [33] determined the (x, y, z)-atomic coordi- was carried out using a k-Means Cluster Analysis (k- nates of the minimal energy conformations for each one. MCA). Six families of descriptors were used: Randic, geometrical, Radial Distribution Function (RDF), 3-D Molecule Rep-

QSAR Comb. Sci. 28 , 2009, No. 5, 526 – 536 www.qcs.wiley-vch.de  2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 527 Full Papers Miguel Angel Cabrera-Pe´rez et al.

Figure 2. A dendrogram illustrating the results of the hierarchical cluster analysis of the acridinic derivatives used in the prediction set of the present work. resentation of Structures Based on Electron Diffraction suited to the modeled endpoint than the individuals they (3D-MoRSE), Weighted Holistic Invariant Molecular De- were created from, just as in natural selection. scriptors (WHIM), and GEometry, Topology, and Atom- In this study the GA simulation conditions were 10000 Weights AssemblY (GETAWAY). The input of the MO- generations, number of crossover 5000, smoothness factor PAC 6.0 software consists of SMILES ( simplified molecu- 1, mutation probability for adding new term 50%, and 300 lar input line entry specification) codes for each compound models population. The scoring function defined to assess [34]. the fitness of the chromosome was the determination coef- 2 ficient of the leave-one-out cross-validation ( QLOO ). The models were linear combinations of five descriptors and 2.2 Chemometric Methods this procedure was repeated several times, for each family The Multiple Linear Regression (MLR) analysis and the of descriptors, to confirm that the selected ones are the op- variable selection procedure were performed by using the timal set of descriptors for describing the modeled proper- software MOBYDIGS [35]. For each case were employed ty. the Ordinary Least Squares (OLS) and Genetic Algo- rithm-Variable Subset Selection (GA-VSS) methods, re- 2.3 Validation of the Models spectively [36]. GA is a method based on biological evolu- tion rules that have been applied in descriptor selection to In recent years, exhaustive validation of mathematical build a variety of models. This stochastic search method models constitutes a main key of current QSAR theory. In mimics biological evolution by undertaking a survival of this sense, we carried out an “internal” validation that in- the fittest approach. This involves a population of solu- cluded a leave-one-out cross-validation, bootstrapping and tions, consisting of individuals known as chromosomes. Y-scrambling methodologies to demonstrate the robust- Once a chromosome has been created, it fitness is assessed ness and predictive ability of the model. using a defined scoring function. This process is repeated All the calculations were performed maximizing the for every individual in the population with the chromo- cross-validated R2 (Q2, leave-one-out) by applying the somes then being ranked. The best solutions are chosen to QUIK rule (only models with a global correlation of [ Xþ survive unchanged, while the remainders are subjected to Y] block ( KXY ) greater than the global correlation of the X crossover and mutation, the outcome of which mimics bio- block (KXX ) variable, X being the molecular descriptors, logical evolution to produce a new population. This cycle and Y the response variable, are accepted) [37]. The colli- is then repeated for a specified number of iterations. This nearity in the original set of molecular descriptors leads to leads to a final population (of descriptors) that is better many similar models yielding more or less the same pre-

528  2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 28 , 2009, No. 5, 526 – 536 Telomerase Inhibitory Activity of Acridinic Derivatives: A 3D-QSAR Approach dictive power (in MOBYDIGS software 100 models of dif- As can be appreciated in this table the better fitted ferent dimensionality). Therefore, among models of simi- models were achieved with GETAWAY descriptors ( R2 ¼ 2 lar performance those with higher DK (KXY ÀKXX ) were 75.81) and using all 3-D descriptors (R ¼74.92). Both selected. Later, for these selected models a stronger inter- models also had the better statistical parameters for ro- 2 2 nal validation procedures using leave-one-out validation bustness and internal predictive ability (QLOO and QBOOT ) 2 (QLOO ), drawing multiple random samples with replace- compared with the rest of models. Although a cluster anal- 2 ment (bootstrapping, QBOOT ) and response permutation ysis was used to select the training and prediction sets, the testing (Y-scrambling) were carried out [38, 39]. Other reliability of both models was also tested by a ten-fold ran- useful statistical parameters considered were the Standard dom selection of the training and prediction sets in order Deviation Error in Calculation (SDEC) for the training to reduce the result dependency of the data splitting. The set and the Standard Deviation Error in Prediction mean statistical results are depicted in Table 2 (bold num- (SDEP), calculated for the cross-validation. Both parame- bers). Finally, the better mean statistical values were ach- ters provide a more reliable indication of the fitness of the ieved with the model obtained with all 3-D molecular de- model. scriptors. The main target of any QSAR modeling is that the de- As can be also seen in the models described in Table 2, 2 veloped model should be robust enough to be capable of the QLOO values were lower than 0.7 which indicate, in my making accurate and reliable predictions of biological ac- opinion, models with low robustness and low internal pre- tivities of new compounds [40, 41]. So, QSAR models that dictive ability. At this point we investigated whether R2 are developed from a training set should be validated us- and Q2 are suitable criteria to determine the accuracy and ing new chemical entities for checking the predictive ca- predictivity of the models and we also analyzed the influ- pacity of the developed models. The validation strategies ence of the number of additional descriptors in models check the reliability of the developed models for their pos- with regard to goodness of fit. The genetic algorithm was sible application on a new set of data from the same data used to build models containing an increasing number of domain. descriptors (starting with one and with a maximum of ten). In this sense, the predictive capability of the models was Taking into consideration that the main goal of any evaluated through the calculation of the predictive coeffi- QSAR study is the potential use in external prediction, 2 ¼ cient (Rpred 1-PRESS/SD), where PRESS is the sum of some estimation of the quality of the model is required. squared differences between the measured response and However, the natural complexity of any biological end- the predicted value for each molecule in the prediction set points means that it is difficult to develop a mathematical and SD is the sum of squared deviations between the mea- model which will include all of the intrinsic mechanism of sured response for each molecule in the prediction set and relevant processes. Therefore, a QSAR model will always the mean observed value of the training set. The standard contain simplifications. As a result, predictions derived 1/2 deviation error in prediction [SDEP pred ¼(PRESS/n) ] from these models can never be entirely accurate. Specifi- was also calculated for the prediction set. cally with regard to QSARs, models validated internally In this study, a more demanding evaluation is provided (e.g. , the previous models with high R2 and Q2), do not by an external validation, where the predictive power of necessarily generate accurate predictions for new data. 2 2 2 the model was assessed with an external data of ten tri- For these reason, the model statistics ( R , QLOO , QBOOT , 2 substituted acridines reported in the literature [42]. Rpred , SDEP pred , and F) were compared in order to select Finally, the statistical goodness of fit and the number of the model with better predictive ability. parameters that have to be estimated to achieve that de- Table 3 summarizes all MRL models with their different gree of fit was considered with the Akaikes Information statistical parameters of fitness and predictive ability. Criterion (AIC). As can be seen the results from MLR confirmed the well-known phenomenon that increasing the number of 2 2 descriptors in an MLR increased R and QLOO (see Ta- ble 3). 3 Results and Discussion Based on internal validation characteristic is important to note that the best model is the equation with ten varia- 3.1 Development of the Regression Model bles with the highest R2 for training set (R2 ¼85.64) and 2 2 ¼ The regression equations to predict the telomerase inhibi- the QLOO value is also high ( QLOO 79.39); however, the 2 tory activity by stabilization of G-quadruplex, in the train- predictive power of this model is really poor ( Rpred and ing set, were obtained using the complete family of 3-D SDEP pred values of 37.63 and 0.541, respectively). Never- 2 molecular descriptors. The meaning of the variables in- theless, the model with seven variables shows Rpred and cluded in each model appears in Table 1. SDEP pred values of 71.50 and 0.366, respectively, which are The best fitted and predicted models, for each family of the best values compared with the rest of models. If those descriptors, together with the statistical parameters of the parameters are analyzed, there is an inflexion point in this 2 regression, are shown in Table 2. model, after which a local maximum for Rpred and mini-

QSAR Comb. Sci. 28 , 2009, No. 5, 526 – 536 www.qcs.wiley-vch.de  2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 529 Full Papers Miguel Angel Cabrera-Pe´rez et al.

Table 1. The 3-D molecular descriptors of the QSAR regression models reported in this study. 3 D descriptor Meaning Randic DP01 Molecular profile no.01 DP16 Molecular profile no.16 SP14 Shape profile no.14 SP17 Shape profile no.17 SP18 Shape profile no.18 Geometrical ASP Asphericity SPAN Span R QYYe Qyy COMMA 2 value/weighted by atomic Sanderson electronegativities QXXp Qxx COMMA 2 value/weighted by atomic polarizabilities QZZp Qzz COMMA 2 value/weighted by atomic polarizabilities G(N..N) Sum of geometrical distance between N..N RDF RDF095u RDF-9.5/unweighted RDF045m RDF-4.5/weighted by atomic masses RDF080m RDF-8.0/weighted by atomic masses RDF115m RDF-11.5/weighted by atomic masses RDF040v RDF-4.0/weighted by atomic van der Waals volumes RDF155v RDF-15.5/weighted by atomic van der Waals volumes 3D-MoRSE Mor22u 3D-MoRSE signal 22/unweighted Mor22m 3D-MoRSE signal 22/weighted by atomic masses Mor07e 3D-MoRSE signal 07/weighted by atomic Sanderson electronegativities Mor22e 3D-MoRSE signal 22/weighted by atomic Sanderson electronegativities Mor10p 3D-MoRSE signal 10/weighted by atomic polarizabilities Mor16e 3D-MoRSE signal 16/weighted by atomic Sanderson electronegativities WHIM G1v First component symmetry directional WHIM index/weighted by atomic van der Waals volumes G1e First component symmetry directional WHIM index/weighted by atomic Sanderson electronegativities G3u Third component symmetry directional WHIM index/unweighted E2u Second component accessibility directional WHIM index/unweighted E3u Third component accessibility directional WHIM index/unweighted E1v First component accessibility directional WHIM index/weighted by atomic van der Waals volumes E1e First component accessibility directional WHIM index/weighted by atomic Sanderson electronegativities E2s Second component accessibility directional WHIM index/weighted by atomic electrotopological states Du D total accessibility index/unweighted De D total accessibility index/weighted by atomic Sanderson electronegativities GETAWAY HATS7p Leverage weighted autocorrelation of lag 7/weighted by atomic polarizabilities HATS8p Leverage weighted autocorrelation of lag 8/weighted by atomic polarizabilities HATS7v Leverage weighted autocorrelation of lag 7/weighted by atomic van der Waals volumes REIG First eigenvalue of R matrix R2eþ R maximal autocorrelation of lag 2/weighted by atomic Sanderson electronegativities R2uþ R maximal autocorrelation of lag 2/unweighted R6m R autocorrelation of lag 6/weighted by atomic masses R6mþ R maximal autocorrelation of lag 6/weighted by atomic masses

tel mum for SDEPpred are achieved, suggesting some over-fit- log EC 50 ¼1.28þ6.98G3uþ ting problems for the equations with greater number of 4.99E3uÀ6.84DeÀ53.19HATS7v þ9.63·REIG þ variables. The internal validation parameters of this equa- 12.23R6mÀ71.83R6m tion are also very good. ¼ 2 ¼ 2 ¼ 2 ¼ 2 ¼ Taking into consideration the previous results, the best N 64, R 80.04, QLOO 74.31, QBOOT 72.42, Rpred predictive model obtained to determine the telomerase in- 71.50, F(7,56)¼32.09, p<0.0001, SDEPpred ¼0.366 (Eq. 1) hibitory activity, by stabilization of G-quadruplex, is given below together with the statistical parameters of the re- Often in QSAR study outliers are assumed to be errone- gression: ous data, or observations that cannot be explained al-

530  2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 28 , 2009, No. 5, 526 – 536 Telomerase Inhibitory Activity of Acridinic Derivatives: A 3D-QSAR Approach

Table 2. Regression models obtained for each family of 3-D molecular descriptors.

2 2 2 3-D molecular descriptor Equation variables R QLOO QBOOT F SDEC SDEP AIC Randic DP01, DP16, SP14, SP17, SP18 50.78 43.15 40.89 11.97 0.551 0.592 0.404 Geometrical ASP, QYYe, QXXp, QZZp, G(N..N) 60.82 53.89 51.44 18.01 0.492 0.533 0.322 RDF RDF095u, RDF045m, RDF080m, RDF115m, RDF040v 65.47 56.00 54.13 21.99 0.461 0.521 0.284 3D-MoRSE Mor22u, Mor22m, Mor07e, Mor22e, M0r10p 62.97 55.06 53.04 19.73 0.478 0.526 0.304 WHIM G3u, E2u, E1v, E1e, Du 63.98 58.23 56.17 20.6 0.471 0.508 0.296 GETAWAY HATS7p, REIG, R2u þ, R6m, R6m þ 75.81 70.88 69.32 36.36 0.386 0.424 0.199 – – 72.74 67.11 65.37 31.28 – – – All 3-D descriptors De, HATS7v, REIG, R6m, R6m þ 74.92 69.49 67.13 34.65 0.393 0.434 0.206 – – 73.56 67.70 65.98 32.78 – – –

Bold numbers indicate the mean statistical parameters for the best two models.

Table 3. Regression models obtained with all 3-D molecular descriptors and different variable numbers.

2 2 2 2 Variable Model equation R QLOO QBOOT F Rprep SDEPpred SDEP ext numbers 1 E1e 49.60 46.47 47.25 61.01 55.09 0.459 – 2 E3u, E1e 55.25 51.06 48.34 37.66 58.19 0.443 – 3 SPAN, E1e, HATS8p 63.06 57.95 55.99 34.15 53.61 0.467 – 4 RDF155v, E1e, HATS7v, R6m 68.28 63.11 61.49 31.75 61.33 0.426 – 5 De, HATS7v, REIG, R6m, R6m þ 74.92 69.49 67.13 34.65 64.85 0.406 – 6 E3u, De, HATS7v, REIG, R6m, R6m þ 77.76 72.18 69.85 33.21 70.29 0.374 – 7 G3u, E3u, De, HATS7v, REIG, R6m, R6m þ 80.04 74.31 72.42 32.09 71.50 0.366 0.462 8 G3u, E3u, E1e, HATS7v, REIG, R6m, R6m þ, R2e þ 80.90 74.96 71.19 29.12 63.40 0.415 0.632 9 Mor16e, E3u, G1v, G1e, E1e, HATS7v, R6m, R6m þ, R2e þ 83.78 77.36 74.14 31.00 53.01 0.470 0.761 10 Mor16e, E3u, G1v, G1e, E1e, E2s, HATS7p, R2u þ, R6m, R6m þ 85.64 79.39 74.87 31.60 37.63 0.541 0.564

Bold model indicate the best model obtained.

though they are commonly valid observations and one of All compounds together with the descriptor values ap- the most interesting parts of the dataset. In this study al- pear in Table S2 of the Supporting Information. though the compound 39 was considered an outlier (resid- Four variables of the equation (HATS7v, REIG, R6m, ual value of À1.10) it was not removed because the sub- and R6mþ) are GETAWAY descriptors and they encode stituent in 3,6 position of the acridinic moiety is different both the geometrical information given by the Molecular from the rest of compounds. Similar structures were not in- Influence Matrix (MIM) and the topological information cluded in the dataset because their values of telomerase in- given by the molecular graph, weighted by chemical infor- hibitory activities were not exact (compound 16). mation encoded in selected atomic weightings. These type The variables in the Eq. 1 are essentially independent, of descriptors usually lacking of physicochemical interpre- being their inter-correlation very low (see Table 4). The tation but some relevant meaning of its will be explained greater value (0.68) is between REIG and HATS7v de- using three structural features of the acridinic derivatives scriptors and it is lower than the value of regression coeffi- studied here (see Table 5). These structural features are cient. substitutions in regioisomers scaffolds (3,6,9; 2,6,9; 2,7,9 see compounds 1, 74, and 86), variation in the side chain structures (compounds 1, 44, 45, and 46) and the introduc- Table 4. Correlation matrix for the variables of the model with tion of substituents in the ortho, meta, and para position of seven variables (Eq. 1). the anilino ring (compounds 55, 58, and 59). The HATS indices are based only on the diagonal ele- þ G3u E3u De HATS7v REIG R6m R6m ments of the MIM, which account for the relative position G3u 1.00 À0.37 À0.15 À0.30 À0.21 À0.35 À0.28 of each atom in the 3-D molecular space. In our model the E3u À0.37 1.00 0.54 0.25 0.39 0.23 0.17 HATS7v descriptor (weighted by atomic van der Waals De À0.15 0.54 1.00 0.37 0.35 0.16 0.44 volumes) has a negative contribution to telomerase inhibi- HATS7v À0.30 0.25 0.37 1.00 0.68 0.55 0.30 tory potency. For example, in compounds 55, 58, and 59, REIG À0.21 0.39 0.35 0.68 1.00 0.15 0.33 which are mainly the same acridinic structures with the R6m À0.35 0.23 0.16 0.55 0.15 1.00 0.12 R6mþ À 0.28 0.17 0.44 0.30 0.33 0.12 1.00 only difference of the NH 2 group in para, meta, and ortho position of the aniline ring, the descriptor values decrease

QSAR Comb. Sci. 28 , 2009, No. 5, 526 – 536 www.qcs.wiley-vch.de  2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 531 Full Papers Miguel Angel Cabrera-Pe´rez et al.

Table 5. Compounds used to explain the meaning of the molecular descriptors used.

tel Compounds EC 50 (mM) G3u E3u De HATS7v REIG R6m R6m þ 55 0.074 0.184 0.428 0.398 0.056 0.515 0.318 0.011 58 0.060 0.209 0.409 0.403 0.055 0.520 0.320 0.011 59 0.020 0.199 0.423 0.401 0.054 0.514 0.317 0.011 1 0.100 0.217 0.402 0.383 0.045 0.494 0.284 0.011 74 0.170 0.197 0.372 0.395 0.040 0.494 0.238 0.009 86 0.500 0.172 0.361 0.391 0.067 0.509 0.352 0.013 44 0.100 0.192 0.423 0.412 0.055 0.483 0.309 0.008 45 1.930 0.159 0.388 0.394 0.061 0.469 0.308 0.009 46 6.910 0.199 0.387 0.470 0.062 0.459 0.294 0.007 in the same direction para-ortho. This result suggests that mation regarding the molecular size, shape, symmetry, and a decreasing of the topological distance might possibility atom distribution with respect to some invariant reference the hydrogen bond interaction with the receptor, enhanc- frame. The G3u is related with molecular symmetry and ing the affinity of these compounds toward the G-quadru- has a positive contribution to the telomerase inhibitory ac- plex structure [26]. tivity. The E3u is a directional descriptor related to the ac- For compounds 1, 44, 45, and 46 this variable is also in- cessibility, atom distribution, and density around the origin creased when is increased the elongation of the side chains by one methylene with respect to compound 1. This results suggests when is greater the number of topological distan- ces of length 7 is lower the inhibitory telomerase potency due to greater variation in the ligand motion on the quad- ruplex surface by the greater flexibility of the three- and six-chains, altering the binding mode of these compounds. This results is agree with mentioned by Neidle and Read where is explained that the optimal side-chain linker be- tween the amide group and the charged nitrogen atom is À À (CH 2)2 [43]. For the same set of compounds (1, 44, 45, and 46) there are decreased values for REIG descriptor. This variable has some structural interpretation and its values decreases as the molecular size increases and seems to be more sen- sitive to molecular branching than to cyclicity and confor- mational changes.

On the other hand, R and Rkþindices have a depend- ence on conformational changes as an encoding informa- tion on pairs of atoms near each other. The R6m descrip- tor has a positive contribution to the telomerase inhibitory activity. This descriptor is defined based on the influence/ distance matrix and weighted by atomic masses, and ex- plains the presence of significant substituents or fragments in the molecule at a topological distance of 6. For example, for compounds 1, 74, and 86, the larger values correspond to compound 86 where the substituents in positions 2 and 7 of the acridine framework are located very next to each other in the molecular space. In the case of compounds 1, 44, 45, and 46 when the side chain is increased by addition of methylene groups, the R6m descriptor decreasing when the terminal atoms of the substituents are located to a topological distance greater than 6 ( e.g. , compounds 45 and 46). The rest of variables in Eq. 1 are WHIM descriptors (G3u, E3u, and De). This family of descriptors is built in Figure 3. Scatterplot of WHIM descriptors for the molecules such a way as to capture the relevant molecular 3-D infor- of the training set.

532  2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 28 , 2009, No. 5, 526 – 536 Telomerase Inhibitory Activity of Acridinic Derivatives: A 3D-QSAR Approach and along the principal axes and has a positive contribu- The G3u descriptor is able to differentiate regioisomers tion to the property. Finally, the De descriptor is calculated inside the same family of compounds (acridines or acri- directly as a combination of the directional WHIM de- dones), considering the symmetry of compounds. As is de- scriptors (weighted by atomic Sanderson electronegativi- picted in the Figure 3, the 2,7,9 acridines have lower values ties) and give information about total density. of G3u than 2,6,9 acridines (see bottom of Figure 3). For To understand and explain the chemical meaning of example, for compounds 1, 74, and 86 the G3u values de- WHIM descriptors, the training dataset was used. In Fig- creases as well as E3u, being in agree with the telomerase ure 3 the WHIM descriptors in the Eq. 1 were plotted inhibitory potency. each other (De vs. E3u and De vs. G3u) to give a deeper The results of the linear regression model (Eq. 1) for the insight into the descriptor chemical meaning. training and prediction sets are depicted in Tables 6 and 7. As can be appreciated the greater values of De are for When we use a model for predicting the values for acridones and acridines 9-substituted with hydrogen some unknown compounds, the assessment of model ap- atoms. When these compounds are projected to the axis plicability, in context of the compounds under study, is there are great unfilled spaces per projected atoms. It is necessary. This can be assessed with different approaches, explained by the worse interaction with the surface of the all of which in some sense try to assess whether the struc- DNA groove for these compounds. Once the 9 position of ture-, chemical-, or descriptor-based properties of the acridines are functionalized with different aliphatic and ar- “unknown” compound lie in similar “space” as those for omatic side chains, the particular conformation adopted the compounds that were a part of the training set used by its in the 3,6,9 scaffold brings the 9 position substituents for building the model. Considering that in QSAR mod- significantly deeper into the cavity of this groove when eling similar compounds have similar activity/property compared to the other scaffold families. The 3,6,9; 2,7,9 and hence, given an unknown compound, we shall be and 2,6,9 regiosomers of acridines have the lower values of able to predict its activity/property with confidence if it is De (See top of Figure 3). “similar” to the compounds that were used for building

Table 6. Results of the regression equation for the training set.

Compound log EC 50 exp log EC 50 pred Res Compound log EC 50 exp log EC 50 pred Res 2 5.57 5.69 À0.12 47 6.78 7.33 À0.55 4 5.87 5.09 0.78 49 6.93 7.61 À0.68 5 5.36 5.82 À0.46 50 6.49 7.09 À0.60 6 5.27 5.71 À0.44 51 6.59 6.52 0.07 7 5.39 5.43 À0.04 52 6.84 6.51 0.33 8 5.10 5.30 À0.20 53 5.28 5.65 À0.37 9 5.51 5.66 À0.16 56 7.22 7.09 0.13 13 5.24 5.59 À0.36 57 7.30 7.22 0.08 14 5.09 5.21 À0.13 58 7.22 7.23 À0.01 17 5.28 5.13 0.15 59 7.70 7.20 0.50 18 6.50 6.19 0.31 60 7.00 7.01 À0.01 20 6.78 6.78 0.00 61 7.05 6.64 0.41 21 7.01 6.81 0.20 62 6.85 6.42 0.43 22 7.10 6.90 0.20 64 7.40 7.09 0.30 23 5.09 5.58 À0.49 66 7.74 7.33 0.42 26 5.72 5.50 0.23 67 7.18 7.16 0.02 27 6.22 5.81 0.41 69 7.30 6.83 0.47 29 5.82 5.72 0.10 71 6.82 6.64 0.19 30 5.64 5.64 0.00 72 7.00 6.97 0.03 31 6.70 6.34 0.36 73 7.10 7.06 0.03 33 6.40 5.70 0.70 74 6.77 6.70 0.07 34 5.23 5.53 À0.30 75 6.57 7.08 À0.51 35 6.70 6.87 À0.18 77 6.96 6.81 0.15 36 6.70 6.70 À0.01 78 5.88 6.22 À0.34 37 5.37 4.73 0.64 79 7.10 7.14 À0.04 38 5.57 5.59 À0.02 80 6.68 6.67 0.01 39 4.31 5.41 À1.10 81 6.34 6.45 À0.11 40 5.77 5.64 0.13 83 5.96 5.79 0.17 42 5.64 5.60 0.04 84 6.22 6.49 À0.27 43 5.64 5.58 0.06 85 6.70 6.29 0.41 45 5.71 6.02 À0.31 86 6.30 6.32 À0.02 46 5.16 5.60 À0.44 89 5.99 6.27 À0.28

QSAR Comb. Sci. 28 , 2009, No. 5, 526 – 536 www.qcs.wiley-vch.de  2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 533 Full Papers Miguel Angel Cabrera-Pe´rez et al.

Table 7. Results of the regression equation for the prediction set.

Compound log EC 50 exp log EC 50 pred Res Compound log EC 50 exp log EC 50 pred Res 1 7.00 7.23 À0.23 55 7.13 7.06 0.07 3 5.59 5.75 À0.16 63 6.68 7.15 À0.47 15 5.55 5.04 0.51 65 7.74 6.98 0.77 19 6.57 6.92 À0.34 68 7.00 7.08 À0.08 24 5.72 6.14 À0.42 70 7.15 6.76 0.40 25 5.24 5.71 À0.47 76 6.68 6.56 0.12 28 5.72 5.74 À0.02 82 6.77 6.31 0.46 32 6.15 6.09 0.06 87 5.89 5.96 À0.07 41 5.77 5.67 0.10 88 5.56 6.46 À0.90 44 7.00 6.84 0.16 90 6.24 6.34 À0.10 48 7.17 7.36 À0.19

the model. In this sense the “chemical space” of the vali- lower values were achieved for the model with seven var- dation dataset was evaluated by a Principal Components iables. Analysis (PCA) on the descriptors used in the model, for Such results further suggest that the established model both the training set and the unknown compounds, and has a strong predictive ability and can be prospectively then launches a plot on the first two components. In Fig- used in the molecular design and action mechanism analy- ure 4, the training set compounds are shown in full circles sis of this kind of compound with anticancer activity by while the unknown compounds, for which the predictions stabilization of the G-quartet. need to be made, are depicted in full triangles. As can be appreciated, only one of ten acridine derivatives not be- long to the same distribution or space as the rest of com- Conclusions pounds used for deriving the model. This fact explains the poor prediction for this compound. The results for The inhibition of the enzyme telomerase is a promissory this validation process are summarized in Table 8. The target for the treatment of cancer. The results of this re- standard deviation error in prediction for the external search are considered by these authors as the first QSAR data (SDEP ext ) was also calculated and the values for the study that allows a structural interpretation of the interac- best fitted models appear in Table 3. As can be seen the tions of a novel family of compounds (acridines and acri-

Figure 4. Determination of the model applicability for the external set of ten acridine derivatives.

534  2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 28 , 2009, No. 5, 526 – 536 Telomerase Inhibitory Activity of Acridinic Derivatives: A 3D-QSAR Approach

Table 8. Results of the classification of an external set of ten ac- cross-validation of the final model and their potentiality to ridines. get good results in the prediction of external set support this claim. This suggests that the present method should be regarded as one of the choice for lead optimization pro- grams in the drug discovery process. The present paper is a first step in the prediction of this important biological activity, where a good computational Mol 1 2 log EC exp log EC pred Res 50 50 tool is necessary in order to increase the successful in the T1 7.52 6.88 0.64 obtaining of novel and potent antitumoral compounds be- longs to this heterocyclic family.

T2 6.46 6.97 À0.52 Acknowledgements

The authors would like to dedicate the present paper to the memory of Maykel Pe´rez Gonza´lez, one of the authors T3 6.00 6.83 À0.83 who died before finish this work. This research was spon- sored by the Cuban Higher Education Ministry (R&D project number 6.181-2006). Anonymous reviewers are T4 6.62 7.05 À0.43 gratefully acknowledged for their helpful suggestions that have led to improving the paper.

T5 6.07 8.33 À2.27 References

[1] S. W. Chan, E. H. Blackburn, Oncogene 2002, 21 , 553. [2] J. W. Shay, S. Bacchetti, Eur. J. Cancer 1997, 33 , 787 – 791. [3] C. M. Counter, H. W. Hirte, S. Bacchetti, C. B. Harley, Proc. T6 6.06 6.36 À0.31 Natl. Acad. Sci. USA 1994, 91 , 2900 – 2904. [4] M. Meyerson, Toxicol. Lett. 1998, 102, 41 – 45. [5] A. E. Pitts, D. R. Corey, Proc. Natl. Acad. Sci. USA 1998, 95 , 11549 – 11554. T7 6.36 6.02 0.34 [6] Y. Kondo, S. Koga, T. Komata, S. Kondo, Oncogene 2000, 19 , 2205 – 2211. [7] C. Strahl, E. H. Blackburn, Mol. Cell. Biol. 1996, 16 , 53 – 65. [8] P. J. Perry, M. A. Read, R. T. Davies, S. M. Gowan, A. P. Reszka, A. A. Wood, L. R. Kelland, S. Neidle, J. Med. T8 6.64 6.31 0.33 Chem. 1999, 42 , 2679 – 2684. [9] S. M. Gowan, R. Heald, M. F. Stevens, L. R. Kelland, Mol. Pharmacol. 2001, 60 , 981 – 988. [10] S. M. Gowan, J. R. Harrison, L. Patterson, M. Valenti, T9 6.41 6.36 0.05 M. A. Read, S. Neidle, L. R. Kelland, Mol. Pharmacol. 2002, 61 , 1154 – 1162. [11] E. Izbicka, R. T. Wheelhouse, E. Raymond, K. K. Davidson, R. A. Lawrence, D. Sun, B. E. Windle, L. H. Hurley, D. D. T10 6.44 6.20 0.24 Von Hoff, Cancer Res. 1999, 59 , 639 – 644. [12] J. R. Williamson, Annu. Rev. Biophys. Biomol. Struct. 1994, 23 , 703 – 730. [13] D. Rhodes, R. Giraldo, Curr. Opin. Struct. Biol. 1995, 5, 311 – 322. [14] M. Y. Kim, M. Gleason-Guzman, E. Izbicka, D. Nishioka, L. H. Hurley, Cancer Res. 2003, 63 , 3247 – 3256. dones derivatives), with probed properties as telomerase [15] D. Cairns, E. Michalitsi, T. C. Jenkins, S. P. Mackay, Bioorg. inhibitors by stabilization of the quartet-G structure, with Med. Chem. 2002, 10 , 803 – 807. the telomeric structure. The results shown like GET- [16] C. M. Incles, C. M. Schultes, H. Kempski, H. Koehler, L. R. Kelland, S. Neidle, Mol. Cancer Ther. 2004, 3, 1201 – 1206. AWAY and WHIM molecular descriptors are able to pre- [17] R. J. Harrison, A. P. Reszka, S. M. Haider, B. Romagnoli, J. dict the telomerase inhibitory activity for acridinic deriva- Morrell, M. A. Read, S. M. Gowan, C. M. Incles, L. R. Kel- tives. The linear models developed are easily calculated land, S. Neidle, Bioorg. Med. Chem. Lett. 2004, 14 , 5845 – and suitable for the rapid prediction of this property. The 5849.

QSAR Comb. Sci. 28 , 2009, No. 5, 526 – 536 www.qcs.wiley-vch.de  2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 535 Full Papers Miguel Angel Cabrera-Pe´rez et al.

[18] R. J. Harrison, S. M. Gowan, L. R. Kelland, S. Neidle, Bio- [30] STATISTICA, Version 6.0 , StatSoft Inc., Tulsa, USA, 2001. org. Med. Chem. Lett. 1999, 9, 2463 – 2468. [31] R. Todeschini, V. Consonni, A. Mauri, M. Pavan, DRAGON, [19] M. J. Moore, C. M. Schultes, J. Cuesta, F. Cuenca, M. Gu- Version 5.3 , Talete srl, Milano, Italy, 2005. naratnam, F. A. Tanious, W. D. Wilson, S. Neidle, J. Med. [32] M. J. S. Dewar, E. G. Zoebisch, E. F. Healy, J. J. P. Stewart, Chem. 2006, 49 , 582 – 599. J. Am. Chem. Soc. 1985, 107, 3902 – 3909. [20] J. L. Mergny, L. Lacroix, M. P. Teulade-Fichou, C. Hounsou, [33] J. J. P. Stewart, MOPAC Manual, MOPAC 6.0; Frank J. Seil- L. Guittat, M. Hoarau, P. B. Arimondo, J. P. Vigneron, J. M. er Research Laboratory, United States Air Force Academy, Lehn, J. F. Riou, T. Garestier, C. Helene, Proc. Natl. Acad. Colorado Springs, USA, 1990. Sci. USA 2001, 98 , 3062 – 3067. [34] D. Weininger, J. Chem. Inf. Comput. Sci. 1988, 28 , 31 – 36. [21] O. Y. Fedoroff, M. Salazar, H. Han, V. V. Chemeris, S. M. [35] R. Todeschini, D. Ballabio, V. Consonni, A. Mauri, M. Pa- Kerwin, L. H. Hurley, Biochemistry 1998, 37 , 12367 – 12374. van, MOBYDIGS, Version 1.0 , Talete srl, Milano, Italy, [22] L. G. Valerio, Jr., K. B. Arvidson, R. F. Chanderbhan, J. F. 2004. Contrera, Toxicol. Appl. Pharmacol. 2007, 222, 1 – 16. [36] R. Leardi, R. Boggia, M. Terrile, J. Chemom. 1992, 6, 267 – [23] B. Li, M. P. Lyle, G. Chen, J. Li, K. Hu, L. Tang, M. A. 281. Alaoui-Jamali, J. Webster, Bioorg. Med. Chem. 2007, 15 , [37] R. Todeschini, V. Consonni, A. Mauri, M. Pavan, Anal. 4601 – 4608. Chim. Acta 2004, 515, 199 – 208. [24] I. Hutchinson, A. J. McCarroll, R. A. Heald, M. F. Stevens, [38] B. Efron, The Jackknife, the Bootstrap and other Resampling Org. Biomol. Chem. 2004, 2, 220 – 228. Planes, Society of Industrial and Applied Mathematics, Phil- [25] C. M. Schultes, B. Guyen, J. Cuesta, S. Neidle, Bioorg. Med. adelphia (PA) 1982. Chem. Lett. 2004, 14 , 4347 – 4351. [39] F. Lindgren, B. Hansen, W. Karcher, M. Sjostrom, L. Eriks- [26] R. J. Harrison, J. Cuesta, G. Chessari, M. A. Read, S. K. son, J. Chemom. 1996, 10 , 521 – 532. Basra, A. P. Reszka, J. Morrell, S. M. Gowan, C. M. Incles, [40] A. Golbraikh, A. Tropsha, J. Mol. Graph. Model. 2002, 20 , F. A. Tanious, W. D. Wilson, L. R. Kelland, S. Neidle, J. 269 – 276. Med. Chem. 2003, 46 , 4463 – 4476. [41] K. Rose, L. H. Hall, L. B. Kier, J. Chem. Inf. Comput. Sci. [27] J. C. Cookson, R. A. Heald, M. F. Stevens, J. Med. Chem. 2002, 42 , 651 – 666. 2005, 48 , 7198 – 7207. [42] C. Martins, M. Gunaratnam, J. Stuart, V. Makwana, O. Gre- [28] R. A. Heald, C. Modi, J. C. Cookson, I. Hutchinson, C. A. ciano, A. P. Reszka, L. R. Kelland, S. Neidle, Bioorg. Med. Laughton, S. M. Gowan, L. R. Kelland, M. F. Stevens, J. Chem. Lett. 2007, 17 , 2293 – 2298. Med. Chem. 2002, 45 , 590 – 597. [43] S. Neidle, M. A. Read, Biopolymers 2000, 56 , 195 – 208. [29] D. Shi, R. T. Wheelhouse, D. Sun, L. Hurley, J. Med. Chem. 2001, 44 , 4509 – 4523.

536  2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 28 , 2009, No. 5, 526 – 536

Table S2: Descriptor values for all acridinic derivatives in the final regression model (Eq. 1).

No. G3u E3u De HATS7v REIG R6m R6m+ No. G3u E3u De HATS7v REIG R6m R6m+ 1 0.217 0.402 0.383 0.045 0.494 0.284 0.011 51 0.186 0.401 0.465 0.060 0.466 0.360 0.008 2 0.213 0.439 0.502 0.055 0.498 0.276 0.015 52 0.198 0.406 0.413 0.051 0.440 0.308 0.009 3 0.199 0.383 0.506 0.067 0.532 0.339 0.015 53 0.205 0.449 0.493 0.068 0.565 0.237 0.009 4 0.180 0.366 0.440 0.058 0.494 0.251 0.014 54 a 0.195 0.440 0.455 0.119 0.569 0.311 0.012 5 0.163 0.464 0.548 0.058 0.554 0.282 0.012 55 0.184 0.428 0.398 0.056 0.515 0.318 0.011 6 0.172 0.450 0.530 0.058 0.518 0.321 0.017 56 0.215 0.398 0.386 0.042 0.493 0.259 0.010 7 0.213 0.418 0.501 0.056 0.478 0.259 0.011 57 0.196 0.455 0.434 0.049 0.514 0.298 0.010 8 0.205 0.407 0.448 0.062 0.494 0.271 0.016 58 0.209 0.409 0.403 0.055 0.520 0.320 0.011 9 0.228 0.359 0.486 0.051 0.519 0.255 0.015 59 0.199 0.423 0.401 0.054 0.514 0.317 0.011 10 a 0.206 0.511 0.574 0.057 0.507 0.294 0.012 60 0.177 0.41 0.414 0.058 0.524 0.336 0.011 11 a 0.207 0.480 0.522 0.046 0.483 0.254 0.008 61 0.188 0.445 0.443 0.057 0.512 0.330 0.015 12 a 0.207 0.442 0.476 0.054 0.431 0.284 0.010 62 0.183 0.432 0.439 0.054 0.520 0.293 0.014 13 0.204 0.464 0.511 0.054 0.523 0.291 0.023 63 0.233 0.439 0.431 0.045 0.504 0.278 0.012 14 0.200 0.368 0.445 0.073 0.552 0.242 0.009 64 0.175 0.409 0.421 0.063 0.533 0.363 0.011 15 0.162 0.391 0.473 0.066 0.529 0.273 0.014 65 0.203 0.439 0.420 0.045 0.508 0.272 0.012 16 a 0.203 0.395 0.487 0.070 0.537 0.304 0.016 66 0.218 0.43 0.401 0.054 0.511 0.310 0.010 17 0.181 0.447 0.513 0.069 0.542 0.307 0.020 67 0.193 0.456 0.406 0.062 0.516 0.340 0.011 18 0.248 0.309 0.453 0.045 0.426 0.306 0.010 68 0.215 0.409 0.419 0.070 0.522 0.383 0.012 19 0.176 0.411 0.450 0.044 0.448 0.330 0.008 69 0.203 0.424 0.412 0.053 0.512 0.293 0.012 20 0.209 0.354 0.451 0.042 0.439 0.328 0.009 70 0.174 0.440 0.409 0.066 0.517 0.360 0.014 21 0.205 0.428 0.476 0.042 0.421 0.325 0.008 71 0.179 0.391 0.382 0.050 0.497 0.310 0.016 22 0.203 0.419 0.389 0.029 0.403 0.258 0.010 72 0.213 0.38 0.416 0.061 0.525 0.344 0.012 23 0.201 0.377 0.431 0.067 0.528 0.294 0.016 73 0.250 0.382 0.377 0.058 0.505 0.281 0.007 24 0.202 0.454 0.490 0.062 0.499 0.330 0.014 74 0.197 0.372 0.395 0.040 0.494 0.238 0.009 25 0.222 0.475 0.529 0.067 0.535 0.319 0.019 75 0.203 0.434 0.417 0.038 0.501 0.256 0.012 26 0.199 0.450 0.498 0.057 0.498 0.270 0.015 76 0.209 0.320 0.351 0.059 0.505 0.284 0.008 27 0.238 0.460 0.487 0.070 0.549 0.250 0.010 77 0.241 0.288 0.351 0.053 0.515 0.265 0.008 28 0.191 0.379 0.481 0.054 0.523 0.293 0.017 78 0.220 0.311 0.367 0.063 0.511 0.281 0.009 29 0.218 0.372 0.495 0.069 0.518 0.356 0.017 79 0.222 0.436 0.392 0.039 0.486 0.222 0.007 30 0.193 0.404 0.477 0.062 0.518 0.298 0.015 80 0.210 0.387 0.401 0.056 0.517 0.265 0.007 31 0.194 0.475 0.541 0.067 0.505 0.382 0.013 81 0.170 0.446 0.444 0.076 0.533 0.379 0.013 32 0.169 0.519 0.547 0.054 0.530 0.303 0.016 82 0.196 0.438 0.459 0.084 0.531 0.413 0.015 33 0.179 0.463 0.522 0.059 0.504 0.304 0.014 83 0.167 0.417 0.423 0.088 0.523 0.393 0.014 34 0.222 0.431 0.488 0.062 0.507 0.282 0.016 84 0.191 0.378 0.382 0.067 0.508 0.344 0.013 35 0.162 0.611 0.541 0.076 0.571 0.358 0.010 85 0.190 0.444 0.450 0.084 0.538 0.39 0.013 36 0.211 0.500 0.579 0.084 0.603 0.404 0.012 86 0.172 0.361 0.391 0.067 0.509 0.352 0.013 37 0.167 0.397 0.454 0.071 0.548 0.256 0.017 87 0.199 0.437 0.458 0.089 0.533 0.414 0.017 38 0.192 0.405 0.478 0.053 0.513 0.283 0.019 88 0.181 0.381 0.385 0.063 0.490 0.327 0.010 39 0.191 0.421 0.454 0.064 0.508 0.295 0.018 89 0.197 0.432 0.453 0.087 0.527 0.424 0.015 40 0.201 0.405 0.540 0.067 0.528 0.325 0.012 90 0.213 0.356 0.388 0.067 0.519 0.323 0.013 41 0.231 0.386 0.470 0.063 0.509 0.294 0.015 T1 b 0.171 0.434 0.444 0.068 0.537 0.393 0.015 42 0.198 0.401 0.501 0.062 0.507 0.321 0.016 T2 b 0.182 0.504 0.462 0.068 0.501 0.392 0.013 43 0.191 0.383 0.493 0.070 0.516 0.360 0.017 T3 b 0.185 0.453 0.449 0.066 0.529 0.367 0.014 44 0.192 0.423 0.412 0.055 0.483 0.309 0.008 T4 b 0.174 0.488 0.455 0.063 0.498 0.410 0.017 45 0.159 0.388 0.394 0.061 0.469 0.308 0.009 T5 b 0.199 0.441 0.441 0.057 0.496 0.593 0.035 46 0.199 0.387 0.470 0.062 0.459 0.294 0.007 T6 b 0.187 0.412 0.427 0.062 0.496 0.341 0.014 47 0.177 0.385 0.377 0.053 0.487 0.347 0.009 T7 b 0.172 0.395 0.400 0.050 0.466 0.273 0.012 48 0.170 0.375 0.390 0.046 0.479 0.347 0.010 T8 b 0.201 0.398 0.392 0.049 0.482 0.263 0.013 49 0.195 0.436 0.402 0.046 0.469 0.337 0.009 T9 b 0.216 0.39 0.386 0.042 0.455 0.238 0.011 50 0.225 0.382 0.419 0.051 0.457 0.328 0.007 T10 b 0.194 0.371 0.386 0.042 0.464 0.250 0.013 a Compounds excluded of the equation. b Independent external dataset of 10 tri-substituted acridines reported in the literature. Chapter 4. .

4.2 Article 3. Prediction of telomerase inhibitory activity for acridinic derivatives based on chemical structure. European Journal of Medicinal Chemistry 44 (2009) 4826 –4840 Daimel Castillo-González, Miguel Angel Cabrera-Pérez, Maykel Pérez-González, Aliuska Morales- Helguera and Alexander Durán-Martínez.

Summary We present a work where, using linear discriminant analysis, we predict the possibility that a new acridine may have or not inhibitory activity over the telomerase enzyme, considering a mechanism of stabilization of the guanine quartets. A group of 90 mono-, di- and trisubstituted acridines and acridones, is gathered under similar conditions regarding the TRAP assay. This implies that, regardless of the shortcomings that could be associated to this kind of assay, relation and order of the activity between the different compounds is comparable. The molecular descriptors employed on this work are of one dimension. These kinds of descriptors are, in terms of mathematic formulation, very simple, since basically they are based on the counting of functional group repetitions or other features of the chemical structure. They have the advantage of having a simple interpretation and relation with the activity. A cluster analysis is employed for the division of the database on training and prediction sets and a model of Linear discriminant analysis that associates the activity under study with the number of aromatic carbons (nCar), the number of secondary aromatic amines (nArNHR), and the number of unsubstituted benzene (nCbH) is found. The nCar and NCbH descriptors, present a high grade of collinearity, which is eliminated with the orthogonalization of the model. Once the model is orthogonalized, the correlation between variables is eliminated and the classification parameters as well as the molecular descriptor signs are maintained, which indicates the positive or negative contribution to the activity of these factors. Finally, an evaluation of an external set of compounds is done, ten acridines evaluated under the same conditions. This has been defined as a necessary and sufficient condition for the confirmation of the validity and prediction power of the QSAR models. The external capacity of prediction, with a 90% of accurate classification for these series of compounds, is a sample of the quality of the model and the applicability of the same for ligands that have these characteristics (acridinic derivatives).

59

Author's personal copy

European Journal of Medicinal Chemistry 44 (2009) 4826–4840

Contents lists available at ScienceDirect

European Journal of Medicinal Chemistry

journal homepage: http://www.elsevier.com/locate/ejmech

Original article Prediction of telomerase inhibitory activity for acridinic derivatives based on chemical structure

Daimel Castillo-Gonza´lez a,b, Miguel A ´ ngel Cabrera-Pe ´rez b,*, Maykel Pe ´rez-Gonza´lez b, Aliuska Morales Helguera b,c, Alexander Dura´n-Martı´nez c a Department of Pharmacy, Central University of Las Villas, Santa Clara 54830, Villa Clara, Cuba b Molecular Simulation and Drug Design Group, Centre of Chemical Bioactive, Central University of Las Villas, Santa Clara 54830, Villa Clara, Cuba c Department of Chemistry, Central University of Las Villas, Santa Clara 54830, Villa Clara, Cuba article info abstract

Article history: Telomerase is a reverse transcriptase enzyme that activates in more than 85% of cancer cells and it is Received 26 January 2009 associated with the acquisition of a malignant phenotype. Some experimental strategies have been Received in revised form suggested in order to avoid the enzyme effect on unstopped telomere elongation. One of them, the 2 June 2009 stabilization of the G-quartet structure, has been widely studied. Nevertheless, no QSAR studies to Accepted 23 July 2009 predict this activity have been developed. In the present study a classification model was carried out to Available online 6 August 2009 identify, through molecular descriptors with structural fragments and groups information, those acri- dinic derivatives with better inhibitory concentration on telomerase enzyme. A linear discriminant Keywords: Telomerase model was developed to classify a data set of 90 acridinic derivatives (48 more potent derivatives with G-quartet IC 50 < 1 mM and 42 less potent with IC 50  1 mM). The final model fit the data with sensitivity of 87.50% QSAR and specificity of 82.85%, for a final accuracy of 85.33%. The predictive ability of the model was assessed Drug design by a prediction set (15 compounds of 90% and 82.29% of prediction accuracy); a tenfold full cross-vali- Acridines dation procedure (removing 15 compounds in each cycle, 84.80% of good prediction) and the prediction Acridones of inhibitory concentration on telomerase enzyme for external data of 10 novel acridines (90% of good prediction). The results of this study suggest that the established model has a strong predictive ability and can be prospectively used in the molecular design and action mechanism analysis of this kind of compounds with anticancer activity. Ó 2009 Elsevier Masson SAS. All rights reserved.

1. Introduction senescence and loss of cell viability [2–4]. To prevent degradation by exonucleases or processing as damaged DNA, the telomere 3 0 Telomere, a complex of guanine-rich repeat sequences and single-strand overhang folds back into the D-loop of duplex telo- associated proteins, caps and protects every eukaryotic chromo- meric DNA to form a protective ‘T-loop’, which is reinforced with some end against chromosomal fusion, recombination, and TRF2 and other telomeric DNA-binding proteins named Shelterin terminal DNA degradation [1]. Telomeric DNA consists of short [5]. guanine-rich repeat sequences in all eukaryotes with linear chro- Telomerase is a ribonucleoprotein composed of human telo- mosomes, and its length in human somatic cells is remarkably merase reverse transcriptase (hTERT), the catalytic subunit, and its heterogeneous among individuals ranging from 5 to 20 kb, template RNA (hTERC) [6–9]. As direct evidence that telomere according to age, organ, and the proliferative history of each cell [2]. erosion plays a major role in cellular senescence, ectopic expression During the process of DNA synthesis and cell division, telomeres of hTERT in normal human cells with endogenous hTERC resulted shorten as a result of the incomplete replication of linear chro- in activation of telomerase, stabilization of telomere lengths and mosomes, the so-called ‘end-replication problem’. This progressive extension of cellular life span. The catalytic subunit (hTERT) is the telomere shortening is one of the molecular mechanisms under- rate-limiting factor for telomerase activity both biologically and lying ageing, as critically short telomeres trigger chromosome biochemically. In 85–90% of cancer cells and in parasites, telome- rase is activated [10,11] . This enzyme is inactive in most normal somatic cells, providing a potentially specific therapeutic target. In order to extend telomere DNA the telomerase enzyme requires the * Corresponding author. Tel.: þ53 42 281 473/192; fax: þ53 42 281130. E-mail addresses: [email protected], [email protected], macabreraster@ telomere primer to be single-stranded. The formation of higher gmail.com (M.A´ . Cabrera-Pe´rez). ordered structures such as G-quadruplexes prevents hybridization

0223-5234/$ – see front matter Ó 2009 Elsevier Masson SAS. All rights reserved. doi:10.1016/j.ejmech.2009.07.029 Author's personal copy

D. Castillo-Gonza´lez et al. / European Journal of Medicinal Chemistry 44 (2009) 4826–4840 4827

Fig. 1. General chemical structures for the acridinic derivatives used in the QSAR study. of the telomerase RNA template onto the primer and thus inhibits candidates have not been developed using the QSAR methodology. telomerase activity through an indirect topological mechanism For this reason, in the present paper we intend to develop a QSAR [12] . study of a family of acridinic derivatives with demonstrated telo- Stabilization of the quadruplex conformation of telomeres, such merase inhibitory activity based on stabilization of G-quadruplex. as by binding small molecules, has been shown to be an effective The main goal of the present research is to develop a classification method to inhibit telomerase activity [13–16] . The development of model in order to identify the more potent acridinic derivatives that small molecules that can selectively bind to and stabilize the G- permit the design of novel candidates taking into consideration the quadruplex conformation of the telomere is therefore a current structural information contained in simple molecular descriptors. area of interest in anticancer as well as antiparasitic drug design. Compounds that have been shown to bind to quadruplex DNA have 2. Methods traditionally been planar, aromatic compounds that bind via external end-stacking to the G-quartet on either one end or both 2.1. Data set ends of the quadruplex [17–30] . These compounds, which include anthraquinones, cationic porphyrins, acridines, macrocyclic A data set of 90 acridinic derivatives was carefully assembled compounds and analogs, have planar aromatic surface areas that from literature [30,34–39]. The group was composed by mono, di, mimic the large planar surface of the G-tetrads in quadruplex DNA and tri-substituted acridines and acridones. The general chemical [18,19,31] . Since essentially all known quadruplex DNA binders are structures of these derivatives are described in Fig. 1. based on, or derived from duplex intercalators, many of them All these compounds have a proved inhibitory potency against exhibit little selectivity for quadruplex over duplex structures and telomerase enzyme [40]. This activity has been mainly attributed to this can result in nonspecific cytotoxicity. Increasing the selectivity the planarity of these aromatic structures, which could intercalate of telomerase inhibitors for their quadruplex targets is an impor- within the double-stranded DNA structure, thus interfering with tant focus of research. the cellular machinery. The inhibition of telomerase, by stabiliza- QSAR (Quantitative Structure–Activity Relationship) studies tion of G-quartet structure, was determined using the cell-free have demonstrated to be a very useful tool in the design and telomeric repeat amplification protocol (TRAP) assay [41] . Prior to development of novel compounds with different biological activi- the evaluation in the TRAP telomerase assays, the compounds were ties. One of the main fields where this kind of studies have been tested for their ability to inhibit Taq polymerase, in order to ensure carried out is in the prediction of antitumoral activity [32,33]. that no false positives were obtained in the subsequent TRAP Nevertheless, the telomerase inhibitory concentration of potential assays. TRAP assays were then performed at concentrations lower

Fig. 2. A dendrogram illustrating the results of the hierarchical cluster analysis for the less potent acridinic derivatives. Author's personal copy

4828 D. Castillo-Gonza´lez et al. / European Journal of Medicinal Chemistry 44 (2009) 4826–4840

Fig. 3. A dendrogram illustrating the results of the hierarchical cluster analysis for the more potent acridinic derivatives.

the molecular structure and the telomerase inhibitory concentra- tion, by stabilization of G-quartet structure (see Table 4 for details). The compounds were first clustered in two groups according to their mean telomerase inhibitory concentration values (IC 50 ). The first group, more potent compounds, includes all chemicals with IC 50 < 1 mM, while the second one includes the less potent deriva- tives with IC 50  1 mM [43,44] . This classification criterion was adopted not only because this value have been reported as a cut-off value but also to get a reasonable ratio between both chemical groups in the data set. In order to obtain a validated and predicted QSAR model, the available data set was divided into training and prediction sets, using a k-means cluster analysis (k-MCA). Firstly, we carried out a k-MCA1 with the more potent compounds and later another, k-MCA2, using the less potent compounds. The k-MCA was carried out with the STATISTICA software 6.0 [45]. For acceptable statistical quality of data clusters, we took into account the number of members in each cluster and the standard deviation of the variables Fig. 4. Distribution of the standardized residuals for all acridinic derivatives used in in the cluster (as low as possible). We also inspected the standard the study. deviation between and within clusters, the respective Fisher ratio and their p-level of significance considered to be lower than 0.05. Compounds for the training and test sets were randomly collected than which Taq polymerase inhibition was observed, using extracts from the previous clusters. This procedure permitted us to select, in from the A2780 ovarian carcinoma cell line as a source of a representative way and in all level of the linking distance ( Y-axis), telomerase. compounds for the training and prediction sets. In Figs. 2 and 3 are This experimental method has certain variability and the shown the results of the developed cluster analysis. quantitative value of telomerase inhibition fluctuates from one experiment to the next; however, the fluctuation within the same Table 2 experiment is Æ10% [42]. For the complete data set was reported Box’s test of equality of covariance matrices.

Log determinants Disc Rank Log determinant À1 3 0.101 Table 1 1 3 À39.065 Inter-correlation among the three descriptors selected as statistically significant by Pooled within-groups 3 À0.491 GLDA analysis. Test results Box’s M 1484.262 nCar nCbH nArNHR F Approx. 236.254 nCar 1.00 0.97 0.12 df1 6 nCbH 1.00 0.18 df2 36748.004 nArNHR 1.00 Sig. 0.000 Author's personal copy

D. Castillo-Gonza´lez et al. / European Journal of Medicinal Chemistry 44 (2009) 4826–4840 4829

Finally, the training and prediction sets were composed of 75 compounds and 15 compounds, respectively. The compounds belonging to the prediction set were never used in the develop- ment of the discriminant function and they were reserved to assess the discriminant model obtained.

2.2. Molecular descriptors

Our study is based on the different sets of 1D-descriptors available in the DRAGON software, which in turn have a long history of structure–activity and structure–property correlations [46]. Two main family of descriptors were used, the atom-centred fragments and the functional group counts. The input of the soft- ware consists of SMILES (simplified molecular input line entry specification) codes for each compound [47] . Count descriptors directly encode particular features of molec- ular structure and are simply obtained from the chemical structure Fig. 6. Receiver Operating Characteristic (ROC) curve for the classification model (Eq (3)). of molecules by counting defined elements such as atoms (nAT), bonds (nBT), rings (nCIC), H-bond acceptor (nHA) and H-bond donor (nHD) atoms, path counts, walk counts and so on. If atom- types are considered, atom-type counts are obtained such as number of carbon atoms (nC), number of halogens (nX), number of oxygen atoms (nO), etc. Analogously, functional group counts and fragment counts are calculated such as number of hydroxyl-groups (nOH), number of nitro-groups (nNO), number of amino-groups (nNH2), etc. Atom-centred fragments are also very simple molecular descriptors that describe each atom by its own atom-type and the bond-types and atom-types of its first neighbor. Functionalities in a molecule can be represented by two to five atoms, which consist of a central atom and its neighboring bonded atoms. Each fragment is represented by a single descriptor. Their use greatly increases the specific chemical information regarding different functional groups.

2.3. Model search

The discriminant function (Eq. (1)), which best describe telo- Fig. 5. Histogram for the frequency distribution of residuals within the classification model Eq. (3). merase inhibitory concentration values as a linear combination of the predictor X-variables (1D-descriptors) and weighted by the an

Table 3 Results of the inter-relation of molecular descriptors in the prediction of different range of telomerase inhibitory activity.

Molecular structure No. IC 50 (mM) nArNHR nCar nCbH

N NH

O N 65 0.018 1 13 6 NN N N O H H

H N O N NN 54 >20 1 19 12 H

H H O N N N N 12 >50 0 25 16 N O Author's personal copy

4830 D. Castillo-Gonza´lez et al. / European Journal of Medicinal Chemistry 44 (2009) 4826–4840

Table 4 Telomerase inhibition data set for acridinic derivatives. Results of the classification for the training and test sets compounds.

O O

R R N N N n H H n

No. R n IC 50 (mM) Experim. class Pred. class Reference

N 1(53 ) 1 5.2 À À [39]

2(2) N 2 2.7 À À [35]

N 3(3) 2 2.6 À À [35]

N

4(4) 2 1.35 À À [35]

N

5(5) 2 4.4 À À [35]

OH

6(6) N 2 5.4 À À [35]

OH

7(7)a N 2 4.1 À À [35]

N

8(8) 2 8.0 À À [35] OH

N

9(9) 2 3.1 À À [35] Author's personal copy

D. Castillo-Gonza´lez et al. / European Journal of Medicinal Chemistry 44 (2009) 4826–4840 4831

Table 4 (continued)

No. R n IC 50 (mM) Experim. class Pred. class Reference

N

10 (10 ) 2 >50 À À [35]

N 11 (11 )a 2 >50 À À [35]

N

12 (12 ) 2 >50 À À [35]

N

13 (13 ) 2 5.8 À À [35]

N 14 (14 ) 2 8.2 À À [35]

N 15 (15 )a 2 2.8 À À [35]

N O 16 (16 ) 2 >50 À À [35]

N 17 (17 ) 2 5.2 À À [35]

O

R N H

HN O

O O

N N N N N H H

No. R IC 50 (mM) Experim. class Pred. class Reference

N 18 (18 ) 0.318 þ þ [36]

(continued on next page ) Author's personal copy

4832 D. Castillo-Gonza´lez et al. / European Journal of Medicinal Chemistry 44 (2009) 4826–4840

Table 4 (continued)

No. R IC 50 (mM) Experim. class Pred. class Reference

NO 19 (19 ) 0.267 þ þ [36]

NN 20 (20 ) 0.165 þ þ [36]

H N N

21 (21 ) 0.098 þ þ [36]

H N N

22 (22 ) 0.08 þ þ [36]

O

O O

R N N N R H H

No R IC 50 (mM) Experim. class Pred. class Reference

N 23 (23 ) 8.1 À À [30]

NN H 24 (24 ) 1.9 À À [30]

N 25 (34 ) 5.9 À À [30]

N 26 (37 ) 4.3 À À [30]

N

27 (38 ) 2.7 À À [30]

NO 28 (39 ) 49 À À [30] Author's personal copy

D. Castillo-Gonza´lez et al. / European Journal of Medicinal Chemistry 44 (2009) 4826–4840 4833

Table 4 (continued)

No R IC 50 (mM) Experim. class Pred. class Reference

N

29 (40 ) 1.7 À À [30]

N

30 (41 ) 1.7 À À [30] OH

31 (42 ) N 2.3 À À [30]

32 (43 )a N 2.3 À À [30]

O

H H R N N R

O O N

No R IC 50 (mM) Experim. class Pred. class Reference

N 33 (25 ) 5.8 À À [30]

N 34 (26 )a 1.9 À À [30]

N 35 (27 ) 0.6 þ À [30]

NNH 36 (28 ) 1.9 À À [30]

N 37 (29 ) 1.5 À À [30]

N

38 (30 )a 2.3 À À [30]

OH

(continued on next page ) Author's personal copy

4834 D. Castillo-Gonza´lez et al. / European Journal of Medicinal Chemistry 44 (2009) 4826–4840

Table 4 (continued)

O

H N R O

O R N N H

No R IC 50 (mM) Experim. class Pred. class Reference

OH

39 (31 ) N 0.2 þ À [30]

N

40 (32 ) 0.7 þ À [30]

N 41 (33 ) 0.4 þ À [30]

N

42 (35 ) 0.2 þ À [30]

N

43 (36 )a 0.2 þ À [30] OH

R HN

O O

N N N N N n H H n

No R n IC 50 (mM) Experim. class Pred. class Reference

44 (1) –C6H4N(CH3)2 (p) 2 0.1 þ þ [36] 45 (44 ) –C6H4N(CH3)2 (p) 3 0.099 þ þ [38] 46 (45 ) –C6H4N(CH3)2 (p) 4 1.93 À þ [38] 47 (46 ) –C6H4N(CH3)2 (p) 5 6.91 À þ [38] H O N

48 (47 ) 2 0.167 þ þ [38] N

H O N

49 (48 )a 2 0.067 þ þ [38] N Author's personal copy

D. Castillo-Gonza´lez et al. / European Journal of Medicinal Chemistry 44 (2009) 4826–4840 4835

Table 4 (continued)

No R n IC 50 (mM) Experim. class Pred. class Reference O

NH

50 (49 ) 2 0.117 þ þ [38] N

H O N

51 (50 ) 3 3.26 À þ [38] N

H O N

52 (51 ) 4 0.255 þ þ [38] N

H O N

53 (52 ) 5 0.146 þ þ [38] N

a 54 (55 ) –C 6H4NH 2 (p) 2 0.074 þ þ [42] 55 (56 ) –CH2CH 2CH 2N(CH3)2 2 0.060 þ þ [42] 56 (57 ) –CH2CH 2C5H10 N 2 0.05 þ þ [42] 57 (58 ) –C6H4NH 2 (m) 2 0.06 þ þ [42] 58 (59 ) –C6H4NH 2 (o) 2 0.02 þ þ [42] 59 (60 ) –C6H4N(CH3)2 (m) 2 0.1 þ þ [42] 60 (61 ) –C6H11 (c) 2 0.09 þ þ [42] a 61 (62 ) –CH2CH 2OCH3 2 0.14 þ þ [42] 62 (63 ) –C7H13 (c) 2 0.21 þ þ [42] 63 (64 ) –C6H4COCH3 (p) 2 0.04 þ þ [42] 64 (65 ) –CH2CH 2N(CH3)2 2 0.018 þ þ [42]

65 (66 )a N 2 0.018 þ þ [42]

66 (67 ) –CH2C5H4N (m) (c) 2 0.066 þ þ [42] 67 (68 ) –C6H4NHCOCH3 (m) 2 0.1 þ þ [42] a 68 (69 ) –C 3H5 (c) 2 0.05 þ þ [42] 69 (70 ) –C6H4F (p) 2 0.07 þ þ [42] 70 (71 ) –C6H4SCH3 (o) 2 0.15 þ þ [42] a 71 (72 ) –C 6H4SCH3 (m) 2 0.1 þ þ [42]

R HN H N N O

O N N N H

No R IC 50 (mM) Experim. class Pred. class Reference

72 (73 ) –C6H4NH 2 (p) 0.08 þ þ [42] 73 (74 ) –C6H4N(CH3)2 (p) 0.17 þ þ [42] 74 (75 ) –CH2CH 2N(CH3)2 0.27 þ þ [42] 75 (76 ) –C6H4NH 2 (m) 0.21 þ þ [42] a 76 (77 ) –C 6H4NH 2 (o) 0.11 þ þ [42] 77 (78 ) –C6H5 1.33 À À [42] 78 (79 ) –CH2CH 2CH 2N(CH3)2 0.08 þ þ [42] 79 (80 ) –C6H11 (c) 0.21 þ þ [42] (continued on next page ) Author's personal copy

4836 D. Castillo-Gonza´lez et al. / European Journal of Medicinal Chemistry 44 (2009) 4826–4840

Table 4 (continued)

R HN H H N N N N

O O N

No R IC 50 (mM) Experim. class Pred. class Reference

80 (81 ) –C6H4OCH3 (p) 0.46 þ þ [42] 81 (82 ) –C6H4NH 2 (o) 0.17 þ þ [42] 82 (83 ) –C6H4NH 2 (m) 1.09 À þ [42] 83 (84 ) –C6H4N(CH3)2 (m) 0.6 þ þ [42] 84 (85 ) –C6H4NH 2 (p) 0.2 þ þ [42] 85 (86 ) –C6H4N(CH3)2 (p) 0.5 þ þ [42] 86 (87 ) –C6H5 1.29 À À [42] 87 (88 ) –C 6H4OCH3 (m) 2.73 À þ [42] a 88 (89 ) –C 6H4OH (o) 1.03 À þ [42] 89 (90 ) –CH2CH 2N(CH3)2 0.57 þ þ [42]

R HN

N

No R IC 50 (mM) Experim. class Pred. class Reference H O N

90 (54 ) >20 À À [38]

N

a Compounds used as a prediction set; c: cyclic, m: meta, p: para, o: ortho.

coefficients, was obtained with the General Linear Discriminant multicollinearity problem, we have applied the Randic´ approach. Analysis Module (GLDA) implemented in STATISTICA 6.0 [45]. The resulting orthogonal-descriptor model was standardized afterward.

IC 50 ¼ a1A1 þ a2A2 þ a3A3. þ anAn þ a0 (1) Several diagnostic statistical tools were used for evaluating the final model equation, in terms of goodness-of-fit and goodness-of- In developing these classification functions, IC 50 values of þ1 and prediction. The goodness-of-fit was estimated by standard statistics À1 were assigned to more potent and less potent compounds, such as the l value, the square of Mahalanobis distance ( D2), the respectively. In addition, the compounds were considered Fisher ratio ( F), the corresponding p-level ( p) as well as the unclassified (U) by the model when the differences in the proportion between the cases and variables in the equation. percentage of classification between two groups did not differ by The validity of the preadopted assumptions such as normality, more than 5%. homocedasticity, noncollinearity and linearity of the model was The ‘‘ best subset’’ technique, using the Wilks Lambda value as also checked. The goodness-of-prediction of the final model was the criterion for choosing the best subset of predictor effects, was assessed by an internal cross-validation (CV), specifically by the applied to select the molecular descriptors (X-variables) with the leave-group-out (LGO) technique [50,51] . Basically, CV-LGO highest influence on the telomerase inhibition. consists of forming several subsets from the complete data set, each In any multiple linear-based QSAR it is desirable that the vari- missing a small group of k cases (k ¼ 15% of the data set). These k ables included in the model are not interrelated to each other. cases are used to validate a new model that is trained with the Highly correlated variables clearly contain redundant information corresponding subset. Quality (goodness-of-fit) of the new models that might be more effectively encoded by a single variable. Further, gives then a measure of the predictive ability of the full model (10- and more importantly from the point of view of a QSAR model, fold LGO-CV was carried out). correlated independent variables lead to multicollinearity, which A more rigorous procedure to assess the ‘‘realistic’’ predictive can cause problems in interpreting the individual estimated coef- power of the model and the generalizability of the QSAR model ficients [48,49] . One very useful and informative approach of for new chemical compounds was evaluated using and external avoiding multicollinearity is the orthogonal-descriptor technique data. This type of model validation is important, if we take into suggested by Randic´ some years ago [48,49] . In the Randic´ consideration that the predictive ability of a QSAR model can be approach, after choosing a starting descriptor, subsequent estimated using only a set of compounds that was not used for descriptors are added only as their orthogonal complements to the building the model [50,51] . Therefore, it is important to ensure descriptor already present. This approach has the advantages that that the prediction algorithms are able to perform well on novel the equation coefficients are stable, and the new information data from the same data domain. In this study a more supplied by each additional descriptor is clearly distinguished in demanding evaluation is provided by an external validation, the final equation statistics. In this paper, to tackle the where the predictive power of the model was assessed with an Author's personal copy

D. Castillo-Gonza´lez et al. / European Journal of Medicinal Chemistry 44 (2009) 4826–4840 4837 external data of 10 tri-substituted acridines reported in the Following the Randic´ approach, we determined orthogonal literature [38]. complements for all variables in Eq. (2), which in turn were further standardized, to then find the best equation (see Eq. (3)) 3. Results and discussion 1 2 3 IC 50 ¼ 0:091þ1:71 UðnArNHRÞþ 0:66 UðnCarÞÀ 1:11 UðnCbHÞ The best discriminant function to classify the more potent N ¼ 75 ; l ¼ 0:58 ;D2 ¼ 2:75 ;Fð3;71 Þ ¼ 16 :62 ;p < 0:001 (3) compounds with telomerase inhibitory activity, by stabilization of G-quadruplex, is given below together with the statistical param- The relative contributions of the variables in the orthogonalized eters of the GLDA: model are quite similar to those in the non-orthogonalized model. For example, the variables nCbH and nArNHR have similar absolute IC 50 ¼ À 10 :40 þ 4:11 nArNHR þ 1:42 nCar À 1:90 nCbH contributions in Eq. (3), while in Eq. (2) the contribution of nArNHR N 75 ; l 0:58 ; D 2:75 ; F 3;71 16 :62 ; p < 0:001 2 ¼ ¼ 2 ¼ ð Þ ¼ ð Þ is twofold the contribution of nCbH. These variations are due to the As can be appreciated, the large F index and the small p value are fact that the coefficients of the orthogonal model are more stable indicative of the model statistical significance. In addition, the than the model without orthogonalization. In the case of variables values of the Wilks l statictic (l can take values from cero, perfect nCar and nCbH both have different contributions to the prediction discrimination, to one, no discrimination) and the Mahalanobis of telomerase inhibition, being the absolute contribution of nCbH distance (a measure of the separation between the more potent and almost twofold the contribution of nCar. less potent groups) show that the model displays an adequate We tested the normal distribution of residual plotting the discriminatory capacity. frequency histogram (see Fig. 5) and using different distribution The parametrical assumptions (normality, homocedasticity and fitting tests such as Kolmogorov–Smirnov and Shapiro–Wilk. noncollinearity) are very important factors in the application of At first sight, the results of the Kolmogorov–Smirnov ( D ¼ 0.251 linear multivariate statistic techniques to carry out any QSAR with p < 0.01) and Shapiro–Wilk ( W ¼ 0.88 with p < 0.001) tests model. Nevertheless, the validity and statistical significance of any indicate that the hypothesis of normality should be rejected. QSAR model is strongly conditioned by the above mentioned However, the deviations from normality seem to be not severe as factors [52]. can be seen in the frequency histogram shown in Fig. 5. Further analysis of the previous classification model (Eq. (2)) After orthogonalization of descriptors the verification of should only be carried out after checking the reliability of the homocedasticity was determined considering the homogeneity of parametrical assumptions. In this study the model was obtained by the (co)variances suggested by the Box M statistical test ( p < 0.01). GLDA, being the simplest mathematical form that might be envis- The results of this test appear in Table 2 . aged for the model in absence of any a priori information. However, As can be appreciated the statistically significant F ratio a simple observation of the distribution of standardized residual (F ¼ 236.25; p < 0.000) suggested that the population covariance (observed minus predicted divided by the square root of the matrices were not equal, which reject the null hypothesis of equal residual mean square) for all compounds studied (see Fig. 4) no population covariance matrices. Nevertheless, this violation is not systematic pattern was observed. This result suggests that the very problematic for discriminant analysis, and we should be able model does not exhibit a nonlinear dependence. to evaluate this likelihood of errors by examining the determinants Another aspect deserving special attention is the degree of of the pooled within-groups covariance. collinearity among the variables of the model. As can was explain A better threshold for a priori classification probability can be previously multicollinearity does not affect the predictability of estimated by means of the Receiver Operating Characteristics (ROC) the model, but the physical meaning of the resultant model, as curve [53]. This is a useful technique not only for obtaining the best well as the influence of the predictive variables may be mis- thresholds but also for organizing classifiers [54,55]. As Fig. 6 interpreted and consequently erroneous inferences about the shows, the optimal threshold for predicting the more potent activity under study could be obtained. In order to study the chemicals with the present QSAR model is 0.87. Further, one can see collinearity of the variables in our model the cross-correlation that the model is not a random since the area under the ROC curve matrix was analyzed. As can be seen in Table 1 , the pairs of is significantly higher than the area under the random classifier 2 descriptors nCar and nCbH are strongly correlated, but we have to curve (diagonal line) with a value of 0.87 u . take care to discard one of them because of they have different As can be seen in Eq. (3) only three variables are present in the chemical information. discriminant model. The positive contribution to the telomerase

Table 5 General performance of the final model (Eq. (3)) in terms of its Cooper statistic.

Predicted class

More potent (T/P)a Less potent (T/P)a Total Experimental class More potent 35/7 (a) 5/1 (b) 40/8 (a þ b) Less potent 6/1 (c) 29/6 (d) 35/7 (c þ d) Total 41/8 (a þ c) 34/7 (b þ d) 75/15 (a þ b þ c þ d)

Statistic Formula Results (T/P)a Sensitivity (true positive rate) a/( a þ b) 87.50/87.50 Specificity (true negative rate) d/( c þ d) 82.85/85.71 Accuracy (a þ d)/(a þ b þ c þ d) 85.33/86.67 Positive predictivity a/( a þ c) 85.36/87.50 Negative predictivity d/( b þ d) 85.29/85.71 False positive c/( c þ d) 17.14/14.28 False negative b/( a þ b) 12.50/12.50

a (T/P): for training and prediction sets. Author's personal copy

4838 D. Castillo-Gonza´lez et al. / European Journal of Medicinal Chemistry 44 (2009) 4826–4840

Table 6 Table 7 Results of the 10-fold full cross-validation procedure for the acridinic data set. Results of the classification of an external test set of 10 acridines.

% Classif. (training set) % Classif. (test set) l F D 2 1 1 82.67 100.00 0.565 20.818 3.108 O O 2 85.33 86.67 0.620 14.492 2.388 2 N N N 2 3 85.33 86.67 0.577 17.318 2.874 H H 4 85.33 86.67 0.586 16.743 2.767 5 84.00 93.33 0.573 17.640 2.907 6 84.00 93.33 0.613 14.960 2.465 Mol 1 2 IC 50 (mM) Real Class 7 85.33 86.67 0.595 16.089 2.659 8 86.67 80.00 0.521 21.801 3.587 F 9 85.33 86.67 0.588 16.567 2.738 NH N 10 84.00 93.33 0.602 15.660 2.599 1 F 0.03 þ þ Mean 84.80 89.33 0.584 17.209 2.809

O inhibitory activity for variables such as nArNHR (number of N secondary aromatic amines) it is in correspondence with reported 2 NH 0.35 þ þ in other studies, where the secondary amine (–NH– group) plays an O important role in the ligand binding to quadruplex substituents and with the water molecules present in the binding site [56]. As can be seen in the acridinic framework (see Fig. 1 and Table 4 ), all N substituents in positions 3 and 6 are secondary amines, while in 3 NH 1 À þ position 9, the substituents are secondary amines and carbonyl groups. For this reason, this molecular descriptor also explains the less telomerase inhibitory potency of acridones with regard to CF 3 acridines derivatives. N The nCbH descriptor (the number of unsubstituted benzene 4 NH 0.24 þ þ F C(sp2)) has a negative contribution to the telomerase inhibitory potency. The substitution on the acridinic skeleton is very impor- tant due to the electrostatic interactions produced among the CF 3 lateral chains (generally present N atoms) with the G-tetrad. The N tri-substituted acridinic derivatives could have a better stacking 5 NH 0.86 þ þ CF with the groups of the wide grooves located in the opposite sides of 3 the G-tetrad face, increasing the stability of the complex [56]. Compounds with high values of this descriptor would have a weak F interaction with the G-quartet. For instance, compound 90 is NH N a mono-substituted acridine and it has a great number of unsub- 6 F 0.88 þ þ 2 stituted C(sp ) and the IC 50 value is greater than 20 mM. If this compound is compared with the bi-substituted and tri-substituted O acridines (compounds 17 and 49 ), the IC 50 values decreases in the same order as the number of unsubstituted benzene C(sp 2). This N 7 NH 0.44 þ þ descriptor can also explain the variability of telomerase inhibitory O concentration, by substitutions on the phenyl ring of the side chain at ninth position of the acridinic framework. For example, compounds 72 , 73 , 75 and 76 have lower IC 50 values than the unsubstituted derivative ( 77 ) and the same occur for compounds N 8 NH 0.23 þ þ 80 –85 compared with compound 86 . Compounds from 1 to 17 have lower telomerase inhibitory potency compared with similar derivatives with a side chain at ninth position of the acridine ring. The positive sign of nCar indicates that acridinic derivatives with N large number of aromatic carbons could have a strong binding to 9 NH 0.39 þ þ the G-quartet. A good example of this general contribution can be seen in compounds 23 –43 (acridones), where with the addition of a carbonyl group the aromaticity of the ring is lost, affecting the 10 HN N N 0.36 þ þ hydrophobic interactions. In these conditions the nitrogen atom is also unable to support a positive charge so the electrostatic inter- actions could not play their effect. This also explain why acridones are less potent than their respective acridines [30]. The combina- activity but only their inter-relation will bring a global interpreta- tion of both variables ( nCbH and nCar) explains the differences in tion of the molecule role in the interaction with G-quartet struc- the telomerase inhibitory concentration of compounds with ture. In Table 3 can be appreciated a clear example of this descriptor aromatic rings or cycloalcanes substituents at terminal position of inter-relation in three acridinic derivatives with different values of C9 side chain. Compound 77 has six aromatic carbons more than IC 50 . compound 79 , but five of them are of unsubstituted benzene with These results evidenced that the present model is able to a negative contribution to telomerase inhibitory activity. identify those acridinic derivatives with good telomerase inhibition As was previously explained, each one of the analyzed frag- and could be used in the design of potential selective telomerase ments has a direct effect in the prediction of telomerase inhibitory inhibitors with high affinity for the G-quartet, bringing also a good Author's personal copy

D. Castillo-Gonza´lez et al. / European Journal of Medicinal Chemistry 44 (2009) 4826–4840 4839

‘‘window of opportunity’’ for the design of library of novel acridinic a squared area within Æ3 standard deviations and a leverage derivatives. threshold h* of 0.226 ( h* ¼ 3p0/N, where p0 is the number of model The classification results for the training set are shown in Table 4 . parameters and N the number of chemicals). As can be seen in Fig. 7, The general performance of the model (goodness-of-fit) was the majority of compounds of the training set are inside this area. assessed in terms of its Cooper Statistic (see Table 5 ). However, two acridinic compounds have a leverage greater than h* The model classified correctly the 87.50% of compounds belongs but show standard deviation values within the limit, which implies to the more potent compounds (sensitivity) and with 82.85% those that they are not to be considered outliers but instead are influ- belongs to the less potent compounds group (specificity), for ential chemicals. These chemicals greatly influence the linear a global accuracy of 85.33%. Unclassified compounds were not discriminate analysis; in fact, the GLDA model is forced near the obtained. observed value and their residuals (observed–predicted value) are The most important criterion for the quality of the discriminant small; that is, they are well-predicted. For future predictions, if model is based on the statistics for the prediction set. The classi- a chemical belonging to the test set has a leverage value greater fication results for the prediction set are also shown in Table 4 . than h*, this means that the prediction is the result of substantial The results achieved shown an 87.50% of good classification for extrapolation and therefore may not be reliable [58]. As can be more potent compounds and 85.71% for compounds with IC 50 noted in Fig. 7, all compounds used as external data set lie within values greater than 1 mM. The global classification was 86.66%. this area, evidencing no extrapolation for their predicted values of The statistical results of the 10-fold cross-validation procedure telomerase inhibitory concentration. are depicted in Table 6 . The overall classification performed for the training and test sets were 84.80% and 89.33%, respectively. This 4. Conclusions model showed a high Matthews correlation coefficient (MCC) of 70.51. This statistical parameter quantifies the strength of the linear The inhibition of the enzyme telomerase is a promissory target relation between the molecular descriptors and the classifications. for the treatment of cancer. The results of this research are The prediction results for the 15% full cross-validation test evi- considered by these authors as the first QSAR study that allows denced the quality (robustness, stability, and predictive power) of a structural interpretation of the interactions of a novel family of the obtained models. compounds (acridines and acridones derivatives), with probed In this study, an external data of 10 novel acridine derivatives properties as telomerase inhibitors by stabilization of the quartet-G was also evaluated [38]. The results for this validation process are structure, with the telomeric structure. The results show how summarized in Table 7 . The equation showed a 90% of good simple molecular descriptors, with substructural information, are classification. able to differentiate the more potent acridines from the rest of The applicability domain of a (Q)SAR model is the response and compounds of this family with greater values of telomerase chemical structure space in which the model makes predictions inhibitory activity. This kind of computational model permits to with a given reliability [57] . In this paper, we developed a reliable identify functional fragments and groups that can be used in the QSAR model for predicting the antitelomerase activity by G-quad- design of libraries of acridinic derivatives with better predicted ruplex stabilization of acridinic compounds, considering TRAP properties to interact and stabilize the G-quartet. The present paper assay. In order to describe the applicability domain of the previous is a first step in the prediction of this important biological activity, QSAR model we used the leverage approach [59,60]. This approach where a good computational tool is necessary in order to increase provides a measure of the distance between the descriptor values the successful in the obtaining of novel and potent antitumoral for a chemical and the mean of descriptor values for all chemicals. compounds belongs to this heterocyclic family. So, we can plot leverage values vs standard residuals for each compound of the training set (see Fig. 7). From this plot (called Acknowledgment William’s plot), the applicability domain is established inside

The authors would like to dedicate the present paper to the memory of Maykel Pe ´rez Gonza´lez, one of the authors who died before finishing this work. This research was sponsored by the Cuban Higher Education Ministry (R&D project number 6.181- 2006).

References

[1] E.H. Blackburn, Cell 106 (2001) 661–673. [2] W.E. Wright, J.W. Shay, J. Am. Geriatr. Soc. 53 (2005) S292–S294. [3] K. Collins, J.R. Mitchell, Oncogene 21 (2002) 564–579. [4] M.A. Blasco, Embo. J. 24 (2005) 1095–1103. [5] G.H. Lang, Y. Wang, N. Nomura, M. Matsumura, Mar. Biotechnol. (NY) 6 (2004) 347–354. [6] L.A. Harrington, C.W. Greider, Nature 353 (1991) 451–454. [7] A.A. Avilion, L.A. Harrington, C.W. Greider, Dev. Genet. 13 (1992) 80–86. [8] K. Collins, C.W. Greider, Genes Dev. 7 (1993) 1364–1376. [9] L. Maes, E. Lippens, J.P. Kalala, L. de Ridder, Cell Prolif. 38 (2005) 3–12. [10] J.W. Shay, J. Cell Physiol. 173 (1997) 266–270. [11] J.W. Shay, W.E. Wright, Carcinogenesis 26 (2005) 867–874. [12] D.M. Shcherbakova, M.E. Zvereva, O.V. Shpanchenko, O.A. Dontsova, Mol. Biol. (Mosk) 40 (2006) 580–594. [13] T. Mashimo, H. Sugiyama, Nucleic Acids Symp Ser (Oxf) (2007) 239–240. [14] G. Lixia, Y. Fei, J. Jiajia, L. Jianhui, Biotechnol. Lett. 30 (2008) 47–53. [15] L.H. Hurley, R.T. Wheelhouse, D. Sun, S.M. Kerwin, M. Salazar, O.Y. Fedoroff, F.X. Han, H. Han, E. Izbicka, D.D. Von Hoff, Pharmacol. Ther. 85 (2000) 141–158. [16] F. Koeppel, J.F. Riou, A. Laoui, P. Mailliet, P.B. Arimondo, D. Labit, O. Petitgenet, Fig. 7. Williams plot based on classification model (Eq. (3)). C. Helene, J.L. Mergny, Nucleic Acids Res. 29 (2001) 1087–1096. Author's personal copy

4840 D. Castillo-Gonza´lez et al. / European Journal of Medicinal Chemistry 44 (2009) 4826–4840

[17] L. Oganesian, M.E. Graham, P.J. Robinson, T.M. Bryan, Biochemistry 46 (2007) [39] M.J. Moore, C.M. Schultes, J. Cuesta, F. Cuenca, M. Gunaratnam, F.A. Tanious, 11279–11290. W.D. Wilson, S. Neidle, J. Med. Chem. 49 (2006) 582–599. [18] W.J. Zhang, T.M. Ou, Y.J. Lu, Y.Y. Huang, W.B. Wu, Z.S. Huang, J.L. Zhou, [40] P. Belmont, J. Bosson, T. Godet, M. Tiano, Anticancer Agents Med. Chem. 7 K.Y. Wong, L.Q. Gu, Bioorg. Med. Chem. 15 (2007) 5493–5501. (2007) 139–169. [19] P. Phatak, J.C. Cookson, F. Dai, V. Smith, R.B. Gartenhaus, M.F. Stevens, [41] N.W. Kim, M.A. Piatyszek, K.R. Prowse, C.B. Harley, M.D. West, P.L. Ho, A.M. Burger, Br. J. Cancer 96 (2007) 1223–1233. G.M. Coviello, W.E. Wright, S.L. Weinrich, J.W. Shay, Science 266 (1994) 2011– [20] S. Neidle, M.A. Read, Biopolymers 56 (2000) 195–208. 2015. [21] S.M. Gowan, R. Heald, M.F. Stevens, L.R. Kelland, Mol. Pharmacol. 60 (2001) [42] D. Shi, R.T. Wheelhouse, D. Sun, L. Hurley, J. Med. Chem. 44 (2001) 4509–4523. 981–988. [43] P. Mailliet, A. Laoui, J.F. Riou, G. Doerflinger, J.L. Mergny, F. Hamy, T. Caulfield, [22] L. Rossetti, M. Franceschin, A. Bianco, G. Ortaggi, M. Savino, Bioorg. Med. Triazine Derivatives and Their Applications as Antitelomerase Agents, vol. Chem. Lett. 12 (2002) 2527–2533. US6887873 B2, Aventis Pharma S.A., France, 2005. [23] D. Gomez, J.L. Mergny, J.F. Riou, Cancer Res. 62 (2002) 3365–3368. [44] P. Mailliet, J.F. Riou, M. Alasia, G. Doerflinger, J.L. Mergny, A. Laoui, [24] S.M. Gowan, J.R. Harrison, L. Patterson, M. Valenti, M.A. Read, S. Neidle, O. Petitgenet, E. Renou, Chemical Derivatives and Their Applications as Anti- L.R. Kelland, Mol. Pharmacol. 61 (2002) 1154–1162. telomerase agents, vol. US 6858608 B2, Aventis Pharma S.A., France, 2005, 26. [25] Y. Ishikawa, T. Yamashita, Y. Tomisugi, T. Uno, Nucleic Acids Res. Suppl. (2001) [45] I. StatSoft, STATISTICA, 2001. 107–108. [46] R. Todeschini, V. Consonni, A. Maui, M. Pavan, Dragon (2005). [26] M.Y. Kim, M. Gleason-Guzman, E. Izbicka, D. Nishioka, L.H. Hurley, Cancer Res. [47] D. Weininger, J. Chem. Inf. Comput. Sci. 28 (1988) 31–36. 63 (2003) 3247–3256. [48] D.J. Klein, M. Randic´, D. Babic´, B. Lucˇic´, S. Nikolic´, N. Trinajstic´, Int. J. Quant. [27] C.P. Li, J.H. Huang, A.C. Chang, Y.M. Hung, C.H. Lin, Y. Chao, S.D. Lee, J. Whang- Chem. 63 (1991) 215–222. Peng, T.S. Huang, Pharm. Res. 21 (2004) 93–100. [49] M. Randi&cacute, New J. Chem. 15 (1991) 517–525. [28] T. Tauchi, K. Shin-Ya, G. Sashida, M. Sumi, A. Nakajima, T. Shimamoto, [50] A. Golbraikh, A. Tropsha, J. Mol. Graph. Model. 20 (2002) 269–276. J.H. Ohyashiki, K. Ohyashiki, Oncogene 22 (2003) 5338–5347. [51] K. Rose, L.H. Hall, L.B. Kier, J. Chem. Inf. Comput. Sci. 42 (2002) 651–666. [29] B. Guyen, C.M. Schultes, P. Hazel, J. Mann, S. Neidle, Org. Biomol. Chem. 2 [52] J. Stewart, L. Gill, Econometrics, second ed. Prentice Hall, London, 1998. (2004) 981–988. [53] F. Provost, T. Fawcett, Analysis and visualization of classifier performance [30] R.J. Harrison, A.P. Reszka, S.M. Haider, B. Romagnoli, J. Morrell, M.A. Read, comparison under class and cost distributions, in: A.A.f.A. Intelligence (Ed.), S.M. Gowan, C.M. Incles, L.R. Kelland, S. Neidle, Bioorg. Med. Chem. Lett. 14 Third International Conference on Knowledge Discovery and Data Mining (2004) 5845–5849. (KDD-97), American Association for Artificial Intelligence, 1997. [31] R. Kieltyka, J. Fakhoury, N. Moitessier, H.F. Sleiman, Chemistry (2007). [54] H. Toivonen, A. Srinivasan, R.D. King, S. Kramer, C. Helma, Bioinformatics 19 [32] L.G. Valerio Jr., K.B. Arvidson, R.F. Chanderbhan, J.F. Contrera, Toxicol. Appl. (2003) 1183–1193. Pharmacol. (2007). [55] R. Benigni, A. Giuliani, Bioinformatics 19 (2003) 1194–1200. [33] B. Li, M.P. Lyle, G. Chen, J. Li, K. Hu, L. Tang, M.A. Alaoui-Jamali, J. Webster, [56] N.H. Campbell, G.N. Parkinson, A.P. Reszka, S. Neidle, J. Am. Chem. Soc. 130 Bioorg. Med. Chem. (2007). (2008) 6722–6724. [34] C.M. Incles, C.M. Schultes, H. Kempski, H. Koehler, L.R. Kelland, S. Neidle, Mol. [57] S. Raic-Malic, D. Svedruzic, T. Gazivoda, A. Marunovic, A. Hergold-Brundic, Cancer Ther. 3 (2004) 1201–1206. A. Nagl, J. Balzarini, E. De Clercq, M. Mintas, J. Med. Chem. 43 (2000) [35] R.J. Harrison, S.M. Gowan, L.R. Kelland, S. Neidle, Bioorg. Med. Chem. Lett. 9 4806–4811. (1999) 2463–2468. [58] T.I. Netzeva, A.P. Worth, T. Aldenberg, R. Benigni, M.T.D. Cronin, [36] C.M. Schultes, B. Guyen, J. Cuesta, S. Neidle, Bioorg. Med. Chem. Lett. 14 (2004) P. Gramatica, J.S. Jaworska, S. Kahn, G. Klopman, C.A. Marchant, G. Myatt, 4347–4351. N. Nikolova-Jeliazkova, G.Y. Patlewicz, R. Perkins, D.W. Roberts, T.W. Schultz, [37] R.J. Harrison, J. Cuesta, G. Chessari, M.A. Read, S.K. Basra, A.P. Reszka, J. Morrell, D.T. Stanton, J.J.M. van de Sandt, W. Tong, G. Veith, C. Yang, ATLA 33 (2005) S.M. Gowan, C.M. Incles, F.A. Tanious, W.D. Wilson, L.R. Kelland, S. Neidle, J. 155–173. Med. Chem. 46 (2003) 4463–4476. [59] L. Eriksson, J. Jaworska, A.P. Worth, M.T.D. Cronin, R.M. McDowell, [38] C. Martins, M. Gunaratnam, J. Stuart, V. Makwana, O. Greciano, A.P. Reszka, P. Gramatica, Environ. Health Perspect. 111 (2003) 1361–1375. L.R. Kelland, S. Neidle, Bioorg. Med. Chem. Lett. 17 (2007) 2293–2298. [60] P. Gramatica, P. Pilutti, E. Papa, Atmos. Environ. 37 (2003) 3115–3124. Chapter 4. .

4.3 Discussion of Articles 2 and 3 The paper “ Telomerase Inhibitory Activity of Acridinic Derivatives: A 3D- QSAR Approach ” was the first paper using QSAR techniques for the prediction of this important biological activity. Here we could identify and use appropriate computational tools in order to increase the success rate in obtaining novel and potent compounds belonging to this heterocyclic family. For the division into training and validation sets we used hierarchical cluster analysis. The aims in hierarchical cluster methods is to group clusters to form a new one (Agglomerative) or to separate an existing cluster into two new ones (Divisive), trying to minimize any distance or to maximize any similarity measurement (246). The process of data splitting on training (T) and validation (V) subsets was conducted by the sequential application of Principal Component (PC) Analysis and Joining Tree/K-means Cluster Analysis (52), as implemented on the respective modules of Principal Components & Classification Analysis and Cluster Analysis of STATISTICA software (246). Our reference space were all the 3D molecular descriptors implemented on the DRAGON software (247). The molecular structure of each acridine was represented as a vector of 621 variables (32 Geometrical descriptors, 150 RDF descriptors, 160 3D MoRSE descriptors, 82 WHIM descriptors and 197 GETAWAY descriptors). A PC analysis was applied to all compounds of the dataset. Next, a joining tree cluster analysis using the complete linkage as amalgamation rule was applied to the dataset of compounds. In this way it is possible to obtain a graphic picture of the distribution and clustering of members of the data according to the PC-based structural representation provided. Next, a K-Means cluster analysis using the Euclidean distance as structural proximity (actually, dissimilarity) measure was applied. The number of clusters was set in such a way that members of each cluster agree as much as possible with the distribution obtained in previous joining tree cluster analysis. Finally ~ 25% of the data members were reserved for validation in such a way that each cluster is represented on both training and validation subsets. Distribution of training and validation cases can be accessed on Tables 6 and 7 of the paper. This procedure ensures that both training and validation subsets are uniformly populated from the molecular structure point of view, and that each structure pattern on the validation subset is represented on training subset. The goal here is to guarantee that predictions of new compounds based on models derived from such a training subset will be based on interpolations, avoiding the lack of

60

Chapter 4. . reliability associated to extrapolations (194, 246). Models were building for each molecular descriptors family separately and also for all they together. We try to provide an explanation about the relationships between some compounds and their molecular descriptors. For a more simple understanding of this explanation in the Figure 4.1 we provide the molecular structures and the activity values of the ligands used in the discussion.

Figure 4.1. Chemical structures of the acridines described in the discussion of the article 2 .

It is important to underline that the models presented in these two articles are ligand- based; for this reason they can only indicate if a given functional group or substituent is better or worse for the activity. It is not possible to explain the mode of interaction of the acridines with guanine quadruplexes using these methods. Nevertheless some molecular descriptors can relate with experimental evidences previously described in the scientific literature. It is this relationship that we want highlight in the interpretation

61

Chapter 4. . of the facts. For example, the HATS 7v molecular descriptor (Leverage weighted autocorrelation of lag 7/weighted by atomic van der Waals volumes), which accounts for the relative position of each atom in the 3-D molecular space and could be the related with the optimal distance for the interaction with the side chains (242): higher topological distances (7 or 8) have negative contributions to the activity. Another example is provided in article 3, in which the number of secondary amines in the molecular structure could relate with experimental observations (231, 248). In the same article the number of unsubstituted could relate with the importance of lateral side chains also described in previous SAR studies (231). These observations reinforce the utility of the simple QSAR models for the identification and prediction of new compounds and how these parameters can be used for molecular design with the objective of improving the biological activity. To check the reliability of the developed models we used an external set of compounds. Although internal validation techniques have been widely described as useful and necessaries in a model validation process, only the use of an external prediction set is necessary and sufficient condition to guarantee the quality and the predictive power of a model (213, 249). The external prediction set employed for this purpose in this work is a real set (250). That external set was collected, after the models had been made and validated using the internal techniques aforementioned on the paper. It is remarkable that using molecular descriptors and "low cost" computational techniques (if we compare with structure-based techniques such as docking or molecular dynamics) our models are capable to grasp the favorable or unfavorable effect of the substituents and features of the structures to the activity. These features are in concordance with previous conclusions obtained by experimental procedures and experiments based in the descriptions of molecular interactions. The concordance and results of our models provide good predictions with minimal computational cost in a very short time. Although relationships between molecular descriptors and experimental evidences have been found, the main disadvantage of the QSAR technique is that the biological or biophysical interpretation of the results is not conclusive for explaining the interactions that are established. In the chapter 6 we perform a work where we use a combination of ligand-based and structure-based techniques for the prediction of the activity.

62

Chapter 4. .

The optimal number of variables is another important point in the building of QSAR models. It has been defined for linear models the maximal number of possible variables. The maximum number of independent variables is the proportion between the number of comp ounds (N) in the training set and the number of adjustable parameters (p’) in the equation, higher than 4 since this is required to avoid overfitting in the linear models (251). For the case we presented in the article 2, with 64 compounds one can design possible models with 16 adjustable parameters or less, or in other words 15 variables and one intercept. An increase in the number of variables usually leads to the overfitting of the model, but it is also interesting to appreciate how the fit statistics varies for models that contain between 11 and 15 variables. The table 4.1 shows the regression models for 3D descriptors that contains between 11 and 15 variables. As we explained before (in the article), a considerable diminution in the R 2 of prediction is observed. This is consequent with models that describe appropriately the training set but not the test set. The higher value of this local maximum in the R 2 of prediction it is for the model with 7 variables. In other hand the standard error in prediction for the test set (SDEPpred) is increased if it is compared with other models with lower variable number. The minimal error in the prediction is obtained with 7 variables. Another interesting statistics is the Akaike Information Criterion (AIC) that is mainly used for the comparison between models with different dimensions, and is expected to decrease with an increase in the number of variables . In contrast, the con Mutation and SElection Uncover Models (MUSEUM) approach, also known as FIT, evaluates the quality of the models maximizing the fitness function. The FIT parameter must increase with the number of variables or present a local maximum in the presence of a good model, and its value can then be used to select the best model. We did not observe a marked decrease in AIC value. The maximal value of FIT was obtained for the model with 11 2 variables but in both cases (models with 11 and 14 variables) the values of R pred is too low to consider these models in comparison with the model with 7 variables.

63

Table 4.1. Regression models obtained with all 3-D molecular descriptors and different variable numbers (from 11 to15).

Variable 2 2 2 2 Model equation R Q LOO R adj R pred F SDEP SDEP pred SDEC s FIT AIC LOF numbers RDF105u, RDF030m, RDF015p, RDF080i, Mor13m, E2m, H7s, R2e, 11 87.69 82.08 85.09 24.47 33.67 0.342 0.779 0.284 0.315 1.986 0.145 0.187 R7e, SP07, SP20 QZZi, RDF020s, RDF030s, Mor09p, Mor26p, P1p, H6u, HTm, 12 84.50 76.68 80.85 36.66 23.17 0.243 0.556 0.318 0.357 1.328 0.192 0.260 HATS4i, H6s, H7s, R1u RDF125u, RDF030m, RDF155m, Mor08u, Mor19m, Mor04e, 13 85.62 77.71 81.88 3.66 22.90 0.382 0.712 0.307 0.347 1.269 0.183 0.267 Mor13e, Mor21e, H4v, HATS3p, R1i, DP11, DP12 CMBL, RDF080u, RDF105u, RDF095e, RDF100e, RDF075p, 14 90.27 84.63 87.50 19.29 32.49 0.317 0.628 0.252 0.288 1.731 0.134 0.201 RDF030s, E2v, G2p, P2i, HATS4u, H7s, HATS7s, SP17 RDF020m, RDF075m, RDF085m, RDF135v, RDF155e, RDF030s, 15 87.83 79.03 84.03 2.40 23.10 0.370 0.690 0.282 0.326 1.189 0.177 0.282 RDF125s, Mor08u, Mor13u, Mor22u, P1s, H5v, HATS4e, H7s, R1i

Chapter 4. .

In the article number 2, compound 39 is described as an outlier . An outlier is an observation that is numerically distant from the rest of the data (252). For us, an outlier is referred to a compound with an error in the activity determination or a miscalculation in the molecular descriptors, which inserts a deviation in the model that perturbs the right modeling process (253). The most usual recommendation it is to remove these compounds from the database to improve the models. Nevertheless the presence of the morpholine groups joined to the end lateral chains in the positions 2 and 6 of the acridine (compound 39) is very interesting because this compound have an elevated IC 50 value, comparable only with compounds not included in the regression as an exact tel activity value could not be measured (compound 16 possesses a IC 50 >50μM value, which cannot be used in the regression) Figure 4.2. In other hand the other ligand that has the same substituent group is one of the acridines that have IC50 > 50µM and is not included in the building of the model. These facts suggest that the morpholine substituent at the end of two side chains in acridines has a harmful effect on activity. Then, we have considered including compound 39 into the model, because it is possible that the compound could be an activity cliff, where its structural characteristics describe and contain important information concerning to this functional group.

Figure 4.2. Molecular representations of the compound 39 from article 2.

Compound 39 is not considered an outlier when other techniques for determining the application domain are used. For example, the graph that we show in the Figure 4.3

65

Chapter 4. .

represents the application domain for the same data using leverage and standardized residuals values. The ligands considered as outliers following this procedure are compounds 49 and 50. It is very easy to conclude that the compounds in and out of the application domain are greatly dependent of the technique used. For this reason it is necessary to be very careful before excluding compounds from the database in the modeling process.

Standardized residuals residuals Standardized

Leverage

Figure 4.3. Application domain for the compounds of the article 2 , following the built procedure described in article 3.

Nevertheless in the face of this dilemma it is necessary to analyze what can be defined as an activity cliff and what it is an outlier. When and how should we remove a compound from the database because it has an atypical activity value? The answers to these questions are related with the activity landscape which is a function of the chemical-space representation employed for modeling the groups of compounds under study. The activity landscapes are associated to many factors such as the nature of the molecular representation and the assay (if it is an enzyme-based a cell-based assay, etc.), the regions of chemical space from which the compounds are drawn and the density distribution of the compounds in these regions (253). One generally considered that similar compounds possess similar activities values. Exceptions are defined as

66

Chapter 4. . activity cliffs. Indeed an activity cliff it is defined by the ratio between the difference in activity of two compounds and their “distance” of separation in a given chemical space (253). Then why a linear QSAR may fail for the modeling of the activity cliff? They are several possible reasons. One of them it is that when purely linear models (in which none of the parameter or the molecular descriptors are nonlinear), are used for the building of the models, these parameters are not very satisfactory to describe the activity landscapes of the cliffs. Another reason is that the compounds considered as outliers in the data are not due to inaccurate measurements of their properties or statistical fluctuations, but constitute a signal of the cliff activity presence. Finally to define a compound as a cliff, one must perform additional assays to the neighborhood of this compound to ensure that the local activity is appropriate for QSAR or SAR modeling (253). One must perform a rigorous detection and selection to remove the true outliers of the databases, because they must be not confused with an activity cliff. In the chapter 5 of this thesis we propose a comprehensive procedure for the curation of the databases and treatment of compounds that could constitute activity cliffs. We include some points that must be taken into account for considering a ligand as a cliff or not and when it should be removed or kept. The article “ Prediction of telomerase inhibitory activity for acridinic derivatives based on chemical structure ” uses, like in the previous one, mono -, bi- and tri- substituted acridines. In this case we focus on the building of a classification model for discriminating these acridines for their action over the telomerase enzyme. Therefore the utility of the model that we suggest here in future predictions will be only applicable to other acridines acting telomerase through the formation of G4 structures. In the time that this paper was published other strategies based on acridines for predict a possible tel range of Ic 50 value, using linear discriminant analysis, have not been published. This classification strategy tries to identify the substructures that have a positive influence on activity. This information can be used for hit-to-lead optimization and to guide the choice of modifications that may improve inhibitory activity. The molecular structures of the acridines employed in the building of the models and the activity values have been collected from articles published by Pr. S. Neidle and coworkers (151, 164, 172,

67

Chapter 4. .

231, 241, 242, 254-260). These acridines have been evaluated under the same experimental conditions. In the paper we performed a division into training and test sets. The process of data splitting on training (T) and validation (V) subsets was conducted by the sequential application of Principal Component (PC) Analysis and Joining Tree/K-means Cluster Analysis, similar to previous papers (52, 246). In that case the reference space were constituted by all the one-dimensional molecular descriptors implemented on the DRAGON software (247). The molecular structure of each acridine was represented as a vector of 73 variables (constituted by 34 functional group counts descriptors and 39 atom-centered fragments molecular descriptors; molecular descriptors with constant values were discarded). One-dimensional molecular descriptors implemented in the DRAGON software are very useful as, independently of their simple definition, they reflect molecular composition and atom connectivity on the molecule (see chapter 2 for details) (52, 190, 261). The 1D MD definition based on the presence or not of specific atoms groups either functional groups in the molecular structure makes possible an easy physico-chemical interpretation of them. They can relate with the electronic environment or the steric characteristics of a certain functional group (261) which are essential aspects to understand the established interaction natures. On top of it, one dimensional MD is of special interest as it is easier to check that the correlation between the activity and the independent variables present in the model is not causality correlation (262). It is possible to identify possibilities of cause correlation taking in account experimental facts described in literature. Other definitions in QSAR modeling divide it in two kind of models, useful for predicting and useful for understanding (262). Of course models that comply with both conditions are preferred over models that just comply with one of them. On the other hand, the interpretation of the model must be done adjusting to the proven facts, because of the risk to do it based in causal correlations (261, 262). The interpretation of the models must be done based on the structural information encoded in the selected molecular descriptors which are the most informative ones that significantly correlate with the activity of the compounds. However, it is not always possible a straightforward analysis of this correlation since the descriptors can be based in very abstract and complex mathematical definitions making difficult to interpret the resulting numbers as chemical information. Whenever the chemical information can be extracted from the molecular descriptors it is important to find correlations between structural moieties

68

Chapter 4. . previously described in SAR literature related to the compounds and the biological activity under study as a validation of our models. If a determinate parameter (for example, the planarity of the ligand) can be correlated with activity and if it is possible to find a relationship between some molecular descriptors (for example, number of aromatic carbons nCar) and this parameter, such MD can be a useful tool for the design of new molecules. Of course, the more diverse the database used to build the models is, the better and more general will be the model in terms of coverage of the chemical search space defined by the applicability domain of the model. This kind of model can be used to predict the activity of different families of compounds (261). In the case of this article (number 3) only a congeneric set of acridinic ligands have been used in building the QSAR models. Therefore the application of such models is only valid for derivatives of acridines included in the applicability domain of these models. Training and validation sets were composed by 75 and 15 compounds respectively. Validation test represents the 16% of the data members; each cluster is represented on both training and validation subsets. Distribution of training and validation cases can be accessed on Table 4 of the paper. This procedure ensures that both training and validation subsets are uniformly populated from the molecular structure point of view, and that each structure pattern on the validation subset is represented on training subset. The goal here is to guarantee that predictions of new compounds based on models derived from such a training subset will be based on interpolations, avoiding the lack of reliability associated to extrapolations (194, 246). tel It has been described that ligands with IC 50 values lower than 5 μM can be considered as a potent telomerase inhibitors (263, 264). Our purpose it is find a model that describe appropriately the activity and use it for the prediction of new and more powerful ligands. In order to achieve this aim, to obtain a model that allows the tel identification of better ligands, we lowered the IC 50 cutoff value to 1 μM. Usually the test set is composed of 15-25% of all data. The training set is employed for the building of the models and the test set is used to assess the predictive ability of the obtained discriminant function. In terms of interpretability it is necessary to specify one point that is not totally explained or well set in the paper. In the article 3, when mentioning the effect of a certain variable in relation with the activity, we have said that the variable(s) “contribute” to the prediction, or “contribute” to the activity . The influence of a variable in the activity is analyzed in terms of its coefficient value and

69

Chapter 4. . sign in the mathematical equation that define the predictive model. This influence or “contribution” is normally described as positive when increasing the value of the variable has a positive correlation with the activity or negative otherwise. This is the basis of the interpretation of the model when applied to describe certain biological activity. Using the relation between every variable and the activity is a useful guide for the design of new molecules with the desired biological response. We would like highlight some points related with the positive contribution of the nCar variable to the discrimination between groups. The model predicts that compounds with high electronic delocalization, like acridines and anthraquinones are better for the activity than the compounds that do not involve a fully delocalised system, such as the acridones. The conjugation of the acridinic ring favours the stacking interactions between the compound and a guanine quartet. Then, based in our model we can infer that a decrease in aromaticity causes an adverse effect on these interactions. The loss of the aromaticity in acridones (respect to acridines) joined to the loss of the charged nitrogen atom in the central ring of the acridine (which contributes to electronic delocalization) lead to a decrease in effective π -π stacking interactions with adjacent base pairs in the G4 structure. Then QSAR linear models presented in this chapter are capable to predict, to describe and to identify factor associated with the stabilization of guanine quadruplexes and the inhibition of telomerase. They illustrate the possibilities of using simple techniques to model a complex biological phenomenon. The models we propose may be used to predict the activity of new acridines.

70

Chapter 5. .

Chapter 5. Database collection, linear models of non-congeneric sets of compounds, virtual screening of FDA-approved compounds, and experimental evidence for the G-quadruplex stabilizing activity of neuroleptic antipsychotic drugs.

Introduction A number of groups have sought new compounds that inhibit telomerase by stabilizing G-quadruplex structures. In this chapter we discuss how telomere stabilization by formation of G-quadruplex structures induces eradication of cancer cells. We will describe the G-quadruplex-stabilizing agents that have activity in in vivo tumor models. Finally, we describe a database of inhibitors of telomerase that act through stabilizing G-quadruplex structures. Different LDA models were constructed based on these compounds and their activity, and a virtual screening of a database of FDA-approved drugs was performed (265-267). The ligands predicted to bind G- quadruplex structures were evaluated experimentally (Figure 5.1).

Figure 5.1. Schematic of ligand identification procedure used in this chapter.

5.1 Xenograft models demonstrate the anticancer activity of G-quadruplex ligands For most G-quadruplex ligands no evidence of anticancer activity is reported and for other compounds only in vitro information is available (268, 269). A recent cell-based study suggests that ligands that target and stabilize more than one type of G-quadruplex topologies (i.e., telomeric and c-kit ) display high anticancer activity (270).

71

Chapter 5. .

Unfortunately, for most G-quadruplex ligands, cellular and in vivo data are rarely available (271). Here, we will summarize the evidence for G-quadruplex antitumor activity in various xenograft models (Figures 1.4 and 5.2). Braco-19 inhibits growth of UXF1138L human uterine carcinoma (255) and A431 human epithelial carcinoma tumors in mice (272). The BRACO-19 (Figure 1.4) molecule was designed using qualitative molecular modeling using the crystal structure of the native parallel human telomeric quadruplex as a template. It was rationalized that each of the three substituents emanating from the acridine core of BRACO-19 should interact with a quadruplex groove (151). Quarfloxin (Figure 5.2) showed activity in MDA-MB-231 human breast cancer and MIA PaCa-2 human pancreatic cancer xenograft models (273), and this compound has entered clinical trials. Telomestatin (Figure 1.4) activity has been proven against U937 human lymphoma (274). TMPyP4 (Figure 1.4) halts growth of PC-3 human prostate carcinoma and MX-1 mammary tumors (275). Finally, RHPS4 (Figure 5.2) is active against UXF1138L human uterine carcinoma (276), M14, LP, LM melanomas (277) and CG5 breast carcinoma (278). Table 5.1 summarizes some in vivo data on quadruplex-binding ligands suggesting that quadruplex ligands may be useful for the treatment of solid tumours.

Figure 5.2. Some of the most representative compounds with antitumor activity in xenograft models.

72

Chapter 5. .

Table 5.1. In vivo data on quadruplex-binding ligands . Data taken from (271).

G-4 Ligand Xenograft model Dosage Tumour response Ref (mg/kg -1) TMPyP4 MX-1 mammary tumor 10, 20; i.p. Survival increase from (275) 45% to 75% PC-3 human prostate 40; i.p. 60% tumour shrinkage (275) carcinoma Telomestatin U937 human lymphoma 15 80% tumour shrinkage (274) BRACO-19 UXF1138L human uterine 2; i.p. 96% tumour shrinkage (255) carcinoma + some complete remissions A431 human epithelial 2; i.p. Not significant (272) carcinoma Quarfloxin MDA-MB-231 human 6.25, 15.5; i.v. 50% tumour shrinkage (273) breast cancer MIA PaCa-2 human 5; i.v. 59% tumour shrinkage (273) pancreatic cancer RHPS4 UXF1138L human uterine 5; oral 30% tumour shrinkage (276) carcinoma M14, LP, LM melanoma 10; i.p. 40 –51% tumour weight (277) reduction CG5 breast carcinoma 15; i.v. 75% tumour shrinkage (278) i.p., intraperitoneal; i.v., intravenous

Our identification of G-quadruplex ligands using LDA models from a database of FDA-approved drugs is described in the attached Article 4.

5.2 Article 4. “FDA -approved Drugs Selected Using Virtual Screening Bind Specifically to G-quadruplex DNA”. Current Pharmaceutical Design, 2013, 19 (12):2164-73 Daimel Castillo-González, Gisselle Pérez-Machado, Aurore Guédin, Jean-Louis Mergny and Miguel- Angel Cabrera-Pérez.

Summary In this paper, FDA-approved compounds have been tested as G4-forming and stabilizing agents. Initially, an in silico database of compounds reported to act as G4- based telomerase inhibitors was built. For the modeling the database is divided in two parts (training and test set) employing cluster analysis; three cutoff values (1, 5 and 10 µM) were selected. The best models according to its statistical parameters for each cutoff are selected and they are used to carry out a consensual virtual screening to the FDA-approved database of compounds. The more affordable predicted compounds were

73

Chapter 5. . bought and tested as G4 stabilizing agents. Six of them showed stabilizing properties while triamterene, the best scoring molecule in the computational predictions, did not show any stabilizing activity (data not shown). With these models developed using TRAP data (which disadvantages have been discussed before), the prediction of compounds with stabilizing activity over the G4 is possible. The ΔTm values are relatively low, but all activities are significant. Interestingly, other evidences found in the literature (included in § 4.3) suggest that some of the compounds identified by us in this work have anticancer properties. The link between these anticancer properties of these compounds and their G4 stabilisation potency remains to be proved. The proposal database and the main statistical parameters of the QSAR models are showed in an electronic file (ESM1 ).

74

Send Orders of Reprints at [email protected] Current Pharmaceutical Design, 2013 , 19, 000-000 1 FDA-approved Drugs Selected Using Virtual Screening Bind Specifically to G-quadruplex DNA

Daimel Castillo-González ABCD *, Gisselle Pérez-Machado AB , Aurore Guédin C,D , Jean-Louis Mergny C,D and Miguel-Angel Cabrera-Pérez AB

ADepartment of Pharmacy, Central University of Las Villas, Santa Clara 54830, Villa Clara, Cuba; BMolecular Simulation and Drug Design Group, Centre of Chemical Bioactive, Central University of Las Villas, Santa Clara 54830, Villa Clara, Cuba; CUniv. Bor- deaux, ARNA laboratory. IECB, F-33600 Pessac, France; D INSERM, U869, ARNA laboratory, F-33000 Bordeaux, France

Abstract: Guanine-rich sequences found in telomeres and oncogene promoters have the ability to form G-quadruplex structures. In this paper we describe the use of a virtual screening assay to search a database of FDA-approved compounds for compounds with the poten- tial to bind G-quadruplex DNA. More than 750 telomerase inhibitors were identified in a literature search as acting through G-quadruplex stabilization, and from evaluation of these compounds, theoretical models capable of discriminating new compounds that bind G- quadruplex DNA were developed. Six compounds predicted to bind to the G-quadruplex structure were tested for their ability to bind to the human telomeric DNA sequence. Prochloroperazine, promazine, and chlorpromazine stabilized the G-quadruplex structure as determined by fluorescence resonance energy transfer techniques. These compounds also bound to promoter sequences of oncogenes such as c-myc and K-ras . Amitriptyline, imipramine, and were less stabilizing but did bind to the G-quadruplex. The ability of prochloroperazine, promazine, and chlorpromazine to recognize G-quadruplex structures was confirmed using a fluorescent intercalator displacement assay, in which displacement of thiazole orange from G-quadruplex structures was demonstrated. Interestingly, these compounds exhibited selectivity for the G-quadruplex structure as all had poor affinity for the duplex sequence. Keywords: FRET melting, G-quadruplex, QSAR, telomeres, virtual screening.

1. INTRODUCTION center allowing hydrophobic interactions and - stacking with a Guanine quadruplexes (G-quadruplexes) formed by guanine- terminal G-quartet. The presence of electronegative elements such rich DNA or RNA sequences are four-stranded structures as nitrogen in the center of the ligand often contributes to characterized by planar arrangements of four guanine bases, called electrostatic interactions with the nucleic acid. Many of these tetrads or G-quartets. These structures are stabilized in the presence compounds have flexible side chains with electronegative atoms of sodium and potassium ions [1] and may adopt different topolo- towards the end of the chain that interact with the phosphates gies [2-6]. G-quadruplexes may include varying numbers of G- located in the grooves or loops of the DNA structure. One such quartets  most commonly three, but sometimes two or four. They compound is currently in phase II clinical trials [32]; however, also vary in their loop topology and strand directionality, which anticancer drugs that bind specifically to G-quadruplex but not may be parallel, antiparallel, or mixed parallel/antiparallel. The duplex DNA are not available yet. formation of G-quadruplexes has been extensively studied as te- The discovery and development of a new drug is an expensive lomere sequences [7-11], oncogenic promoters [12-15], and mini- and difficult process [33]. It is therefore very attractive to find new satellite repeats [16] have the potential to adopt these structures. therapeutic uses for drugs that are already on the market. With the Telomeres are located at the end of chromosomes, and the knowledge of pharmacokinetic and toxicological profiles, these formation of G quadruplexes may inhibit elongation by the drugs can quickly enter phase II clinical trials with a cost reduction telomerase enzyme or perturb telomere capping. Stabilization of the of about 40% relative to a drug that has not previously been G-quadruplex structure in oncogenic promoter sequences leads to evaluated clinically [34]. In addition, commercialized drugs tend to transcriptional inhibition [17, 18]. Promoters in many of these on- have acceptable profiles of adsorption, distribution, metabolism, cogenes are able to form G-quadruplexes with vast diversity in their and excretion (ADME). Because of this, approved drugs are a good folding patterns and loop lengths, putatively making them amenable starting point when a novel therapeutic target is identified. to targeting with drugs that bind specifically to certain G- Virtual screening of chemical databases using quantitative quadruplex conformations [19]. For these reasons ligands that structure–activity relationship (QSAR) models or other methodolo- stabilize G-quadruplex formation have potential in anticancer gies is a widely used approach for discovery of small molecule therapies [20]. The promoter region of the c-myc gene is G-rich; c- inhibitors and has been successfully applied to analysis of com- myc controls expression of a variety of genes responsible for en- pounds that bind G-quadruplexes [35-41]. The major advantage of hancing the proliferative capacity of cells [21]. One of these genes virtual screening is that chemical diversity is generated without the is hTERT , which encodes the catalytic subunit of telomerase [22] need for chemical synthesis. By identifying potent small molecules and consequently influences the elongation of telomeres [23]. binders in silico , the number of compounds to be tested in vitro can The scientific literature describes a large number of compounds be vastly reduced. Experimentally confirmed hits may then be used that stabilize G-quadruplex structures [24-30]. As previously to guide further synthesis and the design of more powerful candi- described [31], these compounds generally have a planar aromatic dates. Different techniques for virtual screening of compounds to *Address correspondence to this authr at the Molecular Simulation and identify quadruplex binding ligands based on the use of pharma- Drug Design Group, Centre of Chemical Bioactive, Central University of cophore [42] and virtual screening using molecular docking tools Las Villas, Santa Clara 54830, Villa Clara, Cuba; Tel: +53 42 281 473; have been reported [38]. A detailed and comprehensive review of Fax: + 53 42 281130; E-mails: [email protected]; [email protected] recent advances in computational processing power and molecular

1381-6128/13 $58.00+.00 © 2013 Bentham Science Publishers 2 Current Pharmaceutical Design, 2013 , Vol. 19, No. 00 Castillo-González et al. docking algorithms for the rapid and efficient discovery of G- significance (considered to be significant if lower than 0.05). Com- quadruplex-interacting molecules and the perspectives on the poten- pounds for the training set were randomly collected from the previ- tial application of in silico techniques for the discovery of RNA G- ous clusters. This procedure allowed us to select compounds for the quadruplex-binding ligands has been published recently by Ma and training and prediction sets in a representative way from all levels collaborators [43]. In this paper we described use of the QSAR of the linking distance (Y-axis). The training and prediction sets are approach to evaluate a library of FDA-approved drugs for their described in Table 1 . The compounds in the prediction set were not ability to bind to a G-quadruplex structure. We began by evaluation used in the development of the discriminant function but were re- of molecules known to be active in the telomere repeat amplifica- served to assess the discriminant model obtained. The difference tion protocol (TRAP) assay. This is the first works that applies a among the total number of compounds for each cut-off is due to the QSAR methodology to a non-congeneric set of compounds and inexactly reported activity for some. For example, compound 108 tel includes the report of a database of ligands with this activity. Data [51] (Supplementary Material S1) has a IC 50 of > 1 M, consid- from this evaluation allowed us to define predictive rules that were ered as inactive for the cut-off of 1 M, but it cannot be appropri- used to select potential G-quadruplex ligands from an independent ately used when the cut-off values were 5 or 10 M. A similar issue library of FDA-approved drugs. Among those drugs predicted to be was encountered with compounds 429 and 430 [52], which have tel active, six were experimentally tested and confirmed to bind to the IC 50 s > 7 M. human telomeric quadruplex and other oncogenic G-quadruplex Table 1. Numbers of Compounds for Each Cut-off in Training targets. and Prediction Sets 2. MATERIALS AND METHODS 2.1. QSAR Models Cut-off More active Less active Total Set We first prepared a non-congeneric database of more than 750 value  Cut-off > Cut-off compounds described to inhibit telomerase activity via G- quadruplex stabilization (Supplementary Material S1). The 238 400 training compounds were taken from refereed scientific publications in 1 M 783 recent years as well as from patents. The main classes of 54 91 test compounds are pentacyclic acridines, berberine derivatives, fluoro- 388 244 training quinophenoxazines, ethidium derivatives, mono-, bi- and tri- substi- 5 M 782 tuted acridines, acridones, pyridinium derivatives, quindoline de- 93 57 test rivatives, triazines, fluorenones, neomycine derivatives, amidoan- thracene-9,10-diones, catecholic flavonoids, benzoindoloquinolines, 437 202 training and porphyrins [44-46]. In all cases we used the reported activity of 10 M 780 the compounds in the TRAP assay, expressed as the concentration 94 47 test reported to cause 50% inhibition of the enzyme (IC 50 ). It should be noted that some of these values do not reflect telomerase inhibition We computed the parameters corresponding to 0D-1D, 2D, and but rather quadruplex-related PCR amplification artifacts [47]. 3D molecular descriptors employing DRAGON software [53]. The Thus, these values are sometimes different from the IC 50 value in a 3D descriptors were obtained after an MM+ geometry optimization direct telomerase inhibition assay; however, these values do reflect of each compound using the semi-empirical AM1 method [54] in- the particular compound’s affinity for the G-quadruplex structure. cluded in the Mopac 6.0 computer software [55]. For the other de- We used Linear Discriminant Analysis (LDA) techniques to scriptors, SMILES (simplified molecular input line entry specifica- derive theoretical models capable of discriminating new TRAP- tion) was used as input. More than 1400 descriptors were calcula- active compounds from inactive molecules. For the LDA analysis, ted. the compounds were first clustered into two groups according to tel The discriminant function (Eq. (1)) that best described the their published mean TRAP inhibitory concentration values ( IC 50 ). TRAP inhibitory concentration values as a linear combination of We used three cut-off values: 1, 5 and 10 M. The first group (the tel the predictor X variables (molecular descriptors) and weighted by most active compounds) included all chemicals with IC 50 less than the a n coefficients was obtained with the General Linear Discrimi- or equal to the chosen cutoff value, the second group included those tel nant Analysis Module (GLDA) implemented in STATISTICA with IC 50 greater than the cut-off value [48, 49]. This classifica- software 8.0 [50]: tion criterion was adopted to yield a reasonable ratio of active to less active chemicals in the dataset. CLASS = a 1A1 + a 2A2 + a 3A3……+ a nAn + a 0 Eq.1 In order to obtain a validated and predictive QSAR model, the In developing these classification functions, CLASS values of available dataset was divided into training and prediction sets. Ide- +1 and -1 were assigned to compounds with higher and lower val- ally, the division into these sets should satisfy the following condi- ues of IC 50 with respect to each cut-off, respectively. The “ best tions: (i) the compounds of the prediction set must closely resemble subset ” technique, using the Wilks Lambda ( ) value [50] as the those of the training set in the multidimensional descriptor space; criterion for choosing the best subset of predictor effects, was ap- (ii) all compounds in the training set must be similar to those of the plied to select the molecular descriptors (X variables) with the most prediction set in the multidimensional descriptor space, and (iii) the influence on the dependent variable. The principle of maximal par- training set compounds must be distributed throughout the area simony (Occam’s razor) was used as a strategy for model selection: occupied by the entire dataset. The division into training and pre- We selected the models with highest statistical significance and the diction sets was carried out using a k-means cluster analysis (k- fewest parameters (a k). MCA) [50], first with the more active compounds and then with the The general performance of the best classification model was less active compounds. assessed in terms of Cooper statistics [56]. Several diagnostic statis- The k-MCA was performed using the k-MCA module of STA- tical tools were used in evaluation of the final model equation in TISTICA software v. 8.0 [50]. To obtain acceptable statistical qual- terms of goodness-of-fit and goodness-of-prediction. The goodness- of-fit was estimated by standard statistics such as the  value, the ity of data clusters, we took into account the number of members in 2 each cluster and minimized the standard deviation of the variables square of Mahalanobis distance (D ), the Fisher ratio (F), the corre- in the cluster. We also inspected the standard deviation between and sponding p-level (p), and the proportion between the cases and within clusters, the respective Fisher ratios, and their p-level of variables in the equation. The validity of the pre-adopted assump- FDA-approved Drugs Selected Using Virtual Screening Bind Current Pharmaceutical Design, 2013 , Vol. 19, No. 00 3 tions such as normality, homocedasticity, noncolinearity, and line- formed from oligonucleotides d-GCGTGAGTTCGG and d- arity of the model were also determined. The goodness-of- CCGAACTCACGC; the ds26 duplex had the self-complementary prediction of the final model was assessed by an internal cross- d-CAATCGGATCGAATTCGATCCGATTG sequence. The fluo- validation (CV), specifically by the leave-group-out (LGO) tech- rescence of TO bound to DNA is referred to as FI 0. The fluores- nique [57]. Basically, CV-LGO consists of forming several subsets cence of TO in nucleic acid wells upon addition of ligand is re- from the complete data set, each missing a small group of k cases ( k ferred to as FI . The background fluorescence was subtracted from = 20% of the data set). These k cases are used to validate a new the measured fluorescence in DNA or DNA + ligand wells (FA = model that is trained with the corresponding subset. Quality (good- FI-Fb or FA 0 = FI 0 -Fb). We then calculated the percentage of TO ness-of-fit) of the new models gives a measure of the predictive displacement by the formula previously described [59]: ability of the full model (10-fold LGO-CV was required). TO Displacement (%) = 100 - ((FA/FA 0) x 100) Eq. 2 2.2. Oligonucleotides 2.5. Fluorescence Melting Studies Oligodeoxynucleotide probes were synthesized by Eurogentec Fluorescence can be used to probe the secondary structure of (Belgium). All concentrations are expressed in strand molarity and oligodeoxynucleotides mimicking repeats in the guanine-rich strand were determined using a nearest-neighbour approximation for the of vertebrate telomeres when a FAM (fluorescent tag) and a absorption coefficients of the unfolded species [58]. F21T is a dou- TAMRA (quencher) are attached to the 5´ and 3´ ends of the oli- bly labeled 21-nucleotide oligodeoxynucleotide that mimics 3.5 gonucleotide, respectively [60]. In the experiments presented here, copies of the human telomeric guanine-rich strand; it was modified a real-time PCR apparatus (MX3005P, Stratagene) was used to with 6-carboxyfluorescein (FAM) at the 5´ end and tetramethylrho- simultaneously record fluorescence of 96 samples. Initial experi- damine (TAMRA) at the 3´ end. The sequence of F21T is FAM - ments were performed at concentrations of 1, 5, and 10 M ligand G3(T 2AG 3)3-Tamra . in the presence of the telomere mimic F21T . The ‘‘K +’’ conditions corresponded to 10 mM lithium cacodylate (pH 7.2), 10 mM potas- 2.3. Fluorescence Enhancement Measurements sium chloride, 90 mM lithium chloride; ‘‘Na +’’ conditions were 10 Experiments were performed in 96-well microplates from mM lithium cacodylate (pH 7.2), 100 mM sodium chloride. LiCl Stratagene. Each condition was tested at least in triplicate. Meas- was added in order to approach physiological ionic strength without urements were performed at 25 °C in a volume of 25 L. All se- stabilizing the quadruplex (lithium does not interact with G- quences adopt the G-quadruplex structure in KCl at this tempera- quartets). ture. Two equivalents of 1 M thiazole orange (TO) were added to  Experiments were also performed to test the effect of drugs on a solution of 0.5 M oligonucleotide. Samples were incubated in 10  other oligonucleotides that form G-quadruplex structures: mM lithium cacodylate buffer (pH 7.2) supplemented with 100 mM KCl. The final concentration of ligand under study was 50 M. FKit1T : FAM -GGGAGGGCGCTGGGAGGAGGG- Tamra Fluorescence emission was collected at 516 nm with 8-fold gain FKit2T : FAM -GGGCGGGCGCGAGGGAGGGG- Tamra after excitation at 492 nm in a Stratagene Mx3005P instrument. The FmycT : FAM -TTGAGGGTGGGTAGGGTGGGTAA- Tamra wavelengths do not correspond to the maximum excitation and emission wavelengths of TO (501/534 nm), but rather to commonly FKras35B1T : FAM - available fluorescein filters (492/516 nm). This was not detrimental AGGGCGGTGTGGGAAGAGGGAAGAGGGGGAGGCAG - to the measurements, however, given that the bandwidths (± 5 nm) Tamra allowed a good excitation and recovery of the emission. FdxT : FAM -TATAGCTATA-hexa ethylene glycol- TATAGCTATA- Tamra . 2.4. Fluorescent Intercalator Displacement Assay The melting of G-quadruplex was monitored at concentrations All experiments were performed in 96-well microplates. Each of 1 and 20 M in a volume of 25 L in the presence and in the condition was tested in duplicate in at least three independent ex- absence of the drug by measuring the fluorescence of fluorescein. periments. The total volume of each test sample was of 25 L.  The oligonucleotides with the highest T1/2 values were tested Samples were incubated in 10 mM lithium cacodylate buffer (pH alone and in the presence of a DNA double-stranded competitor 7.2) supplemented with 100 mM KCl. The pre-folded DNA target (the self-complementary ds26 ) at concentrations of 3 and 10 M. In (0.5 M strand concentration for intramolecular G-quadruplex oli- the absence of a quadruplex-interacting drug, this competitor had gonucleotides and 1 M strand concentration for duplexes) was no effect on the melting temperature of the quadruplex (data not mixed with 1 M TO. Ligands were tested at 10, 25, and 50 M. shown). All experiments were performed at least in duplicate. The fluorescence of samples was measured at 25 °C in a Stratagene Emission of fluorescein was normalized between 0 and 1, and the Mx3005P instrument. The temperature was kept constant with a T1/2 was defined as the temperature for which the normalized emis- thermostat cell holder (Peltier). The TO was excited at 492 nm (± 5 sion is 0.5. nm) and the emission was collected at 516 nm (± 5 nm) with 8-fold gain. In each assay plate, a number of wells were dedicated to con- 3. RESULTS AND DISCUSSION trols and calibration including: 3.1. QSAR Models and Virtual Screening 1. Buffer only (background level = Fb) We first prepared a database of more than 750 compounds 2. Buffer + TO (should exhibit very low fluorescence, close to described as active in the TRAP assay (Supplementary Material Fb, as no DNA is present and emission of free TO is very S1). This dataset was divided by degree of activity (see Experimen- weak) tal section for details) and models were created using different 3. Buffer + ligand (should also exhibit very low fluorescence, families of molecular descriptors implemented in DRAGON. A close to Fb, unless ligand is naturally fluorescent) summary of the classification parameters and Cooper statistics of the models are shown in Supplementary Material S2. We selected a 4. Buffer + TO + ligand (should also be close to Fb unless ligand group of the best performing models for each activity cut-off ac- interacts with TO) cording to their statistical parameters. The criteria are shown Table The truly informative wells corresponded to samples containing 2. A prediction was considered positive if the level of activity of a nucleic acids. Pre-folded DNA target (0.5 M strand concentration compound was predicted correctly by multiple models. Among the for intramolecular G-quadruplexes and 1 M strand concentration selection criteria for a positive evaluation was that the candidate for duplexes) was mixed with 1 M TO. The dx12 duplex was must belong to the application domain for each set of models. In 4 Current Pharmaceutical Design, 2013 , Vol. 19, No. 00 Castillo-González et al.

Table 2. Parameters for the Selection of the Best Models in the Virtual Screening

Cut-off value 1 M 5 M 10 M

  0.75 0.62 0.62

MCC  0.55 0.55 0.60 Sensibility (%) > 78 75 75

Specificity (%) > 72 80 80

Accuracy (%) > 75 78 79

D2 > 1.40 2.50 2.50

2D Autocorrelations Constitutional D Edge adjancency índices Topological D Constitutional D Burden eigenvalues 2D Autocorrelations Topological D. Better models made up of 3D-Morse Edge adjancency indices 2D Autocorrelations descriptors: GETAWAY 3D-Morse Atom-centred fragments Functional group counts Functional group counts Descriptor´s Mix Atom-centred fragments Atom centred fragments

At least 4 positive predictions of a At least 4 positive predictions of a total At least 3 positive predictions Selection Criteria total of 7 models of 7 models of a total of 5 models this case, it was determined by the value of leverage [61]. A sum- evaluation (Fig. 1). Selection of these six compounds was driven by mary of the molecular descriptors, the coefficients, and the signifi- commercial availability and price. cation of the descriptors are showed in Supplementary Material S3. The best models were those that employed molecular descriptors 3.2. Compounds comprised between 0D-2D. The three-dimensional descriptors with The six selected compounds were obtained from Sigma Ald- good statistical parameters were GETAWAY and 3D-Morse. rich, and stock solutions (10 mM) were prepared in DMSO and To determine the cutoff criteria to apply for the selection of the kept at -20 °C. The chemical structures of these compounds are best models for each cutoff, we sought to ensure the highest possi- shown in (Fig. 1 ). All six compounds are used in management of ble enrichment of the set of selected compounds using models such psychiatric disorders and have acceptable safety profiles. Prochlor- as virtual screening tool by consensus. The models were required to is a antipsychotic employed in the treatment have a high predictivity (accuracy, sensibility, and specificity), a of nausea, vomiting, and vertigo; it is also indicated for the symp- balance of sensibility and specificity (MCC), and the values of D 2 tomatic management of psychotic disorders and short-term man- and lambda were required to be statistically significant. As ex- agement of non-psychotic anxiety in patients with generalized anxi- pected, some models had good performance in terms of predictivity ety disorder [62-64]. Chlorpromazine is a prototypical phenothiaz- but not in terms of balance and/or statistical significance. The cutoff ine antipsychotic drug used as an antiemetic and in the treatment of value also influenced the results of predictive models. Here, we non-tractable hiccup, for the treatment of schizophrenia, for control used the LDA because it is known that the results are heavily influ- of nausea and vomiting, for relief of agitation and nervousness be- enced by the distribution of the classes, which should ideally be 1:1. fore surgery, and for control of manic type of manic-depressive illness [62-64]. Promazine is another phenothiazine with actions Once the models and criteria were selected, we proceeded to similar to chlorpromazine but with less antipsychotic action. It is evaluate molecules from the Drug Bank database [62, 63]; this da- principally used for the short-term treatment of disturbed behavior, tabase contains information on FDA-approved drugs. The com- as an antiemetic, and for short-term treatment of moderate and se- pounds should be commercially available, have minimal toxicity, vere psychomotor agitation [62-64]. Loxapine is a dibenzoxazepine, and good bioavailability. The Drug Bank dataset used for selection a subclass of tricyclic antipsychotic agents. It is a sedative for contained 1411 compounds. The prediction results for different cut- which the mode of action is not well-known; it is thought to an- offs are shown below. tagonize dopamine and serotonin receptors. The principal use of A single compound, triamterene, a drug used as diuretic [62], this compound is for the treatment of psychotic disorders such as was predicted to be active at the cut-off value of 1 M. However, schizophrenia [62-64]. Amitriptyline , a tertiary amine tricyclic anti- this compound did not show any stabilizing activity in fluorescence depressant, is extremely sedating. Amitriptyline is used for the resonance energy transfer (FRET) analysis (data not shown). The treatment of depression, chronic pain, irritable bowel syndrome, tel compounds predicted to have IC 50  5 M were also predicted to sleep disorders, diabetic neuropathy, agitation, and insomnia and be active at 10 M. This supports the theoretical validity of the for migraine prophylaxis [62-64]. Imipramine is structurally similar predictions made by our models. Most compounds predicted to bind to and contains a tricyclic ring system with an alkyl to G-quadruplex at the two higher cut-off values have a planar aro- amine substituent on the central ring. Imipramine is employed for matic center that should favour hydrophobic interactions with gua- the relief of symptoms of depression, as a temporary adjunctive nines. Many also have flexible side chains. Side chains present on therapy in reducing enuresis in children, for management of panic known G-quadruplex binders contribute to electrostatic interactions disorders, for short-term control of acute depressive episodes in with the phosphates found in the grooves of the structure of DNA bipolar disorder and schizophrenia, and for symptomatic treatment [31]. Six of the predicted active compounds were obtained for of post herpetic neuralgia [62-64]. FDA-approved Drugs Selected Using Virtual Screening Bind Current Pharmaceutical Design, 2013 , Vol. 19, No. 00 5

Table 3. Prediction Results for Compounds from Drug Bank at more apparent in buffer that did not contain potassium ions (Fig. 1 Different Cut-off Values. 2A ). The largest shifts in T 1/2 were observed with prochloperazine (+5 °C at a concentration of 10 M drug) and chlorpromazine (+3.9 °C at a concentration of 10 M drug). Amitriptyline and promazine Drug 1 M 5 M 10 M    had T1/2 values of +2.4°C and +2.6 °C, respectively, at 10 M concentrations. The T1/2 values of 10 M loxapine and imipra- Triamterene + + + mine were lower at +1.8 and +1.7 ºC, respectively. inactive + + In buffer containing potassium, we observed a slight increase in melting temperature values (Fig. 2B ). Under these conditions Almitrine inactive + + prochloperazine had a T1/2 of +7.5 °C at a concentration of 20 M (a unique concentration where, for this compound, the stabilization inactive + + in presence of sodium was higher than that in presence of Promazine inactive + + potassium). Chloropromazine had a T1/2 value of + 6.1 °C. Amitriptyline, imipramine, and promazine increased quadruplex Chlorpromazine inactive + + stability by about 4.0 °C. The two most stabilizing ligands, pro- chlorperazine and chlorpromazine, are similar to the acridines, Chloroquine inactive + + compounds previously shown to stabilize G-quadruplex structures and inhibit telomerase enzyme activity [67-70]. However, the Chlorhexidine inactive + + changes in melting temperature induced by prochlorperazine and inactive + + chlorpromazine are well below those reported for the acridines [69, 70]. This is likely due to several factors: First, the presence of a Methotrimeprazine inactive + + sulfur atom in the ring of prochlorperazine and chlor- promazine may prevent an efficient stacking action; second, these Phenazopyridine inactive + + compounds each have only a single side chain. The most active Hydroxychloroquine inactive + + previously reported acridines and acridone have two or three flexible side chains [24, 31, 71, 72]. Higher T1/2 values are inactive + + generally obtained with compounds with longer chains [31]. The only monosubstituted acridine characterized, N-(4-(acridin-9- inactive + + ylamino)phenyl)-3-(pyrrolidin-1-yl)propanamide [27], has a T1/2 of + 5.5 °C at 1 M, significantly lower than trisubstituted ana- Loxapine inactive + +  logues. inactive + + A chlorine substitution may limit stacking on guanine; however, this cannot be a large factor as chlorpromazine has a Amodiaquine inactive + + higher T1/2 than promazine. The chlorine atom causes the electron Imiquimod inactive + + cloud on the bond between chlorine and carbon to be more dis- placed towards the chlorine atom due to the greater electrone- inactive + + gativity of this element. The positive charge is then distributed to other atoms in the chain (Supplementary Material S4). This effect Abacavir inactive + + may enhance the interaction of chlorinated compounds (pro- chloperazine and chlorpromazine) as compared to a non-chlorinated Primaquine inactive + + compound (promazine) with the G-quadruplex. Although stabili- Clenbuterol inactive + + zation by amitriptyline and imipramine was modest, it is interesting that the central rings in these compounds are not planar, because the Propericiazine inactive + + two benzenes rings are joined by a 7-membered aliphatic ring. The fact that both compounds are monosubstituted may explain the low Thioproperazine inactive + + values of stabilization. Derivatives of these compounds with well- chosen substituents might provide better stabilization of G- Amitriptiline inactive + + quadruplex structures. Imipramine inactive + + 3.4. Activity of Drugs in the Fluorescent Intercalator Displace- Folic acid inactive + + ment Assay Fluorescence enhancement of thiazole orange in presence of 1 The “+” indicates an active compound. each of the six ligands was monitored in order to determine which 3.3. Study of the Stabilizing Capacity of the Ligands in the could be used in the G-quadruplex Fluorescent Intercalator Dis- Human Telomeric Sequence placement (G4-FID) assay. The G4-FID assay is based on the dis- placement of the “on/off” fluorescence probe, TO, from quadruplex Our models predict that the compounds selected should have an indirect activity on telomerase via quadruplex stabilization. We or duplex DNA matrices by a ligand [59, 73-75]. TO is virtually non-fluorescent when free in solution but strongly fluorescent when evaluated each compound for the ability to selectively stabilize G- bound to DNA, thus the ligand-induced displacement leads to a quadruplex structures relative to duplex DNA. No experiments were performed directly on telomerase. Future work will address decrease of the fluorescence that is monitored as function of the ligand concentration. Therefore, the quadruplex-affinity of a candi- this point. FRET melting analysis can be used to determine the date compound can be evaluated through its ability to displace TO ligand-induced stabilization of a G-quadruplex structure by meas- urement of the ligand-induced shift in the apparent melting tem- from quadruplex DNA. Moreover, selectivity measurements can be achieved by comparing the ability of the ligand to displace TO from perature ( T ) [65, 66]. We evaluated melting of the oligo-  1/2 various quadruplex and duplex structures. The repertoire of the nucleotide F21T , which consists of 3.5 copies of the human te- lomeric guanine-rich strand, in the presence of various concentra- oligonucleotides sequences available for this experiments was ex- tended in a recent work [73]. tions (1-20 M) of the ligands under study. Stabilizing effects were 6 Current Pharmaceutical Design, 2013 , Vol. 19, No. 00 Castillo-González et al.

Fig. (1). Chemical structures of the compounds employed in this study.

Fig. (2). Stabilizing effect of prochloperazine, chlorpromazine, amitriptyline, loxapine, promazine, and imipramine on the human telomeric quadruplex + + structure. For these experiments a 10 mM lithium cacodylate buffer (pH 7.2) and 0.2 M F21T were used in ( A) Na conditions and ( B) K conditions. Each ligand was tested at four different concentrations (1, 5, 10, and 20 +M). FDA-approved Drugs Selected Using Virtual Screening Bind Current Pharmaceutical Design, 2013 , Vol. 19, No. 00 7

As shown in (Fig. 3), prochloperazine, chlorpromazine, and Table 4. Effect of ds26 Duplex Competitor on Prochlo- promazine displaced TO from structures formed by sequences perazine and Chlorpromazine Stabilization of c-myc found in c-kit2 , c-myc , and K-ras promoter regions. At 100 equiva- and K-ras G-quadruplexes lents of prochloperazine, displacement was higher than 40%. Inter- estingly, none of the tested drugs bound to duplex-forming se- quences as the TO displacement from duplexes was close to 0%. Tm(°C) Tm(°C) Chlorpromazine, promazine, and amitriptyline bound only to cer- prochloperazine chlorpromazine tain sequences at higher concentrations. Loxapine displaced TO + 1 only from the c-kit2 sequence at 50 M concentration. Imipramine Na conditions c-myc K-ras c-myc K-ras did not show significant TO displacement. No competitor 7.0 9.8 6.7 7.4 3.5. Stabilization of G-quadruplex-prone Sequences by Ligands 3 M ds26 5.7 7.3 5.8 5.4 In order to study quantitatively the binding capacity of these  ligands on different G-quadruplexes and to confirm the results 10 M ds26 5.6 9.0 7.0 6.4 obtained by FID, we performed FRET experiments. Only a single duplex sequences was used due to the lack of affinity for duplex by Tm these drugs in the FID assay. The stabilization of quadruplex Tm (°C) chlorpromazin structures adopted by different G-quadruplex sequences (taken from (°C) prochloperazine e four promoter regions and the human telomeric motif) are shown in + 2 (Fig. 4). At a ligand concentration of 20 DM, prochloperazine, K conditions c-myc K-ras c-myc K-ras chlorpromazine, and promazine increased the melting temperature of the c-myc -based sequence by 9.0, 6.8, and 5.9 °C, respectively. No competitor 4.4 9.1 5.6 7.1 Tm increases of more than 5 °C were observed for imipramine. 3 M ds26 6.0 9.1 4.5 3.6 Prochloperazine, chloropromazine, and promazine also stabilized the K-ras quadruplex with T m increases of 10.3, 7.0, and 10 M ds26 5.1 6.4 4.8 4.7

7.2 °C, respectively. Imipramine stabilized the K-ras structure less 1 than it did the c-myc mimic with a T of 4.4 °C for the K-ras G- Experiments were performed in 10 mM lithium cacodylate (pH 7.2), 100 mM  1/2 NaCl, 0.2 M F21T , 20 M ligand. quadruplex. None of the drugs stabilized the c-kit2 or c-kit1 G- 2 Experiments were performed in 10 mM lithium cacodylate (pH 7.2), 1 mM KCl, quadruplexes structures. We also tested the effect of prochlor- 99 mM LiCl, 0.2 M F21T , 20 M ligand. perazine and chlorpromazine on these quadruplexes in the presence of 3 and 10 M of a competitor duplex, ds26 . Results are summa-  results. In this paper we employed a QSAR study to identify rized in Table 4 . There were no significant changes in the value of compounds from a database of FDA-approved drugs with the T in the presence of duplex. These results, coupled with the low  1/2 potential to bind G-quadruplex DNA structures. A related approach affinities of these drugs for ds26 and dx12 duplexes in FID experi- was recently proposed by Chan et al . who used entirely different ments, suggests that these compounds are quadruplex-selective. calculation procedures for virtual screening of FDA-approved 4. CONCLUSIONS compounds [39]; it led to the identification of a different candidate, methylene blue. Six of the drugs identified in our screen were tested Commercial drugs are potential sources of compounds for new in a variety of experiments designed to evaluate their stabilizing therapeutic applications as data on bioavailability and toxicity are effect and selectivity for G-quadruplex structures. Each of the six available. Virtual screening of large databases of these compounds bound to G-quadruplexes, as demonstrated by FID and FRET is an inexpensive and fast methodology that can provide interesting melting studies, with various levels of affinity; none showed affinity for duplex DNA. Efforts will now be made to obtain higher

Fig. (3). FID assays using oligonucleotides indicated on the left. The first five sequences form quadruplexes and the last two form duplexes. The colour chart representation allows the visual comparison of the percentage of TO displacement obtained with different test molecules at three concentrations. 8 Current Pharmaceutical Design, 2013 , Vol. 19, No. 00 Castillo-González et al.

+ + Fig. (4). Stabilizing effect of tested drugs (20 M) on quadruplex- and duplex-forming sequences in ( A) Na conditions and ( B) K conditions. affinity ligands by derivatizing the scaffolds identified in this study. ACKNOWLEDGEMENTS Three of the six compounds identified were phenothiazine derivatives; it should be noted that Chan et al . [39] indirectly found We are grateful to the Flemish Interuniversity Council (Bel- a similar scaffold as an interesting starting point for G-quadruplex gium) for financial support to the project “Strengthening postgradu- recognition. They designed derivatives with two side chains that ate education and research in Pharmaceutical Sciences” where this interact strongly with G-quadruplexes. The other three molecules research is included. We are also grateful to the project “Montaje de we identified have not, to the best of our knowledge, been inves- un laboratorio de química computacional, con fines académicos y tigated as G-quadruplex ligands before. An intriguing possibility is científicos, para el diseño racional de nuevos candidatos a fármacos that some of biological effects induced by the identified compounds en enfermedades de alto impacto social (D/024153/09)”. INSERM could be related to their ability to bind G-quadruplex-forming U869 acknowledges support from INSERM, CNRS-PIR, INCa, regions of the genome. ANR "G4-Toolbox", Conseil régional d'Aquitaine, Association pour la Recherche sur le Cancer (ARC), and Fondation pour la CONFLICT OF INTEREST recherche Médicale (F.R.M.). D. Castillo is grateful to Professor The authors confirm that this article content has no conflicts of Dr. Mathews Froeyen (Rega Institute, KU Leuven) for his helpful interest. discussions regarding the results in this work. Special thanks to Prof. Dr. Federico Pallardo from Department of Physiology, Uni- FDA-approved Drugs Selected Using Virtual Screening Bind Current Pharmaceutical Design, 2013 , Vol. 19, No. 00 9 versity of Valencia, for allowing G. Perez-Machado to work in his quadruplex-mediated telomerase inhibitors. Proc Natl Acad Sci U S research group. A. 2001;98(9):4844-9. [25] Douarre C, Gomez D, Morjani H, Zahm JM, O'Donohue M F, REFERENCES Eddabra L, et al . Overexpression of Bcl-2 is associated with apoptotic resistance to the G-quadruplex ligand 12459 but is not [1] Hardin CC, Henderson E, Watson T, Prosser JK. Monovalent sufficient to confer resistance to long-term senescence. Nucleic cation induced structural transitions in telomeric DNAs: G-DNA Acids Res. 2005;33(7):2192-203. folding intermediates. Biochemistry. 1991;30(18):4460-72. [26] Kim MY, Vankayalapati H, Shin-Ya K, Wierzba K, Hurley LH. [2] Williamson JR. G-quartet structures in telomeric DNA. Annu Rev Telomestatin, a potent telomerase inhibitor that interacts quite Biophys Biomol Struct. 1994;23:703-30. specifically with the human telomeric intramolecular g-quadruplex. [3] Schultze P, Hud NV, Smith FW, Feigon J. The effect of sodium, J Am Chem Soc. 2002;124(10):2098-9. potassium and ammonium ions on the conformation of the dimeric [27] Moore MJ, Schultes CM, Cuesta J, Cuenca F, Gunaratnam M, quadruplex formed by the Oxytricha nova telomere repeat Tanious FA, et al . Trisubstituted acridines as G-quadruplex oligonucleotide d(G(4)T(4)G(4)). Nucleic Acids Res. 1999; 27(15): telomere targeting agents. Effects of extensions of the 3,6- and 9- 3018-28. side chains on quadruplex binding, telomerase activity, and cell [4] Vorlickova M, Bednarova K, Kejnovska I, Kypr J. Intramolecular proliferation. J Med Chem. 2006;49(2):582-99. and intermolecular guanine quadruplexes of DNA in aqueous salt [28] De Cian A, Guittat L, Shin-Ya K, Riou JF, Mergny JL. Affinity and ethanol solutions. Biopolymers. 2007;86(1):1-10. and selectivity of G4 ligands measured by FRET. Nucleic Acids [5] Xu Y, Sato H, Shinohara K, Komiyama M, Sugiyama H. T-loop Symp Ser (Oxf). 2005(49):235-6. formation by human telomeric G-quadruplex. Nucleic Acids Symp [29] Waller ZA, Shirude PS, Rodriguez R, Balasubramanian S. Ser (Oxf). 2007(51):243-4. Triarylpyridines: a versatile small molecule scaffold for G- [6] Amrane S, Ang RW, Tan ZM, Li C, Lim JK, Lim JM, et al . A quadruplex recognition. Chem Commun (Camb). 2008(12):1467-9. novel chair-type G-quadruplex formed by a Bombyx mori [30] Kerwin SM, Sun D, Kern JT, Rangan A, Thomas PW. G- telomeric sequence. Nucleic Acids Res. 2009;37(3):931-8. quadruplex DNA binding by a series of carbocyanine dyes. Bioorg [7] Tran PLT, Mergny JL, Alberti P. Stability of telomeric G- Med Chem Lett. 2001;11(18):2411-4. quadruplexes. Nucleic Acids Res. 2011;39(8):3282-94. [31] Neidle S, Read MA. G-quadruplexes as therapeutic targets. [8] Phan AT. Human telomeric G-quadruplex: structures of DNA and Biopolymers. 2001;56(3):195-208. RNA sequences. FEBS J. 2010;277(5):1107-17. [32] Drygin D, Siddiqui-Jain A, O'Brien S, Schwaebe M, Lin A, [9] Phan AT, Kuryavyi V, Darnell JC, Serganov A, Majumdar A, Ilin Bliesath J, et al . Anticancer activity of CX-3543: a direct inhibitor S, et al . Structure-function studies of FMRP RGG peptide of rRNA biogenesis. Cancer Res. 2009;69(19):7653-61. recognition of an RNA duplex-quadruplex junction. Nat Struct Mol [33] Chong CR, Sullivan DJ, Jr. New uses for old drugs. Nature. Biol. 2011;18(7):796-804. 2007;448(7154):645-6. [10] Patel PK, Hosur RV. NMR observation of T-tetrads in a parallel [34] DiMasi JA, Hansen RW, Grabowski HG. The price of innovation: stranded DNA quadruplex formed by Saccharomyces cerevisiae new estimates of drug development costs. J Health Econ. telomere repeats. Nucleic Acids Res. 1999;27(12):2457-64. 2003;22(2):151-85. [11] Yang D, Hurley LH. Structure of the biologically relevant G- [35] Cabrera-Pérez MA, Castillo-González D, Pérez-González M, quadruplex in the c-MYC promoter. Nucleosides Nucleotides Durán-Martínez A. Telomerase Inhibitory Activity of Acridinic Nucleic Acids. 2006;25(8):951-68. Derivatives: A 3D-QSAR Approach. QSAR & Combinatorial [12] Zhang Z, He X, Yuan G. Regulation of the equilibrium between G- Science. 2009;28(5):526-36. quadruplex and duplex DNA in promoter of human c-myc [36] Trotta R, De Tito S, Lauri I, La Pietra V, Marinelli L, Cosconati S, oncogene by a pyrene derivative. Int J Biol Macromol. 2011. et al . A more detailed picture of the interactions between virtual [13] Patel DJ, Phan AT, Kuryavyi V. Human telomere, oncogenic screening-derived hits and the DNA G-quadruplex: NMR, promoter and 5'-UTR G-quadruplexes: diverse higher order DNA molecular modelling and ITC studies. Biochimie. 2011;93(8):1280- and RNA targets for cancer therapeutics. Nucleic Acids Res. 2007. 7. [14] Cogoi S, Xodo LE. G-quadruplex formation within the promoter of [37] Chen SB, Tan JH, Ou TM, Huang SL, An LK, Luo HB, et al . the KRAS proto-oncogene and its effect on transcription. Nucleic Pharmacophore-based discovery of triaryl-substituted imidazole as Acids Res. 2006;34(9):2536-49. new telomeric G-quadruplex ligand. Bioorg Med Chem Lett. [15] Rankin S, Reszka AP, Huppert J, Zloh M, Parkinson GN, Todd 2011;21(3):1004-9. AK, et al . Putative DNA quadruplex formation within the human c- [38] Lee HM, Chan DS, Yang F, Lam HY, Yan SC, Che CM, et al . kit oncogene. J Am Chem Soc. 2005;127(30):10584-9. Identification of natural product fonsecin B as a stabilizing ligand [16] Ribeyre C, Lopes J, Boule JB, Piazza A, Guedin A, Zakian VA, et of c-myc G-quadruplex DNA by high-throughput virtual screening. al . The yeast Pif1 helicase prevents genomic instability caused by Chem Commun (Camb). 2010;46(26):4680-2. G-quadruplex-forming CEB1 sequences in vivo . PLoS Genet. [39] Chan DS, Yang H, Kwan MH, Cheng Z, Lee P, Bai LP, et al . 2009;5(5):e1000475. Structure-based optimization of FDA-approved drug methylene [17] Hurley LH, Von Hoff DD, Siddiqui-Jain A, Yang D. Drug blue as a c-myc G-quadruplex DNA stabilizer. Biochimie. targeting of the c-MYC promoter to repress gene expression via a 2011;93(6):1055-64. G-quadruplex silencer element. Semin Oncol. 2006;33(4):498-512. [40] Ma DL, Lai TS, Chan FY, Chung WH, Abagyan R, Leung YC, et [18] Siddiqui-Jain A, Grand CL, Bearss DJ, Hurley LH. Direct evidence al . Discovery of a drug-like G-quadruplex binding ligand by high- for a G-quadruplex in a promoter region and its targeting with a throughput docking. ChemMedChem. 2008;3(6):881-4. small molecule to repress c-MYC transcription. Proc Natl Acad Sci [41] Ma DL, Chan DS, Lee P, Kwan MH, Leung CH. Molecular U S A. 2002;99(18):11593-8. modeling of drug-DNA interactions: virtual screening to structure- [19] Brooks TA, Hurley LH. The role of supercoiling in transcriptional based design. Biochimie. 2011;93(8):1252-66. control of MYC and its importance in molecular therapeutics. Nat [42] Li Q, Xiang J, Li X, Chen L, Xu X, Tang Y, et al . Stabilizing Rev Cancer. 2009;9(12):849-61. parallel G-quadruplex DNA by a new class of ligands: two non- [20] Blackburn EH. Telomeres and telomerase: their mechanisms of planar alkaloids through interaction in lateral grooves. Biochimie. action and the effects of altering their functions. FEBS Lett. 2009;91(7):811-9. 2005;579(4):859-62. [43] Ma D-L, Ma VP-Y, Chan DS-H, Leung K-H, Zhong H-J, Leung C- [21] Dang CV. c-Myc target genes involved in cell growth, apoptosis, H. In silico screening of quadruplex-binding ligands. Methods. and metabolism. Mol Cell Biol. 1999;19(1):1-11. 2012; 57 (1):106-14. [22] Flores I, Evan G, Blasco MA. Genetic analysis of myc and [44] Arola A, Vilar R. Stabilisation of G-quadruplex DNA by small telomerase interactions in vivo . Mol Cell Biol. 2006;26(16):6130-8. molecules. Curr Top Med Chem. 2008;8(15):1405-15. [23] Sampedro Camarena F, Cano Serral G, Sampedro Santalo F. [45] Haider SM, Neidle S, Parkinson GN. A structural analysis of G- Telomerase and telomere dynamics in ageing and cancer: current quadruplex/ligand interactions. Biochimie. 2011;93(8):1239-51. status and future directions. Clin Transl Oncol. 2007;9(3):145-54. [46] Jiang YL, Liu ZP. Metallo-organic G-quadruplex ligands in [24] Read M, Harrison RJ, Romagnoli B, Tanious FA, Gowan SH, anticancer drug design. Mini Rev Med Chem. 2010;10(8):726-36. Reszka AP, et al . Structure-based design of selective and potent G 10 Current Pharmaceutical Design, 2013 , Vol. 19, No. 00 Castillo-González et al.

[47] De Cian A, Cristofari G, Reichenbach P, De Lemos E, Monchaud relationships. The report and recommendations of ECVAM D, Teulade-Fichou MP, et al . Reevaluation of telomerase inhibition Workshop 52. ATLA. 2005;33:155–73. by quadruplex ligands and their mechanisms of action. Proc Natl [62] Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Acad Sci U S A. 2007;104(44):17347-52. Stothard P, et al . DrugBank: a comprehensive resource for in silico [48] Mailliet P, Laoui A, Riou JF, Doerflinger G, Mergny JL, Hamy F, drug discovery and exploration. Nucleic Acids Res. et al ., inventors; Aventis Pharma S.A., assignee. Triazine 2006;34(Database issue):D668-72. derivatives and their applications as antitelomerase agents. France [63] Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, et al . patent US 6887873 B2. 2005 Maay 3-2005. DrugBank 3.0: a comprehensive resource for 'omics' research on [49] Mailliet P, Riou JF, Alasia M, T. C, Doerflinger G, Mergny JL, et drugs. Nucleic Acids Res. 2011;39(Database issue):D1035-41. al ., inventors; Aventis Pharma S.A., assignee. Chemical Dervatives [64] Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, et And their applications as antitelomerase agents. France patent US al . DrugBank: a knowledgebase for drugs, drug actions and drug 6858608 B2. 2005 Feb 22-2005. targets. Nucleic Acids Res. 2008;36(Database issue):D901-6. [50] StatSoft I. STATISTICA (data analysis software system). version [65] Darby RA, Sollogoub M, McKeen C, Brown L, Risitano A, Brown 8.0 ed2007. N, et al . High throughput measurement of duplex, triplex and [51] Cookson JC, Heald RA, Stevens MF. Antitumor polycyclic quadruplex melting curves using molecular beacons and a acridines. 17. Synthesis and pharmaceutical profiles of pentacyclic LightCycler. Nucleic Acids Res. 2002;30(9):e39. acridinium salts designed to destabilize telomeric integrity. J Med [66] De Cian A, Guittat L, Kaiser M, Sacca B, Amrane S, Bourdoncle Chem. 2005;48(23):7198-207. A, et al . Fluorescence-based melting assays for studying [52] Franceschin M, Lombardo CM, Pascucci E, D'Ambrosio D, quadruplex ligands. Methods. 2007;42(2):183-95. Micheli E, Bianco A, et al . The number and distances of positive [67] Debray J, Zeghida W, Jourdan M, Monchaud D, Dheu-Andries charges of polyamine side chains in a series of perylene diimides ML, Dumy P, et al . Synthesis and evaluation of fused bispyrimi- significantly influence their ability to induce G-quadruplex dinoacridines as novel pentacyclic analogues of quadruplex-binder structures and inhibit human telomerase. Bioorg Med Chem. BRACO-19. Org Biomol Chem. 2009;7(24):5219-28. 2008;16(5):2292-304. [68] Demeunynck M. Antitumour acridines. Expert Opin Ther Patents [53] Todeschini R, Consonni V, Pavan M. Dragon for windows 2004:55-70. Ashley Publications Ltd. (Software for molecular descriptors calculation). Version 5.4 ed: [69] Harrison RJ, Cuesta J, Chessari G, Read MA, Basra SK, Reszka www.talete.mi.it; 2006. AP, et al . Trisubstituted acridine derivatives as potent and selective [54] Dewar MJS, Zoebisch EG, Healy EF, Stewart JJP. AM1: A new telomerase inhibitors. J Med Chem. 2003;46(21):4463-76. general purpose quantum mechanical molecular model. J Am [70] Harrison RJ, Gowan SM, Kelland LR, Neidle S. Human telomerase Chem Soc. 1985;107:3902-9. inhibition by substituted acridine derivatives. Bioorg Med Chem [55] Frank J. MOPAC 6.0 ed: Seiler Research Laboratory. U.S. Air Lett. 1999;9(17):2463-8. Force Academy.; 1993. [71] Martins C, Gunaratnam M, Stuart J, Makwana V, Greciano O, [56] Cooper JA, Saracci R, Cole P. Describing the validity of Reszka AP, et al . Structure-based design of benzylamino-acridine carcinogen screening tests. Brit J Cancer 1979;39:87-9. compounds as G-quadruplex DNA telomere targeting agents. [57] Golbraikh A, Tropsha A. Beware of q2! J Mol Graph Model. Bioorg Med Chem Lett. 2007;17(8):2293-8. 2002;20(4):269-76. [72] Schultes CM, Guyen B, Cuesta J, Neidle S. Synthesis, biophysical [58] Cantor CR, Warshaw MM, Shapiro H. Oligonucleotide and biological evaluation of 3,6-bis-amidoacridines with extended interactions. 3. Circular dichroism studies of the conformation of 9-anilino substituents as potent G-quadruplex-binding telomerase deoxyoligonucleotides. Biopolymers. 1970;9(9):1059-77. inhibitors. Bioorg Med Chem Lett. 2004;14(16):4347-51. [59] Monchaud D, Allain C, Bertrand H, Smargiasso N, Rosu F, [73] Tran PLT, Largy E, Hamon F, Teulade-Fichou MP, Mergny JL. Gabelica V, et al . Ligands playing musical chairs with G- Fluorescence intercalator displacement assay for screening G4 quadruplex DNA: a rapid and simple displacement assay for identi- ligands towards a variety of G-quadruplex structures. Biochimie. fying selective G-quadruplex binders. Biochimie. 2008;90(8):1207- 2011;93(8):1288-96. 23. [74] Monchaud D, Allain C, Teulade-Fichou MP. Development of a [60] Mergny JL, Maurizot JC. Fluorescence resonance energy transfer fluorescent intercalator displacement assay (G4-FID) for as a probe for G-quartet formation by a telomeric repeat. establishing quadruplex-DNA affinity and selectivity of putative Chembiochem. 2001;2(2):124-32. ligands. Bioorg Med Chem Lett. 2006;16(18):4842-5. [61] Netzeva TI, Worth AP, Aldenberg T, Benigni R, Cronin MTD, [75] Monchaud D, Allain C, Teulade-Fichou MP. Thiazole orange: a Gramatica P, et al . Current status of methods for defining the useful probe for fluorescence sensing of G-quadruplex-ligand applicability domain of (quantitative) structure–activity interactions. Nucleosides Nucleotides Nucleic Acids. 2007;26(10- 12):1585-8.

Received: ?????? 2, 2012 Accepted: September 23, 2012 Chapter 5. .

5.3 Overview of evidence that identified compounds have potential as anticancer agents None of the commercial drugs identified in this work currently have applications as anti-cancer agents. All six compounds identified are used in management of psychiatric disorders and have acceptable safety profiles. Certain preclinical studies suggested that the long-term use of may result in the initiation and/or promotion of tumor in the gastrointestinal tract and other malignances (279-281). Another study noted that the observed reduction in cancer risk in those patients suffering from schizophrenia could result from anti-neoplastic effects of antipsychotic medications (282). This supports the hypothesis of the possible applicability of these compounds in antitumor therapy. Here we present some of the results that support our initial approach related to the discovery of compounds with anticancer activity through TRAP inhibition and stabilization of the guanine quadruplexes. Loxapine is a potent non-competitive inhibitor of P-glycoprotein (P-gp). It causes a 3.5-fold decrease in the concentration of doxorubicin necessary to inhibit 50% growth

(GI 50 ) of the K562Dox cells. P-gp is one of the best characterized transporters responsible for the multidrug resistance phenotype exhibited by cancer cells (283). The antitumor effects of imipramine on glioma cells have been investigated. Treatment of U-87MG cells with imipramine resulted in the inhibition of PI3K/Akt/mTOR signaling, reduction of clonogenicity, and induction of cell death. These results suggest that imipramine exerts antitumor effects on PTEN-null U-87MG human glioma cells by inhibiting PI3K/Akt/mTOR signaling and by inducing autophagic cell death (284). The effects of imipramine and amitriptyline on the proliferation of colorectal tumor cells were examined using human HT29 colon carcinoma cells. Both drugs reduce the cell viability in a manner dependent on the time of exposure. These compounds may be cytotoxic and likely induce the non-oxidative apoptotic death of human HT29 colon carcinoma cells through a non-mitochondrial pathway associated with the cell-cycle progression (279). Human IGR1 cells are a model for malignant melanoma. Voltage-gated K + channels have been detected in cancer cell lines of diverse origin and shown to influence their rate of proliferation. Imipramine blocks these channels. The incubation of IGR1 cells for 48 hr with 10-15 mM imipramine reduced DNA synthesis and metabolism without

75

Chapter 5. . significant effects on apoptosis (285, 286). These channels are present also in the cytoplasm and nuclei of SK-OV-3 cells. Imipramine significantly inhibits proliferation of SK-OV-3 cells; it does not affect the cell cycle but increases the proportion of SK- OV-3 cells undergoing early apoptosis (287). Amitriptyline at concentrations of 0.14 mM to 0.5 mM is an inhibitor of cellular respiration. Some studies have indicated that inhibition of cellular respiration is considered an indicator of apoptosis (288). Oxidative therapy is a relatively new anticancer strategy that is based on the induction of high levels of oxidative stress in cancer cells, achieved by increasing intracellular reactive oxygen species (ROS) and/or by depleting the protective antioxidant machinery of tumor cells. The antitumor potential of amitriptyline in three human tumor cell lines: H460 (lung cancer), HeLa (cervical cancer), and HepG2 (hepatoma) was evaluated. Amitriptyline induces high levels of ROS followed by irreversible serious mitochondrial damage (289). It was demonstrated that amitriptyline inhibited cyclin D2 transactivation and displayed potential anti-myeloma activity by inhibiting histone deacetylases (HDACs). Amitriptyline markedly decreased cyclin D2 promoter-driven luciferase activity, reduced cyclin D2 expression, and arrested cells at the G(0)/G(1) phase of the cell cycle. Amitriptyline increases acetylation of histone 3 and expression of p27 and p21. Other studies indicate that amitriptyline interferes with HDAC function by down-regulation of HDAC3, -6, -7, and -8, but not HDAC2, and by interacting with HDAC7 (290). The tricyclic antidepressants have previously been shown to inhibit growth of glioma cells in vitro (284). Eight primary cell cultures from metastatic melanoma deposits were exposed to amitriptyline at concentrations ranging from 200 to 6.25 µM corroborating the activity in primary cells (291). Chlorpromazine has activity against a variety of cancer types (292-295). The use of chlorpromazine in the prevention of brain cancers has been evaluated (296). Chlorpromazine in clinically relevant doses has antiproliferative activity and induces apoptosis in leukemic cells without any influence on the viability of normal lymphocytes. (297). The genotoxic, cytotoxic, and cytostatic potential of chlorpromazine, alone or in combination with mitomycin C, makes it a promising anti- tumor agent (294). Pentamidine and chlorpromazine treatment results in synergistic antitumor effects (298). Chlorpromazine also works synergistically with tamoxifen to reduce cell growth

76

Chapter 5. . and metabolic activity, both in the estrogen-sensitive breast cancer cell line, MCF-7, and in a tamoxifen-resistant cell line established from the MCF-7 cells (299). Chlorpromazine has effects on the viability of cell lines derived from lymphoblastoma, neuroblastoma, non-small cell lung cancer, and breast adenocarcinoma. Selective inhibition of the viability of cancer cell lines compared with normal cells was found (282). Treatment with chlorpromazine results in cell-cycle arrest at the G2/M phase in rat C6 glioma cells and reduces the expression of cell cycle-related proteins (300). Hepatocellular carcinoma is an aggressive tumor with a poor prognosis. The activity of chlorpromazine in the inhibition of orthotopic liver tumor growth is also proven (301). In addition, chlorpromazine increases the levels of endogenous retinoic acid within the tumor cells by blocking their metabolism. This process has been related with antitumor activity (302). There are no reports of anti-cancer activity for promazine or prochlorperazine. The literature available on the other compounds identified as G-quadruplex ligands reinforces the need for an experimental corroboration of the activity of these compounds. It is also necessary to look at the performance of pharmaco- epidemiological studies of prevalence and incidence of tumors in patients who have been exposed to regular doses of these compounds during long periods of time.

5.4 FDA-approved compounds with positive predictions in the virtual screening and negatives activities From the initial right predictions list, three other compounds were evaluated as G4 stabilizer, but were not included in the aforementioned paper. These ligands are triamterene (the compound with best scores in the computational prediction), clozapine and folic acid. Figure 5.3 shows the molecular structure of these three ligands.

77

Chapter 5. .

Figure 5.3. Molecular structures of triamterene, clozapine and folic acid.

The three compounds were evaluated in the same conditions as the paper mentioned above. They did not exhibit any significant G4 stabilizing effect in the 1-50 µM concentration range. Clozapine has a molecular structure very similar to loxapine, with the only difference being that loxapine has an oxygen and clozapine an amine group. The stabilizing activity of loxapine is the lowest of the six compounds reported in the paper. It is therefore unsurprising that clozapine was found inactive in the FRET melting assay. Triamterene and folic acid have the possibility to form Hydrogen bonds but it looks like this is not a sufficient condition to increase the ΔTm.

78

Chapter 6. .

Chapter 6. Curation of databases of ligands and use of non-linear techniques to develop reliable models. Virtual screening of commercial databases of compounds and biophysical evaluation.

Introduction This section describes a new modeling strategy to improve our prediction of G-quadruplex ligands. Primarily, we implemented a process of database curation based on the recommendations suggested by Tropsha in 2010 in his paper “Trust, But verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research ” (58). Usually database curation is a procedure that is not employed in QSAR modeling. However, details such as the inclusion of organometallic compounds in the database (most of the molecular descriptors do not describe correctly the coordinate bond) or the representation of different forms of the same functional group, like the nitro group, can lead to undesirable results in the modeling process. In addition to database curation, we employed non-linear techniques not used in the other studies reported herein. Although non- linear techniques do not allow activity to be directly related to a molecular descriptor (because the relation of proportionality is not linear), these are powerful tools from the predictive standpoint. Among non-linear techniques we used for modeling are the rules and trees of the classification, meta-classifiers, and Bayesian functions. After defining a consensus variant, we proceeded to virtual screening of commercially available databases of compounds. The best compounds were further screened in silico by docking following the recommendations proposed by Neidle and colleagues (303) to confirm ligands that were predicted to bind and stabilize G-quadruplex structures. Finally, the best compounds based on QSAR modeling and molecular docking were purchased and evaluated as G-quadruplex stabilizers. The use of curated databases of ligands and non-linear techniques produced more robust and reliable models as described in the present chapter.

The central premise of medicinal chemistry and computer-assisted drug design principles states that similar compounds should present similar activities (304). Such a principle is usually complied. However, in some cases, it is possible to find similar compounds with very different activity values, by a factor of 10 or more. Literature defines such ligands as activity

79

Chapter 6. . cliffs (58). The presence of these kinds of ligands gives room from the maximal and minimal activity appearance in the data, which contributes to the non-homogeneity distribution of it. These compounds, the cliffs, could be able taken in account in the new drugs design and optimization advanced steps, in order to perform a detailed analysis of the influence of certain functional groups and/or other factors that could contribute to increase the activity or avoid unwanted effects. But on the beginning of the modeling this kind of compounds could be very risky for deriving new models. The presence of activity extreme values contributes to the non-softness and non-homogeneity of the data; the most extended recommendation is the removal of these ligands (58). Many times the activity cliff presence is due to a functional group or a part of the molecule contributes enormously to the activity, either positive or negatively. Have other possibility, the extreme value of activity is due to an erroneous measurement; an error that can give it in the experimentation or data processing, steps that are not exempt of mistakes, and this is the case of an outlier. The recommendation is also to remove it. Studies suggest that the error rates in publications related with medicinal chemistry and either commercial database of compounds can be from 0.1 to 3.4% (305) depending on the database and also gets values higher to 8% in other publications (306). One of the more fascinating works published during the last years about this topic is “Trust, But verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling ” (58), where step by step are analyzed the factors that can go in detriment of a good computational modeling by the use of erroneous data. Some topics are tackled, including the kind of molecular descriptors to use, optimization techniques, structure codification ways, etc. The conclusion is that the database manual curation process in cheminformatics is absolutely necessary for getting better models and improving the quality in the predictions. In fact, the authors suggest that this step, the curation, will be included as a previous step to the other five steps enunciated by the OECD for the right development of a QSAR model. The five points defined by OECD are: (i) a defined endpoint, (ii) an unambiguous algorithm, (iii) a defined domain of applicability, (iv) appropriate measures of goodness-of-fit, robustness and predictivity (v) a mechanistic interpretation, if possible (307, 308).

80

Chapter 6. .

6.1 Materials and methods

6.1.1 Dataset All the modelling efforts conducted in this work were based on an initial large and structurally diverse dataset of 783 compounds reported as telomerase inhibitors via G- quadruplex stabilization which was compiled by Castillo and co-workers (32). The concentration reported in the respective primary sources to cause 50% inhibition of the enzyme (IC 50 ) as measured by the TRAP assay was the endpoint selected to encode the G- quadruplex-induced telomerase inhibition activity of the dataset compounds. This dataset covers relatively large and evenly distributed structural and activity spaces (see Table 6.1 for details) and therefore we considered it an appropriate learning space to obtain predictive classifiers. In this work those compounds with IC 50 values ≤ 1µM were assigned to the “active” class (Class_1), otherwise the compound was labelled as “inactive” (Class_0). The identification codes, SMILES and IC 50 values for the full set of 783 Telomerase inhibitors obtained from (32) are provided as supporting information (ESM 1). Although this dataset have been subject of previous modeling efforts we decided to conduct a careful data curation on it, previous to any modeling effort, as recommended by Tropsha(309). The organized protocols for data curation proposed by Fourches et al. (310) (essentially) constituted the guide for the data curation process applied in this work.

Table 6.1. Raw/curated data set distribution.

COMPOSITION ( Raw/Curated ) TRAP ASSAY - TELOMERASE IC (µM) ( Raw/Curated ) CODE a 50 N Class_1 Class_0 Min Max Mean Median Std. Dev. Ac 104/75 59/50 45/25 0.01/0.01 50.00/50.00 4.12/3.51 0.59/0.32 10.70/9.99 acp 37/29 24/23 13/6 0.03/0.03 80.00/80.00 3.61/4.02 0.74/0.38 13.12/14.84 Aq 188/94 10/9 178/85 0.08/0.08 339.00/310.00 66.70/52.41 23.90/19.60 89.47/80.47 bar 30/21 0/0 30/21 2.00/2.00 130.00/75.00 20.31/17.32 15.00/14.00 26.75/20.23 dAU 5/4 0/0 5/4 50.00/50.00 50.00/50.00 50.00/50.00 50.00/50.00 0.00/0.00 fen 6/2 6/2 0/0 0.01/0.01 0.18/0.03 0.05/0.02 0.03/0.02 0.06/0.01 Flo 2/2 0/0 2/2 28.00/28.00 58.00/58.00 43.00/43.00 43.00/43.00 21.21/21.21 Flu 29/21 0/0 29/21 8.00/9.00 50.00/50.00 25.51/25.58 20.00/20.00 14.54/13.94 fur 2/2 0/0 2/2 25.00/25.00 25.00/25.00 25.00/25.00 25.00/25.00 0.00/0.00 Isa 4/1 0/0 4/1 7.80/7.80 15.20/7.80 10.33/7.80 9.15/7.80 3.31/(---) neo 4/4 1/1 3/3 0.20/0.20 7.50/7.50 3.73/3.73 3.60/3.60 3.44/3.44 pip 25/19 5/5 20/14 0.03/0.03 50.00/50.00 19.39/18.63 17.00/17.00 17.34/17.99 Pir 6/6 4/4 2/2 0.22/0.22 21.00/21.00 7.05/7.05 0.38/0.38 10.43/10.43 por 17/7 1/0 16/7 0.03/6.00 39.00/30.00 18.20/20.00 17.00/25.00 11.48/8.60 pta 10/3 8/3 2/0 0.02/0.21 1.65/0.49 0.50/0.32 0.37/0.26 0.52/0.15 qui 25/19 12/12 13/7 0.16/0.16 138.00/138.00 8.69/9.17 1.10/0.55 27.29/31.33

81

Chapter 6. .

COMPOSITION ( Raw/Curated ) TRAP ASSAY - TELOMERASE IC (µM) ( Raw/Curated ) CODE a 50 N Class_1 Class_0 Min Max Mean Median Std. Dev. quo 4/0 2/0 2/0 0.06/(---) 5.70/(---) 2.21/(---) 1.54/(---) 2.53/(---) Tet 30/21 6/6 24/15 0.13/0.13 50.00/40.00 18.32/13.25 9.40/4.00 18.03/16.02 Tio 3/3 3/3 0/0 1.00/1.00 1.00/1.00 1.00/1.00 1.00/1.00 0.00/0.00 Tri 219/174 138/133 81/41 0.04/0.05 10.00/10.00 1.21/0.90 0.62/0.48 1.43/1.18 Tro 3/3 0/0 3/3 13.30/13.30 23.50/23.50 17.97/17.97 17.10/17.10 5.15/5.15 comp 5/0 5/0 0/0 0.12/(---) 0.76/(---) 0.36/(---) 0.20/(---) 0.29/(---) mis 25/19 11/11 14/8 0.05/0.05 37.30/37.30 3.39/3.55 0.93/0.78 8.25/9.19 TOTAL 783/529 295/262 488/267 0.01/0.01 339.00/310.00 21.87/14.99 2.30/1.10 52.27/40.28 a Code used to represent the different chemical classes that conform the dataset (ac: Acridines; acp: Pentaciclic acridines; aq: Antrhaquinones; bar: Barberine derivatives; dAU: Diarylureas; fen: Ethidium derivatives; flo: Fluoroquinophenoxazines; flu: Amidofluorenone derivatives; fur: Furan based cyclic oligopeptides; isa: Isaindigotone derivatives; neo: Neomycin-capped aromatic platforms; pip: Perylene derivatives; pir: Pyridine derivatives; por: Porphyrins; pta: Phthalocyanine; qui: Quindoline derivatives; quo: Benzoindoloquinolines; tet: Catecholic flavonoids; tio: Thiophene derivatives; tri: Triazine derivatives; tro: Triazoles; comp: Ni(II) and Pt(II) Complexes; mis: Miscellaneous).

6.1.2 Dataset Curation Previous to the application of any established data curation procedure, a preliminary inspection of duplicate structures was conducted based on the original SMILES codes provided in the raw dataset. The original SMILES were converted to the respective canonical form with OpenBabel(311). Next, the molecular formulas were obtained with the JCMolFormula function implemented in JChem for Excel(312) and used to analyze the distribution of the elements present in the dataset. This step was conducted to detect rare or underrepresented elements, organometallics, mixtures or inorganics. The chemical structure provided in the respective SMILES codes were subject to a standardization process. First, the corresponding SDF file was generated by using the JChem for Excel program(312). The molecular structure representation was then standardized by using the ChemAxon´s Standardizer(313). The parameters of the standardization process were set to obtain clean 3D molecular structure representations in SDF format with benzene in the aromatic form, explicit hydrogens, and a common representation (normalization) of specific chemotypes such as nitro or sulfoxide groups. The SDF file containing the standardized molecular representations generated by the ChemAxon´s Standardizer was used as input for the ChemAxon's Calculator ( cxcalc ). By using the majormicrospecies option, cxcalc produced a SDF file with protonated structures at pH = 7.2, the pH value corresponding to the conditions of further experimental assays. Since the process of conversion of the molecular structure to the corresponding protonated state altered the

82

Chapter 6. . previous standardization process (mainly aromaticity and explicit hydrogens), the standardization process was repeated over the protonated structures.

6.1.3 Detection of Activity Cliffs and Removal of Compounds Inducing SAR Discontinuity In addition to the standard data curation procedures applied until now, we propose to include the detection of pairs of compounds that form activity cliffs and the removal of those compounds with a significant influence over SAR discontinuity. The specific procedure applied to identify activity cliff pairs and the corresponding elimination of the most influencing compounds is described below. A vector of 306 DRAGON molecular descriptors (including 96 Burden Eigenvalues, 22 Constitutional Indices, 32 Ring Descriptors, 71 Functional Group Counts, 73 Atom-centred Fragments, and 12 Molecular Properties) was used as reference space for the structural proximity assessment of the curated set of 761 compounds. The Euclidean distance (ED) was used as structural proximity measure. The Euclidean distance matrix for the curated set of 761 compounds was obtained by using the Generalized k-means Cluster Analysis as implemented in STATISTICA 8.0 (314). In order to quantify the degree of structural proximity in a more intuitive way, the ED value between each pair of compounds ij (ED ij ) max was normalized or range scaled to [0, 1] by dividing it by the maximum ED ij value ( ED ij ), Norm obtaining an structural proximity matrix ( S) based on normalized values ( ED ij ).

(Eq 6.1) ே௢௥௠ ௜௝ ௜௝ ܧܦ ௠௔௫ ܧܦ ൌ ൘ ௜௝ At the same time, the absolute differenceܧܦ of potency between each pair of compounds ij

(ΔPot ij ) was computed to obtain the corresponding activity matrix ( A). (Eq 6.2)

where Pot i andοܲ݋ݐ Pot ௜௝i areൌ หܲ݋ݐthe potencies௜ െ ܲ݋ݐ ௝ห of the ith and the jth compounds according to the decadic logarithm of the IC 50 values expressed in nM units. Finally, each element in A is divided by the corresponding element in S to obtain the corresponding SALI matrix whose elements are the Structure-Activity Landscape Index

(SALI) (315) for each pair of compounds ij (SALI ij ).

83

Chapter 6. .

(Eq 6.3) ௜௝ ௜௝ οܲ݋ݐ ே௢௥௠ ܵܣܮܫ ൌ ൘ ௜௝ SALI assigns each pair of compoundsܧܦ in a set of compounds a score that combines their pairwise structural proximity and the difference between their potencies. Therefore, the corresponding elements in the SALI matrix can be used to identify and quantify pair of

compounds forming potential activity cliffs. For this, an arbitrary SALI ij value to be used as cutoff needs to be applied. The goal of the cutoff is to focus on increasingly significant activity cliffs. Thus, as the cutoff is increased, fewer pairs of compounds remain, and as the cutoff tends toward the maximum SALI value, only the most significant activity cliffs remain (315). The cutoff was selected based on the analysis of the pairs of compounds corresponding

to each possible SALI ij value (see Figure 6.1). The analysis was directed to identify a SALI ij

cutoff value at which the variation in the number of pairs of compounds with SALI ij values equal or over the applied cutoff stops being significant. From Figure 6.1 it is apparent that a

SALI ij value = 100 is an appropriate choice.

Figure 6.1 Analysis of the pairs of compounds corresponding to each possible SALI ij value.

Pairs of potential activity cliffs identified by the SALI approach were screened and refined by the application of a discrete activity cliff definition adapted to deal with a classification problem. As activity cliffs we considered only those pairs of compounds having a value of

84

Chapter 6. .

Norm ED ij ≤ 0.10 (structural similarity ≥ 90%) and a value of ΔPot ij ≥ 1 (potency difference ≥ one order of magnitude); except for those pairs involving compounds of different classes which were considered an activity cliff independently of its values of ΔPot ij . This last was done to reduce the noise induced by structurally similar compounds assigned to different classes (class overlapping). Finally, once identified those compounds pairs forming activity cliffs we proceed to remove those with more influence on the SAR discontinuity on the curated dataset. We tried to minimize the number of compounds removed, especially of Class_1. For this, the compounds involved in the cliff pairs identified (according to the discrete definition) were decreasingly sorted by the number of cliff pairs in which the respective compound was involved. Then, the compounds were sequentially removed from the cliff pairs list according to the above mentioned ranking criterion until only those involved in only one cliff pair remained in which case the Class_0 or the less potent partner was removed.

6.1.4 Classes Balancing Although the imbalance towards inactive cases in our dataset is not extreme, the size and composition of the data set allows conducting an under-sampling procedure of inactive cases. Rather than simply applying a random procedure, the under sampling approach followed in this work was directed to keep as much as possible the structural and activity information encoded on the initial full set of 422 Class_0 compounds. In order to preserve compounds with unique SAR profiles, a set of nineteen compounds of underrepresented chemical families or with extreme IC 50 values within each chemical family was except of the analysis and preserved to conform the balanced subset of Class_0 compounds to be used finally for machine learning modeling. After this, we split up the remaining 403 Class_0 compounds into four subsets of similar size according to the respective IC 50 values: 99 compounds with

IC 50 < 3 µM; 103 compounds with 3 µM ≤ IC 50 < 10 µM; 107 compounds with 10 µM ≤ IC 50

< 40 µM; and 94 compounds with IC 50 ≥ 40 µM. Each activity subset was subject to a Generalized k-means Cluster Analysis (316), as implemented on the Data Mining module of the software package STATISTICA 8.0(314). The vector of 306 DRAGON molecular descriptors used on the activity cliffs identification stage was used as structural reference space. The Euclidean distance (ED) was used as structural proximity measure and the optimal number of cluster was determined through the 5-fold cross-validation procedure implemented on the module.

85

Chapter 6. .

The compounds that conforms each cluster on each activity subset were increasingly sorted by the respective computed ED to the cluster´s centroid. Compounds with minimum and maximum ED values on each cluster were directly assigned to the final balanced subset of Class_0 compounds. The rest of compounds in the cluster were analyzed in terms of its ED values to the cluster´s centroid in order to remove the excess of Class_0 compounds, proportionally to the size of the cluster. By doing this we evenly removed those compounds with a structurally similar cluster´s member (the cluster´s member with the most similar ED value to the cluster´s centroid) ensuring that while a compound is removed from the cluster, a structurally similar partner remains. In this way, the loss of SAR information induced by the under-sampling procedure is minimum, obtaining a final under-sampled set of 267 Class_0 compounds which provides the required class balance with respect to the remaining set of 262 Class_1 compounds. A total of 155 Class_0 compounds were removed to balance the classes. Instead of totally excluding this subset, these compounds were reserved as an additional external evaluation set.

6.1.5 Training / Evaluation Data Splitting The curated, standardized and balanced dataset of 529 telomerase inhibitors was split up into three subsets: training, test and external evaluation sets, as part of the model validation scheme (309). First, 105 compounds (≈20%) were selected randomly as an “External Evaluation Set” by using the Create Subset / Random Sampling option implemented on the software package STATISTICA 8.0 (314). This procedure was applied to each class separately. So, our external evaluation set includes 52/53 Class_1/Class_0 compounds. The goal of this external evaluation set is to reproduce in the best possible way a real life situation were any subset of compounds can be provided for evaluation by using the predictive model derived. Thus, the performance of the prediction model on this subset will be the most important indicator of their predictive or generalization ability. The remaining set of 424 telomerase inhibitors was divided into training and test sets by the application of a Generalized k-means Cluster Analysis (316), as implemented on the Data Mining module of STATISTICA 8.0 (314). The vector of 306 DRAGON molecular descriptors used on the activity cliffs identification and classes balancing stages was used as structural reference space. A generalized k-means cluster analysis was independently applied to the members of each class (Class_1: 210; Class_0: 214). The Euclidean distance was used as structural proximity measure and the optimal number of cluster was determined through

86

Chapter 6. . the 5-fold cross-validation procedure implemented on the module. Approximately 20% of the respective Class_1/0 compounds are reserved for the test set in such a way that each cluster is represented on both training and test subsets. Therefore, 339 (168/171 Class_1/Class_0) out of 424 compounds were used for training while the remaining 85 (42/43 Class_1/Class_0) were reserved for the test set and never used for training. This procedure ensures that both training and test subsets are uniformly populated from the molecular structure point of view, and that each structure pattern on the test subset is represented on the training subset. The goal here is to guarantee that predictions of new compounds based on models derived from such a training subset will be based on interpolations, avoiding the lack of reliability associated to extrapolations (309, 317). Distribution of training, test and external evaluation cases can be accessed on the Supplementary Material ( ESM 2) . A small set of nineteen G-quadruplex-based telomerase inhibitors (3 and 16 Class_1 and

Class_0 compounds, respectively) collected from studies (reporting the respective IC 50 values as measured by the TRAP assay) published after the classifiers were obtained was also used as a “Real External Evaluation Set”. It is important to recall that the set of 77 compoun ds (19 + 58 Class_1/Class_0 compounds) removed due to their cliff nature (“External Cliff Set”) and the 155 Class_0 compounds removed in the classes balancing process (“External Negative Set”) were also used to evaluate the generalization ability of the ma chine learning classifiers generated.

6.1.6 Structure Codification Four blocks of DRAGON molecular descriptors were computed for each molecule in the dataset. A first block comprising constitutional, physicochemical and topological information (denoted as 0-2D block); a second block encoding conformational or tri-dimensional information (3D); a third block including the 45 P_VSA-like descriptors (P_VSA); and a fourth block comprising all the structural information available in DRAGON 6.0 (FULL) except the charge descriptors. These first four blocks were saved without considering pair correlations between descriptors. So, another four blocks were saved excluding those descriptors with a pair correlation over 0.7 and identified as 0-2D_70, 3D_70, P_VSA_70, and FULL_70, respectively. In this way we favor the development of diverse classifiers based on different structural information and/or influenced or not by the effect of the degree of multi-colinearity present on the molecular descriptor matrix. See details on Table 6.2.

87

Chapter 6. .

Table 6.2 Molecular descriptors blocks and composition of them used in this work.

MDs Block a N MDs Families b Constitutional Indices; Ring Descriptors; Topological Indices; 0-2D 2076 Walk and Path Counts; Connectivity Indices; Burden Eigenvalues; ETA Indices; Edge Adjacency Indices; Functional Group Counts; Atom -Centred Fragments; Atom-type E-state Indices; CATS 2D; 0-2D_70 284 2D Atom Pairs; Molecular Properties. 3D 1069 Geometrical Descriptors; 3D Matrix-based Descriptors; 3D Autocorrelations; RDF Descriptors; 3D -MoRSE Descriptors; 3D_70 93 WHIM Descriptors; Randic Molecular Profiles; 3D Atom Pairs. P_VSA 45 P_VSA -like Descriptors P_VSA_70 15 FULL 3157 Molecular Descriptors included in Blocks 0 -2D; 3D, and P-VSA. FULL_70 367 N is the number of molecular descriptors used for each block. a the means of each MD block is detailed in the previous paragraph. b Comprehensive information about the molecular descriptors families, their calculation and definition can be consulted in the Dragon 6 user's manual (http://www.talete.mi.it/help/dragon_help/), the Handbook of Molecular Descriptors (52) or the help of the DRAGON 6.0 (186).

6.1.7 Molecular docking A computer model to study the stacking of G-quadruplex DNA was built. The X-ray crystal structure of the bimolecular G-quadruplex DNA was taken from the Protein Data Bank and used as the initial model. In this preliminary stage our choice fell on the structure 3CE5 co- crystallized with BRACO19 (231). The 3CE5 structure is nicely solved at 2.5Å and the visible binding mode is that parallel-stranded. Also the topology of 3CE5 is similar to that observed in the native uni- and bimolecular crystal structure of human telomeric sequence

(318) and also to that complexed with TMPyP4 (319). DNA structure was prepared using the Protein Preparation Wizard tool in Maestro (Schrodinger Suite). Missing hydrogen atoms were added to the structure followed by local minimization. 3D ligand structures were built with LigPrep program accessible from Maestro GUI, retaining chirality specified in SMILES codes. Epik program was used to generate the most probable ionized and tautomerized structures within a pH range of 7.0 ± 1.0. Molecular docking simulations were carried out using GOLD5.1, choosing GoldScore as fitness function. GOLD was set to generate 10 docking poses per molecule in a sphere of 15Å radius centered on the centroid atom of co-crystallized ligand.

88

Chapter 6. .

6.1.8 Oligonucleotides. Oligodeoxynucleotide probes were synthesized by Eurogentec (Belgium). For F21RT oligoribonucleotide probe were synthesized by IBA (Germany). All concentrations are expressed in strand molarity and were determined using a nearest-neighbour approximation for the absorption coefficients of the unfolded species (320). F21T is a doubly labeled 21- nucleotide oligodeoxynucleotide that mimics 3.5 copies of the human telomeric guanine-rich strand; it was modified with 6-carboxyfluorescein (FAM) at the 5´ end and tetramethylrhodamine (TAMRA) at the 3´ end. The sequence of F21T is FAM -G3(T 2AG 3)3- Tamra .

6.1.9 FRET melting studies Fluorescence can be used to probe the secondary structure of oligodeoxynucleotides mimicking repeats in the guanine-rich strand of vertebrate telomeres when a FAM (fluorescent tag) and a TAMRA (quencher) are attached to the 5´ and 3´ ends of the oligonucleotide, respectively (321). In the experiments presented here, a real-time PCR apparatus (MX3005P, Stratagene) was used to simultaneously record fluorescence of 96 samples. Initial experiments were performed at concentrations of 1, 5, 10 and 20 µM ligand in the presence of the telomere mimic F21T . The ‘‘K +’’ conditions corresponded to 10 mM lithium cacodylate (pH 7.2), 10 mM potassium chloride, 90 mM lithium chloride; ‘‘Na +’’ conditions were 10 mM lithium cacodylate (pH 7.2), 100 mM sodium chloride. LiCl was added in order to approach physiological ionic strength without stabilizing the quadruplex (lithium does not interact with G-quartets). Experiments were also performed to test the effect of ligands on other oligonucleotides that form G-quadruplex structures: FKit1T : FAM -GGGAGGGCGCTGGGAGGAGGG-Tamra FKit2T : FAM -GGGCGGGCGCGAGGGAGGGG-Tamra FmycT : FAM -TTGAGGGTGGGTAGGGTGGGTAA-Tamra F21RT : FAM -r-GGGUUAGGGUUAGGGUUAGGG –Tamra (RNA strand) FdxT : FAM -TATAGCTATA-hexa ethylene glycol-TATAGCTATA-Tamra .

For oligonucleotides differ ent of F21T the ‘‘K +’’ conditions corresponded to 10 mM lithium cacodylate (pH 7.2), 1 mM potassium chloride, 99 mM lithium chloride. The melting of G-quadruplex was monitored at concentrations of 1 and 20 µM (oligonucleotide strand

89

Chapter 6. . final concentration was 0.2 µM) in a volume of 25 µL in the presence and in the absence of the ligand by measuring the fluorescence of fluorescein. The oligonucleotides with the highest ∆T 1/2 values were tested alone and in the presence of a DNA double-stranded competitor (the self-complementary ds26 sequence) at concentrations of 3 and 10 µM. In the absence of a quadruplex-interacting ligand, this competitor had no effect on the melting temperature of the quadruplex (data not shown). All experiments were performed at least in duplicate. Emission of fluorescein was normalized between 0 and 1, and the T 1/2 was defined as the temperature for which the normalized emission is 0.5.

6.1.10 UV-Vis spectroscopy studies UV Absorbance spectroscopy can be used for study molecular structures of DNA and its complexes. UV techniques are very useful because they are quick and inexpensive (322). In our studies a spectrophotometer Uvikon XS was used. The experiments were performed to a concentration of 10 µM of the M1 or L13, along and in the presence of the unlabeled sequence 21 G (GGGTTAGGGTTAGGGTTAGGG). The spectra are recorded in the presence of potassium. The stock solution of 21G was prepared at a concentration of 500 µM in 10 mM lithium cacodylate (pH 7.2), 100 mM potassium chloride. The sample was heated to 90°C for 5 minutes and allowed to cool down slowly over 2 hours. The ‘‘K +’’ conditions corresponded to 10 mM lithium cacodylate (pH 7.2), 100 mM potassium chloride. The spectra were recorded between 220 and 700 nm at a temperature of 25 °C. The spectra of the ligands were recorded first, and 21G was added to reach a concentration of 10 µM. Subsequent additions of 21G were performed, and absorbance spectra were recorded after each addition.

6.1.11 Fluorescence direct detection. Fluorescence spectroscopy is very useful for studding the interactions between nucleic acids and ligands. Many DNA-binding ligands undergo a change in fluorescent emission quantum yield and / or a shift in emission maximum wavelength. In that case, fluorescence titration may be used to determine ligand binding (323). Experiments were performed in a Spectrofluorometer FluoroMax-4. Unlabelled oligonucleotides at 25µM were prepared in 10 mM lithium cacodylate (pH 7.2), 100 mM potassium chloride. They were heated at 90 °C for 5 minutes and after the temperature decreased slow and uniformly during 2 hours. The sequences used in this study are below.

90

Chapter 6. .

Fluorescence measurements were done for an excitation wavelength of 466 nm and the emission spectra were recorded between 560 and 800 nm, at 20°C. Final experiments were performed to a concentration of 1µM of oligonucleotide and 100nM of ligand. Before the realization each experiments is waited 5 minutes after the cuvette is inside the Spectrofluorometer due to is necessary to keep temperature equilibrium for the kinetic effect. 22Ag : AGGGTTAGGGTTAGGGTTAGGG Kit1 : GGGAGGGCGCTGGGAGGAGGG Kit2 : GGGCGGGCGCGAGGGAGGGG Myc : TTGAGGGTGGGTAGGGTGGGTAA 22AgR : AGGGUUAGGGUUAGGGUUAGGG dx12 : d-GCGTGAGTTCGG and d-CCGAACTCACGC ds 26 : d-CAATCGGATCGAATTCGATCCGATTG

6.2 Results and discussion

6.2.1 Dataset Curation. It is well known that QSAR models based on inaccurate structural or biological data will produce statistically insignificant and unreliable predictions. In this context, Young and coworkers (324) and more recently Tropsha and coworkers (310) clearly pointed out the importance of chemical data curation. Following these recommendations we proceed to explain the procedure carried out by us in this chapter. First, a preliminary inspection of duplicate structure allowed to detect two pairs of duplicated compounds ( acp-009 (0.38 µM) and acp-038 (0.41 µM); tri-067 (0.50 µM) and tri-147 (0.30 µM)) details in ESM2 . From the first pair just acp-038 was removed since the

IC 50 values were very similar (a difference of just 0.03 µM) while both compounds in the second pair were removed due to a higher discrepancy in IC 50 (0.2 µM). Next, the distribution of the elements present in the remaining 780 compounds was analyzed. The bar graph in Figure 6.2 shows the percentage of molecules in the remaining datatset containing the respective element. As a result, four molecules containing rare elements (Se, In and Pt) were detected, in addition to fourteen organometallics. It was also possible to confirm that the dataset was free of mixtures or inorganics.

91

Chapter 6. .

100 90 80 70 60 50 40 30 20 10 % Molecules with Element XElement with Molecules% 0 C H N O S Cl F Br NiZnCuMnPtSe In

Figure 6.2. Distribution in percentage of the element composition.

The removal of organometallics and molecules containing rare elements are justified by the fact that not all molecular descriptors calculation software’s handle such compounds. Then, we decided to remove these compounds mainly due to a significant underrepresentation of its specific elements in the dataset (structural outliers), which represents an important source of noise with a negative influence on a further model learning process. Compound por-005, the only compound in the dataset containing a bromine atom in its structure, was removed too for identical reasons. Details on the compounds removed and the reasons considered to do so are provided, in the electronic supporting information (ESM2 ). The remaining subset of 761 compounds constituted by just C, H, N, O, S, Cl and F atoms were subject to a standardization process of the chemical structure provided in the respective SMILES codes, as depicted in the Material and Methods section . As a result, an SDF file was obtained comprising a standardized 3D representation of the 761 compounds in the protonation state corresponding to a pH = 7.2, which was used for further data analysis and modeling.

6.2.2 Detection of Activity Cliffs and Removal of Compounds Inducing SAR Discontinuity We first identified 220 pairs of compounds (221 unique compounds) forming potential activity cliffs by applying the procedure of activity cliffs detection previously described .

92

Chapter 6. .

It is important to keep in mind that by SALI scoring, the magnitude of cliffs is not actually determined; they are only compared on a relative scale. Consequently, a disadvantage of this approach is that cliffs detected at a certain cutoff might essentially be irrelevant (pseudo- cliffs) because they are only of small and chemically insignificant magnitude (potency difference) (325). Subsequently, Stumpfe and Bajorath highlight the necessity of discrete definitions of activity cliffs including the applied similarity criterion, the potency measure and potency difference between cliff partners, and the potency range (interval) that is considered relevant for cliff formation. These experts recommend considering a pair of compounds as an activity cliff only if one cliff partner has potency in the nanomolar range and if there is an at least 100-fold difference in potency between two partners. Yet, they advise that depending on a specific application, these criteria might of course be modified. So, by applying the discrete definition of activity cliffs described in this chapter in the Material and Methods section, the 220 pairs of potential activity cliffs identified by the SALI approach were reduced to only 142 (164 unique compounds) with a maximum value of Norm ED ij = 0.06 and a minimum value of ΔPot ij = 0.21. Once identified those compounds pairs forming activity cliffs we proceed to remove those with more influence on the SAR discontinuity on the curated dataset of 761 telomerase inhibitors. The application of such a procedure produced a list of 77 unique compounds to be removed due to a significant influence on SAR discontinuity. This subset was used as an external evaluation set specifically intended to evaluate the ability of models derived to correctly predict this challenging type of compounds. For example, the most influencing compound identified in the dataset was ac-029 which forms cliffs pairs with other eight compounds. As can be noted in Table 6.3, all cliffs partners have a potency value in the class’s borderline and ac-029 is the only belonging to Class_0. At the same time the potency difference approach or exceeds the order of magnitude for all pairs. This situation induces a significant discontinuity to a pre-assumed continue SAR. So, the removal of ac-029 clearly contributes to alleviate this effect by eliminating the only representative with a potency value determining a different class from a set of compounds with a common chemotype and class. On the other hand, the less significant activity cliffs pairs covered by the discrete definition applied are those with the minimum/maximum difference in potency/structure. The concrete exemplars are illustrated in Table 6.4 where it can be noted that even in the worst situation the structural similarity is apparent and in the case where the potency difference is minimal, the cliff partners belongs to different classes.

93

Chapter 6. .

Table 6.3. Structure and activity data of molecules forming activity cliff pairs with the most influencing compound detected.

Norm Norm ID IC 50 ; µM Class SALI ij ΔPot ij ED ij ID IC 50 ; µM Class SALI ij ΔPot ij ED ij ac-029 1.09 0 REF REF REF ac-023 0.11 1 344.8 1.00 0.0029

ac-005 0.02 1 2120.5 1.74 0.0008 ac-028 0.17 1 967.5 0.81 0.0008

ac-004 0.06 1 1180.3 1.26 0.0011 ac-031 0.20 1 760.5 0.74 0.0010

ac-001 0.07 1 647.1 1.19 0.0018 ac-022 0.21 1 234.3 0.72 0.0031

ac-019 0.08 1 326.3 1.13 0.0035

94

Chapter 6. .

Table 6.4. Structure and activity data of the less significant activity cliffs pairs identified . IC ; IC ; ID 50 Class SALI ΔPot Norm ED ID 50 Class SALI ΔPot Norm ED µM ij ij ij µM ij ij ij Cliff Pair of Minimum ΔPot ij Value tri- tri- 1.5 0 REF REF REF 0.93 1 237.4 0.21 0.0009 207 216

Norm Cliff Pair of Maximum ED ij Value aq- aq- 238 0 REF REF REF 4.4 0 101.3 1.15 0.0621 062 035

6.2.3 Classes Balancing Finding a significantly smaller number of active molecules compared to inactive ones is the common trend in chemoinformatics datasets, especially in drug discovery tasks. When machine learning classifiers trained under these conditions are applied to similarly distributed validation sets, the number of molecules correctly assigned to the respective class tends to be artificially high leading to an overestimation of the generalization ability of the classifier, being the classifier probably biased to the inactive molecules(309, 326). This situation comes from the well-known inability of most standard classifier learning algorithms to handle data with imbalanced class distribution since they assume a relatively balanced class distribution and equal misclassification costs (327) On the other hand, studies conducted on the machine learning area evidence that the sensitivity of standard machine learning algorithms increases with the complexity of the classification problem (as the classes become less linearly

95

Chapter 6. .

separable) (328, 329), which is also the common scenario in chemoinformatics applications, like class imbalance. So, in order to favor the learning process and ensure a proper generalization ability of machine learning classifiers it is required to provide the dataset with the presumed class balance. Once applied the classes balancing procedure described in Material and Methods section remained 267 Class_0 compounds which provides the required class balance with respect to the remaining set of 262 Class_1 compounds. Finally, after preprocessing the initial raw data, it is possible to assert that the final result is a curated, standardized and balanced dataset optimized to train predictive and reliable machine learning classifiers. From Table 1 can be confirmed that the structural and activity information encoded on the initial (raw) full set was keeped as much as possible on the final (curated) dataset. Additionally, we could confirm that the curated dataset is free of duplicates by using the “ Find duplicate structures ” option of the EdiSDF program (330) included on the ISIDA project (331, 332). A schematic representation of the data curation process applied in this work is shown in Figure 6.3.

96

Chapter 6. .

Figure 6.3. Schematic representation of the data curation process.

6.2.4 Models Building and Consensus Classifier More than 300 classifiers based on DRAGON descriptors and WEKA machine learning classifications algorithms were fitted and evaluated on training (including 10-fold cross validation), test, and all external evaluation sets, as described in Material and Methods section. Only 81 classifiers with values of accuracy, sensitivity and specificity ≥ 0.6 on training (including 10-fold cross validation), test, and all external evaluation sets were considered as predictive (57) and reliable and so, further advanced to the stage of deriving a

97

Chapter 6. .

consensus classifier. The identification of each model, as well as the classification for each external sets of compounds are showed in the (ESM3 ) file. The selection of the classifiers finally employed for consensus prediction was based on the combined analysis of the diversity between the 81 high performing classifiers and its respective classification performance. First, we evaluated the pairwise normalized Euclidean distance between the 81 high performing classifiers using as reference space the outputs (predicted class) vector of compounds in the external evaluation set. The result is a

classifier´s prediction dissimilarity matrix ( ED ) composed by elements ED ij . Accordingly, the pairwise mean classification accuracy was also evaluated as:

(Eq 6.4) ஺௖௖Ǥ೔ା஺௖௖Ǥೕ ௜௝ where Acc. i and Acc.ܣܿܿǤj areൌ the accuracyଶ of the classifiers i and j on the external evaluation

set, and Acc. ij are the elements of the mean classification accuracy matrix (Acc. ) corresponding to the 81 high performing classifiers. Finally, the geometric mean between the respective elements in ED and Acc. is computed Acc. to conform a resultant matrix whose elements ED ij act as diversity metric weighted with classifier’s accuracy. This matrix is first used to identify and remove those classifiers with Acc. identical accuracy and prediction outputs in the external test set ( ED ij = 0). As a consequence, a refined matrix of 59 high performing and diverse classifiers was obtained. For each classifier we computed its average accuracy-weighted distance with respect to the rest of the 58 classifiers. Only those 27 classifiers with average accuracy-weighted distance values (obtained from each row/column) higher than the overall average accuracy-weighted distance (obtained from the whole 59x59 matrix) were selected to be included in the final consensus classifier. Details on these 27 classifiers are provided in Table 6.5.

98

Table 6.5. Detailed results for each one of the best 27 models.

Training Set Test Set Ext. Evaluation Set CLASSIFIER p Acc. CCR Se. Sp. Kappa Acc. CCR Se. Sp. Kappa Acc. CCR Se. Sp. Kappa 01_0-2D_70_bayes.BayesLogistReg 29 0.791 0.791 0.786 0.795 0.581 0.800 0.801 0.857 0.744 0.601 0.790 0.790 0.712 0.868 0.580 02_0-2D_70_functions.RBFNetwork 29 0.796 0.797 0.810 0.784 0.593 0.776 0.776 0.667 0.884 0.552 0.781 0.781 0.788 0.774 0.562 03_0-2D_70_trees.FT 29 0.938 0.938 0.929 0.947 0.876 0.682 0.683 0.714 0.651 0.365 0.876 0.877 0.885 0.868 0.752 04_0-2D_trees.RandomTree 33(22) 1.000 1.000 1.000 1.000 1.000 0.765 0.764 0.714 0.814 0.529 0.790 0.790 0.788 0.792 0.581 05_0-2D_trees.LADTree 33(32) 1.000 1.000 1.000 1.000 1.000 0.847 0.847 0.857 0.837 0.694 0.810 0.810 0.846 0.774 0.619 06_0-2D_meta.FiltClassif.BFTree 33(13) 0.891 0.891 0.952 0.830 0.782 0.835 0.837 0.929 0.744 0.671 0.819 0.820 0.942 0.698 0.639 07_0-2D_rules.Nnge 33 1.000 1.000 1.000 1.000 1.000 0.788 0.789 0.786 0.791 0.576 0.752 0.753 0.731 0.774 0.505 08_3D_70_rules.JRip 33(9) 0.817 0.817 0.768 0.865 0.634 0.741 0.740 0.643 0.837 0.481 0.724 0.723 0.615 0.830 0.447 09_3D_70_rules.PART 33(10) 0.891 0.891 0.940 0.842 0.782 0.80 0.802 0.905 0.698 0.601 0.695 0.696 0.769 0.623 0.391 10_3D_70_lazy.KStar 33 1.000 1.000 1.000 1.000 1.000 0.718 0.718 0.714 0.721 0.435 0.781 0.782 0.827 0.736 0.562 11_3D_RandonComitee.RepTree 32 1.000 1.000 1.000 1.000 1.000 0.788 0.788 0.762 0.814 0.576 0.819 0.819 0.788 0.849 0.638 12_3D_trees.REPTree 32(11) 0.876 0.876 0.833 0.918 0.752 0.718 0.717 0.619 0.814 0.434 0.733 0.733 0.654 0.811 0.466 13_Full_rules.PART 27(16) 0.929 0.930 0.964 0.895 0.859 0.847 0.847 0.857 0.837 0.694 0.819 0.820 0.865 0.774 0.638 14_Full_trees.RandomTree 27(26) 1.000 1.000 1.000 1.000 1.000 0.788 0.789 0.810 0.767 0.577 0.800 0.801 0.885 0.717 0.601 15_Full_functions.SMO 27 0.929 0.930 0.958 0.901 0.859 0.847 0.848 0.905 0.791 0.695 0.829 0.830 0.904 0.755 0.658 16_Full_70_rules.DTNB 30(9) 0.853 0.852 0.839 0.865 0.705 0.812 0.812 0.786 0.837 0.623 0.848 0.848 0.865 0.830 0.695 17_Full_70_rules.NNge 30 1.000 1.000 1.000 1.000 1.000 0.765 0.765 0.786 0.744 0.530 0.762 0.762 0.788 0.736 0.524 18_Full_70_bayes.BayesLogistReg 30 0.788 0.788 0.780 0.795 0.575 0.812 0.813 0.881 0.744 0.624 0.819 0.819 0.769 0.868 0.638 19_PVSA_70_lazy.KStar 13 0.997 0.997 0.994 1.000 0.994 0.788 0.788 0.738 0.837 0.576 0.848 0.848 0.865 0.830 0.695 20_PVSA_70_functions.MLP 13 0.894 0.894 0.881 0.906 0.788 0.838 0.838 0.788 0.887 0.646 0.824 0.823 0.738 0.907 0.676 21_PVSA_70_trees.SimpleCart 13(10) 0.935 0.936 0.970 0.901 0.870 0.800 0.801 0.881 0.721 0.601 0.829 0.830 0.885 0.774 0.658 22_PVSA_70_trees.BFTree 13(10) 0.917 0.918 0.923 0.912 0.835 0.765 0.766 0.810 0.721 0.530 0.838 0.838 0.865 0.811 0.676 23_PVSA_70_tress.FT 13 0.87 0.870 0.851 0.889 0.740 0.800 0.801 0.857 0.744 0.601 0.771 0.771 0.692 0.849 0.542 24_PVSA_70_functions.VotedPerceptron 13 0.776 0.777 0.804 0.749 0.552 0.753 0.754 0.857 0.651 0.507 0.743 0.742 0.692 0.792 0.485 25_PVSA_rules.DTNB 14 0.823 0.823 0.839 0.807 0.646 0.824 0.824 0.833 0.814 0.647 0.829 0.829 0.846 0.811 0.657 26_PVSA_trees.LadTree 14 0.988 0.989 1.000 0.977 0.976 0.812 0.812 0.786 0.837 0.623 0.819 0.819 0.827 0.811 0.638 27_PVSA_lazy.KStar 14 0.994 0.994 0.994 0.994 0.988 0.812 0.812 0.786 0.837 0.623 0.819 0.819 0.788 0.849 0.638 MAX --- 1.000 1.000 1.000 1.000 1.000 0.847 0.848 0.929 0.887 0.695 0.876 0.877 0.942 0.907 0.752 MIN --- 0.776 0.777 0.768 0.749 0.552 0.682 0.683 0.619 0.651 0.365 0.695 0.696 0.615 0.623 0.391 MEAN --- 0.915 0.915 0.919 0.910 0.829 0.790 0.790 0.797 0.782 0.578 0.799 0.799 0.801 0.797 0.599 Consensus Model (uW, CCR, Acc.) --- 0.971 0.971 0.994 0.947 0.941 0.859 0.859 0.881 0.837 0.718 0.886 0.886 0.904 0.868 0.771 Consensus Model (Kappa) --- 0.971 0.971 0.994 0.947 0.941 0.847 0.847 0.881 0.814 0.694 0.895 0.896 0.923 0.868 0.791

Ext. Negative Set Ext. Cliff Set Real Ext. Evaluation Set CLASSIFIER p Acc. CCR Se. Sp. Kappa Acc. CCR Se. Sp. Kappa Acc. CCR Se. Sp. Kappa 01_0-2D_70_bayes.BayesLogistReg 29 0.819 ------0.779 0.73 0.632 0.828 0.436 0.789 0.740 0.667 0.813 0.377 02_0-2D_70_functions.RBFNetwork 29 0.794 ------0.636 0.653 0.684 0.621 0.238 0.632 0.646 0.667 0.625 0.174 03_0-2D_70_trees.FT 29 0.723 ------0.714 0.775 0.895 0.655 0.417 0.895 0.803 0.667 0.938 0.604 04_0-2D_trees.RandomTree 33(22) 0.755 ------0.675 0.696 0.737 0.655 0.310 0.684 0.813 1.000 0.625 0.345 05_0-2D_trees.LADTree 33(32) 0.761 ------0.675 0.678 0.684 0.672 0.290 0.684 0.678 0.667 0.688 0.230 06_0-2D_meta.FiltClassif.BFTree 33(13) 0.716 ------0.662 0.705 0.789 0.621 0.311 0.737 0.844 1.000 0.688 0.410 07_0-2D_rules.Nnge 33 0.735 ------0.610 0.618 0.632 0.603 0.183 0.842 0.771 0.667 0.875 0.477 08_3D_70_rules.JRip 33(9) 0.877 ------0.753 0.748 0.737 0.759 0.427 0.632 0.646 0.667 0.625 0.174 09_3D_70_rules.PART 33(10) 0.645 ------0.636 0.670 0.737 0.603 0.258 0.684 0.678 0.667 0.688 0.230 10_3D_70_lazy.KStar 33 0.710 ------0.623 0.644 0.684 0.603 0.221 0.684 0.678 0.667 0.688 0.230 11_3D_RandonComitee.RepTree 32 0.748 ------0.662 0.670 0.684 0.655 0.272 0.632 0.646 0.667 0.625 0.174 12_3D_trees.REPTree 32(11) 0.742 ------0.610 0.618 0.632 0.603 0.183 0.737 0.709 0.667 0.750 0.296 13_Full_rules.PART 27(16) 0.742 ------0.740 0.775 0.842 0.707 0.440 0.684 0.813 1.000 0.625 0.345 14_Full_trees.RandomTree 27(26) 0.787 ------0.740 0.757 0.789 0.724 0.424 0.632 0.646 0.667 0.625 0.174 15_Full_functions.SMO 27 0.723 ------0.701 0.749 0.842 0.655 0.382 0.737 0.709 0.667 0.750 0.296 16_Full_70_rules.DTNB 30(9) 0.774 ------0.740 0.722 0.684 0.759 0.388 0.789 0.740 0.667 0.813 0.377 17_Full_70_rules.NNge 30 0.639 ------0.636 0.635 0.632 0.638 0.216 0.737 0.709 0.667 0.750 0.296 18_Full_70_bayes.BayesLogistReg 30 0.774 ------0.766 0.721 0.632 0.810 0.413 0.789 0.740 0.667 0.813 0.377 19_PVSA_70_lazy.KStar 13 0.826 ------0.610 0.618 0.632 0.603 0.183 0.789 0.740 0.667 0.813 0.377 20_PVSA_70_functions.MLP 13 0.813 ------0.688 0.670 0.632 0.707 0.288 0.684 0.813 1.000 0.625 0.345 21_PVSA_70_trees.SimpleCart 13(10) 0.748 ------0.662 0.652 0.632 0.672 0.251 0.632 0.646 0.667 0.625 0.174 22_PVSA_70_trees.BFTree 13(10) 0.787 ------0.675 0.661 0.632 0.690 0.269 0.632 0.646 0.667 0.625 0.174 23_PVSA_70_tress.FT 13 0.832 ------0.636 0.635 0.632 0.638 0.216 0.684 0.678 0.667 0.688 0.230 24_PVSA_70_functions.VotedPerceptron 13 0.703 ------0.701 0.678 0.632 0.724 0.307 0.632 0.646 0.667 0.625 0.174 25_PVSA_rules.DTNB 14 0.723 ------0.610 0.618 0.632 0.603 0.183 0.789 0.740 0.667 0.813 0.377 26_PVSA_trees.LadTree 14 0.794 ------0.623 0.627 0.632 0.621 0.199 0.684 0.813 1.000 0.625 0.345 27_PVSA_lazy.KStar 14 0.781 ------0.610 0.618 0.632 0.603 0.183 0.842 0.771 0.667 0.875 0.477 MAX --- 0.877 ------0.779 0.775 0.895 0.828 0.440 0.895 0.844 1.000 0.938 0.604 MIN --- 0.639 ------0.610 0.618 0.632 0.603 0.183 0.632 0.646 0.667 0.625 0.174 MEAN --- 0.758 ------0.673 0.679 0.690 0.668 0.292 0.717 0.722 0.729 0.715 0.306 Consensus Model (uW, CCR, Acc.) --- 0.794 ------0.701 0.696 0.684 0.707 0.327 0.789 0.740 0.667 0.813 0.377 Consensus Model (Kappa) --- 0.794 ------0.701 0.696 0.684 0.707 0.327 0.789 0.740 0.667 0.813 0.377

Chapter 6. .

Consensus predicted classes were obtained by fusing (summing) the predicted class from the 27 diverse and high performing base classifiers via majority vote. Several weighting schemes were explored to select the scheme(s) rendering the best classification performance. In this sense, the predicted class of each compound according to each base classifier was weighted (multiplied) with the accuracy, Kappa, and CCR statistics of the respective classifier. For the case of un-weighted majority vote compounds predicted as Class_1/Class_0 are labeled as 1/0 and the corresponding un-weighted consensus score (SUM U) is computed as:

(Eq 6.5) ௎ ଶ଻ ௜ୀଵ ௜ where PredClassܷܵܯ i accountsൌ σ ܲݎ݁݀ܥ݈ܽݏݏ for the class assigned by the classifier i to each compound in the dataset. Then, if SUM U ≥ 14 then the compound is predicted as Class_1, otherwise as Class_0. For the case of weighted majority vote compounds predicted as Class_1/Class_0 are labeled as 1/-1 and the corresponding weighted consensus score ( SUM W) is computed as:

(Eq 6.6) ௐ ଶ଻ where W is theܷܵܯ weightingൌ σ feature௜ୀଵ ܲݎ݁݀ܥ݈ܽݏݏ selected௜ ൈwhich ܹ can be the accuracy, Kappa, or the CCR statistics associated to the corresponding classifier. Then, if SUM W ≥ 0 then the compound is predicted as Class_1, otherwise as Class_0.

An initial comparison between the best performing base classifier, the consensus predictions derived from the 81 high performing classifiers, and the 27 diverse and high performing classifiers was conducted on the External Evaluation Set in order to test the suitability of the 27 base classifiers selected for consensus classification. As can noted in Table 6.6 the classification performance of the consensus classifier based on the 27 diverse and high performing classifiers indeed overcomes the best base classifier as well as the consensus predictions derived from the 81 high performing classifiers. Specifically, the results of the majority vote based on the 27 base classifiers weighted with Kappa favorably compares with the best base classifier as well as the consensus classifier based on the 81 classifiers on all the classification performance metrics except

101

Chapter 6. . the specificity, which equals the best base classifier. A more detailed comparison on all the subsets (training, test, and external sets) it is offered in Table 6.5. From this comparison can be deduced that no significant differences on the classification performance of weighted and un-weighted majority vote schemes are observed. So, either or both schemes can be applied in a further virtual screening campaign.

Table 6.6. Summarized results of the consensus process for 81 and 27 models.

Acc Se Sp FP FN Kappa Best Model 0.876 0.885 0.868 0.132 0.115 0.752 Unweigh 0.867 0.904 0.830 0.170 0.096 0.734 Consensus W(Acc) 0.867 0.904 0.830 0.170 0.096 0.734 81 W(CCR) 0.867 0.904 0.830 0.170 0.096 0.734 Classifiers W(Kappa) 0.867 0.904 0.830 0.170 0.096 0.734 Unweigh 0.886 0.904 0.868 0.132 0.096 0.772 Consensus W(Acc) 0.886 0.904 0.868 0.132 0.096 0.772

27 Majority Vote W(CCR) 0.886 0.904 0.868 0.132 0.096 0.772 Classifiers W(Kappa) 0.895 0.923 0.868 0.132 0.077 0.791

6.2.5 Ligand-Based Virtual Screening Performance So far, the 27 base classifiers derived as well as the corresponding consensus classifiers has proved to be predictive and reliable enough to support it use as a virtual screening tool providing a practical solution to the automatic identification of telomerase inhibitors via G-quadruplex stabilization. Anyhow, a good classification performance does not ensure the usefulness of the classifier as a virtual screening tool. Therefore, the potential enrichment ability of the VS strategy proposed needs to be estimated as realistically as possible by means of established enrichment metrics (see Material and Methods section). In this sense we decided to apply a virtual screening strategy based on the sequential use of the scores derived from both un-weighted ( SUMU) and weighted ( SUM W) majority vote consensus classifiers. Based on the results obtained from the comparison of the different weighting schemes we decided to use only the scores derived from the Kappa-weighted ( SUM K) consensus classifier. A high-quality VS tool should render an ordered list of candidates where promising ones are placed at the top of the list while irrelevant or detrimental candidates are relegated to the bottom of that list. For this, the prediction algorithm on which the VS

102

Chapter 6. . tool is based must be characterised by a good predictive performance. Nevertheless, for VS the most important features are a particularly high TP and low FP rates , seeking to maximise/minimise the number of actual active/inactive cases regarded as active by the prediction algorithm. Here the class of interest is regarded as the active class. In fact, both un-weighted and Kappa-weighted consensus classifiers show a good enough TP and FP rates (around 88% and 13% respectively, as deduced from the external evaluation set). So, from these results we can expect that a subset of molecule candidates classified as potential telomerase inhibitors (Class_1) by using any of the two consensus classifiers (or both) will contain around 88% of the telomerase inhibitors screened but 13% of non-inhibitors. The other key feature of a high-quality VS tool is to provide a measure to quantitatively score the target property in such a way that using it as ranking criterion the resultant ordered list resembles as much as possible the actual levels of the target property. In our case, both SUM U and SUM K exhibit the variability required for library ranking. So, by sequentially using SUM U and SUM K as ranking criteria (decreasingly sort the chemical library by SUM U, then by SUM K) we decided to test the suitability of this VS strategy for the automatic identification of potent (IC 50 ≤ 1µM ) telomerase inhibitors via G-quadruplex stabilization dispersed in a data set of moderate, weak, or non-inhibitors compounds (IC 50 < 1 µM). In doing so, we initially decided to estimate the enrichment performance by using the full dataset of 783 inhibitors (295 / 488 Class_1 / Class_0). However, the reliability of enrichment metrics estimated from such a sample is hampered by its reduced size as well as the high ratio of active compounds (Class_1 inhibitors). The problem with using a reduced dataset is that the enrichment metrics derived exhibit a higher variance compared to significantly large datasets. Experiments conducted by Truchon and Bayly (215) shows that the standard deviation associated to enrichment metrics such as ROC or AUAC are higher for small datasets and converge to a constant value when the size of the dataset increases. In any case, for the problem at hand it is not possible to setup a large enough decoy set as recommended in the performance evaluation of virtual screening tools (215, 216). All we can do is to consider the relative error associated to the use of our data set. For this, the relative error associated to the enrichment metrics computed for this dataset will be estimated as recommended in (215).

103

Chapter 6. .

The other problem is related to the high ratio of actives (Class_1 inhibitors) which mainly hinders the early recognition ability in what is known as the “saturation effect”. That is, for datasets with a high ratio of actives, once active compounds “saturate” the early part of the ordered list, the enrichment metric cannot get any higher; being this effect more acute as the top fraction considered is smaller (215). In order to alleviate as much as possible the saturation effect and thus, to estimate in a more realistic way the utility of the virtual screening strategy proposed we decided to simulate an experiment to evaluate the ability of the approach to retrieve just 14 structurally diverse Class_1 inhibitors dispersed in a set of 325 Class_0 inhibitors never used on the classifiers training process. The 14 Class_1 inhibitors were selected in such a way that every chemical family represented in test and all the external evaluation subsets were represented. For this, one Class_1 representative of each of the 11 chemical families ( ac-022, acp-003, aq-152, fen-005, mis-009, pip-013 , qui-015, quo-002 , tet-026, tri-002, and pta-008) were randomly selected to be included in the VS experiment together with the three Class_1 inhibitors included in the Real External Evaluation Set ( Ant1,5 , M1 , and M2 ). The structural diversity of this subset of Class_1 inhibitors can be assessed by inspecting the Figure 6.4. At the same time, all the Class_0 compounds included on Test, External Evaluation, Real External Evaluation, External Cliff and External Negative sets (never used for training) were included as decoys (in this case actual negative cases) for the VS experiment. Finally, the resultant subset of 339 compounds is decreasingly sorted according to the computed values of SUM U, then by SUM K, and the enrichment ability of the ligand-based VS strategy proposed is finally assessed according to the enrichment metrics previously detailed.

104

Chapter 6. .

Figure 6.4. Chemical structure and dissimilarity matrix (based on normalized Euclidean distances from ChemAxon´s Chemical Fingerprints) for the set of 14 Class_1 inhibitors selected to evaluate the enrichment ability of the VS strategy proposed in this work.

We are aware that a ratio of actives Ra = 0.0413 (24 Class_0 compounds "decoys" for each Class_1 inhibitor "active") of this dataset is still insufficient to fulfil the minimum of 36 decoys proposed in (333). However, the enrichment metrics derived from such a dataset can be used as a proper estimate if we consider the relative error associated to each metric. The relative error associated to the enrichment metrics derived from this dataset as well as details on their size and composition are provided in Table 6.7. As can be noted, the relative error is less than or about 1% for most of the enrichment metrics and never exceeds 5.5%. Actually, the enrichment metrics with

105

Chapter 6. . associated relative errors near 3% are just those corresponding to the top 1% of the dataset, where the saturation effect becomes more acute. So, this data provides sufficient evidence to assert that the effect of using this dataset does not affect significantly the inferences on the enrichment performance deduced from enrichment metrics computed from it.

Table 6.7. Relative error associated to the enrichment metrics and dataset size and composition.

Dataset Size and Composition N n R a 339 14 0.0431

Relative Error (%) associated to Enrichment Metrics EF RIE BEDROC

α = 160.9 ( α = 1%) 2.77 2.31 5.36 α = 32.2 ( α = 5%) 1.14 1.02 1.04 α = 16.1 ( α = 10%) 0.78 0.70 0.48 α = 8 ( α = 20%) 0.52 0.45 0.21 ROC 0.16 AUAC 0.15 N is the total number of ligands used in the training set. n is the ligand class used for evaluate the enrichment analysis, it is composed for the ligands represented in the figure 6.4. Ra is the ratio of active cases in the dataset (see eq. 2.21). ROC is the area under curve, it definition appears in the eq.2.21. EF is the enrichment factor (see eq. 2.22). RIE means Robust Initial Enhancement (see eq.2.23). BEDROC is the Boltzmann-enhanced discrimination of ROC (see eq.2.24) AUAC is the area under accumulation curve (see eq 2.20).

The respective values of AUAC and ROC metrics obtained suggest that by using the VS strategy proposed it is possible to rank a potent telomerase inhibitor earlier than a dataset compound of moderate, weak or non-inhibitor capacity with a probability > 0.80. The excellent EF values obtained suggests that the ranking obtained provides a top 1% fraction about 8 times more enriched of potent inhibitors compared to just selecting a random 1% fraction from the dataset. As can be noted in Table 8, the EF values deteriorates at higher top fractions (top 5%, 10%, and 20%), but still overcomes a random selection. These results point to a good overall (deduced from AUAC and ROC ) and local (deduced from EF ) enrichment ability of the VS strategy, especially at the top 1% fraction.

106

Chapter 6. .

Table 6.8. Classic Enrichment and Early Recognition Metrics.

Classic Enrichment Metrics Early Recognition Metrics -1 -1 EF 1% 8.07(±2.2·10 ) RIE 1% 9.37(±2.1·10 ) -2 -2 EF 5% 5.69(±6.5·10 ) RIE 5% 4.58(±4.6·10 ) -2 -2 EF 10% 3.56(±2.8·10 ) RIE 10% 3.72(±2.5·10 ) -2 -2 EF 20% 2.49(±1.3·10 ) RIE 20% 2.98(±1.3·10 ) -3 -2 AUAC 0.80(±1.3·10 ) BEDROC 1% 0.38(±2.0·10 ) -3 -3 ROC 0.81(±1.2·10 ) BEDROC 5% 0.25(±2.7·10 ) -3 BEDROC 10% 0.31(±1.5·10 ) -4 BEDROC 20% 0.43(±9.0·10 )

This assertion can be confirmed by a visual inspection of the corresponding accumulation, ROC and enrichment curves. Thus, if one is interested in filtering the first 1% of a considerably large virtual library (thousands or millions), we can realistically expect that by following the VS protocol proposed encouraging enrichment performance should be obtained.

Figure 6.5. ROC and accumulative curves . FP means false positives rate (positive cases classified incorrectly or 1-specificty), TP means true positives rate (positive cases classified correctly or sensitivity),

X is the relative rank of the i-th active in the ordered list when their corresponding rank ri is scaled to the total number of cases ( N) in the dataset see Eq 2.20.

However, classic enrichment metrics, such as ROC , AUAC and EF , cannot discriminate between a VS tool that ranks half of the actives at the beginning of the ordered list and the other half at the end from a VS protocol that ranks all actives at the beginning of the list. This feature is the most important property of a VS tool and is known as “early recognition” ability. Therefore, the analysis of metrics such as RIE and BEDROC are essential to effectively estimate this essential feature on a VS protocol, especially when very large datasets are intended to be screened. From the analysis of RIE at the respective top 1%, 5%, 10% and 20% fractions we

107

Chapter 6. . can deduce that the early recognition ability of the approach follows a similar behaviour to that observed by the analysis of the overall enrichment ability using the EF metric. That is, the early recognition ability of the approach is better at early fractions but starts deteriorating as the top fraction considered increases. This pattern is also observed when the metric analysed is BEDROC, except for a recovering and even improvement at top 20% fractions. The probabilistic interpretation of this metric allows confirming that the ability of the VS protocol to rank most of the Class_1 inhibitors at the top of the fraction filtered is consistently superior at top 1% fractions. In summary, the VS strategy showed not only good enrichment ability (retrieve active cases in the filtered fraction) but more important, a very good early recognition ability (placing active cases at the very beginning of the ordered list). So, from the data presented and the comparative analysis conducted regarding the enrichment and early recognition ability of the LBVS strategy we can conclude that the approach is perfectly valid option as VS tool able to ensure an efficient VS campaign.

6.2.6 Virtual screening to Asinex and Life chemicals database. Two commercial compounds databases were screened following the ligand-based VS protocol above described (decreasingly sort by SUM U, then by SUM K). A total of 669 417 Compounds were scored and sorted (313473 and 355944 compounds in ASINEX and LIFECHEMICALS, respectively) to produce a sorted list. Finally we decided to keep those compounds in ASINEX and LIFECHEMICALS libraries with SUM U ≥ 20 (>75% concordance) and SUM K ≥ 7; except when the compound belo ngs to the anticancer subset of LIFECHEMICALS library, in which case the acceptance criteria was: SUM U ≥ 18 (>66% concordance) and SUM K ≥ 4. The final number of selected compounds was 101. More details concerning its structures can be appreciated in the material supplementary ESM4 . Figure 6.6 shows the structural diversity of the 101 candidates selected by ligand-based VS accessed through the corresponding heatmap based on normalized Euclidean distances and ChemAxon’s Chemical Fingerprints (334, 335).

108

Chapter 6. .

(Mean Norm ED = 0.77 ) Figure 6.6. Heatmap based on normalized Euclidean distances that evidence the structural diversity of the 101 ligands selected by our Ligand Based Virtual Screening (LBVS) . Red points means non-similarity between the ligands and green point’s high -similarity. Points are referred to the definition of fingerprints from the ChemAxon (313).

6.2.7 Docking modeling campaign and selection of the most promising candidates. Docking studies were performed on the 101 TI candidates in order to determine those with more chances to inhibit the enzyme by a G4 stabilization mechanism. We decided then to make a more detailed inspection to select the best candidates stacking on guanine quadruplexes. In this respect, docking studies were carried out on the 101 TI candidates identified via LBVS in order to shed light on the binding mode of the shortlisted ligands as well as on interactions driving the molecular recognition. Docking studies were executed on the basis of the recommendations proposed by Haider and Neidle in (303). Table 6.9 shows the ∆Tm values that are employed for the calibration of the docking protocol. All the ∆Tm values are obtained in the same experimental conditions.

109

Chapter 6. .

Table 6. 9. ∆Tm values employed for the calibration of the docking protocol.

Name ∆Tm (°C) A ∆Tm (°C) B 360A 26.1 26.0 BRACO 19 20.2 13.5 Telomestatin 22.8 23.6 TMPyP4 27.4 21.0 Piper 23.0 15.0

A It is corresponded to 10 mM lithium cacodylate (pH 7.2), 10 mM potassium chloride, 90 mM lithium chloride. B It is corresponded 10 mM lithium cacodylate (pH 7.2), 100 mM sodium chloride.

Docking calibration carried out with GOLD5.1 returned a reliable pose reproducing the binding conformation of co-crystallized molecule. Note that the RMSD value of 1.70 Å, was measured by comparing the crystal structure (in green) with the top-scored docking pose (in yellow). Analysis of docking results was performed by a visual inspection of the docking solutions, followed by scoring function analysis, with the aim to rank activities with docking scores. All structures were protonated at pH 7.1.

Braco19 Piper

360A TMPyP4

110

Chapter 6. .

Telomestatin Figure 6.7. Docking calibration for the ligands used in the calibration using as target the reference structure available from PDB with code 3CE5 (231).

Once conducted the docking campaign the docking scores (GoldScores) were obtained for the 101 preselected candidates. All the compounds were sorted on the basis of the docking average fitness (in the range from 93.43 to 40.49 kJ/mol). We doped the list with the five reference compounds ( i.e ., Braco19 etc) to have an idea of the docking ranking (see enclosed ESM 5 ). We have computed hashed linear binary fingerprints for all compounds. A total of 81 out of the 101 LBVS preselected compounds that are those having average fitness values higher than that of telomestatin (that was the worst performer out the five references with an average fitness equal to 67.06 KJ/mol) were preselected according to a structure-based criterion. Finally 10 compounds were selected on the basis of a diversity-based selection using the linear fingerprint as descriptor space, the Sorgel distance as metric and the maximum sum of pairwise distance for selection. Such an approach should ensure a better sampling of the chemical types thus avoiding to pick-up too similar molecules. The 10 proposed molecules were visually inspected to better assess their binding modes. It was observed that the ten selected compounds returned pretty well superimposed poses apart from 'BAS00732386' and 'F9995-0028', which however adopted two alternatives, but still viable, binding conformations.

111

Chapter 6. .

6.2.8 Harmonizing Ligand- and Structure-Based Information for Virtual Screening. The 101 candidates were re-sorted considering the results of the docking study. The 10 most probable and structurally diverse candidates via G-4 stabilization proposed by the Docking study were assigned to the first 10 places in the rank list, and were U specifically sorted in this range [1,10] by (Decreasingly Sort by F Dock , then by SUM , then by SUM K). The other 6 promising (but less probable) candidates via G-4 stabilization proposed by the Docking study were assigned to the next 6 places in the rank list and were specifically sorted in this range [11,16] by using the same sorting scheme used in the range [1,10]. The remaining 85 candidates were assigned to the last 85 positions in the rank list and were specifically sorted in this range [86,101] by using the same sorting scheme.

6.2.9 Selection and Purchase of the Most Promising and Structurally Diverse Candidates. Selection was done according to the final Ligand- & Structure-based Final Ranking, medicinal chemistry expert´s criteria and commercial availability from vendors. Some of top ranked compounds were not available from vendors at the moment of the purchase. So, the 27 top ranked and commercially available candidates were purchased (19 from ASINEX and 8 from LifeChemicals) and submitted to experimental assays. The chemical structure and library identification code is provided in Figure 6.8.

112

Chapter 6. .

A

B

Figure 6.8. Chemical structure of the Compounds employed in this study. (A) ASINEX Compounds.

(B) LIFECHEMICALS Compounds.

113

Chapter 6. .

Finally the stock solutions (10 mM) were prepared in DMSO and kept at -20 °C. Due to the low solubility of some compounds the stock solutions were prepared at 5 mM (compounds L16, M2 and M5) and in other cases at 2 mM (compounds L2, L5, L7, L10, L11, L17). Most of compounds shows an extended aromatic core and the presence of electronegative atoms like nitrogen, oxygen, fluorine and sulfur.

(Mean Norm ED = 0.81 ) Figure 6.9. Structural diversity of the 27 candidates purchased accessed through the corresponding heatmap based on normalized Euclidean distances and ChemAxon’s Chemical Fingerprints.

6.2.10 Study of the stabilizing capacity of the ligands in the human telomeric sequence. The fluorescence resonance energy transfer (FRET) melting assay is used to measure the selectivity and capacity of stabilization of certain ligands on intramolecular the human telomere G-quadruplex. It may also be transposed to other G4 or control duplexes. This assay is highly sensitive and depends on the nature and constitution of the fluorescent probe, the concentrations of stabilizer employed, the incubation buffer, the ion which is used to adjust the ionic strength of the medium and the methodology used in order to determine the value of Tm. Our models predict that the compounds selected should have an indirect activity on telomerase via quadruplex stabilization. The docking campaign predicts the energies and the ability of these compounds for interact with Guanine quadruplexes. We evaluated each compound for the ability to selectively stabilize G-quadruplex structures relative to duplex DNA. No experiments were performed directly on telomerase. FRET

114

Chapter 6. . melting analysis can be used to determine the ligand-induced stabilization of a G- quadruplex structure by measurement of the ligand-induced shift in the apparent melting temperature ( ΔT1/2 ) (171, 336). We evaluated melting of the oligonucleotide F21T , which consists of 3.5 copies of the human telomeric guanine-rich strand, in the presence of various concentrations (1-20 µM) of the ligands under study. Stabilizing effects were more apparent in buffer that contains potassium ions (Figure 10). The largest shifts in

T1/2 were observed with M1 (+12.5 °C at a concentration of 1 µM compound) and L13 (+10.0 °C at a concentration of 5 µM compound). Other compounds like L2, L8 and

L19 ΔT1/2 values of +3.8°C, 9.4 and +7.6 °C, respectively, at 20 µM concentrations. For this reason we have decided focus our studies in compounds M1 and L13.

Figure 6.10. Stabilizing effect of L2, L8, L13, L19 and M1 on the human telomeric quadruplex structure. For these experiments a 10 mM lithium cacodylate buffer (pH 7.2) and 0.2 µM F21T were used in K + conditions. Each ligand was tested at four different concentrations (1, 5, 10, and 20 μM).

In buffer containing sodium, we observed a slight decrease in melting temperature values, as observed for most G4 ligands. Under these conditions M1 had a ΔT1/2 of +7.1

°C at a concentration of 1 µM. In other hand L13 had a ΔT1/2 of +11.2 °C at a concentration of 10 µM (the only situation where, for this compound, the stabilization in presence of sodium was higher than that in presence of potassium). L8 and L19 had a

ΔT1/2 values of + 7.4 °C and 4.3°C, respectively at a concentration of 20 µM. The high stabilization obtained with M1 could be the result of an extensive conjugation of the aromatic system and its potentiality for establish π π stacking interactions. In the case of L13 is also probably the establishment of hydrophobic interaction, but lower than M1

115

Chapter 6. . because the conjugation is not so extended and the presence of sulfur could exert steric impediment.

L2 L8 L13

L19 M1

Figure 6.11. Compounds with activity as G4 identified by the VS campaign.

Table 6.10. Structural diversity of the best candidates.

Overall M1 L2 L13 L8 L19 Mean Norm ED F9995 -0028 BAS 00383983 BAS 01130922 BAS 01125526 BAS 00914927 = 0.8502 F9995-0028 M1 0.000 0.990 1.000 0.930 0.969 BAS 00383983 L2 0.990 0.000 0.889 0.698 0.607 BAS 01130922 L13 1.000 0.889 0.000 0.847 0.884 BAS 01125526 L8 0.930 0.698 0.847 0.000 0.688 BAS 00914927 L19 0.969 0.607 0.884 0.688 0.000 MEAN 0.9722 0.7960 0.9048 0.7908 0.7870

Five hits out of 27 candidates were found using the VS strategy (Figure 6.11) with success rate of 18.6%. The ligands were found disperse in a pool of more than half a million of molecules. The identified ligands are structurally diverse hits (Table 6.10).

116

Chapter 6. .

The scaffolds found confirm the ability of the VS strategy to identify new ligands and the utility of the procedure. 6.2.11 Stabilization of G-quadruplex-prone sequences by ligands In order to study quantitatively the binding capacity of these ligands on different G- quadruplexes we performed FRET experiments. Only a single duplex sequences (dx12) was used to study the affinity for duplex by these compounds. The stabilization of quadruplex structures adopted by different G-quadruplex sequences (taken from three promoter regions, RNA, and the human telomeric motif) are shown in Figure 6.12. At a ligand concentration of 1 μM and in the presence of potassium, M1 increased the melting temperature of the c-kit1, c-kit2 and c-myc sequences by more than 9.3, 9.0, and

7.7 °C, respectively. The largest T m increases (+14,2 °C) was obtained for the RNA sequence F21RT under the same conditions. For the duplex sequence dx12, the ΔTm value is +2.4 °C and +0.9 °C in the presence of potassium and sodium, respectively. In other hand, the stabilization in presence of sodium is higher for c-kit 2 than in potassium conditions with a ΔTm increase of +14.4 °C. Stabilizations of 4.1, 9.1 and 11.2 is appreciated for the sequences c-kit1, c-myc and F21RT, respectively, in buffer that contains 100mM of sodium chloride. Stabilization values for M1 and L13 at 1µM in presence of potassium and sodium are showed in table 6.11.

Table 6.11. ΔTm stabilization values for M1 and L13 on five different G-quadruplexes sequences tested at 1 µM in presence of potassium and sodium ions.

ΔTm(°C) M-1 (1 µM) ΔTm(°C)L -13 (1 µM) Sequence Na + conditions 1 K+ conditions 2 Na + conditions 1 K+ conditions 2 F21T 5.2±1.4 8.9±0.0 0.1±0.7 1.2±0.7 FKit1T 4.1±1.2 9.3±0.2 1.8±1.3 1.4±0.6 FKit2T 14.4±1.0 9.0±0.1 2.7±0.2 1.5±0.2 FmycT 9.1±0.5 7.7±0.4 3.1±1.1 1.8±0.4 F21RT 11.2±1.2 14.2±1.4 -2.9±1.7 2.0±0.2 FdxT 0.9±0.6 2.4±0.1 0.0±0.1 -0.9±0.3 1 Experiments were performed in 10 mM lithium cacodylate (pH 7.2), 100 mM NaCl, 0.2 µM oligonucleotide, 2 Experiments were performed in 10 mM lithium cacodylate (pH 7.2), 1 mM KCl, 99 mM LiCl, (except F21T, 10 mM KCl, 90 mM LiCl) 0.2 µM oligonucleotide.

117

Chapter 6. .

Figure 6.12. Stabilizing effect of M1 and L13 at concentrations between 0.5 and 10µM on quadruplex- and duplex-forming sequences in K + conditions.

Stabilization for L13 is lower than M1 too; in the sequences different to the human telomeric repetition in potassium conditions the stabilization at 1 μM is relatively low. For a concentration of 5 μM is possible to appreciate values of stabilization higher than

5.0 degrees for sequences c-kit1, c-kit2 and F21RT, and a T m increase close to 10°C for the sequence c-myc. Similar values of stabilizations are found in sodium conditions. L13 does not stabilize the control duplex dx12 under these conditions. We also tested the effect of M1 and L13 on these quadruplexes in the presence of 3 and 10 mM of a competitor duplex, ds26, to confirm the specificity of these compounds for quadruplexes over duplexes. Results are summarized in Table 6.12. The addition of a large excess of ds26 has little effect on ∆Tm values. These results, together with the low stabilizations found for dx12 demonstrate that M1 and L13 are selective for G- quadruplex structures.

118

Chapter 6. .

Table 6.12. Effect of ds26 duplex competitor on M1 and L13 stabilization of five different G- quadruplexes.

ΔTm(°C)M -1 ( 1 µM) ΔTm(°C)L -13 ( 5 µM) Na + conditions 1 F21T Kit-1 Kit-2 21RT c-Myc F21T Kit-1 Kit-2 21RT c-Myc

No competitor 5.3±2.2 3.4±0.3 9.0±0.5 9.5±0.1 4.2±2.6 5.5±2.3 3.5±2.3 5.6±0.0 10.8±0.1 14.3±1.5 3 µM ds26 3.6±2.4 4.9±3.0 8.9±0.5 8.3±0.8 3.0±0.5 8.0±0.7 9.3±0.3 8.3±0.8 11.3±1.3 14.1±3.0 10 µM ds26 3.7±1.0 3.4±2.5 8.1±1.1 8.1±0.7 5.1±1.0 6.4±2.3 9.0±0.7 8.0±0.8 11.3±1.0 11.2±0.1 ΔTm(°C)M -1 ( 1 µM) ΔTm(°C)L -13 ( 5 µM) K+ conditions 2 F21T Kit-1 Kit-2 21RT Myc F21T Kit-1 Kit-2 21RT Myc

No competitor 9.8±2.2 8.6±1.3 10.9±1.3 12.6±0.9 10.2±2.2 13.0±0.6 9.3±0.3 8.4±0.9 9.3±1.0 11.7±1.0 3 µM ds26 7.9±0.1 7.0±0.6 6.2±0.5 10.0±0.5 12.0±0.3 13.3±0.5 10.3±0.4 8.4±1.5 9.9±0.3 11.3±0.2 10 µM ds26 8.2±0.5 5.3±1.5 6.5±0.8 10.2±0.4 9.9±0.4 10.7±0.9 8.8±0.4 7.8±1.4 9.6±0.2 10.3±1.4 1 Experiments were performed in 10 mM lithium cacodylate (pH 7.2), 100 mM NaCl, 0.2 µM oligonucleotide, 2 Experiments were performed in 10 mM lithium cacodylate (pH 7.2), 1 mM KCl, 99 mM LiCl, (except F21T, 10 mM KCl, 90 mM LiCl) 0.2 µM oligonucleotide.

6.2.12 UV-Vis spectroscopy studies. When a compound interacts with DNA, its spectroscopic properties may be altered. L13, exhibits a maximal of absorption at 453 nm, with an intensity of the log ε = 4.36 (Figure 6.13). The signal is in correspondence, in position and intensity, with the tran sitions n→π* and π→π* that are probable in the aromatic conjugated system. The n→π* transitions signal correspond with transitions of low energy that appears as weak bands in the near ultraviolet region. As the same way, the transitions π→π* that usually appear in the vacuum ultraviolet (when they are isolated), in our case can be seen in this region (near ultraviolet) because the conjugated instaurations of the aromatic system appears to these values of wavelengths. When the oligonucleotide is added, due to the interactions of the ligand with the DNA, by binding of hydrogen or by hydrophobic forces, the new system is more stable, and transitions of lower energy are found: one observes a bathochromic shift in the wavelength of maximal absorption from 453 nm to 472 nm and a significant hypochromicity.

119

Chapter 6. .

Figure 6.13. UV spectra of L13 alone and in the presence of a G-quadruplex-forming DNA oligonucleotide, 21Ag.

Ligand M1 was obtained as a dark powder in solid form. The spectrum for this compound was initially recorded between 200 and 700 nm. In this range we could not see a well-defined absorbance peak. We therefore decided to extend the range of wavelength in the spectrum to values between 300 and 1100 nm. In these new spectrum we found an absorbance maximum in the near infrared at 1001 nm, with log ε = 4.57. We can relate the maximal at the λ =1001nm with the immense conjugation in the aromatic system and transitions of very low energy. Preliminary experiments suggest that M1 also interact with 21G, as shown by changes in the absorbance spectra (Figure 6.14). However, measuring absorbance in that wavelength range is very demanding for our experimental setting, and infrared spectroscopy will be considered to confirm these results.

120

Chapter 6. .

Figure 6.14. UV spectra of M1 alone (in blue) and in the presence of a G-quadruplex-forming DNA oligonucleotide, 21G at two different concentrations (red and purple).

6.2.13 Fluorescence properties of L13. Fluorescence spectroscopy may be used to confirm that the ligand can bind selectively to a DNA or RNA, We only performed experiments for L13, as the absorbance properties of M1 are not compatible with the specifications of our spectrofluorimeter. Figure 6.15 shows the fluorescence spectra for the different oligonucleotide sequences, alone and in the presence of ligand L13. Nucleic acids sequences alone have low or negligible emission in that wavelength range. Ligand L13 alone is also weakly fluorescent (blue curves). However when ligand and oligonucleotide are mixed, a large increase in fluorescence is observed (green curves) with respect to the ligand alone. The largest enhancement (1500 fold) is found for the c-myc sequence. The sequences kit1 and kit2 show enhancement of fluorescence close to 800-fold, with respect to L13 alone. Emission maxima for the Drug-DNA complexes are found at 587, 591 and 592 nm for c-myc, kit1 and kit2, respectively. A large fluorescent enhancement (1100 fold) was also found for the RNA quadruplex. In contrast, little enhancement is found for the two duplexes tested, ds26 and ds12, confirming a weak interaction between L-13 and duplex DNA.

121

Chapter 6. .

Figure 6.15. Fluorescence spectra of L13 in presence of the different oligonucleotides under study.

122

Chapter 6. .

6.3 Antiproliferatives properties of other similar compounds to M1 and L13 To conclude this chapter, we will provide evidence that compounds similar to M1 and L13 have anti-proliferative properties. Similar compounds were found in a similarity search of a cancer resource database (337). Six compounds had similarity values higher than 60% with L13. Table 6.13 shows the chemical structures, similarity scores, and activities of some of these compounds evaluated in cell lines according the procedures defined in (337). For M1, the highest similarity value was only 42.5%; therefore, no analogs were tested. The table 6.13 shows the antiproliferative effect of the different compounds in sixty human cancer cell lines (which represents leukemia, melanoma and cancers of the lung, colon, brain, ovary, breast, prostate, and kidney) of the NCI (NCI-60 set), that it were selected with respect to the availability of expression data as well as data of changes in biological activity by compound treatment. Only cell lines with positive results on the growth of the cancer cell lines are showed in the figure below. The Human tumor cell lines are listed here: Leukemia: CCRF-CEM, HL-60(TB), K-562, MOLT-4, RPMI-8226. Non-Small Cell Lung: A549/ATCC, EKVX, HOP-62, HOP-92, NCI-H226, NCI- H23, NCI-H322M, NCI-H460, NCI-H522. Colon : HCC-2998, HCT-116, HCT-15, HT29, KM12, SW-620. Central Nervous System: SF-268, SF-295, SF-539, SNB-19, SNB-75, U251 Melanoma: LOX IMVI, MALME-3M, M14, MDA-MB-435, MDA-N, SK-MEL-2, SK-MEL-28, SK-MEL-5, UACC-257, UACC-62. Ovarian: IGROV1, OVCAR-3, OVCAR-4, OVCAR-5, OVCAR-8, NCI/ADR-RES, SK-OV-3. Renal: 786-0, A498, ACHN, CAKI-1, RXF 393, SN12C, TK-10, UO-31. Prostate: PC-3, DU-145. Breast: MCF7, MDA-MB-231/ATCC, HS 578T, BT-549.

123

Table 6.13. Antiproliferative effect of the different compounds similar to L13 in human cancer cell lines. L13 Structure NCI Cancer Screen

Name/id Similarity NSC 662250 73.6 %

Name/id Similarity NSC 662251 73.1%

A

Name/id Similarity NSC 662424 71.5% A Not all GI 50 data are shown. Bars represent the concentration required for inhibit the 50% of the growth of the cancer cell lines. GI 50 it is expressed in mol/L.

L13 Structure NCI Cancer Screen

A

Name/id Similarity NSC 670685 63.8%

Name/id Similarity NSC 263625 63.2%

Name/id Similarity NSC 402471 62.6% A Not all GI 50 data are shown. Bars represent the concentration required for inhibit the 50% of the growth of the cancer cell lines. GI 50 it is expressed in mol/L.

Chapter 6 .

6.4 Brief commentaries concerning to the Detection of Activity Cliffs and Removal of Compounds Inducing SAR Discontinuity. Despite almost limitless availability of molecular descriptors and the increasing variety and efficiency of machine learning techniques available for implementing QSAR models, their predictive capability is still limited. Significant inaccurate predictions of activity still arise among similar molecules even in cases where overall predictivity is high. Unfortunately, this observation made by Gerald M. Maggiora in 2006 (338) still holds and probably will in the near future. As noted in his editorial (338), the reason why QSAR often disappoints is essentially related to the nature of the underlying structure-activity relationship (SAR). That is, the main assumption of QSAR and similarity-based approaches is SAR continuity, where the structure-activity landscape looks like gently rolling hills and so, gradual changes in structure should necessarily leads to gradual changes in activity. However, systematic quantitative profiling of many different sets of active compounds has shown that the majority of global SARs are heterogeneous in nature (339), that is their activity landscapes contain both gently sloped regions but also sharp cliffs. The presence of SAR continuity provides a fundamental basis for QSAR analysis and resulting compound activity predictions (316, 340, 341), while the presence of SAR discontinuity falls outside the applicability domain of the QSAR paradigm (341- 343). To accept and keep in mind this fundamental limitation is the usual behaviour among those practicing QSAR best practices.(309) However, little work has been devoted to alleviate it, and for the best of our knowledge no one has been directed to reduce SAR discontinuity on a dataset and consequently to restore as much as possible the fundamental principle of QSAR and similarity-based methods (340, 344- 346). While at the beginnings statistical learning methods played a protagonist role in QSAR development, at present machine learning algorithms are the most extended tools in chemoinformatics applications. The two general purposes for which machine learning is used in chemoinformatics are classification and generalization of data, where machine learning is used to extract regularity from data. Machine learning uses SAR knowledge to guide the process in favour of producing classifications and generalizations that are conceptually meaningful (347).

126

Chapter 6 .

Accordingly, special data cases such as structure-activity cliffs represent exceptions for the regular trend encoded in the whole data; or even worst, they represent examples that contradict that regular trend. So, if the classification mechanism in machine learning is understood as a function that maps a description (chemical structure encoded by molecular descriptors) of an example to the label (active/inactive) of the exampl e’s class, the counterproductive influence of pair of molecules that form activity cliffs is evident, which points to a remedial (or at least palliative) solution coming from its removal (348). The essence of the solution proposed here it is to remove from the training process those problematic compounds responsible of the SAR discontinuity, and consequently restore the SAR continuity required for deriving reliable and predictive QSAR models. The main assumption of this solution is that a machine learning algorithm that learns from a training set free of the noise induced by these problematic examples should produce a model able to identify the structural and/or physicochemical patterns determining the desired activity in a sharper way than learning from a training set including those problematic examples. However, the question that remains is to what extent is affected the learning process and so, the generalization ability of the pattern found by the loss of the information encoded in the activity cliff pairs. In fact, according to Prof. Maggiora “ some of the outliers may, in fact, be activity cliffs. Thus, removing such points would severely prejudice model’s predictive capa bilities ” (349). As can be noted (and expected), the very nature of activity cliffs hinders any possible consensus on how to deal with its negative influence on QSAR modelling. Even more discouraging, Prof. Maggiora concluded in his editorial (338) (referring to activity cliffs and other problems inherent to the QSAR approach) that “ addressing all of these problems is a daunting task at best, and it may not be possible to treat some of them in any substantive way ”. So, all we can do it is to try a procedure that we hypothesize can do more good than harm but mandatorily checking the ability of a model trained under this conditions to correctly predict activity cliffs members .

Conclusions

The database curation is a useful and necessary process before performing QSAR models. The inclusion of this procedure provides the possibility of improved models,

127

Chapter 6 . robust with high predictivity. At the same time, the use of consensus model in order to do virtual screening contributes to an increased effectiveness in the process of identifying new candidates. The use of non-linear techniques, implemented in Weka is another aspect that helps to obtain better predictions. Besides it is possible an increment of the speed in the process of virtual screening respect when are used linear techniques. The combined use of QSAR modeling and molecular docking helps reduce the number of compounds to be studied, and provides a greater chance of success in selection of the candidates for the experimental assessment. We identified 27 compounds using virtual screening. Two of these molecules showed interesting stabilization profiles. These two ligands are capable to stabilize the human telomeric sequence, and other G-quadruplex forming sequences. such as c- myc and kit. This interaction was confirmed by FRET, UV absorbance and fluorescence spectroscopy. Control experiments demonstrated that these ligands only weakly interact with DNA duplexes, providing evidence that our compounds are selective for quadruplex structures. The unusual absorbance properties of M1 prevented us from performing fluorescence measurements. Nevertheless, we will consider this compound for future applications as this may be a desirable trait for a variety of applications such as phototherapy.

128

Chapter 7. .

Chapter 7. Effects of L13 and M1 on G-quadruplex structures that are hallmarks of cancer cells.

7.1 Article 5. “Hallmarks of cancer, quadruplexes and antitumor properties. Speculation, circumstantial evidence or possibility?” (In progress) Daimel Castillo-González, Gisselle Pérez-Machado, Maykel Cruz-Monteagudo Jean-Louis Mergny.

Introduction The G-quadruplex structure is the most extensively studied unusual nucleic acid conformation, since its discovery in telomeric extreme in the eighties (1-3). The nucleus of this motif was proposed by Gellert and Davies in 1962 (5). The structure consists of guanine tetrads (G4), vertically stacked with a cation located in the center. In each tetrad, four guanines take up the corners and form a plane that is joined by Hoogsteen bonds. Today it is well known that the different quadruplex topologies depend on the stoichiometry, polarity, geometry, conformation of the sugar and the nature of the cation (6). This generates a huge versatility of therapeutic targets for different proteins which bind or interact with DNA, such as helicases, gyrases, etc. The interest in the study of guanine quadruplexes began when it was associated with the formation of G4 structures at the chromosomal extremities, and it was related to the indirect inhibition of the telomerase enzyme (71, 76, 350-353). The telomerase enzyme is active in more than 85% of all cancer cell lines and it non-active or poorly expressed in the rest of somatic cells. The hypothesis of the inhibition of telomerase enzyme for the formation of G4 structure was considered a specific target in anticancer therapy. Many stabilizers of G4, most of the hitherto known were developed following this paradigm (Figure. I.2). While telomeric quadruplexes are the most explored targets (34-36), a number of other G4-prone sequences are present in the human genome and may regulate key biological processes. More of the 40% of human gene promoters contain one or more guanine rich motifs, especially oncogenes. Consequently the G4 elements can act like molecular switches jointly with the DNA binding proteins to control transcription and response to change in chromatin structure, or signalization mechanism. The presence

129

Chapter 7. . of guanine tetrads in regulating regions and oncogene promoters such as c-myc, c- myb, c-fos, c-kit , KRAS, VEGF, PDGF-A, Rb, RET, Hif1-α, c-ABL and hTERT offer the possibility to have other targets for antitumor intervention that not yet be widely exploited (37-41). For this reason, the development of ligands that stabilize G4 sequences in regulating regions and oncogene promoters is of a promising and novel therapeutic value. In this context we consider it interesting to evaluate the stabilization of guanine quadruplexes as a multi target approach for antitumor treatment. The possibility of formation of stable G-quadruplex in different regions of several oncogene promoters was demonstrated (354). More importantly, altered expressions of these oncogenes have been recognized as hallmarks of cancer (355). Hanahan and Weinberg have proposed six alterations and processes that are aberrantly regulated during the malignization and oncogenic transformation (8). These process involved self-sufficiency for growth signals, insensitivity to antigrowth signals, evasion of apoptosis, continual angiogenesis, limitless replicative potential, tissue invasion and metastasis. After a detailed revision some genes can be found with a G-quadruplex in the core or proximal promoter (Figure. 7.1) (355). The sequences of c-Myc, c-kit and K-ras could be associated with self-sufficiency; pRb with insensitivity; Bcl-2 with evasion of apoptosis; VEGF-A with angiogenesis; hTERT with limitless replication; and PDGF-A with metastasis (355). If we could prove that our ligands are capable to interact with all or most of these “hallmark sequences” rich in guanines, by formation of stable guanine quadruplexes, this would sustain our hypothesis that these compounds may have interesting antitumoral potential. To confirm the effect of our compounds in the formation of G- quadruplex structures, in the sequence that are present in the oncogene promoters described before, we will employ some of the techniques described in this work.

130

Chapter 7. .

Figure 7.1. G-quadruplexes formed in different region considered hallmarker of cancer . Figure taken from from (355).

Materials and methods

Compounds M1 and L13 were purchased from ASINEX and LifeChemicals. Stock solutions of both were made at initial concentration of 10 mM in DMSO, and were stored at - 20°C.

Oligonucleotides Oligodeoxynucleotide probes were synthesized by Eurogentec (Belgium). All concentrations are expressed in strand molarity and were determined using a nearest- neighbour approximation for the absorption coefficients of the unfolded species (320).

131

Chapter 7. .

Solutions were stored at -20°C. The composition in bases of each sequence is provided below (Table 7.1):

Table 7.1. Oligonucleotide sequences used in this study .

Name Sequence composition in bases (5' => 3') C-Myc TTGAGGGTGGGTAGGGTGGGTAA C-kit1 GGGAGGGCGCTGGGAGGAGGG C-kit2 GGGCGGGCGCGAGGGAGGGG 35B1K-ras AGGGCGGTGTGGGAAGAGGGAAGAGGGGGAGGCAG BCL-2 GGGCGCGGGAGGGAATTGGGCGGG G4hTert GGGGAGGGGCTGGGAGGGC pRb(ODN1) CGGGGGGTTTTGGGCGGC PDGF-A Pu48 GGAGGCGGGGGGGGGGGGGCGGGGGCGGGGGCGGGGGAGGGGCGCGGC VEGF GGGAGGGTTGGGGTGGG

UV spectroscopy studies UV spectroscopy could be very useful for the study of nucleic acids and their interactions with ligands (356-359). In our experiments a spectrophotometer (Uvikon XS) was used to obtain the measurements . Experiments were performed at 10 µM of concentration of M1 or L13, alone and in the presence of oligonucleotides sequences mentioned before. Thermal denaturation experiments were performed in a solution containing 10 mM lithium cacodylate, pH = 7.2, in presence of 100 mM of potassium chloride. Sequences were heated at 90 °C during 5 minutes and then, slowly renatured by returning to room temperature in 2h. First, the absorbance spectrum of each compound was recorded. Later, the spectra of each oligonucleotide sequence were recorded, and finally ligand and oligonucleotide were mixed at a concentration (1:1 equivalent) and new spectra were recorded.

Fluorescence enhancement for L13 Fluorescence spectroscopy is very useful for studying the interactions between nucleic acids and ligands. Many DNA-binding ligands are poorly or non-fluorescent in water solutions. Nevertheless, once tightly bound to a nucleic acid the ligand is in a hydrophobic environment and aggregation phenomenon or solvent quenching is no longer possible. Alternatively, some molecules are strongly emitting when free in

132

Chapter 7. . solution and binding to DNA leads to quenching. In both cases, fluorescence emission is a direct probe of the concentration of bound ligand (323). Oligonucleotides at 25µM were prepared in 10 mM lithium cacodylate (pH 7.2), 100 mM potassium chloride. They were heated at 90 °C for 5 minutes and then the temperature was decreased slowly and uniformly during 2 hours. The sequences used in this study are given in Table 7.1. Fluorescence measurements were done for an excitation wavelength of 466 nm and the emission spectra were recorded between 560 and 800 nm, at 20°C.

Fluorescence measurements Experiments were performed in 96-well microplates from Stratagene. Each condition was tested at least in triplicate. Measurements were performed at 25 °C in a volume of 25 µL. All sequences adopt the a G-quadruplex structure in KCl at this temperature. Two equivalents of 1 µM thiazole orange (TO) were added to a solution of 0.5 µM oligonucleotide. Samples were incubated in 10 mM lithium cacodylate buffer (pH 7.2) supplemented with 100 mM KCl. The final concentration of ligand under study was 50 µM. Fluorescence emission was collected at 516 nm with 8-fold gain after excitation at 492 nm in a Stratagene Mx3005P instrument. The wavelengths do not correspond to the maximum excitation and emission wavelengths of TO (501/534 nm), but rather to commonly available fluorescein filters (492/516 nm). This was not detrimental to the measurements, however, given that the bandwidths (± 5 nm) allowed a good excitation and recovery of the emission.

Fluorescent intercalator displacement assay (“G4 -FID”) All experiments were performed in 96-well microplates. Each condition was tested in duplicate in at least three independent experiments. The total volume of each test sample was 25 µL. Samples were incubated in 10 mM lithium cacodylate buffer (pH 7.2) supplemented with 100 mM KCl. The pre-folded DNA target (0.5 µM strand concentration for intramolecular G-quadruplex oligonucleotides and 1 µM strand concentration for duplexes) was mixed with 1 µM TO. Ligands were tested at 2.5, 12.5, and 25 µM. The fluorescence of samples was measured at 25 °C in a Stratagene Mx3005P instrument. The temperature was kept constant with a thermostated cell holder (Peltier). The TO was excited at 492 nm (± 5 nm) and the emission was

133

Chapter 7. . collected at 516 nm (± 5 nm) with 8-fold gain. In each assay plate, a number of wells were dedicated to controls and calibration including: 1. Buffer only (background fluorescence level = Fb) 2. Buffer + TO (should exhibit very low fluorescence, close to Fb, as no DNA is present and emission of free TO is very weak) 3. Buffer + ligand (should also exhibit very low fluorescence, close to Fb, unless ligand is naturally fluorescent) 4. Buffer + TO + ligand (should also be close to Fb unless ligand interacts with TO)

The truly informative wells corresponded to samples containing nucleic acids. Pre- folded DNA target (0.5 µM strand concentration for intramolecular G-quadruplexes) was mixed with 1 µM TO. The fluorescence of TO bound to DNA is referred to as

FI 0. The fluorescence of TO in nucleic acid wells upon addition of ligand is referred to as FI . The background fluorescence was subtracted from the measured fluorescence in DNA or DNA + ligand wells (FA = FI-Fb or FA0 = FI 0 -Fb). We then calculated the percentage of TO displacement by the formula previously described (360):

TO Displacement (%) = 100 - ((FA/FA 0) x 100) Eq. 1

134

Chapter 7. .

A diagram of the experiments is shown below:

H G F E D C B A

1 Tp Tp Tp+TO Tp+TO ligand ligand TO+lig TO+lig

2 C-Myc C-Myc C-Myc C-Myc C-Myc C-Myc C-Myc C-Myc

3 C-kit1 C-kit1 C-kit1 C-kit1 C-kit1 C-kit1 C-kit1 C-kit1

4 C-kit2 C-kit2 C-kit2 C-kit2 C-kit2 C-kit2 C-kit2 C-kit2

5 K-Ras K-Ras K-Ras K-Ras K-Ras K-Ras K-Ras K-Ras

6 BCL -2 BCL -2 BCL -2 BCL -2 BCL -2 BCL -2 BCL -2 BCL -2

7 VEGF VEGF VEGF VEGF VEGF VEGF VEGF VEGF

8 G4hTert G4hTert G4hTert G4hTert G4hTert G4hTert G4hTert G4hTert

9 pRb pRb pRb pRb pRb pRb pRb pRb

10 PDGF -A PDGF -A PDGF -A PDGF -A PDGF -A PDGF -A PDGF -A PDGF -A

0µM 2.5 µM 12.5 µM 25 µM

Lig Lig Lig Lig

15µL 10µL 5µL

H2O H2O H2O

Scheme 7.1. Plate diagram employed for the execution of G4-FID protocol . The diagram represents a 96 wells plate. The first row is the experiment control. Tp means the buffer control. TO is thiazole orange. Under the scheme it is represented the different concentrations of ligand used and the water volumes for each individual well in rows 2-10.

135

Chapter 7. .

Results and discussion

Compounds The M1 and L13 ligands (Figure. 7.2) were proviously identified using virtual screening on the Asinex and Life Chemical databases. The selection was carried out employing computational techniques (see previous chapter). The compounds are capable to interact and stabilize human telomeric sequence and other sequences capable of forming guanine quadruplexes. M1 and L13 are also selective for quadruplexes as compared to duplexes.

Figure 7.2. Chemical structures of ligands M1 and L13.

The compound L13 has an extended conjugation, favoring stacking interactions. Besides, the oxygen atom joined to the 1,3-benzothiazol-3-ium group can form Hydrogen bonds with the bases of the G-Quadruplex. Huge conjugation is presented in M1 that helps the π -π stacking between ligand and guanine bases. The unch arged nitrogen of M1 can form likewise hydrogen bonds.

UV spectroscopy studies In our previous work (chapter 6) we could verify that compound L13, has an absorption maximum at 453 nm (see UV spectrum, Figure. 6.13), with an intensity of log ε = 4.36. The signal is in correspondence with n Ѝπ* and ȧЍ π* transitions present in the aromatic conjugated system. A very simple spectroscopic method may be used to probe an interaction between the oligonucleotide sequence and the ligand. One can follow absorbance changes of the ligand when DNA is added. If an interaction occurs, the absorbance properties of the dye may change, often leading to an increase in wavelength absorption maxima (red shift).

136

Chapter 7. .

The larger shifts in the absorption maxima were observed for c-myc and K-ras sequences, with new values in the absorption peaks at 506 and 504 nm respectively. The PDGF-A, G4hTERT, C-Kit1, C-kit2, BCL-2 and VEGF sequences led to new absorbance maxima at wavelengths between 475 and 484 nm. On the other hand, the shifts for pRb sequence is smaller, just 7nm, compared with the other sequences under study. The smaller value of the shift, for pRb, combined with the evidence of the other experiments may suggest that this sequence is differently (or less strongly) recognized by our compounds.

137

Figure 7.3. Absorbance spectra for ligand L13 in presence of different G-quadruplex forming sequences .

Chapter 7. .

Interestingly, ligand M1 has a maximum of absorption occuring at very long wavelengths (around 1001 nm), with an intensity of log ε = 4.57 (Figure. 6.14). This absorbance peak therefore occurs in the near infrared region as a result of an extensive conjugation, these wavelengths are not optimal for most spectrophotometers. For this reason, we will consider that changes in the shape, intensity and maximum of the band, compared with the compound and the oligonucleotide along, as evidence of the interaction. All experiments presented below will be reproduced before publication. Compounds that have maximum of absorption at high wavelength values could be used in photodynamic therapy (PDT) provided that they can generate singlet oxygen. This is an emerging therapeutic modality that employs three basic components: the photosensitizer, light and tissue oxygen. After the excitation at the right wavelength, the photosensitizer molecules are able to transfer the absorbed photon energy to oxygen molecules, generating cytotoxic reactive species to kill cancer cells. The optimal wavelength range for PDT could be controversial, a report says that the use of wavelengths higher than 900 nm is not desirable (361). On the contrary, other reports about the use of PDT at higher wavelengths (700-1000nm) are available (362-365); in this range, biological tissues have the minimal light absorption, this makes these ligands the ideal choice for optical imaging and phototherapy. Phototherapy is very useful in oncology, treatment of cardiovascular, dermatological, ophthalmic and infectious diseases (361). This kind of compounds can also get applications in cancer diagnosis, intra-operative tumor detection, or for investigations of drug delivery and tumor physiology (361).

139

Figure 7.4. Absorbance spectra for ligand M1 in the presence of different G-quadruplex forming sequences.

Chapter 7. .

Enhancement of fluorescence for L13 at 466 nm of excitation In our case we used this technique to confirm that the ligand binds selectively to DNA, producing a change in fluorescence emission upon formation of a complex. Any change (increase, decrease or wavelength shift) may be used, but "light-up" probes, in which emission increases upon binding are especially attractive. That is qualitative evidence of the union between the species when it is too difficult to find a numerical value for the Kd. We only performed experiments for L13, because the absorption maximum of M1 appears at a wavelength that is incompatible with our experimental setup. Figure 7.5 shows the fluorescence spectra of L13 in the presence of different quadruplex sequences representative of various hallmark of cancer. Excitation was set at 466 nm. In each case we provide the emission spectrum of the compound alone (in green), the oligo alone (in blue) and mixture (in red). Previous studies proved an increment in the fluorescence value when oligo and ligand interact for sequences c-myc , c-kit1 and c-kit2 (chapter 6). L13 has a very low fluorescence quantum yield in aqueous solution (see emission spectrum in the Figure 7.5). Likewise - and rather unsurprisingly! - DNA oligonucleotides do not emit light when excited at this wavelength. These background emission levels are in clear contrast with the large increase in emission found when DNA is added to the ligand. Significant differences in fluorescence enhancement are found depending on DNA sequence. The fluorescence of 0.1 µM of ligand L13 has increased 140-fold when 1µM of 35B1K-ras sequence is added. Meanwhile sequences VEGF-A and PDGF show increases in fluorescence around 280-fold and 390-fold respectively. For G4hTert sequence this is a 440-fold increase and the highest value is reached for BCL2 where the addition of the ligand produces a 690-fold fluorescence increase with respect to the ligand alone. Rb is the G4 sequence that leads to the lowest enhancement of fluorescence with 21-fold. This increase is still highly significant and demonstrates that an interaction occurs. The increase in fluorescence intensity cannot be directly related to the affinity as fluorescence quantum yield of the dye may be different in each complex. Titrations will be employed to better characterize these affinities and conclude if the lower signal found for Rb reflects a lower affinity and/or a lower fluorescence quantum yield of the dye bound to this quadruplex. Another aspect to consider is the variation in the position of the emission maximum for oligo-ligand complexes.

141

Chapter 7. .

Figure 7.5. Fluorescence emission spectra for the different G4 sequences considered hallmark of cancer , for a wavelength of excitation at 466nm (A-F). Enhancement of fluorescence for the different sequences after the addition of L13 (G).

142

Chapter 7. .

Activity of drugs in the fluorescent intercalator displacement (FID) assay

Fluorescence enhancement of thiazole orange (TO) in presence of M1 and L13 was monitored in order to determine which could be used in the G-quadruplex Fluorescent Intercalator Displacement (G4-FID) assay. The G4-FID assay is based on the displacement of the “on/off” fluorescence probe, TO, from quadruplex or duplex DNA matrices by a ligand (360, 366-368). TO is virtually non-fluorescent when free in solution but strongly fluorescent when bound to DNA, thus the ligand-induced displacement leads to a decrease of the fluorescence that is monitored as function of the ligand concentration. Therefore, the quadruplex-affinity of a candidate compound can be evaluated through its ability to displace TO from quadruplex DNA. As shown in Figure 7.6, L13 and M1 displaced TO from structures formed by sequences found in c-myc, c-kit-1, c-kit2 , K-ras, BCL-2, G4hTert, PDGF-A, pRb and VEGF promoter regions at 25 equivalents of ligand (12.5 µM). A very low displacement is appreciated for the c-kit1 and bcl2 sequences when 5 equivalents of M1 ligand are added (displacement of TO values lower than 20%). No displacement of TO from the pRb sequence is appreciated in the interaction with 2.5 µM of M1. These observations suggest that the ligands bind to G4 and may also induce the formation of the G-quadruplex structures. The higher values of M1 displacement than L13 are in line with a greater affinity of M1 for G4 structures. Quantitative methods (SPR, ITC, Titrations and/or mass spectrometry) will be required to provide affinity constants for these interactions. The positive control used in FID experiments was the 360A ligand. This pyridine- dicarboxamide derivative shows a strong selectivity for G4 structures. The antiproliferative effects of 360-A at very low concentration in SAOS-2 and in different glioma cell lines was also proven. Finally the investigations on 360A conclude that it is a promising agent for the treatment of various tumors including malignant gliomas (369-373). The fact that 360A stabilizes strongly all sequences considered hallmarker of cancer with the formation of G-quadruplex structures and proved to have antitumor activity, in an optimistic scene, could be considered circumstantial evidence about the possible antitumor properties of our compounds.

143

Chapter 7. .

<0 0 20 40 60 80 100 >100

% of TO displacement M1 ( µM) L13 ( µM) 360A 2,5 12,5 25 2,5 12,5 25 2,5 C-Myc C-kit1 C-kit2 35B1K-ras BCL-2 VEGF G4hTERT pRb PDGF-A

Figure 7.6. FID assays using oligonucleotides indicated on the left . All sequences form quadruplexes. The color chart representation allows the visual comparison of the percentage of TO displacement obtained with different test molecules at three concentrations.

Conclusions We demonstrated the interaction of the L13 and M1 ligands with a variety of quadruplex sequences considered as "hallmarks of cancer". The interaction was corroborated employing different and independent techniques. The experiments suggest that the sequence pRb(ODN1) CGGGGGGTTTTGGGCGGC is the less likely to interact with our compounds. This interaction should encourage us to test the antiproliferative activity of these compounds. The unusual properties of M1 should encourage us to investigate its spectroscopic properties in more details. A complete biological and antitumor evaluation is required to corroborate the activity.

144

Chapter 8. .

Chapter 8. General discussion of this research. Conclusions and perspectives.

Introduction The interest in G-quadruplex structures and the search for candidates that stabilize this structure started with research related to the stabilization of telomeric DNA and the indirect inhibition of the telomerase enzyme (151-153, 168, 169, 240, 374-376). The possibility that these structures may exist in regulatory regions of the genome sparked interest in their study. The first study that employed QSAR methodology to predict tel inhibition of telomerase by a ligand ( IC 50 ) was reported by Cabrera et al in 2009 (377). This was one of the first manuscripts to propose a mathematical model capable of predicting whether a particular molecule would have the desired activity. tel The IC 50 values for compounds in the dataset described in this chapter were obtained using the same protocol (35, 36, 241, 254). The Organisation for Economic Co-operation and Development (OECD) suggests that data to be used in building QSAR models should be obtained under the same conditions, ideally in the same laboratory (378). Congeneric data sets of compounds are relatively easy to model following the principle that similar compounds should have similar activities (one of the foundations on which SAR is based). The building of mathematical models that include acridines was very interesting. In the presence of an acridinic core, common for all compounds, the side chains play a fundamental role in specificity and affinity. Mono-substituted acridines are less active than the di-substituted and the latter are less active than trisubstituted. For this reason, in the cluster analysis, the distances that separate the compounds are largely a function of the lateral chains. There is not a proportional relationship between concentrations required to inhibit the TRAP assay and concentrations required to promote G-quadruplex structures. In addition, there is not a straightforward relationship between the ligand concentration that inhibits TRAP and the ligand binding constant for the telomeric DNA quadruplex. In the work we presented in Chapter 4, we related the two processes so that inhibition was a consequence of the formation of a stable G-quadruplex structure. In this way we were able to suggest how certain structural factors were to the TRAP activity.

145

Chapter 8. .

8.1 Predicitivity of anti-telomerase activity of a congeneric set For most congeneric sets of compounds, including the acridines (248, 377), berberine derivatives (379), cathecolic flavonoids (380)), and triazines (381), the predictive power of the models is generally high. When assessing QSAR models comprising congeneric series in the data set the resulting percentages of good classification and other statistical parameters are usually higher than the ones resulting from models where the data set is non-congeneric, thus, more structurally diverse. This may be due to the fact that similar structures have similar values of activity. A more important factor may be that the external prediction sets were extracted from scientific papers in which only compounds with activity in the TRAP assay are described. It is common to find only positive reports of activity, because it is what the scientific community expects and wants to publish; however, for QSAR to be successful negative results are also needed. If compounds with a wide range of activities including compounds with no activity were available, information about functional groups or fragments that were damaging to the activity could be taken into account. The publication of such negative results would also make scientific publications more trustworthy. Usually journals publish only positive results, and often researchers present data in ways that cannot be confirmed when the experiments are repeated in other laboratories. Research undertaken by the pharmaceutical giant Bayer HealthCare to validate academic studies that identified promising targets for new drugs showed that nearly two thirds of the results obtained at Bayer did not coincide with those reported in scientific articles (382). Despite these shortcomings, our strategy did identify some novel G-quadruplex ligands (32), although the changes in thermal stability of G-quadruplex structures (ΔTm values) were not very high. This was also the case for the virtual screening carried out on a small database (less than 2500 commercial drugs). In our work, all the predictions are done using QSAR models based on data collected from different publications stating that such ligands inhibit the telomerase by stabilizing the G-quadruplex formation. However, only experimental results from inhibition of telomerase are provided in these papers. Often, the authors provide information on stabilization of a G-quadruplex structure or report the value of Kd for one or a few compounds. Data for the rest of the ligands in the publication are assumed by extrapolation without concrete experimental evidence. These generalizations can lead to disastrous results when identifying new compounds using computational methods. Therefore we recommend preparation of a

146

Chapter 8. . database of ligands that encompasses only compounds with experimentally demonstrated activity as G-quadruplex agents. It is also necessary that the experimental conditions for the collection of the data are the same or similar.

8.2 Novel structures identified using QSAR methodologies It is often questioned whether QSAR methodologies result in identification of compounds that are novel. One of the assumptions underpinning medicinal chemistry is that similar compounds have similar activities. Some applications such as TOPSMODE consider the biological activity as a contribution to the system of all fragments (383, 384). Novel compounds may have a similar core or similar functional groups to those known to be effective. To what extent did the computer models developed by us recognize potentially useful structures that can be considered innovative? Firstly it is important to generalize the features that have been defined based on experimental considerations as important for stabilizing G-quadruplex structure. It has been established that the G4 compounds are often characterized by a large aromatic moiety that can stack onto a G-tetrad (although, the study and discovery of non-planar ligands is a very interesting area of research nowadays), with the presence of electronegative atoms near the center and the ends of the structure. In addition it is necessary that the compounds have donor and acceptor atoms for the formation of Hydrogen bonds (164). Besides, the ligands should not bind efficiently to other structures of DNA, in particular to the double-helical nucleic acid, to avoid toxicity effects (173). Other common characteristics relate to the lengths of the side chains, and preferred substituent groups have been also described (164). The central plane and the extended aromatic conjugated system are needed to potentiate π-π stacking interactions with G-tetrads. The presence of electronegative groups that at physiological pH can support a positive charge results in electrostatic interactions, either with the center of the G-quartet or the negatively charged phosphates in the DNA grooves. Now that shared feature of G-quadruplex ligands have been summarized, we can attempt to answer the question: What parameters can be used to assert that a compound is different from other reported ligands? Visual inspection could be a first approximation. One acridine is similar to another acridine, and an anthraquinone can be similar to an acridone if the substituents are similar. From visual inspection of the compounds shown in Figure 8.1 we can get an idea of how complex it can be to make

147

Chapter 8. . assertions about the novelty of a compound. All these compounds were found by traditional synthesis and test and error methods.

Figure 8.1. Chemical structure of some G-quadruplex stabilizers.

Novelty can be established with the help of the SARANEA software (385). SARANEA is an open-source Java application for interactive exploration of structure- activity relationship (SAR) and structure-selectivity relationship (SSR) information. SARANEA integrates various SAR and SSR analysis functions and utilizes a network- like similarity graph data structure for visualization (http://www.limes.uni- bonn.de/forschung/abteilungen/Bajorath/labwebsite/downloads/saranea/view). This program can be used to make similarity analyses based on fingerprints that can be defined by the user. We analyzed similarity with fingerprints that contains data concerning the aromatic core (π-π interactions) and descriptors based on rings, the number of hydrogen atoms present in the molecules, and the number of atoms able to form hydrogen bond interactions. We also included heteroatoms and specific electronegative atoms (O, N).

148

Chapter 8. .

The similarity map shown in Figure 8.2 does not contain all G4 agents that have been published in the literature, because of differences in the experimental conditions used to measure stabilization values. For making the similarity maps, we employed a group of ligands for which ΔTm was obtained under the same or similar conditions. These conditions were the ionic strength, presence of KCl 10mM, buffer with pH between 7.2 and 7.4, 1 µM ligand, and an oligonucleotide to ligand ration of 1:5. Two molecules are connected if their molecular similarity, calculated as the Tanimoto coefficient (see below) of the two fingerprints, is above a certain similarity threshold. Edge shading reflects the respective similarity value. Dark edges connect highly similar compounds, whereas light edges connect pairs of molecules with similarity close to the threshold. The Tanimoto coefficient for two molecules A and B is calculated as (385):

(Eq 8.1) ௖ ܶܿ ൌ ௔ା௕ି௖ where a and b are the number of features present in molecules A and B, respectively, and c is the number of features present in both molecules. Nodes are colored according to compound potency or selectivity (in the case that different targets are used). Green color means weakly and red means highly potent.

When an activity is described usually the more potent compounds present lower IC 50 values. In our case, we used ΔTm as an indicator of potency, for this reason the color scale is reversed; the most active compounds are represented by green points and ligands with low values of stabilization are red. The nodes are scaled according to their local discontinuity score: large nodes represent activity/selectivity cliff markers, whereas small nodes represent compounds within a smooth activity/selectivity landscape. The similarity threshold is a lower limit for the similarity of two data records that belong to the same cluster. For example, if the similarity threshold is equal to 0.25, the data records with values that are similar to 25% will reside in the same cluster.

149

Chapter 8. .

Figure 8.2. Mapping of similarity at similarity threshold values 75, 85 and 95% respectively.

Figure 8.2 shows the similarity map at the pair wise similarity percentage predefined threshold values 95, 85, and 75 respectively for the G-quadruplex ligands whose data was available. Even at 75% similarity, ligands are not highly similar. For a similarity value of 65% (Figure 8.3), there are clusters for certain groups of compounds. One of the ligands identified by us in this study, prochlorperazine, is similar to the group of indolquinolines by our fingerprint criteria. But the connection of similarity between these compounds is established by light edges, which means that the threshold similarity is close to 0.65; higher values would be reflected with a darker joining line. In the representation, color is important. The most powerful stabilizer identified here is M1; loxapine and promazine are the least powerful. The green color for M1 representation puts it, with an activity profile similar to other reported powerful candidates. This can be a signal of the need of deeper studies about this compound. The size of the nodes is associated with the possibility of finding cliffs based on the ratio between activity and selectivity. The majority of compounds show a gentle activity profile, with the exception of prochlorperazine. Prochlorperazine has low activity value but is connected with ligands with superior activity. This is a negative cliff. Activity cliffs in potency are found between structurally similar compounds with significant differences in potency (386). Other indices within the map with 65% of similarity should also be noted. One is a discontinuity value of 0.026, which indicates a low number of cliffs in the modeled data. Another index is the high value of continuity of 0.990 indicating that many compounds are structurally different, but have a similar relationship between activity and selectivity.

150

Chapter 8. .

The continuity score is calculated as the potency-weighted mean of the reciprocal similarity values of a dataset (385):

ሼ ȁ ሽ ݓ݄݁݅݃ݐሺ݅ǡ݆ሻ ή ͳ σ ሺ௜ǡ௝ሻ ௜ஷ௝ ቎ ൘ ቏ ௪௘௜௚௛௧௘ௗ௠௘௔௡ ൫ͳ ൅ ݏ݅݉ ሺ݅ǡ ݆ሻ൯ ͳ ݋݊ݐ ௥௔௪ ൌ ൤ ൨ ൌܿ  ሺ݅ǡ ݆ሻ σሼሺ௜ǡ௝ሻȁ௜ஷ௝ሽ ݓ݁݅݃ݐ݄ሺ݅ǡ ݆ሻ݉݅ݏ ሼሺ௜ǡ௝ሻȁ௜ஷ௝ሽ ͳ ൅

(Eq 8.2) ௣௢௧ ೔ή௣௢௧ ೕ ݓ݁݅݃ݐ݄ሺ݅ǡ ݆ሻ ൌ ଵାห௣௢௧ ೔ି௣௢௧ ೕห Here, sim( i, j ) is the pairwise chemical similarity between compounds i and j and poti and potj give their respective log-scale potency values. Accordingly, dissimilar compounds with comparable and high potency will contribute most to the score. On the contrary, the discontinuity scores capture the great differences in potency between very similar compounds. It accounts for activity cliffs in the activity landscape and is calculated as the mean potency difference of all compound pairs above a predefined similarity and potency difference threshold:

(Eq 8.3) ௠௘௔௡ ௥௔௪ ሺ௜ǡ௝ሻȁ௦௜௠ ሺ௜ǡ௝ሻவ௧௛௥௘௦௛௢௟ௗǡ ௜ ௝ ሺ݅ǡ ݆ሻሻ ݉݅ݏ ൌቊ ቋ ሺห݌݋ݐ െ ݌݋ݐ ห ή ܿݏ݅݀ ȁ௣௢௧ ೔ି௣௢௧ ೕหவଵ

151

Figure 8.3. Mapping of similarity at similarity threshold values 65%.

Chapter 8. .

Table 8.1 shows the similarity based on the Tanimoto coefficient between the compounds identified by us and the compounds used for calibration of the docking protocol. The greatest similarity observed was 45% and is for prochlorperazine with respect to Braco 19. Prochlorperazine is an acridine analog that has a chlorine substituent and a side chain with a length of three methyl groups joined to a methylpiperazine group. The structural difference between these compounds is more than 55%. If we performed the analysis based on different descriptors such as those employed in the work discussed in Chapter 6, the separations between the compounds may be enhanced.

Table 8.1. The similarities of the compounds.

Tanimoto similarity Braco19 360A Telomestatin Piper TMPyP4 L13 0.36 0.36 0.35 0.33 0.25 M1 0.43 0.26 0.38 0.36 0.25 Amytriptiline 0.35 0.30 0.30 0.37 0.28 Chlorpromazine 0.32 0.31 0.36 0.24 0.24 Impipramine 0.37 0.31 0.31 0.29 0.24 Loxapine 0.25 0.38 0.28 0.21 0.21 Prochloperazine 0.45 0.35 0.34 0.42 0.37 Promazine 0.29 0.22 0.37 0.35 0.30

Let's now analyze the similarities between the compounds from the electronic environment point of view. The UV-Vis spectroscopy has been used for describing the electronic transitions from a the ground level to an excited state in a compound (387). Depending on the electronic nature of the compounds it is possible to establish certain interactions. Thus, the signals registered in the UV-Vis spectrum can be considered a description of the electronic structure of the ligand. Despite this application, the UV-Vis spectroscopy is not used nowadays in the elucidation of molecular structures since has been pushed into the background by other more efficient techniques like NMR and mass spectrometry (388). Nevertheless some parameters derived from UV-Vis spectroscopy, such as the maximum absorption intensity and band shape are characteristics for each ligand, provided that the spectrum is recorded in the same conditions of charge of the molecule,

153

Chapter 8. . pH, solvent, concentration and temperature. These parameters do not enable absolute identification of an unknown ligand but they frequently are used to confirm the identity of a substance through comparison of the measured spectrum with a reference spectrum (388). We can then, try to use UV-Vis aspects (shape, position and intensity band´s) in the identification of similar signals in order to identify compounds with similar electronic transition characteristics (the UV bands have implicit electronic, vibrational and rotational transitions). In other words, to estimate the similarity between two compounds like a function of the electronic transitions that are produced in the molecule. Compound M1 has an absorption maximum at a wavelength close to 1000 nm, which places its absorption maximum in the near infrared region of the electromagnetic spectrum. This makes this compound entirely novel among those identified. We have not found any other report about G4 ligands with an absorption maximum at such long wavelengths. Porphyrins also have unique absorption bands. One of them appears around 400 nm and is called the Soret band. Other interesting bands in porphyrins are the satellite absorption bands (Q band) that appear at a wavelength between 600 and 800 nm (361). But in reports concerning to porphyrins we have not seen any bands to wavelengths closer to 1000 nm. This discussion of similarity indicates that QSAR methodologies allow identification of compounds that are novel from a structural viewpoint. When a certain procedure is followed, such as the one described in Chapter 6 where QSAR and docking methodologies are combined to cover a significant amount of chemical space, discovery of novel ligands can be relatively straightforward. Lastly we want to note that biological activities of certain very similar compounds, such as stereoisomers, may have completely different activity profiles. An example is thalidomide (Figure 8.4): the R isomer produces a desirable sedative effect, whereas the S isomer produces teratogenic effects (389, 390). For this reason we consider that regardless of the structural novelty, the discovery of the compound with desirable biological properties gives value to the discovery.

154

Chapter 8. .

Figure 8.4. Chemical structure of R and S thalidomide.

155

Chapter 8. .

General conclusions Computational methodologies can be used to identify new compounds with a desired profile of activity. Ours is the first application of these methods to the discovery of G4-active compounds potentially active on telomerase. Computational methodologies have been widely employed to explain binding modes and in SAR but have not been broadly employed in the discovery of new compounds. QSAR modeling of congeneric datasets of inhibitors of telomerase that stabilize G-quadruplex structures accurately predicted activity in analogs of these compounds. QSAR allowed identification of molecular descriptors that were related directly with the activity, helpful in the rational design of new ligands. The LDA models and consensus strategies we developed allowed us to identify compounds with desired profiles of activity from drugs in a database of FDA- approved drugs. The Quadruplex stabilization values obtained were not substantially high because the modeling property was the inhibition of telomerase enzyme through G-quadruplex stabilization. These drugs could perhaps find a second therapeutic indication as anti-cancer agents. The amitriptyline and imipramine cores are not strictly planar, because the central rings of seven members are not aromatic. This central structure offers a new nucleus for the developing of stabilizers of the G-quadruplexes. A rational drug design, based on the incorporation of side chains could lead to new, more powerful G-quadruplex ligands. The curation of datasets, building of computational models with the use of non- linear techniques, and docking allowed the virtual screening of large databases of compounds with the identification of ligands with good G-quadruplex stabilization profiles. The complete process has low cost and can be done in a short time. Ligands identified are structurally novel G-quadruplex binding agents. The compounds identified in this study showed selectivity for quadruplex structures over duplex structures. Two of the compounds identified in this study stabilize G-quadruplex structures that are present in oncogene promoter sequences that are likely involved in cancer

156

Chapter 8. .

progression. A multi-targeted approach using G-quadruplex stabilizers to target both telomeres and oncogenic promoters should be considered for the treatment of cancer. Although there is no direct evidence for the anticancer activity of our ligands, literature evidence suggests that the commercial drugs identified by us have potential antiproliferative activity. This supports our hypothesis that, it is possible to identify new compounds with this activity by applying computational methodologies.

157

Chapter 8. .

Recommendations, perspectives, and future directions The compounds identified in this work must be evaluated in relevant biological assays, including telomerase inhibition assays (avoiding TRAP artefacts!), cell- based cytotoxicity assays, and xenograft assays. It is necessary to build a specific database of G-quadruplex stabilizing agents containing biological properties measured in similar experimental conditions. This will allow more reliable models with higher predictive ability to be created. Evidence for a reduction in cancer risk in those suffering from schizophrenia with possible relationship to anti-neoplastic effects of antipsychotic medications suggests that pharmacoepidemiologic studies should be performed to determine the prevalence of cancer in patients exposed long-term to these compounds. It is necessary to perform more in-depth studies to establish direct relationships between sequences with potential to form G-quadruplex structures and cancer, the stabilization of these potential structures by specific compounds, and the properties of these compounds as anticancer agents. The possible use of M1 in phototherapy against cancer and other pathologies should be explored.

158

List of publications related to the thesis.

FDA-approved drugs selected using virtual screening bind specifically to G- quadruplex DNA. Curr Pharm Des, 19(12), 2164-2173. Castillo-Gonzalez, D., Perez-Machado, G., Guedin, A., Mergny, J. L., & Cabrera-Perez, M. A. (2013). Computational tools in the discovery of new G-quadruplex ligands with potential anticancer activity. Curr Top Med Chem, 12(24), 2843-2856. Castillo-Gonzalez, D., Perez-Machado, G., Pallardo, F., Garrigues-Pelufo, T. M., & Cabrera-Perez, M. A. (2012). Prediction of telomerase inhibitory activity for acridinic derivatives based on chemical structure. Eur J Med Chem, 44(12), 4826-4840. Castillo- Gonzalez, D., Cabrera-Perez, M. A., Perez-Gonzalez, M., Morales Helguera, A., & Duran-Martinez, A. (2009).

Telomerase Inhibitory Activity of Acridinic Derivatives: A 3D-QSAR Approach. QSAR & Combinatorial Science, 28(5), 526-536. Cabrera-Pérez, M. A., Castillo-González, D., Pérez-González, M., & Durán-Martínez , A. (2009).

159

Poster in international conferences.

8th Seminars of Advanced Studies on Molecular Design and Bioinformatics SEADIMB VIII. June 2011, Havana-Varadero. Cuba Presented: Neuroleptics as G-quadruplex agents? Employ of QSAR and experimental methodologies for the discovery of new compounds. 18th Euro QSAR, 19-24 September 2010, Rhodes, Greece. Presented: A general QSAR methodology to predict telomerase inhibition by G-quadruplex stabilization. Presented: Prediction of telomerase inhibitory activity for triazine derivatives based on chemical structure. IV International Symposium on chemistry. June.2010, Santa Clara, Cuba. Presented: Employ of cromones for the prediction of antitelomerase activity by G-quadruplex stabilization. A QSAR approach. .( Oral contribution ) Presented: Berberine derivatives as G-quadruplex stabilizers agents. A QSAR perspective. Biotecnology Habana 2009. Medical Applications of Biotechnology November 2009, Havana, Cuba. Presented: Telomerase inhibitors by G-quadruplex Stabilization: A QSAR approach. 7th Seminars of Advanced Studies on Molecular Design and Bioinformatics SEADIMB VII. August 2009, Havana-Varadero. Cuba. Presented: Telomerase inhibitors by G-quadruplex Stabilization: A QSAR approach. 19 Conference of Chemistry. Universidad de Oriente December 2008, Santiago de Cuba. Cuba. Presented: Autocorrelations 2D descriptors of DRAGON, their employment in the prediction of the inhibitory activity over telomerase by G-quadruplex stabilization.

160

Second International Workshop on Bioinformatics Cuba-Flanders IWOBI’ 2008. February, 2008. Santa Clara, Cuba. Presented: Antitelomerase activity by stabilization of G-quartet: A QSAR approach employing 2D descriptors. .( Oral contribution ) 11th International Electronic Conference on Synthetic Organic Chemistry (ECSOC-11) 1-30. November 2007. Presented: Telomerase inhibitory activity by stabilization of G-quartet: A QSAR approach using 2D autocorrelation descriptors. QUITEL 2007. September 2007, Havana. Cuba. Presented: Deteremination of antitelomerase activity by stabilization of G-quartet by the employ of RDF descriptors.(Oral contribution ) III International Symposium of Chemistry. June 2007, Santa Clara. Cuba. Presented: Evaluation of antitelomerase activity by stabilization of G- quadruplex structure in acridines by employ of 1D descriptor of DRAGON. II International conference Science technology by a sustainable development CYTDES2007. June 2007, Camaguey. Cuba. Presented: Study of antitelomerase activity by stabilization of G- Quadruplex Structure across QSAR studies.

161

References. .

References.

1. Harley CB. Telomerase and cancer therapeutics. Nature Reviews Cancer. 2008;8(3):167-79. 2. Tan JH, Gu LQ, Wu JY. Design of selective G-quadruplex ligands as potential anticancer agents. Mini Rev Med Chem. 2008;8(11):1163-78. 3. Corey DR. Telomeres and telomerase: from discovery to clinical trials. Chem Biol. 2009;16(12):1219-23. Epub 2010/01/13. 4. Felsenfeld G, Rich A. Studies on the formation of two- and three-stranded polyribonucleotides. Biochim Biophys Acta. 1957;26(3):457-68. Epub 1957/12/01. 5. Gellert M, Lipsett MN, Davies DR. Helix formation by guanylic acid. Proc Natl Acad Sci U S A. 1962;48:2013-8. Epub 1962/12/15. 6. Wong A, Wu G. Selective binding of monovalent cations to the stacking G-quartet structure formed by guanosine 5'-monophosphate: a solid-state NMR study. J Am Chem Soc. 2003;125(45):13895- 905. Epub 2003/11/06. 7. Organization WH. World Health Statistics 2012. France: World Health Organization, 2012. 8. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell. 2000;100:57-70. 9. Balasubramanian BN, Kadow JF, Kramer RA, Vyas DM. Chapter 15. Recent Developments in Cancer Cytotoxics. In: James AB, editor. Annual Reports in Medicinal Chemistry: Academic Press; 1998. p. 151-62. 10. Feng J, Funk WD, Wang SS, Weinrich SL, Avilion AA, Chiu CP, et al. The RNA component of human telomerase. Science. 1995;269(5228):1236-41. 11. Riou JF, Guittat L, Mailliet P, Laoui A, Renou E, Petitgenet O, et al. Cell senescence and telomere shortening induced by a new series of specific G-quadruplex DNA ligands. Proc Natl Acad Sci U S A. 2002;99(5):2672-7. 12. Allsopp RC, Vaziri H, Patterson C, Goldstein S, Younglai EV, Futcher AB, et al. Telomere length predicts replicative capacity of human fibroblasts. Proc Natl Acad Sci U S A. 1992;89(21):10114- 8. 13. Shay JW, Wright WE. Role of telomeres and telomerase in cancer. Seminars in cancer biology. 2011;21(6):349-53. Epub 2011/10/22. 14. Xu Y, He K, Goldkorn A. Telomerase targeted therapy in cancer and cancer stem cells. Clin Adv Hematol Oncol. 2011;9(6):442-55. Epub 2011/08/16. 15. Agrawal A, Dang S, Gabrani R. Recent patents on anti-telomerase cancer therapy. Recent patents on anti-cancer drug discovery. 2012;7(1):102-17. Epub 2011/08/23. 16. Sprouse AA, Steding CE, Herbert BS. Pharmaceutical regulation of telomerase and its clinical potential. Journal of cellular and molecular medicine. 2012;16(1):1-7. Epub 2011/10/07. 17. Koziel JE, Fox MJ, Steding CE, Sprouse AA, Herbert BS. Medical genetics and epigenetics of telomerase. Journal of cellular and molecular medicine. 2011;15(3):457-67. Epub 2011/02/18. 18. Pascolo E, Wenz C, Lingner J, Hauel N, Priepke H, Kauffmann I, et al. Mechanism of human telomerase inhibition by BIBR1532, a synthetic, non-nucleosidic drug candidate. J Biol Chem. 2002;277(18):15566-72. 19. Barma DK, Elayadi A, Falck JR, Corey DR. Inhibition of telomerase by BIBR 1532 and related analogues. Bioorg Med Chem Lett. 2003;13(7):1333-6. 20. El-Daly H, Kull M, Zimmermann S, Pantic M, Waller CF, Martens UM. Selective cytotoxicity and telomere damage in leukemia cells using the telomerase inhibitor BIBR1532. Blood. 2005;105(4):1742-9. 21. Mueller S, Hartmann U, Mayer F, Balabanov S, Hartmann JT, Brummendorf TH, et al. Targeting telomerase activity by BIBR1532 as a therapeutic approach in germ cell tumors. Invest New Drugs. 2007;25(6):519-24. 22. GRYAZNOV S, SCHULTZ RG, inventors; Geron Corporation (Menlo Park, CA) assignee. 2'- arabino-fluorooligonucleotide N3'->P5' phosphoramidates: their synthesis and use 2013. 23. ELLER MS, YAAR M, GILCHRIST BA, inventors; Trustees Of Boston University, assignee. METHOD TO INHIBIT CELL GROWTH USING OLIGONUCLEOTIDES 2012. 24. YANG Q, HARRIS CC, inventors; The United States of America, as represented by the Secretary, Department of Health & Human Services (Washington, DC), assignee. POT1 ALTERNATIVE SPLICING VARIANTS 2013.

162

References. .

25. BONDAREV IE, inventor; ALT Solutions, Inc. (Wilmington, DE) assignee. Modulation of telomere length in telomerase positive cells and cancer therapy 2012. 26. YAKU H, MIYOSHI D, inventors; Panasonic Corporation (Osaka, JP) assignee. Method for inhibiting telomerase reaction using an anionic phthalocyanine compound 2012. 27. FURET P, MCCARTHY C, SCHOEPFER J, STUTZ S, inventors; Novartis AG (Basel, CH) assignee. 3-HETEROARYLMETHYL-IMIDAZO[1,2-B]PYRIDAZIN-6-YL DERIVATIVES 2013. 28. MIURA N, inventor; National University Corporation Tottori University (Tottori, JP) assignee. hTERT GENE EXPRESSION REGULATORY GENE 2012. 29. Skordalakes E, inventor; The Wistar Institute (Philadelphia, PA) assignee. TRBD-binding effectors and methods for using the same to modulate telomerase activity. US2013. 30. SKORDALAKES E, inventor; The Wistar Institute (Philadelphia, PA) assignee. Method for Identifying a Compound That Modulates Telomerase Activity 2012. 31. Buseman CM, Wright WE, Shay JW. Is telomerase a viable target in cancer? Mutation research. 2012;730(1-2):90-7. Epub 2011/08/02. 32. Gonzalez DC, Machado GP, Guedin A, Mergny JL, Cabrera-Perez MA. FDA-approved drugs selected using virtual screening bind specifically to G-quadruplex DNA. Curr Pharm Des. 2012. Epub 2012/09/29. 33. Laronze-Cochard M, Kim YM, Brassart B, Riou JF, Laronze JY, Sapi J. Synthesis and biological evaluation of novel 4,5-bis(dialkylaminoalkyl)-substituted acridines as potent telomeric G-quadruplex ligands. Eur J Med Chem. 2009;44(10):3880-8. Epub 2009/05/27. 34. Gunaratnam M, Neidle S. An evaluation cascade for G-quadruplex telomere targeting agents in human cancer cells. Methods Mol Biol. 2010;613:303-13. Epub 2009/12/10. 35. Moore MJ, Schultes CM, Cuesta J, Cuenca F, Gunaratnam M, Tanious FA, et al. Trisubstituted acridines as G-quadruplex telomere targeting agents. Effects of extensions of the 3,6- and 9-side chains on quadruplex binding, telomerase activity, and cell proliferation. J Med Chem. 2006;49(2):582-99. 36. Incles CM, Schultes CM, Kempski H, Koehler H, Kelland LR, Neidle S. A G-quadruplex telomere targeting agent produces p16-associated senescence and chromosomal fusions in human prostate cancer cells. Mol Cancer Ther. 2004;3(10):1201-6. 37. Verma A, Halder K, Halder R, Yadav VK, Rawal P, Thakur RK, et al. Genome-wide computational and expression analyses reveal G-quadruplex DNA motifs as conserved cis-regulatory elements in human and related species. J Med Chem. 2008;51(18):5641-9. 38. Cogoi S, Paramasivam M, Filichev V, Geci I, Pedersen EB, Xodo LE. Identification of a new G- quadruplex motif in the KRAS promoter and design of pyrene-modified G4-decoys with antiproliferative activity in pancreatic cancer cells. J Med Chem. 2009;52(2):564-8. Epub 2008/12/23. 39. Lemarteleur T, Gomez D, Paterski R, Mandine E, Mailliet P, Riou JF. Stabilization of the c-myc gene promoter quadruplex by specific ligands' inhibitors of telomerase. Biochem Biophys Res Commun. 2004;323(3):802-8. 40. Lim KW, Lacroix L, Yue DJ, Lim JK, Lim JM, Phan AT. Coexistence of two distinct G- quadruplex conformations in the hTERT promoter. J Am Chem Soc. 2010;132(35):12331-42. Epub 2010/08/14. 41. Siddiqui-Jain A, Grand CL, Bearss DJ, Hurley LH. Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription. Proc Natl Acad Sci U S A. 2002;99(18):11593-8. 42. Lafferty-Whyte K, Bilsland A, Hoare SF, Burns S, Zaffaroni N, Cairney CJ, et al. TCEAL7 inhibition of c-Myc activity in alternative lengthening of telomeres regulates hTERT expression. Neoplasia. 2010;12(5):405-14. Epub 2010/05/11. 43. Micheli E, Martufi M, Cacchione S, De Santis P, Savino M. Self-organization of G-quadruplex structures in the hTERT core promoter stabilized by polyaminic side chain perylene derivatives. Biophys Chem. 2010;153(1):43-53. Epub 2010/11/03. 44. Zhou Q, Li L, Xiang J, Sun H, Tang Y. Fast screening and structural elucidation of G- quadruplex ligands from a mixture via G-quadruplex recognition and NMR methods. Biochimie. 2008. 45. Zhou Q, Li L, Xiang J, Tang Y, Zhang H, Yang S, et al. Screening potential antitumor agents from natural plant extracts by G-quadruplex recognition and NMR methods. Angew Chem Int Ed Engl. 2008;47(30):5590-2. 46. Sheridan RP, Venkataraghavan R. New methods in computer-aided drug design. Accounts of Chemical Research. 1987;20(9):322-9. 47. Gago FB. Métodos computacionales de modelado molecular y diseño de fármacos. Universidad de Alcalá de Henares ed. Madrid2002.

163

References. .

48. Crippen GM. Quantitative structure-activity relationships by distance geometry: systematic analysis of dihydrofolate reductase inhibitors. J Med Chem. 1980;23:599-606. 49. Good AC, So SS, Richards WG. Structure-activity relationships from molecular similarity matrices. J Med Chem. 1993;36(4):433-8. Epub 1993/02/19. 50. Khlebnikov A, Schepetkin I, Kwon BS. Modeling of the anticancer action for radical derivatives of nitroazoles: quantitative structure-activity relationship (QSAR) study. Cancer biotherapy & radiopharmaceuticals. 2002;17(2):193-203. Epub 2002/05/28. 51. Xiao Z Fau - Xiao Y-D, Xiao Yd Fau - Feng J, Feng J Fau - Golbraikh A, Golbraikh A Fau - Tropsha A, Tropsha A Fau - Lee K-H, Lee KH. Antitumor agents. 213. Modeling of epipodophyllotoxin derivatives using variable selection k nearest neighbor QSAR method. 2002(0022-2623 (Print)). 52. Todeschini R, Consonni V. Handbook of Molecular Descriptors. 1. Edition ed: Wiley-VCH, Mannheim 2000. 667 p. 53. Hansch C, Muir RM, Fujita T, Maloney PP, Geiger F, Streich M. The Correlation of Biological Activity of Plant Growth and Choromycetin Derivatives with Hammet Constants and Partition Coefficients. J Am Chem Soc. 1963;85:2817 - 24. 54. Free SM, Wilson JW. A mathematical contribution to structure-activity studies. J Med Chem. 1964( 7):395 - 9. 55. Wells PR. Linear Free Energy Relationships. New York: Academic Press; 1968. 1324 p. 56. Berson JA, Hamlet Z, Mueller WA. The correlation of solvent effects on the stereoselectivities of diels-alder reactions by means of linear free energy relationships. A new empirical measure of solvent polarity. . J Am Chem Soc. 1962;84(2):297 - 304. 57. Tropsha A. Best Practices for QSAR Model Development, Validation, and Exploitation. Molecular Informatics. 2010;29(6-7):476-88. 58. Fourches D, Muratov E, Tropsha A. Trust, But verify: On the Importance of Chemical Sctructure Curation in Cheminformatics and QSAR Modeling Research. J Chem Inf Model. 2010;50:1189-204. 59. Watson JD, Crick FH. Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature. 1953;171(4356):737-8. Epub 1953/04/25. 60. Franklin RE, Gosling RG. Molecular configuration in sodium thymonucleate. Nature. 1953;171(4356):740-1. Epub 1953/04/25. 61. Donohue J. HYDROGEN-BONDED HELICAL CONFIGURATIONS OF POLYNUCLEOTIDES. Proc Natl Acad Sci U S A. 1956;42(2):60-5. Epub 1956/02/01. 62. Hoogsteen K. The structure of crystals containing a hydrogen-bonded complex of 1- methylthymine and 9-methyladenine. Acta Crystallographica. 1959;12(10):822-3. 63. Hoogsteen K. The crystal and molecular structure of a hydrogen-bonded complex between 1- methylthymine and 9-methyladenine. Acta Crystallographica. 1963;16(9):907-16. 64. Pinnavaia TJ, Marshall CL, Mettler CM, Fisk CL, Miles HT, Becker ED. Alkali metal ion specificity in the solution ordering of a nucleotide, 5'-guanosine monophosphate. Journal of the American Chemical Society. 1978;100(11):3625-7. 65. Wong A, Ida R, Spindler L, Wu G. Disodium guanosine 5'-monophosphate self-associates into nanoscale cylinders at pH 8: a combined diffusion NMR spectroscopy and dynamic light scattering study. J Am Chem Soc. 2005;127(19):6990-8. Epub 2005/05/12. 66. Creze C, Rinaldi B, Haser R, Bouvet P, Gouet P. Structure of a d(TGGGGT) quadruplex crystallized in the presence of Li+ ions. Acta Crystallogr D Biol Crystallogr. 2007;63(Pt 6):682-8. Epub 2007/05/17. 67. Yaku H, Fujimoto T, Murashima T, Miyoshi D, Sugimoto N. Phthalocyanines: a new class of G- quadruplex-ligands with many potential applications. Chemical communications (Cambridge, England). 2012;48(50):6203-16. Epub 2012/05/17. 68. Harley CB, Kim NW, Prowse KR, Weinrich SL, Hirsch KS, West MD, et al. Telomerase, cell immortality, and cancer. Cold Spring Harb Symp Quant Biol. 1994;59:307-15. 69. Rhyu MS. Telomeres, telomerase, and immortality. J Natl Cancer Inst. 1995;87(12):884-94. 70. Blackburn EH. Structure and function of telomeres. Nature. 1991;350(6319):569-73. 71. Harley CB. Telomere loss: mitotic clock or genetic time bomb? Mutation research. 1991;256(2- 6):271-82. 72. Broccoli D, Smogorzewska A, Chong L, de Lange T. Human telomeres contain two distinct Myb-related proteins, TRF1 and TRF2. Nat Genet. 1997;17(2):231-5. 73. van Steensel B, de Lange T. Control of telomere length by the human telomeric protein TRF1. Nature. 1997;385(6618):740-3.

164

References. .

74. Mason M, Schuller A, Skordalakes E. Telomerase structure function. Curr Opin Struct Biol. 2011;21(1):92-100. Epub 2010/12/21. 75. Blackburn EH. Telomeres. Trends Biochem Sci. 1991;16(10):378-81. 76. Morin GB. The human telomere terminal transferase enzyme is a ribonucleoprotein that synthesizes TTAGGG repeats. Cell. 1989;59(3):521-9. 77. Wojtyla A, Gladych M, Rubis B. Human telomerase activity regulation. Mol Biol Rep. 2011;38(5):3339-49. Epub 2010/11/19. 78. Tsai YL, Tseng SF, Chang SH, Lin CC, Teng SC. Involvement of replicative polymerases, Tel1p, Mec1p, Cdc13p, and the Ku complex in telomere-telomere recombination. Mol Cell Biol. 2002;22(16):5679-87. 79. Espejel S, Franco S, Sgura A, Gae D, Bailey SM, Taccioli GE, et al. Functional interaction between DNA-PKcs and telomerase in telomere length maintenance. Embo J. 2002;21(22):6275-87. 80. Harrington L, Zhou W, McPhail T, Oulton R, Yeung DS, Mar V, et al. Human telomerase contains evolutionarily conserved catalytic and structural subunits. Genes Dev. 1997;11(23):3109-15. 81. Evans SK, Lundblad V. Positive and negative regulation of telomerase access to the telomere. J Cell Sci. 2000;113 Pt 19:3357-64. 82. Beattie TL, Zhou W, Robinson MO, Harrington L. Functional multimerization of the human telomerase reverse transcriptase. Mol Cell Biol. 2001;21(18):6151-60. 83. Arai K, Masutomi K, Khurts S, Kaneko S, Kobayashi K, Murakami S. Two independent regions of human telomerase reverse transcriptase are important for its oligomerization and telomerase activity. J Biol Chem. 2002;277(10):8538-44. 84. Wenz C, Enenkel B, Amacker M, Kelleher C, Damm K, Lingner J. Human telomerase contains two cooperating telomerase RNA molecules. Embo J. 2001;20(13):3526-34. 85. Drosopoulos WC, Prasad VR. The telomerase-specific T motif is a restrictive determinant of repetitive reverse transcription by human telomerase. Mol Cell Biol. 2010;30(2):447-59. Epub 2009/11/18. 86. DeZwaan DC, Freeman BC. Is there a telomere-bound 'EST' telomerase holoenzyme? Cell Cycle. 2010;9(10):1913-7. Epub 2010/05/04. 87. QUIAGEN. Sample & Assay Technologies. GeneGlobe Pathways. 2013 [cited 2013 January 14]; Available from: https:// www.qiagen.com/geneglobe/pathwayview.aspx?pathwayID=430 . 88. Shippen-Lentz D, Blackburn EH. Functional evidence for an RNA template in telomerase. Science. 1990;247(4942):546-52. 89. Nilsson P, Mehle C, Remes K, Roos G. Telomerase activity in vivo in human malignant hematopoietic cells. Oncogene. 1994;9(10):3043-8. 90. Nagele RG, Velasco AQ, Anderson WJ, McMahon DJ, Thomson Z, Fazekas J, et al. Telomere associations in interphase nuclei: possible role in maintenance of interphase chromosome topology. J Cell Sci. 2001;114(Pt 2):377-88. 91. Chai W, Ford LP, Lenertz L, Wright WE, Shay JW. Human Ku70/80 associates physically with telomerase through interaction with hTERT. J Biol Chem. 2002;277(49):47242-7. 92. Hahn WC. Role of telomeres and telomerase in the pathogenesis of human cancer. J Clin Oncol. 2003;21(10):2034-43. 93. Kim NW, Piatyszek MA, Prowse KR, Harley CB, West MD, Ho PL, et al. Specific association of human telomerase activity with immortal cells and cancer. Science. 1994;266(5193):2011-5. 94. Shay JW. Toward identifying a cellular determinant of telomerase repression. J Natl Cancer Inst. 1999;91(1):4-6. 95. Scatena R, Bottoni P, Pontoglio A, Giardina B. The proteomics of cancer stem cells. Potential clinical applications for innovative research in oncology. Proteomics Clin Appl. 2011. Epub 2011/10/01. 96. Ouellette MM, Wright WE, Shay JW. Targeting telomerase-expressing cancer cells. Journal of cellular and molecular medicine. 2011;15(7):1433-42. Epub 2011/02/22. 97. Granger MP, Wright WE, Shay JW. Telomerase in cancer and aging. Crit Rev Oncol Hematol. 2002;41(1):29-40. 98. Shay JW. Aging and cancer: are telomeres and telomerase the connection? Mol Med Today. 1995;1(8):378-84. 99. Reddel RR. Senescence: an antiviral defense that is tumor suppressive? Carcinogenesis. 2010;31(1):19-26. Epub 2009/11/06. 100. Ozturk M, Arslan-Ergul A, Bagislar S, Senturk S, Yuzugullu H. Senescence and immortality in hepatocellular carcinoma. Cancer Lett. 2009;286(1):103-13. Epub 2008/12/17.

165

References. .

101. Hastie ND, Dempster M, Dunlop MG, Thompson AM, Green DK, Allshire RC. Telomere reduction in human colorectal carcinoma and with ageing. Nature. 1990;346(6287):866-8. 102. Carlosn G. Gene VIII: Pearson Pertince Hall; 2004. 103. Brassat U, Balabanov S, Bali D, Dierlamm J, Braig M, Hartmann U, et al. Functional p53 is required for effective execution of telomerase inhibition in BCR-ABL-positive CML cells. Exp Hematol. 2011;39(1):66-76 e1-2. Epub 2010/10/14. 104. Marion RM, Strati K, Li H, Murga M, Blanco R, Ortega S, et al. A p53-mediated DNA damage response limits reprogramming to ensure iPS cell genomic integrity. Nature. 2009;460(7259):1149-53. Epub 2009/08/12. 105. Harley CB, Futcher AB, Greider CW. Telomeres shorten during ageing of human fibroblasts. Nature. 1990;345(6274):458-60. 106. Counter CM, Avilion AA, LeFeuvre CE, Stewart NG, Greider CW, Harley CB, et al. Telomere shortening associated with chromosome instability is arrested in immortal cells which express telomerase activity. Embo J. 1992;11(5):1921-9. 107. Mendez-Bermudez A, Hills M, Pickett HA, Phan AT, Mergny JL, Riou JF, et al. Human telomeres that contain (CTAGGG)n repeats show replication dependent instability in somatic cells and the male germline. Nucleic Acids Res. 2009;37(18):6225-38. Epub 2009/08/07. 108. Viidik A. [The long way towards selling telomere lengths?]. Lakartidningen. 2011;108(24- 25):1288-9. Epub 2011/08/13. Den langa vagen fram till forsaljning av telomerlangder. 109. Mather KA, Jorm AF, Parslow RA, Christensen H. Is telomere length a biomarker of aging? A review. J Gerontol A Biol Sci Med Sci. 2011;66(2):202-13. Epub 2010/10/30. 110. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144(5):646-74. Epub 2011/03/08. 111. Ries L, Eisner M, Kosary C, Hankey BF, Miller BA, Clegg L, et al. SEER Cancer Statistics Review, 1973 –1998, Table II-3. National Cancer Institute, 2001. 112. SEER Cancer Statistics Review 1975-2009 National Cancer Institute; 2009 [cited 2013 April 11]; Available from: http://seer.cancer.gov/csr/1975_2009_pops09/browse_csr.php . 113. Vaziri H, Schachter F, Uchida I, Wei L, Zhu X, Effros R, et al. Loss of telomeric DNA during aging of normal and trisomy 21 human lymphocytes. American journal of human genetics. 1993;52(4):661-7. Epub 1993/04/01. 114. Vulliamy TJ, Kirwan MJ, Beswick R, Hossain U, Baqai C, Ratcliffe A, et al. Differences in disease severity but similar telomere lengths in genetic subgroups of patients with telomerase and shelterin mutations. PLoS One. 2011;6(9):e24383. Epub 2011/09/21. 115. Nan H, Qureshi AA, Prescott J, De Vivo I, Han J. Genetic variants in telomere-maintaining genes and skin cancer risk. Hum Genet. 2011;129(3):247-53. Epub 2010/12/01. 116. Brennan P, Hainaut P, Boffetta P. Genetics of lung-cancer susceptibility. Lancet Oncol. 2011;12(4):399-408. Epub 2010/10/19. 117. Alexiou GA, Markoula S, Gogou P, Kyritsis AP. Genetic and molecular alterations in meningiomas. Clin Neurol Neurosurg. 2011;113(4):261-7. Epub 2011/01/14. 118. Neidle S, Balasubramanian S, editors. Quadruplex Nucleic Acids. Cambridge CB4 0WF, UK: The Royal Society of Chemistry; 2006. 119. Alder JK, Cogan JD, Brown AF, Anderson CJ, Lawson WE, Lansdorp PM, et al. Ancestral mutation in telomerase causes defects in repeat addition processivity and manifests as familial pulmonary fibrosis. PLoS genetics. 2011;7(3):e1001352. Epub 2011/04/13. 120. Morin GB. The implications of telomerase biochemistry for human disease. Eur J Cancer. 1997;33(5):750-60. 121. Young NS. Telomere Biology and Telomere Diseases: Implications for Practice and Research. ASH Education Program Book. 2010;2010(1):30-5. 122. Codd V, Nelson CP, Albrecht E, Mangino M, Deelen J, Buxton JL, et al. Identification of seven loci affecting mean telomere length and their association with disease. Nat Genet. 2013;45(4):422-7. Epub 2013/03/29. 123. Bryan TM, Englezou A, Dalla-Pozza L, Dunham MA, Reddel RR. Evidence for an alternative mechanism for maintaining telomere length in human tumors and tumor-derived cell lines. Nat Med. 1997;3(11):1271-4. 124. Bryan TM, Englezou A, Gupta J, Bacchetti S, Reddel RR. Telomere elongation in immortal human cells without detectable telomerase activity. Embo J. 1995;14(17):4240-8.

166

References. .

125. Bryan TM, Marusic L, Bacchetti S, Namba M, Reddel RR. The telomere lengthening mechanism in telomerase-negative immortal human cells does not involve the telomerase RNA subunit. Hum Mol Genet. 1997;6(6):921-6. 126. O'Hare TH, Delany ME. Molecular and Cellular Evidence for the Alternative Lengthening of Telomeres (ALT) Mechanism in Chicken. Cytogenet Genome Res. 2011;135(1):65-78. Epub 2011/08/09. 127. Tilman G, Loriot A, Van Beneden A, Arnoult N, Londono-Vallejo JA, De Smet C, et al. Subtelomeric DNA hypomethylation is not required for telomeric sister chromatid exchanges in ALT cells. Oncogene. 2009;28(14):1682-93. Epub 2009/03/03. 128. Wang Y, Meeker AK, Kowalski J, Tsai HL, Somervell H, Heaphy C, et al. Telomere length is related to alternative splice patterns of telomerase in thyroid tumors. Am J Pathol. 2011;179(3):1415-24. Epub 2011/07/19. 129. Venturini L, Daidone MG, Motta R, Collini P, Spreafico F, Terenziani M, et al. Telomere maintenance in Wilms tumors: first evidence for the presence of alternative lengthening of telomeres mechanism. Genes Chromosomes Cancer. 2011;50(10):823-9. Epub 2011/07/20. 130. Nabetani A, Ishikawa F. Alternative lengthening of telomeres pathway: recombination-mediated telomere maintenance mechanism in human cells. J Biochem. 2011;149(1):5-14. Epub 2010/10/13. 131. Yoon AR, Gao R, Kaul Z, Choi IK, Ryu J, Noble JR, et al. MicroRNA-296 is enriched in cancer cells and downregulates p21WAF1 mRNA expression via interaction with its 3' untranslated region. Nucleic Acids Res. 2011;39(18):8078-91. Epub 2011/07/05. 132. Randall A, Griffith JD. Structure of long telomeric RNA transcripts: the G-rich RNA forms a compact repeating structure containing G-quartets. J Biol Chem. 2009;284(21):13980-6. Epub 2009/03/31. 133. Hsu ST, Varnai P, Bugaut A, Reszka AP, Neidle S, Balasubramanian S. A G-rich sequence within the c-kit oncogene promoter forms a parallel G-quadruplex having asymmetric G-tetrad dynamics. J Am Chem Soc. 2009;131(37):13399-409. Epub 2009/08/27. 134. Del Toro M, Bucek P, Avino A, Jaumot J, Gonzalez C, Eritja R, et al. Targeting the G- quadruplex-forming region near the P1 promoter in the human BCL-2 gene with the cationic porphyrin TMPyP4 and with the complementary C-rich strand. Biochimie. 2009;91(7):894-902. Epub 2009/04/30. 135. Maizels N. Immunoglobulin gene diversification. Annual review of genetics. 2005;39:23-46. Epub 2005/11/16. 136. Murray J, Buard J, Neil DL, Yeramian E, Tamaki K, Hollies C, et al. Comparative sequence analysis of human minisatellites showing meiotic repeat instability. Genome research. 1999;9(2):130-6. Epub 1999/02/19. 137. Piazza A, Serero A, Boule JB, Legoix-Ne P, Lopes J, Nicolas A. Stimulation of gross chromosomal rearrangements by the human CEB1 and CEB25 minisatellites in Saccharomyces cerevisiae depends on G-quadruplexes or Cdc13. PLoS genetics. 2012;8(11):e1003033. Epub 2012/11/08. 138. Litt M, Buroker NE, inventors; DNA probe which reveals a hypervariable region on human chromosone 1. USA1992. 139. Alonso S, Armour JA. A highly variable segment of human subterminal 16p reveals a history of population growth for modern humans outstide Africa. Proc Natl Acad Sci U S A. 2001;98(3):864-9. Epub 2001/02/07. 140. Balasubramanian S, Neidle S. G-quadruplex nucleic acids as therapeutic targets. Curr Opin Chem Biol. 2009;13(3):345-53. Epub 2009/06/12. 141. Chen Z, Xiu MH, Li SF, Xu M. [The biological functions of G-quadruplex]. Sheng Li Ke Xue Jin Zhan. 2010;41(5):329-34. Epub 2011/03/23. 142. Rezler EM, Bearss DJ, Hurley LH. Telomere inhibition and telomere disruption as processes for drug targeting. Annual review of pharmacology and toxicology. 2003;43:359-79. Epub 2003/01/24. 143. Yalvac ME, Yilmaz A, Mercan D, Aydin S, Dogan A, Arslan A, et al. Differentiation and Neuro-Protective Properties of Immortalized Human Tooth Germ Stem Cells. Neurochem Res. 2011. Epub 2011/07/26. 144. Baird DM, Britt-Compton B, Rowson J, Amso NN, Gregory L, Kipling D. Telomere instability in the male germline. Hum Mol Genet. 2006;15(1):45-51. 145. Xie X, Hiona A, Lee AS, Cao F, Huang M, Li Z, et al. Effects of long-term culture on human embryonic stem cell aging. Stem Cells Dev. 2011;20(1):127-38. Epub 2010/07/16. 146. Shay JW, Wright WE. Telomeres and telomerase in normal and cancer stem cells. FEBS Lett. 2010;584(17):3819-25. Epub 2010/05/25. 147. Mergny JL, Mailliet P, Lavelle F, Riou JF, Laoui A, Helene C. The development of telomerase inhibitors: the G-quartet approach. Anticancer Drug Des. 1999;14(4):327-39.

167

References. .

148. Duan W, Rangan A, Vankayalapati H, Kim MY, Zeng Q, Sun D, et al. Design and synthesis of fluoroquinophenoxazines that interact with human telomeric G-quadruplexes and their biological effects. Mol Cancer Ther. 2001;1(2):103-20. 149. Kim JH, Lee GE, Kim JC, Lee JH, Chung IK. A novel telomere elongation in an adriamycin- resistant stomach cancer cell line with decreased telomerase activity. Mol Cells. 2002;13(2):228-36. 150. Kim MY, Vankayalapati H, Shin-Ya K, Wierzba K, Hurley LH. Telomestatin, a potent telomerase inhibitor that interacts quite specifically with the human telomeric intramolecular g- quadruplex. J Am Chem Soc. 2002;124(10):2098-9. 151. Read M, Harrison RJ, Romagnoli B, Tanious FA, Gowan SH, Reszka AP, et al. Structure-based design of selective and potent G quadruplex-mediated telomerase inhibitors. Proc Natl Acad Sci U S A. 2001;98(9):4844-9. 152. Koeppel F, Riou JF, Laoui A, Mailliet P, Arimondo PB, Labit D, et al. Ethidium derivatives bind to G-quartets, inhibit telomerase and act as fluorescent probes for quadruplexes. Nucleic Acids Res. 2001;29(5):1087-96. 153. Gowan SM, Heald R, Stevens MF, Kelland LR. Potent inhibition of telomerase by small- molecule pentacyclic acridines capable of interacting with G-quadruplexes. Mol Pharmacol. 2001;60(5):981-8. 154. Wu WB, Chen SH, Hou JQ, Tan JH, Ou TM, Huang SL, et al. Disubstituted 2-phenyl- benzopyranopyrimidine derivatives as a new type of highly selective ligands for telomeric G-quadruplex DNA. Org Biomol Chem. 2011;9(8):2975-86. Epub 2011/03/05. 155. Peduto A, Pagano B, Petronzi C, Massa A, Esposito V, Virgilio A, et al. Design, synthesis, biophysical and biological studies of trisubstituted naphthalimides as G-quadruplex ligands. Bioorg Med Chem. 2011. Epub 2011/09/29. 156. Manet I, Manoli F, Donzello MP, Ercolani C, Vittori D, Cellai L, et al. Tetra-2,3- pyrazinoporphyrazines with externally appended pyridine rings. 10. A water-soluble bimetallic (Zn(II)/Pt(II)) porphyrazine hexacation as potential plurimodal agent for cancer therapy: exploring the behavior as ligand of telomeric DNA G-quadruplex structures. Inorg Chem. 2011;50(16):7403-11. Epub 2011/07/21. 157. Kaluzhny DN, Shchyolkina AK, Ilyinsky NS, Borisova OF, Shtil AA. Novel Indolocarbazole Derivative 12-(alpha-L-arabinopyranosyl)indolo[2,3-a]pyrrolo[3,4-c]carbazole-5,7-dione Is a Preferred c- Myc Guanine Quadruplex Ligand. J Nucleic Acids. 2011;2011:184735. Epub 2011/07/21. 158. Chen SB, Tan JH, Ou TM, Huang SL, An LK, Luo HB, et al. Pharmacophore-based discovery of triaryl-substituted imidazole as new telomeric G-quadruplex ligand. Bioorg Med Chem Lett. 2011;21(3):1004-9. Epub 2011/01/08. 159. Rzuczek SG, Pilch DS, Liu A, Liu L, LaVoie EJ, Rice JE. Macrocyclic pyridyl polyoxazoles: selective RNA and DNA G-quadruplex ligands as antitumor agents. J Med Chem. 2010;53(9):3632-44. Epub 2010/04/03. 160. Jiang YL, Liu ZP. Metallo-organic G-quadruplex ligands in anticancer drug design. Mini Rev Med Chem. 2010;10(8):726-36. Epub 2010/04/24. 161. Bhattacharya S, Chaudhuri P, Jain AK, Paul A. Symmetrical bisbenzimidazoles with benzenediyl spacer: the role of the shape of the ligand on the stabilization and structural alterations in telomeric G-quadruplex DNA and telomerase inhibition. Bioconjug Chem. 2010;21(7):1148-59. Epub 2010/06/12. 162. Izbicka E, Wheelhouse RT, Raymond E, Davidson KK, Lawrence RA, Sun D, et al. Effects of cationic porphyrins as G-quadruplex interactive agents in human tumor cells. Cancer Res. 1999;59(3):639-44. 163. Shin-ya K, Wierzba K, Matsuo K, Ohtani T, Yamada Y, Furihata K, et al. Telomestatin, a novel telomerase inhibitor from Streptomyces anulatus. J Am Chem Soc. 2001;123(6):1262-3. 164. Neidle S, Read MA. G-quadruplexes as therapeutic targets. Biopolymers. 2001;56(3):195-208. 165. Rossetti L, Franceschin M, Bianco A, Ortaggi G, Savino M. Perylene diimides with different side chains are selective in inducing different G-quadruplex DNA structures and in inhibiting telomerase. Bioorg Med Chem Lett. 2002;12(18):2527-33. 166. Wright WE, Shay JW, Piatyszek MA. Modifications of a telomeric repeat amplification protocol (TRAP) result in increased reliability, linearity and sensitivity. Nucleic Acids Res. 1995;23(18):3794-5. 167. Savoysky E, Akamatsu K, Tsuchiya M, Yamazaki T. Detection of telomerase activity by combination of TRAP method and scintillation proximity assay (SPA). Nucleic Acids Res. 1996;24(6):1175-6.

168

References. .

168. Shi DF, Wheelhouse RT, Sun D, Hurley LH. Quadruplex-interactive agents as telomerase inhibitors: synthesis of porphyrins and structure-activity relationship for the inhibition of telomerase. J Med Chem. 2001;44(26):4509-23. 169. Gomez D, Mergny JL, Riou JF. Detection of telomerase inhibitors based on g-quadruplex ligands by a modified telomeric repeat amplification protocol assay. Cancer Res. 2002;62(12):3365-8. 170. Reed J, Gunaratnam M, Beltran M, Reszka AP, Vilar R, Neidle S. TRAP-LIG, a modified telomere repeat amplification protocol assay to quantitate telomerase inhibition by small molecules. Anal Biochem. 2008;380(1):99-105. 171. De Cian A, Guittat L, Kaiser M, Sacca B, Amrane S, Bourdoncle A, et al. Fluorescence-based melting assays for studying quadruplex ligands. Methods. 2007;42(2):183-95. 172. Schultes CM, Guyen B, Cuesta J, Neidle S. Synthesis, biophysical and biological evaluation of 3,6-bis-amidoacridines with extended 9-anilino substituents as potent G-quadruplex-binding telomerase inhibitors. Bioorg Med Chem Lett. 2004;14(16):4347-51. 173. Sissi C, Lucatello L, Paul Krapcho A, Maloney DJ, Boxer MB, Camarasa MV, et al. Tri-, tetra- and heptacyclic perylene analogues as new potential antineoplastic agents based on DNA telomerase inhibition. Bioorg Med Chem. 2007;15(1):555-62. 174. Guyen B, Schultes CM, Hazel P, Mann J, Neidle S. Synthesis and evaluation of analogues of 10H-indolo[3,2-b]quinoline as G-quadruplex stabilising ligands and potential inhibitors of the enzyme telomerase. Org Biomol Chem. 2004;2(7):981-8. 175. Alberti P, Schmitt P, Nguyen CH, Rivalle C, Hoarau M, Grierson DS, et al. Benzoindoloquinolines interact with DNA tetraplexes and inhibit telomerase. Bioorg Med Chem Lett. 2002;12(7):1071-4. 176. Mergny JL, Lacroix L, Teulade-Fichou MP, Hounsou C, Guittat L, Hoarau M, et al. Telomerase inhibitors based on quadruplex ligands selected by a fluorescence assay. Proc Natl Acad Sci U S A. 2001;98(6):3062-7. 177. Balasubramanian S, Hurley LH, Neidle S. Targeting G-quadruplexes in gene promoters: a novel anticancer strategy? Nat Rev Drug Discov. 2011;10(4):261-75. Epub 2011/04/02. 178. Dearden JC. Physico-Chemical Descriptors. In Practical Applications of QSAR in Enviromental Chemistry and Toxicology. Karther W, Devillers J, editors. Dordrecht (Netherlands): Kluwer; 1990. 520 p. 179. Newman MS. Some Observations Concerning Steric Factors. J Am Chem Soc. 1950;72:4783 - 6. 180. Idoux JP, Hwang PTR, Hancock CK. Study of Alkaline Hydrolisis and NMR Spectra of Some Thiol Esters. J Org Chem. 1973;38:4239 - 43. 181. Okey RW, Stensel HD. A QSAR-based Biodegradability Model. A QSBR. Water Res. 1996;30:2206 - 14. 182. Galvez J, Garcia-Domenech R, de Gregorio Alapont C, de Julian-Ortiz JV, Popa L. Pharmacological distribution diagrams: a tool for de novo drug design. J Mol Graph. 1996;14(5):272-6. 183. Rios-Santamarina I, Garcia-Domenech R, Galvez J, Cortijo J, Santamaria P, Morcillo E. New bronchodilators selected by molecular topology. Bioorg Med Chem Lett. 1998;8(5):477-82. 184. Bonchev D, Meyekan O. Comparability Graphs and Electronic Spectra of Condensed Benzenoid Hydrocarbon. Chem Phys Lett. 1983;98:134 - 8. 185. Gutman I, Ru??ic B, Trinajstic N, Wilcox JCF. Graph theory and molecular orbitals. XII. Acyclic polyenes. The Journal of Chemical Physics. 1975;62(9):3399-405. 186. Todeschini R, Consonni V, Pavan M. DRAGON (Software for Molecular Descriptor Calculation). Version 6.0 ed. Italy: http://www.talete.mi.it/; 2010. 187. Hemmer MC, Steinhauer V, Gasteiger J. Deriving the 3D structure of organic molecules from their infrared spectra. Vibrational Spectroscopy. 1999;19(1):151-64. 188. Todeschini R, Gramatica P. 3D - Modelling and prediction by WHIM descriptors. Part 5. Theory development and chemical meaning of WHIM descriptors. . Quant Struct - Act Relat. 1997;16:113 -9. 189. Todeschini R, Gramatica P. 3D - Modelling and prediction by WHIM descriptors. Part 6. Application of WHIM descriptors in QSAR studies. . Quant Struct - Act Relat. 1997;16:120 - 5. 190. Todeschini R, Consonni V, Pavan M, Mauri A, Ballabio D, Manganaro A. An introduction to molecular descriptors and QSAR. Iran - February 20092009. 191. Albuquerque MG, Hopfinger AJ, Barreiro EJ, de Alencastro RB. Four-dimensional quantitative structure-activity relationship analysis of a series of interphenylene 7-oxabicycloheptane oxazole thromboxane A2 receptor antagonists. J Chem Inf Comput Sci. 1998;38:925 - 38. 192. Everitt BS, Landau S, Leese M. Cluster analysis. 4th Edition ed: A Hodder Arnold Publication; 2001.

169

References. .

193. Hansch C, Grieco C, Silipo C, Vittoria A. Quantitative structure-activity relationship of chymotrypsin-ligand interactions. J Med Chem. 1977;20(11):1420-35. 194. Fukunaga K. Introduction to Statistical Pattern Recognition. Academic Press. . San Diego. California.1990. 195. Frank IE, Todeschini R. The Data Analysis Handbook. Amsterdam (The Netherland): Elsevier; 1994. 366 p. 196. Draper N, Smith H. Applied Regression Analysis. New York (NY). Wiley; 1998. 197. Goldberg DE. Genetic Algorithms in Search, Optimization and Machine Learning Inc A-WEP, editor1989. 198. Leardi R, Boggia R, Terrile M. Genetic algorithms as a strategy for feature selection. Journal of Chemometrics. 1992;6(5):267-81. 199. Saxena AK, Prathipati P. Comparison of MLR, PLS and GA-MLR in QSAR analysis. SAR QSAR Environ Res. 2003;14(5-6):433-45. 200. Lavine BK, Davidson CE, Breneman C, Katt W. Electronic van der Waals surface property descriptors and genetic algorithms for developing structure-activity correlations in olfactory databases. J Chem Inf Comput Sci. 2003;43(6):1890-905. 201. Agatonovic-Kustrin S, Evans A, Alany RG. Prediction of corneal permeability using artificial neural networks. Die Pharmazie. 2003;58(10):725-9. 202. Myers RH. Classical and Modern Regressions with Applications. Boston (MA). Duxbury Press; 1986. 203. Clark M, Cramer RD. The Probability of Chance Correlation Using Partial Least Squares (PLS). Quant Struc Act Relat. 1993;12:137 - 45. 204. Osten DW. Selection of Optimal Regression Models via Cross Validation. J Chemom. 1998;2:39. 205. Todeschini R, Moro G, Boggia R, Bonati L, Cosentino U, Lasagni M, et al. Modeling and Prediction of Molecular Properties. Theory of GridWeighted Holistic Invariant Molecular (GM-WHIM) descriptors. Chemom Intell Lab Syst. 1997;36:65-73. 206. Efron B. The Jackknife, the Bootstrap and other Resampling Planes. Philadelphia (PA),: Society of Industrial and Applied Mathematics; 1982. 92 p. 207. Lindgren F, Hansen B, Karcher W, Sjostrom M, Eriksson L. Model Validation by Permutation Tests: Applications to Variable Selection. J Chemom. 1996;10:521-32. 208. Kim J, Wess J, van Rhee AM, Schoneberg T, Jacobson KA. Site-directed mutagenesis identifies residues involved in ligand recognition in the human A2a adenosine receptor. J Biol Chem. 1995;270(23):13987-97. 209. Winkler DA. Neural networks as robust tools in drug lead discovery and development. Mol Biotechnol. 2004;27(2):139-68. 210. McElroy NR, Jurs PC, Morisseau C, Hammock BD. QSAR and classification of murine and human soluble epoxide hydrolase inhibition by urea-like compounds. J Med Chem. 2003;46(6):1066-80. Epub 2003/03/07. 211. Bazoui H, Zahouily M, Boulajaaj S, Sebt iS, Zakarya D. QSAR for anti-HIV activity of HEPT derivatives. SAR QSAR Environ Res. 2002;13:566-77. 212. Mager DE, Jusko WJ. Quantitative structure - pharmacokinetic /pharmacodynamic relationships of corticosteroids in man. J Pharm Sci. 2002;91:2441-51. 213. Golbraikh A, Tropsha A. Beware of q2! J Mol Graph Model. 2002;20(4):269-76. 214. Huang J, Fan X. Why QSAR fails: an empirical evaluation using conventional computational approach. Molecular pharmaceutics. 2011;8(2):600-8. Epub 2011/03/05. 215. Truchon JF, Bayly CI. Evaluating virtual screening methods: good and bad metrics for the "early recognition" problem. J Chem Inf Model. 2007;47(2):488-508. Epub 2007/02/10. 216. Kirchmair J, Markt P, Distinto S, Wolber G, Langer T. Evaluation of the performance of 3D virtual screening protocols: RMSD comparisons, enrichment assessments, and decoy selection--what can we learn from earlier mistakes? J Comput-Aided Mol Design. 2008;22(3-4):213-28. Epub 2008/01/16. 217. Jewett A, Tseng HC, Arasteh A, Saadat S, Christensen RE, Cacalano NA. Natural killer cells preferentially target cancer stem cells; role of monocytes in protection against NK cell mediated lysis of cancer stem cells. Current drug delivery. 2012;9(1):5-16. Epub 2011/10/26. 218. de Cremoux P. [Hormone therapy and breast cancer]. Bulletin du cancer. 2011;98(11):1311-9. Epub 2011/10/25. Hormonotherapie des cancers du sein. 219. Dimitrov DS. Therapeutic proteins. Methods Mol Biol. 2012;899:1-26. Epub 2012/06/28.

170

References. .

220. Scatena R. Mitochondria and drugs. Advances in experimental medicine and biology. 2012;942:329-46. Epub 2012/03/09. 221. Chahrour O, Cairns D, Omran Z. Small molecule kinase inhibitors as anti-cancer therapeutics. Mini Rev Med Chem. 2012;12(5):399-411. Epub 2012/02/07. 222. Seshacharyulu P, Ponnusamy MP, Haridas D, Jain M, Ganti AK, Batra SK. Targeting the EGFR signaling pathway in cancer therapy. Expert opinion on therapeutic targets. 2012;16(1):15-31. Epub 2012/01/14. 223. Lombardi PM, Cole KE, Dowling DP, Christianson DW. Structure, mechanism, and inhibition of histone deacetylases and related metalloenzymes. Curr Opin Struct Biol. 2011;21(6):735-43. Epub 2011/08/30. 224. Rahman KM, Tizkova K, Reszka AP, Neidle S, Thurston DE. Identification of novel telomeric G-quadruplex-targeting chemical scaffolds through screening of three NCI libraries. Bioorg Med Chem Lett. 2012;22(8):3006-10. Epub 2012/03/17. 225. Egli M, Pallan PS. The many twists and turns of DNA: template, telomere, tool, and target. Curr Opin Struct Biol. 2010;20(3):262-75. Epub 2010/04/13. 226. Ma Y, Ou TM, Tan JH, Hou JQ, Huang SL, Gu LQ, et al. Synthesis and evaluation of 9-O- substituted berberine derivatives containing aza-aromatic terminal group as highly selective telomeric G- quadruplex stabilizing ligands. Bioorg Med Chem Lett. 2009;19(13):3414-7. Epub 2009/05/30. 227. Westwell AD. Novel telomerase inhibitors targeting quadreplex DNA; antitumour benzothiazoles; P-Glycoprotein efflux pump inhibitors; new topoisomerase inhibitors. Drug Discovery Today. 2002;7(9):528-31. 228. He J-H, Liu H-Y, Li Z, Tan J-H, Ou T-M, Huang S-L, et al. New quinazoline derivatives for telomeric G-quadruplex DNA: Effects of an added phenyl group on quadruplex binding ability. European Journal of Medicinal Chemistry. 2013;63(0):1-13. 229. Wang X-D, Ou T-M, Lu Y-J, Li Z, Xu Z, Xi C, et al. Turning off Transcription of the bcl-2 Gene by Stabilizing the bcl-2 Promoter Quadruplex with Quindoline Derivatives. Journal of Medicinal Chemistry. 2010;53(11):4390-8. 230. Boddupally PVL, Hahn S, Beman C, De B, Brooks TA, Gokhale V, et al. Anticancer Activity and Cellular Repression of c-MYC by the G-Quadruplex-Stabilizing 11-Piperazinylquindoline Is Not Dependent on Direct Targeting of the G-Quadruplex in the c-MYC Promoter. Journal of Medicinal Chemistry. 2012;55(13):6076-86. 231. Campbell NH, Parkinson GN, Reszka AP, Neidle S. Structural basis of DNA quadruplex recognition by an acridine drug. J Am Chem Soc. 2008;130(21):6722-4. 232. Campbell NH, Patel M, Tofa AB, Ghosh R, Parkinson GN, Neidle S. Selectivity in ligand recognition of G-quadruplex loops. Biochemistry. 2009;48(8):1675-80. Epub 2009/01/29. 233. Yang H, Zhong HJ, Leung KH, Chan DS, Ma VP, Fu WC, et al. Structure-based design of flavone derivatives as c-myc oncogene down-regulators. European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences. 2013;48(1-2):130-41. Epub 2012/11/07. 234. Perry PJ, Gowan SM, Reszka AP, Polucci P, Jenkins TC, Kelland LR, et al. 1,4- and 2,6- disubstituted amidoanthracene-9,10-dione derivatives as inhibitors of human telomerase. J Med Chem. 1998;41(17):3253-60. 235. Perry PJ, Reszka AP, Wood AA, Read MA, Gowan SM, Dosanjh HS, et al. Human telomerase inhibition by regioisomeric disubstituted amidoanthracene-9,10-diones. J Med Chem. 1998;41(24):4873- 84. 236. Huang HS, Chiou JF, Fong Y, Hou CC, Lu YC, Wang JY, et al. Activation of human telomerase reverse transcriptase expression by some new symmetrical bis-substituted derivatives of the anthraquinone. J Med Chem. 2003;46(15):3300-7. 237. Huang HS, Chiu HF, Lee AL, Guo CL, Yuan CL. Synthesis and structure-activity correlations of the cytotoxic bifunctional 1,4-diamidoanthraquinone derivatives. Bioorg Med Chem. 2004;12(23):6163- 70. 238. Zagotto G, Sissi C, Lucatello L, Pivetta C, Cadamuro SA, Fox KR, et al. Aminoacyl−Anthraquinone Conjugates as Telomerase Inhibitors: Synthesis, Biophysical and Biological Evaluation. Journal of Medicinal Chemistry. 2008;51(18):5566-74. 239. Huang HS, Huang KF, Li CL, Huang YY, Chiang YH, Huang FC, et al. Synthesis, human telomerase inhibition and anti-proliferative studies of a series of 2,7-bis-substituted amido-anthraquinone derivatives. Bioorg Med Chem. 2008;16(14):6976-86.

171

References. .

240. Sun D, Thompson B, Cathers BE, Salazar M, Kerwin SM, Trent JO, et al. Inhibition of human telomerase by a G-quadruplex-interactive compound. J Med Chem. 1997;40(14):2113-6. Epub 1997/07/04. 241. Harrison RJ, Gowan SM, Kelland LR, Neidle S. Human telomerase inhibition by substituted acridine derivatives. Bioorg Med Chem Lett. 1999;9(17):2463-8. 242. Harrison RJ, Cuesta J, Chessari G, Read MA, Basra SK, Reszka AP, et al. Trisubstituted acridine derivatives as potent and selective telomerase inhibitors. J Med Chem. 2003;46(21):4463-76. 243. Valerio LG, Jr., Arvidson KB, Chanderbhan RF, Contrera JF. Prediction of rodent carcinogenic potential of naturally occurring chemicals in the human diet using high-throughput QSAR predictive modeling. Toxicol Appl Pharmacol. 2007. 244. Morales AH, González MP, Cordeiro MNDS, Pérez MA. Quantitative structure carcinogenicity relationship for detecting structural alerts in nitroso-compounds. Toxicol Appl Pharmacol. 2007;221(2):189-202. 245. Yao SW, Lopes VH, Fernandez F, Garcia-Mera X, Morales M, Rodriguez-Borges JE, et al. Synthesis and QSAR study of the anticancer activity of some novel indane carbocyclic nucleosides. Bioorg Med Chem. 2003;11(23):4999-5006. 246. Stat Soft I. Statistica 7.0 (data analysis software system). 2004. 247. Todeschini R, Consonni V, Pavan M. Dragon for windows (Software for molecular descriptors calculation). Version 5.4 ed: www.talete.mi.it; 2006. 248. Castillo-Gonzalez D, Cabrera-Perez MA, Perez-Gonzalez M, Morales Helguera A, Duran- Martinez A. Prediction of telomerase inhibitory activity for acridinic derivatives based on chemical structure. Eur J Med Chem. 2009;44(12):4826-40. 249. Wold S. Validation of QSAR's. Quantitative Structure-Activity Relationships. 1991;10(3):191-3. 250. Martins C, Gunaratnam M, Stuart J, Makwana V, Greciano O, Reszka AP, et al. Structure-based design of benzylamino-acridine compounds as G-quadruplex DNA telomere targeting agents. Bioorg Med Chem Lett. 2007;17(8):2293-8. 251. Garcia-Domenech R, de Julian-Ortiz JV. Antimicrobial activity characterization in a heterogeneous group of compounds. J Chem Inf Comput Sci. 1998;38(3):445-9. 252. Barnett V, Lewis T. Outliers in statistical data. 3rd Edition ed: Wiley & Sons; 1994. 253. Maggiora GM. On Outliers and Activity CliffsWhy QSAR Often Disappoints. Journal of Chemical Information and Modeling. 2006;46(4):1535-. 254. Harrison RJ, Reszka AP, Haider SM, Romagnoli B, Morrell J, Read MA, et al. Evaluation of by disubstituted acridone derivatives as telomerase inhibitors: the importance of G-quadruplex binding. Bioorg Med Chem Lett. 2004;14(23):5845-9. 255. Burger AM, Dai F, Schultes CM, Reszka AP, Moore MJ, Double JA, et al. The G-quadruplex- interactive molecule BRACO-19 inhibits tumor growth, consistent with telomere targeting and interference with telomerase function. Cancer Res. 2005;65(4):1489-96. 256. Moore MJB, Schultes CM, Cuesta J, Cuenca F, Gunaratnam M, Tanious FA, et al. Trisubstituted Acridines as G-quadruplex Telomere Targeting Agents. Effects of Extensions of the 3,6- and 9-Side Chains on Quadruplex Binding, Telomerase Activity, and Cell Proliferation. Journal of Medicinal Chemistry. 2005;49(2):582-99. 257. Gunaratnam M, Greciano O, Martins C, Reszka AP, Schultes CM, Morjani H, et al. Mechanism of acridine-based telomerase inhibition and telomere shortening. Biochem Pharmacol. 2007;74(5):679-89. 258. Cuenca F, Moore MJ, Johnson K, Guyen B, De Cian A, Neidle S. Design, synthesis and evaluation of 4,5-di-substituted acridone ligands with high G-quadruplex affinity and selectivity, together with low toxicity to normal cells. Bioorg Med Chem Lett. 2009;19(17):5109-13. Epub 2009/07/31. 259. Lombardo CM, Martinez IS, Haider S, Gabelica V, De Pauw E, Moses JE, et al. Structure-based design of selective high-affinity telomeric quadruplex-binding ligands. Chemical communications (Cambridge, England). 2010;46(48):9116-8. Epub 2010/11/03. 260. Collie GW, Sparapani S, Parkinson GN, Neidle S. Structural basis of telomeric RNA quadruplex--acridine ligand recognition. J Am Chem Soc. 2011;133(8):2721-8. Epub 2011/02/05. 261. Guha R. On the interpretation and interpretability of quantitative structure-activity relationship models. Journal of computer-aided molecular design. 2008;22(12):857-71. Epub 2008/09/12. 262. Doweyko AM. QSAR: dead or alive? Journal of computer-aided molecular design. 2008;22(2):81-9. Epub 2008/01/15. 263. Mailliet P, Laoui A, Riou JF, Doerflinger G, Mergny JL, Hamy F, et al., inventors; Aventis Pharma S.A., assignee. Triazine derivatives and their applications as antitelomerase agents. France patent US 6887873 B2. 2005 Maay 3-2005.

172

References. .

264. Mailliet P, Riou JF, Alasia M, T. C, Doerflinger G, Mergny JL, et al., inventors; Aventis Pharma S.A., assignee. Chemical Dervatives And their applications as antitelomerase agents. France patent US 6858608 B2. 2005 Feb 22-2005. 265. Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, et al. DrugBank 3.0: a comprehensive resource for 'omics' research on drugs. Nucleic Acids Res. 2011;39(Database issue):D1035-41. Epub 2010/11/10. 266. Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008;36(Database issue):D901-6. Epub 2007/12/01. 267. Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006;34(Database issue):D668-72. Epub 2005/12/31. 268. Monchaud D, Teulade-Fichou MP. A hitchhiker's guide to G-quadruplex ligands. Org Biomol Chem. 2008;6(4):627-36. 269. Yang D, Okamoto K. Structural insights into G-quadruplexes: towards new anticancer drugs. Future Med Chem. 2010;2(4):619-46. Epub 2010/06/22. 270. Gunaratnam M, Swank S, Haider SM, Galesa K, Reszka AP, Beltran M, et al. Targeting human gastrointestinal stromal tumor cells with a quadruplex-binding small molecule. J Med Chem. 2009;52(12):3774-83. Epub 2009/05/28. 271. Neidle S. Human telomeric G-quadruplex: the current status of telomeric G-quadruplexes as therapeutic targets in human cancer. FEBS J. 2010;277(5):1118-25. Epub 2009/12/03. 272. Gowan SM, Harrison JR, Patterson L, Valenti M, Read MA, Neidle S, et al. A G-quadruplex- interactive potent small-molecule inhibitor of telomerase exhibiting in vitro and in vivo antitumor activity. Mol Pharmacol. 2002;61(5):1154-62. 273. Drygin D, Siddiqui-Jain A, O'Brien S, Schwaebe M, Lin A, Bliesath J, et al. Anticancer activity of CX-3543: a direct inhibitor of rRNA biogenesis. Cancer Res. 2009;69(19):7653-61. Epub 2009/09/10. 274. Tauchi T, Shin-Ya K, Sashida G, Sumi M, Okabe S, Ohyashiki JH, et al. Telomerase inhibition with a novel G-quadruplex-interactive agent, telomestatin: in vitro and in vivo studies in acute leukemia. Oncogene. 2006;25(42):5719-25. 275. Grand CL, Han H, Munoz RM, Weitman S, Von Hoff DD, Hurley LH, et al. The cationic porphyrin TMPyP4 down-regulates c-MYC and human telomerase reverse transcriptase expression and inhibits tumor growth in vivo. Mol Cancer Ther. 2002;1(8):565-73. 276. Phatak P, Cookson JC, Dai F, Smith V, Gartenhaus RB, Stevens MF, et al. Telomere uncapping by the G-quadruplex ligand RHPS4 inhibits clonogenic tumour cell growth in vitro and in vivo consistent with a cancer stem cell targeting mechanism. Br J Cancer. 2007;96(8):1223-33. 277. Leonetti C, Scarsella M, Riggio G, Rizzo A, Salvati E, D'Incalci M, et al. G-quadruplex ligand RHPS4 potentiates the antitumor activity of camptothecins in preclinical models of solid tumors. Clin Cancer Res. 2008;14(22):7284-91. 278. Salvati E, Leonetti C, Rizzo A, Scarsella M, Mottolese M, Galati R, et al. Telomere damage induced by the G-quadruplex ligand RHPS4 has an antitumor effect. J Clin Invest. 2007;117(11):3236-47. Epub 2007/10/13. 279. Arimochi H, Morita K. Characterization of cytotoxic actions of tricyclic antidepressants on human HT29 colon carcinoma cells. European journal of pharmacology. 2006;541(1-2):17-23. Epub 2006/06/07. 280. Volpe DA, Ellison CD, Parchment RE, Grieshaber CK, Faustino PJ. Effects of amitriptyline and fluoxetine upon the in vitro proliferation of tumor cell lines. Journal of experimental therapeutics & oncology. 2003;3(4):169-84. Epub 2003/10/22. 281. Sternbach H. Are antidepressants carcinogenic? A review of preclinical and clinical studies. The Journal of clinical psychiatry. 2003;64(10):1153-62. Epub 2003/12/09. 282. Wiklund ED, Catts VS, Catts SV, Ng TF, Whitaker NJ, Brown AJ, et al. Cytotoxic effects of antipsychotic drugs implicate cholesterol homeostasis as a novel chemotherapeutic target. International journal of cancer Journal international du cancer. 2010;126(1):28-40. Epub 2009/08/08. 283. Palmeira A, Rodrigues F, Sousa E, Pinto M, Vasconcelos MH, Fernandes MX. New Uses for Old Drugs: Pharmacophore-Based Screening for the Discovery of P-Glycoprotein Inhibitors. Chemical Biology & Drug Design. 2011;78(1):57-72. 284. Jeon SH, Kim SH, Kim Y, Kim YS, Lim Y, Lee YH, et al. The tricyclic imipramine induces autophagic cell death in U-87MG glioma cells. Biochem Biophys Res Commun. 2011;413(2):311-7. Epub 2011/09/06.

173

References. .

285. Gavrilova-Ruch O, Schonherr K, Gessner G, Schonherr R, Klapperstuck T, Wohlrab W, et al. Effects of imipramine on ion channels and proliferation of IGR1 melanoma cells. The Journal of membrane biology. 2002;188(2):137-49. Epub 2002/08/13. 286. Garcia-Ferreiro RE, Kerschensteiner D, Major F, Monje F, Stuhmer W, Pardo LA. Mechanism of block of hEag1 K+ channels by imipramine and . The Journal of general physiology. 2004;124(4):301-17. Epub 2004/09/15. 287. Asher V, Warren A, Shaw R, Sowter H, Bali A, Khan R. The role of Eag and HERG channels in cell proliferation and apoptotic cell death in SK-OV-3 ovarian cancer cell line. Cancer cell international. 2011;11:6. Epub 2011/03/12. 288. Higgins SC, Pilkington GJ. The in vitro effects of tricyclic drugs and dexamethasone on cellular respiration of malignant glioma. Anticancer research. 2010;30(2):391-7. Epub 2010/03/25. 289. Cordero MD, Sanchez-Alcazar JA, Bautista-Ferrufino MR, Carmona-Lopez MI, Illanes M, Rios MJ, et al. Acute oxidant damage promoted on cancer cells by amitriptyline in comparison with some common chemotherapeutic drugs. Anti-cancer drugs. 2010;21(10):932-44. Epub 2010/09/18. 290. Mao X, Hou T, Cao B, Wang W, Li Z, Chen S, et al. The amitriptyline inhibits D-cyclin transactivation and induces myeloma cell apoptosis by inhibiting histone deacetylases: in vitro and in silico evidence. Mol Pharmacol. 2011;79(4):672-80. Epub 2011/01/12. 291. Parker KA, Glaysher S, Hurren J, Knight LA, McCormick D, Suovouri A, et al. The effect of tricyclic antidepressants on cutaneous melanoma cell lines and primary cell cultures. Anti-cancer drugs. 2012;23(1):65-9. Epub 2011/09/08. 292. Spengler G, Molnar J, Viveiros M, Amaral L. induces apoptosis of multidrug- resistant mouse lymphoma cells transfected with the human ABCB1 and inhibits the expression of P- glycoprotein. Anticancer research. 2011;31(12):4201-5. Epub 2011/12/27. 293. Williams JB, Mallorga PJ, Conn PJ, Pettibone DJ, Sur C. Effects of typical and atypical on human glycine transporters. Schizophrenia research. 2004;71(1):103-12. Epub 2004/09/18. 294. Lialiaris TS, Papachristou F, Mourelatos C, Simopoulou M. Antineoplastic and cytogenetic effects of chlorpromazine on human lymphocytes in vitro and on Ehrlich ascites tumor cells in vivo. Anti- cancer drugs. 2009;20(8):746-51. Epub 2009/07/09. 295. Riffell JL, Zimmerman C, Khong A, McHardy LM, Roberge M. Effects of chemical manipulation of mitotic arrest and slippage on cancer cell survival and proliferation. Cell Cycle. 2009;8(18):3025-38. Epub 2009/08/29. 296. Azuine MA, Tokuda H, Takayasu J, Enjyo F, Mukainaka T, Konoshima T, et al. Cancer chemopreventive effect of phenothiazines and related tri-heterocyclic analogues in the 12-O- tetradecanoylphorbol-13-acetate promoted Epstein-Barr virus early antigen activation and the mouse skin two-stage carcinogenesis models. Pharmacological research : the official journal of the Italian Pharmacological Society. 2004;49(2):161-9. Epub 2003/12/04. 297. Zhelev Z, Ohba H, Bakalova R, Hadjimitova V, Ishikawa M, Shinohara Y, et al. Phenothiazines suppress proliferation and induce apoptosis in cultured leukemic cells without any influence on the viability of normal lymphocytes. Phenothiazines and leukemia. Cancer chemotherapy and pharmacology. 2004;53(3):267-75. Epub 2003/12/10. 298. Lee MS, Johansen L, Zhang Y, Wilson A, Keegan M, Avery W, et al. The novel combination of chlorpromazine and pentamidine exerts synergistic antiproliferative effects through dual mitotic action. Cancer Res. 2007;67(23):11359-67. Epub 2007/12/07. 299. Yde CW, Clausen MP, Bennetzen MV, Lykkesfeldt AE, Mouritsen OG, Guerra B. The antipsychotic drug chlorpromazine enhances the cytotoxic effect of tamoxifen in tamoxifen-sensitive and tamoxifen-resistant human breast cancer cells. Anti-cancer drugs. 2009;20(8):723-35. Epub 2009/07/09. 300. Shin SY, Kim CG, Kim SH, Kim YS, Lim Y, Lee YH. Chlorpromazine activates p21Waf1/Cip1 gene transcription via early growth response-1 (Egr-1) in C6 glioma cells. Experimental & molecular medicine. 2010;42(5):395-405. Epub 2010/04/07. 301. Chen MH, Yang WL, Lin KT, Liu CH, Liu YW, Huang KW, et al. Gene expression-based chemical genomics identifies potential therapeutic drugs in hepatocellular carcinoma. PLoS One. 2011;6(11):e27186. Epub 2011/11/17. 302. Ahmad M. Study on cytochrome p-450 dependent retinoic Acid metabolism and its inhibitors as potential agents for cancer therapy. Scientia pharmaceutica. 2011;79(4):921-35. Epub 2011/12/07. 303. Haider S, Neidle S. Molecular modeling and simulation of G-quadruplexes and quadruplex- ligand complexes. Methods Mol Biol. 2010;608:17-37. Epub 2009/12/17.

174

References. .

304. Martin YC, Kofron JL, Traphagen LM. Do structurally similar molecules have similar biological activity? J Med Chem. 2002;45(19):4350-8. Epub 2002/09/06. 305. Young D, Martin T, Venkatapathy R, Harten P. Are the Chemical Structures in Your QSAR Correct? QSAR & Combinatorial Science. 2008;27(11-12):1337-45. 306. Olah M, Rad R, Ostopovici L, Bora A, Hadaruga N, Hadaruga D, et al. WOMBAT and WOMBAT-PK: Bioactivity Databases for Lead and Drug Discovery. Chemical Biology: Wiley-VCH Verlag GmbH; 2008. p. 760-86. 307. Dearden JC, Cronin MT, Kaiser KL. How not to develop a quantitative structure-activity or structure-property relationship (QSAR/QSPR). SAR QSAR Environ Res. 2009;20(3-4):241-66. Epub 2009/06/23. 308. Group QE. The Report from the Expert Group on (Quantitative) Structure-ActiVity Relationships [(Q)SARs] on the Principles for the Validation of (Q)SARs. Paris: Organisation for Economic Cooperation and Development:, 2004. 309. Tropsha A. Best Practices for QSAR Model Development, Validation, and Exploitation. Mol Inf. 2010;29:476-88. 310. Fourches D, Muratov E, Tropsha A. Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research. J Chem Inf Model. 2010;50:1189-204. 311. OpenBabel: the OpenSource Chemistry Toolbox. 2.2.99 ed: OpenEye Scientific Software, Inc.; 2006. 312. ChemAxon. JChem for Excel. 5.10.2.725 ed2012. 313. ChemAxon. Standardizer. 5.10.2 ed2012. 314. StatSoft. STATISTICA. 8.0 ed2007. p. (data analysis software system). 315. Guha R, Van Drie JH. Structure--activity landscape index: identifying and quantifying activity cliffs. J Chem Inf Model. 2008;48(3):646-58. Epub 2008/02/29. 316. Burden FR, Ford MG, Whitley DC, Winkler DA. Use of automatic relevance determination in QSAR studies using Bayesian neural networks. J Chem Inf Comput Sci. 2000;40(6):1423-30. 317. Kubinyi H, editor. Virtual Screening - The Road to Success. XIX International Symposium on Medicinal Chemistry; 2006; Istanbul, Turkey. 318. Parkinson GN, Lee MP, Neidle S. Crystal structure of parallel quadruplexes from human telomeric DNA. Nature. 2002;417(6891):876-80. 319. Haider SM, Neidle S, Parkinson GN. A structural analysis of G-quadruplex/ligand interactions. Biochimie. 2011;93(8):1239-51. Epub 2011/06/04. 320. Cantor CR, Warshaw MM, Shapiro H. Oligonucleotide interactions. 3. Circular dichroism studies of the conformation of deoxyoligonucleotides. Biopolymers. 1970;9(9):1059-77. Epub 1970/01/01. 321. Mergny JL, Maurizot JC. Fluorescence resonance energy transfer as a probe for G-quartet formation by a telomeric repeat. Chembiochem. 2001;2(2):124-32. Epub 2002/02/06. 322. Frank-Kamenetskii MD. Biophysics of the DNA molecule. Physics Reports. 1997;288(1 –6):13- 60. 323. Blackburn GM. Nucleic Acids in Chemistry and Biology. In: Blackburn GM, Gait MJ, Loakes D, Williams DM, editors. Nucleic Acids in Chemistry and Biology Royal Society of Chemistry; 2006. 324. Young D, Martin T, Venkatapathy R, Harten P. Are the Chemical Structures in Your QSAR Correct? QSAR Comb Sci. 2008;27:1337-45. 325. Stumpfe D, Bajorath J. Exploring activity cliffs in medicinal chemistry. J Med Chem. 2012;55(7):2932-42. Epub 2012/01/13. 326. Scior T, Medina-Franco JL, Do Q-T, Martínez-Mayorga K, Yunes Rojas JA, Bernard P. How to Recognize and Workaround Pitfalls in QSAR Studies: A Critical Review. Curr Med Chem. 2009;16(32):4297-313. 327. Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. 2nd ed. Gray J, editor. San Francisco: Morgan Kaufmann; 2005. 328. Japkowicz N. The class imbalance problem: Significance and strategies. International Conference on Artificial Intelligence (ICAI´2000)2000. 329. Japkowicz N. Learning from imbalanced data sets: A comparison of various solutions. AAAI´2000 Workshop on Learning from Imbalanced Data Sets2000. 330. Solov'ev VP, Varnek A. EdiSDF. 5.03 ed2010. p. (Editor of the Structure Data Files).

175

References. .

331. Varnek A, Fourches D, Hoonakker F, Solov’ev V. Substructural fragments: an universal language to encode reactions, molecular and supramolecular structures. J Comput-Aided Mol Design. 2005;19:693-703. 332. Varnek A, Fourches D, Horvath D, Klimchuk O, Gaudin C, Vayer P, et al. ISIDA - platform for virtual screening based on fragment and pharmacophoric descriptors. Curr Comput Aided Drug Des. 2008;4:191-8. 333. Huang N, Shoichet BK, Irwin JJ. Benchmarking sets for molecular docking. J Med Chem. 2006;49(23):6789-801. Epub 2006/12/13. 334. Willett P. Similarity-based virtual screening using 2D fingerprints. Drug Discovery Today. 2006;11(23 –24):1046-53. 335. Gillet VJ, Leach AR. Chemoinformatics. In: Editors-in-Chief: John BT, David JT, editors. Comprehensive Medicinal Chemistry II. Oxford: Elsevier; 2007. p. 235-64. 336. Darby RA, Sollogoub M, McKeen C, Brown L, Risitano A, Brown N, et al. High throughput measurement of duplex, triplex and quadruplex melting curves using molecular beacons and a LightCycler. Nucleic Acids Res. 2002;30(9):e39. Epub 2002/04/25. 337. Ahmed J, Meinel T, Dunkel M, Murgueitio MS, Adams R, Blasse C, et al. CancerResource: a comprehensive database of cancer-relevant proteins and compound interactions supported by experimental knowledge. Nucleic Acids Research. 2011;39(suppl 1):D960-D7. 338. Maggiora GM. On Outliers and Activity Cliffs - Why QSAR Often Disappoints. J Chem Inf Model. 2006;46:1535-. 339. Peltason L, Bajorath J. SAR index: quantifying the nature of structure –activity relationships. J Med Chem. 2007;50:5571-8. 340. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Monterey, CA: Wadsworth; 1984. 341. Eckert H, Bajorath J. Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. Drug Discov Today. 2007;12(5/6):225-33. 342. Jaworska J, Nikolova-Jeliazkova N, Aldenberg T. QSAR Applicability Domain Estimation by Projection of the Training Set in Descriptor Space: A Review. ATLA. 2005;33:445-59. 343. Sushko I, Novotarskyi S, Körner R, Pandey AK, Cherkasov A, Li J, et al. Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set. J Chem Inf Model. 2010;50:2094-111. 344. Johnson MA, Maggiora GM, editors. Concepts and Applications of Molecular Similarity: John Wiley & Sons; 1990. 345. Peng H, Long F, Ding C. Feature Selection Based on Mutual Information: Criteria of Max- Dependency, Max-Relevance, and Min-Redundancy. IEEE TRANS PATTERN ANAL MACHINE INTEL. 2005;27(8):1226-38. 346. Willett P. Similarity-based virtual screening using 2D fingerprints. Drug Discov Today. 2006;11(23-24):1046-53. Epub 2006/11/30. 347. Rose J. Methods for Data Analysis. In: Gasteier J, editor. Handbook of Chemoinformatics. Weinheim: WILEY-VCH; 2003. p. 1081-97. 348. Smith MR, Martinez T. Improving classification accuracy by identifying and removing instances that should be misclassified. The 2011 International Joint Conference on Neural Networks (IJCNN); July 31 2011-Aug. 5 20112011. p. 2690 - 7. 349. Scior T, Medina-Franco JL, Do QT, Martinez-Mayorga K, Yunes Rojas JA, Bernard P. How to recognize and workaround pitfalls in QSAR studies: a critical review. Curr Med Chem. 2009;16(32):4297-313. Epub 2009/09/17. 350. Blackburn EH, Greider CW, Henderson E, Lee MS, Shampay J, Shippen-Lentz D. Recognition and elongation of telomeres by telomerase. Genome. 1989;31(2):553-60. Epub 1989/01/01. 351. Greider CW. Telomeres, telomerase and senescence. Bioessays. 1990;12(8):363-9. Epub 1990/08/01. 352. Henderson ER, Moore M, Malcolm BA. Telomere G-strand structure and function analyzed by chemical protection, base analogue substitution, and utilization by telomerase in vitro. Biochemistry. 1990;29(3):732-7. Epub 1990/01/23. 353. Zahler AM, Williamson JR, Cech TR, Prescott DM. Inhibition of telomerase by G-quartet DNA structures. Nature. 1991;350(6320):718-20. Epub 1991/04/25. 354. Eddy J, Maizels N. Gene function correlates with potential for G4 DNA formation in the human genome. Nucleic Acids Res. 2006;34(14):3887-96. Epub 2006/08/18.

176

References. .

355. Brooks TA, Kendrick S, Hurley L. Making sense of G-quadruplex and i-motif functions in oncogene promoters. FEBS J. 2010;277(17):3459-69. Epub 2010/07/31. 356. Dasgupta D, Rajagopalan M, Sasisekharan V. DNA-binding characteristics of a synthetic analogue of distamycin. Biochem Biophys Res Commun. 1986;140(2):626-31. Epub 1986/10/30. 357. Luck G, Reinert KE, Baguley B, Zimmer C. Interaction of the nonintercalative antitumour drugs SN-6999 and SN-18071 with DNA: influence of ligand structure on the binding specificity. J Biomol Struct Dyn. 1987;4(6):1079-94. Epub 1987/06/01. 358. Gibbs EJ, Maurer MC, Zhang JH, Reiff WM, Hill DT, Malicka-Blaszkiewicz M, et al. Interactions of porphyrins with purified DNA and more highly organized structures. J Inorg Biochem. 1988;32(1):39-65. Epub 1988/01/01. 359. Fedorov VF, Zozulia VN, Piatigorskaia TL, Chernov VA, Blagoi Iu P. [Thermodynamic and spectral study of DNA-prospidin complex]. Mol Biol (Mosk). 1989;23(5):1440-6. Epub 1989/09/01. Termodinamicheskoe i spektral'noe issledovanie kompleksa DNK s prospidinom. 360. Monchaud D, Allain C, Bertrand H, Smargiasso N, Rosu F, Gabelica V, et al. Ligands playing musical chairs with G-quadruplex DNA: a rapid and simple displacement assay for identifying selective G-quadruplex binders. Biochimie. 2008;90(8):1207-23. 361. Ethirajan M, Chen Y, Joshi P, Pandey RK. The role of porphyrin chemistry in tumor imaging and photodynamic therapy. Chem Soc Rev. 2011;40(1):340-62. Epub 2010/08/10. 362. Wang C, Tao H, Cheng L, Liu Z. Near-infrared light induced in vivo photodynamic therapy of cancer based on upconversion nanoparticles. Biomaterials. 2011;32(26):6145-54. Epub 2011/05/28. 363. Liu K, Liu X, Zeng Q, Zhang Y, Tu L, Liu T, et al. Covalently assembled NIR nanoplatform for simultaneous fluorescence imaging and photodynamic therapy of cancer cells. ACS Nano. 2012;6(5):4054-62. Epub 2012/04/03. 364. Zhang P, Steelant W, Kumar M, Scholfield M. Versatile photosensitizers for photodynamic therapy at infrared excitation. J Am Chem Soc. 2007;129(15):4526-7. Epub 2007/03/28. 365. Pham TH, Hornung R, Berns MW, Tadir Y, Tromberg BJ. Monitoring tumor response during photodynamic therapy using near-infrared photon-migration spectroscopy. Photochem Photobiol. 2001;73(6):669-77. Epub 2001/06/26. 366. Tran PL, Largy E, Hamon F, Teulade-Fichou MP, Mergny JL. Fluorescence intercalator displacement assay for screening G4 ligands towards a variety of G-quadruplex structures. Biochimie. 2011;93(8):1288-96. Epub 2011/06/07. 367. Monchaud D, Allain C, Teulade-Fichou MP. Development of a fluorescent intercalator displacement assay (G4-FID) for establishing quadruplex-DNA affinity and selectivity of putative ligands. Bioorg Med Chem Lett. 2006;16(18):4842-5. 368. Monchaud D, Allain C, Teulade-Fichou MP. Thiazole orange: a useful probe for fluorescence sensing of G-quadruplex-ligand interactions. Nucleosides Nucleotides Nucleic Acids. 2007;26(10- 12):1585-8. Epub 2007/12/11. 369. Pennarun G, Granotier C, Gauthier LR, Gomez D, Hoffschir F, Mandine E, et al. Apoptosis related to telomere instability and cell cycle alterations in human glioma cells treated by new highly selective G-quadruplex ligands. Oncogene. 2005;24(18):2917-28. 370. De Cian A, Mergny JL. Quadruplex ligands may act as molecular chaperones for tetramolecular quadruplex formation. Nucleic Acids Res. 2007;35(8):2483-93. Epub 2007/03/31. 371. Gauthier LR, Granotier C, Hoffschir F, Etienne O, Ayouaz A, Desmaze C, et al. Rad51 and DNA-PKcs are involved in the generation of specific telomere aberrations induced by the quadruplex ligand 360A that impair mitotic cell progression and lead to cell death. Cell Mol Life Sci. 2012;69(4):629-40. Epub 2011/07/21. 372. Granotier C, Pennarun G, Riou L, Hoffschir F, Gauthier LR, De Cian A, et al. Preferential binding of a G-quadruplex ligand to human chromosome ends. Nucleic Acids Res. 2005;33(13):4182-90. Epub 2005/07/30. 373. Marcel V, Tran PL, Sagne C, Martel-Planche G, Vaslin L, Teulade-Fichou MP, et al. G- quadruplex structures in TP53 intron 3: role in alternative splicing and in production of p53 mRNA isoforms. Carcinogenesis. 2011;32(3):271-8. Epub 2010/11/30. 374. Cathers BE, Sun D, Hurley LH. Accurate determination of quadruplex binding affinity and potency of G-quadruplex-interactive telomerase inhibitors by use of a telomerase extension assay requires varying the primer concentration. Anticancer Drug Des. 1999;14(4):367-72. 375. Neidle S, Harrison RJ, Reszka AP, Read MA. Structure-activity relationships among guanine- quadruplex telomerase inhibitors. Pharmacol Ther. 2000;85(3):133-9.

177

References. .

376. Gavathiotis E, Heald RA, Stevens MF, Searle MS. Recognition and Stabilization of Quadruplex DNA by a Potent New Telomerase Inhibitor: NMR Studies of the 2:1 Complex of a Pentacyclic Methylacridinium Cation with d(TTAGGGT)(4) We thank the EPSRC of the UK and AstraZeneca for financial support to E.G. M.F.G.S. and R.A.H. are supported by the Cancer Research Campaign of the UK. Angew Chem Int Ed Engl. 2001;40(24):4749-51. 377. Cabrera-Pérez MA, Castillo-González D, Pérez-González M, Durán-Martínez A. Telomerase Inhibitory Activity of Acridinic Derivatives: A 3D-QSAR Approach. QSAR & Combinatorial Science [Internet]. 2009; 28(5):[526-36 pp.]. 378. Worth AP, Bassan A, Gallegos A, Netzeva TI, Patlewicz G, Pavan M, et al. The Characterisation of (Quantitative) Structure-Activity Relationships: Preliminary Guidance. EUROPEAN COMMISSION DIRECTORATE GENERAL JOINT RESEARCH CENTRE, 2005 Contract No.: January 02. 379. Castillo-González D, Cabrera-Pérez MÁ, Pérez-Machado G, Morales-Helguera A, Durán- Martínez A, editors. Derivados del berverine como agentes estabilizadores del Cuarteto-G. Una perspectiva QSAR. IV Simposio Internacional de Quimica 2010; 2010; Santa Clara, Cuba. 380. Castillo-González D, editor. Empleo de cromonas para la predicción de la actividad antitelomerasa por estabilización del Cuarteto-G. Una aproximación QSAR. IV Simposio Internacional de Quimica 2010; 2010; Santa Clara, Cuba. 381. Castillo-González D, Pérez-Machado G, Teijeira-Bautista M, Cabrera-Pérez MA, editors. Prediction of telomerase inhibitory activity for triazine derivatives based on chemical structure. The 18th European Symposium on Quantitative Structure –Activity Relationships; 2010 19 – 24 September 2010; Rhodes, Greece. 382. Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on published data on potential drug targets? Nat Rev Drug Discov. 2011;10(9):712-. 383. Estrada E, Gutierrez Y. Modeling chromatographic parameters by a novel graph theoretical sub- structural approach. J Chromatogr A. 1999;858(2):187-99. 384. Estrada E. On the topological sub-structural molecular design (TOSS-MODE) in QSPR/QSAR and drug design research. SAR QSAR Environ Res. 2000;11(1):55-73. 385. Lounkine E, Wawer M, Wassermann AM, Bajorath J. SARANEA: a freely available program to mine structure-activity and structure-selectivity relationship information in compound data sets. J Chem Inf Model. 2010;50(1):68-78. Epub 2010/01/08. 386. Peltason L, Hu Y, Bajorath J. From structure-activity to structure-selectivity relationships: quantitative assessment, selectivity cliffs, and key compounds. ChemMedChem. 2009;4(11):1864-73. Epub 2009/09/15. 387. Gauglitz GE, Vo-Dinh TE. Handbook of Spectroscopy. WILEY-VCH Verlag GmbH & Co. KGaA W, editor: WILEY-VCH; 2003. 388. Clark BJE, Frost TE, Russell MAE. UV Spectroscopy: Techniques, Instrumentation and Data Handling. Hall C, editor: Springer; 1993. 389. Fabro S, Smith RL, Williams RT. Toxicity and teratogenicity of optical isomers of thalidomide. Nature. 1967;215(5098):296. Epub 1967/07/15. 390. Aiazzi Mancini M. [Thalidomide: a teratogenic drug?]. La Riforma medica. 1962;76:705-7. Epub 1962/06/30.

178