Phenotypes and the Timing of Pubertal Milestones in a Longitudinal Cohort of Girls

A dissertation submitted to the Graduate School of the University of Cincinnati in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

In the Division of Epidemiology Department of Environmental Health University of Cincinnati College of Medicine

By Cecily Lyn Shimp Fassler BA, Hanover College MS, University of Cincinnati

Committee Susan Pinney, PhD (Advisor and Committee Chair) Frank Biro, MD Iris Gutmark-Little, MD Changchun Xie, PhD

ABSTRACT

Objectives. The primary objective of this research project was to use longitudinal cohort data

and quantitative research methods to identify sex hormone phenotypes around the time of

. A second objective of the research was to determine if the ages of pubertal milestones

(thelarche, , and ) are associated with a hormone phenotype. Identifying these

phenotypes will aid in understanding the biological mechanisms underlying differences in age at

pubertal development.

Methods. We measured four sex , DHEA-S, estrone, estradiol, and , at five

time periods relative to onset of thelarche (time=-18,-12,-6, 0,+6) in 269 girls from the greater

Cincinnati area. Principal components analysis was performed to select a subset of relevant

hormone variables. Cluster Analysis was then applied to identify phenotypes of girls based on

the hormone variables. Differences in demographic, anthropometric measurements and age at pubertal maturation events between the phenotypes were tested with the Kruskal-Wallis statistic and Chi-Square tests. Cox proportional hazard survival models were used to determine if different phenotypes were associated with ages of thelarche, pubarche, and menarche while controlling or race, caregiver education, mother’s age of menarche and body mass just prior to the age of the maturation event.

Results. Principal component analysis yielded three components accounting for 74% of the shared variability among our initial phenotypic variables. Cluster analysis then identified four distinct hormone phenotypes. Phenotype 1 (n=42) was defined as girls with high DHEA, and high testosterone and estrone; Phenotype 2 (n=37) was girls with very high estradiol, and high testosterone and estrone; Phenotype 3 (n=74) was girls who had lower hormone values for all hormones except DHEAS; and Phenotype 4 (n=96) was defined by girls with very low

ii hormones. Survival analysis showed different risks of earlier ages of thelarche, pubarche, and

menarche between the phenotypes.

Conclusions. Principal component and cluster analysis identified four meaningful and distinct

hormone profile phenotypes among young girls indicating the heterogeneity of hormone profiles

around pubertal timing. Furthermore, survival analysis confirmed the heterogeneity of the phenotypes and established the temporal stability of the phenotypes by confirming differentiation of risk for ages of thelarche, pubarche and menarche between the phenotypes. These analyses

help us understand the rich interplay of hormones during pubertal maturation, but also give us

insight in to the developmental onset of hormone related adult diseases.

iii

iv ACKNOWLEDGEMENTS

I would like to acknowledge and thank my PhD dissertation committee, Dr. Susan Pinney, Dr.

Frank Biro, Dr. Iris Gutmark-Little and Dr. Changchun Xie. Furthermore, I am thankful for Dr.

Pinney’s support and encouragement over the years as both my academic advisor and mentor.

Additionally, I would like to thank the late Dr. Succop for encouragement and guidance as my original advisor.

This study and dissertation would not have been possible without support the grants: U01-

ES12770, U01-ES019453, U01-ES026119, R01-ES029133, P30 ES006096.

Finally, I would like to thank my family and friends for their support throughout this process.

They have been the best cheerleaders one could ask for. They encouraged me to continue when I took time off and when things seem insurmountable. I thank my sweet husband who put up with my moods throughout this process and I would especially like to thank my father, Robert Shimp

PhD, for fostering my love of knowledge and education and for being my inspiration.

v TABLE OF CONTENTS

ABSTRACT ii ACKNOWLEDGEMENTS v TABLE OF CONTENTS vi LIST OF TABLES AND FIGURES viii CHAPTER 1: Introduction Introduction 1 Previous Studies 2 Objectives, Hypothesis, and Specific Aims 4 Background 5 Methodology 11 Study Population 13 Format of Dissertation 15 IRB Approval 16 References 17 Figures 21 CHAPTER 2: Sex hormone phenotypes in young girls identified by principal component and cluster analysis in a longitudinal chort Abstract 23 Introduction 24 Methods 26 Results 32 Discussion 36 Conclusions 38 References 40 Tables 43 Figures 52 Supplemental Figures 54 CHAPTER 3: Sex hormone phentoypes in young girls and the age at pubertal milestones Abstract 57 Introduction 58 Materials and Methods 59 Results 64 Discussion 68 References 74 Tables 77 Figures 80 Supplemental Table 83 Supplemental Figure 84

vi

CHAPTER 4: Conclusions, Limitations, Future Research, and Final Thoughts Conclusions 85 Limitations 86 Future Research 87 Final Thoughts 88 References 89

APPENDIX A Contributor's Statement Page 91

APPENDIX B SAS Code 92

vii LIST OF TABLES AND FIGURES Chapter 1: Figure 1 - Thelarche Figure 2 - Pubarche

Chapter 2 Table 1 – Description of sex hormone values for the study cohort across the 5 time periods (-18, -12,-6,0 and 6) Table 2 – Baseline demographics of 269 young girls Table 3 – Description of hormone values for the study cohort, by time period relative to thelarche Table 4 – Pearson correlations of absolute hormone values across the time periods Table 5 – Pearson correlations of hormone differences between time periods Table 6 – Factor loading from principal component analysis of hormones (n=260) Table 7 – Factor loading from principal component analysis of hormones (n=67) Table 8 – Factor loading from principal component analysis restricted to girls in cluster 3 (n=172) Table 9 – Hormone phenotype objective predictors Figure 1 – Scatterplots for clusters using criteria K-means in the population of 260 peripubertal girls Figure 2 – Scatter plots for clusters using criteria of K-means in the population of 172 peripubertal girls from cluster 3 Supplemental Figures S1A – 1 Scatterplot of 4 defined clusters S1B - Scatterplot of 5 defined clusters S1C – Scatterplot of 6 defined clusters

Chapter 3 Table 1 – Description of sex hormones for the study cohort of 269 girls Table 2 – Hormone phenotype objective predictors Table 3 – Hormone phenotype baseline characteristics and changes in hormones Figure 1 – Mean hormone values by phenotypes Figure 2A-C Kaplan Meier plots of time until pubertal milestone by phenotype Figure 3A-C Risk estimates of time to pubertal outcomes by phenotype from multiple variable Cox regression models Supplemental Table – Hazard ratio analysis for risk factors for age at pubertal milestones Supplemental Figure – Mean age of pubertal events by phenotype

viii CHAPTER 1 – Introduction

INTRODUCTION

Puberty is a time of change in a young girl’s life and can last as long as seven years.

During , a young girl’s sex and adrenal hormone (SA) levels and physique change

together, altering her body into one with full sexual reproductive capabilities. There are many

pubertal milestones reached before puberty ends. These milestones include, but are not limited

to, pubertal growth spurt, thelarche ( development), pubarche (appearance of ) and menarche (start of menstrual flow). Menarche is often considered one of the later stages of

puberty (1).

Secretion of the gonadotropin-releasing hormone (GnRH) by the pituitary gland marks the beginning of puberty in girls. Subsequently numerous hormones stimulate development and growth within the female body. These hormones include but are not limited to, sulfate (DHEA-S), testosterone (T), estrone (E1), and estradiol (E2).

These hormones can be detected during puberty in blood serum and typically increase throughout puberty. The effect and rate of change in hormone levels on the timing of pubertal milestones is unclear as there are gaps in research due to the lack of longitudinal studies and sensitive analytic approaches for quantifying hormone concentrations.

Data collected between 1940-1994 show that the ages of thelarche and menarche in the

United States has been on the decline (2–7). The National Health and Nutrition Examination

Survey (NHANES) reported an average age of menarche of 13.3 years for women born prior to

1920 versus an average age of 12.4 years for girls born between 1980 and 1984 (2,3,7). Similar studies showed the age of thelarche dropped from an average age of 10.4 (2) years to 9.6 years

(5). The causes of earlier menarche and thelarche have not been fully answered although several

1 factors have been identified, including race/ethnicity (African American) (3,5,6) and higher body

mass index (BMI) (2,7). Studies have shown different dose, exposure age and duration of

endocrine disrupters are associated with alterations in the pubertal process in girls, including

earlier ages of pubertal events (8–10). Other studies have shown early pubertal timing leads to an increased risk in breast cancer (11–14) and other adverse health outcomes as an adult. A pooled analysis reported that the risk of breast cancer decreases by nine percent for each year of delayed menarche in premenopausal women and by four percent for each delayed year in

postmenopausal women (13).

PREVIOUS STUDIES

Most previous studies evaluating hormone values in young girls relative to puberty have

been cross-sectional looking at hormones by pubertal status or chronological by age regardless of

pubertal status (15–20). A Danish study reported pubertal girls (breast stages 2 or greater)

between the ages of 10 and 12 years old from their 2006 cohort had statistically greater serum

estradiol levels versus pre-pubertal (breast stages 1) girls of the same ages (18). The same study

found mean ages of thelarche (mean=9.86 years) and menarche (mean=13.13 years) were lower

in their 2006 cohort versus the 1991 cohort (thelarche mean=10.88, menarche mean=13.42

years) and concluded that BMI did not contribute to this change in age (18). They also reported

the average serum estradiol level (mean=24 pmol/L, 2.3% of all girls in the cohort were in

thelarche) was statistically higher in the 1991 cohort of girls aged 8-10 years old versus the 2006

cohort (mean=18 pmol/L, 24.4% in thelarche) with no difference in values between 8 year olds

girls or girls older than 10. They reported that since they had performed a cross-sectional study

they could not conclude the earlier age of thelarche was due to higher levels of circulating

2 estradiol. Another cross sectional Danish study using the Copenhagen Puberty Study reported

higher levels of in girls who had achieved thelarche (Estrone mean=541.0 pmol/L,

17β-estradiol mean =70.3 pmol/L) versus prepubertal girls (Estrone median=87.0 pmol/L, 17β- estradiol median=9.6 pmol/L) (20). Furthermore, Mouritsen reported a mean age of thelarche of

10.07 years, a mean age of pubarche of 10.85, and with no association between age of pubertal onset and BMI using longitudinal data from the Copenhagen Puberty Study. The same analysis reported 77.97% of girls entered thelarche prior to pubarche, 15.25% entered pubarche and thelarche synchronously and 6.9% entered pubarche before thelarche (21). These studies have all used a Danish population which is not ethnically representative of the population in the

United States or the rest of the world.

Two previous studies have used data from the cohort (22,23) used in this study. The study participants included in each analyses varied slightly as the first study used hormone values from early in the study and did not have as many hormones values as were available for this study (22). The mean age of thelarche in the two prior studies was 8.78 years (22,23) and the mean age of thelarche for the girls included in this study was 9.02 year. This study included hormone values of girls who had reached thelarche later, since serum samples were sent for analysis only after the girls had reached thelarche. Neither of the prior studies required two serum hormones measurements from each girl as specified in this study and this study included more girls with hormone values assayed. The first study concluded both DHEA-S and estrone increased prior to estradiol in girls. Heavier girls (BMIz greater than the median BMIz) had lower estradiol levels at the onset of and six months later when compared to leaner girls (22). The most recent publication concluded that hormone concentrations for testosterone, estradiol, and estrone in young girls correlated with chronologic age (R=0.362,

3 0,350, 0.444 respectively with pvalues<0.001) as well as time relative to breast maturation

(R=0.259, 0.222, 0.323 receptively with pvalues<0.001), providing new information (23).

Each of the previous studies on hormone levels in young girls led to the need for this study. They all indicated hormone levels rise in young girls but need to be detailed by age related to puberty rather than chronological age. None of the studies determined if hormones change homogeneously in girls prior to puberty or if changes in the hormones are associate with earlier ages of puberty.

OBJECTIVES, HYPOTHESIS, and SPECIFIC AIMS

Even though the sequence of the changes in sex hormones is not fully understood, we do know much of puberty depends on the SA hormones and their effect on changes in a girl’s body.

Understanding the hormonal changes and their effect on thelarche and menarche is important because there is growing evidence that the advancement of pubertal maturation can have long- term health effects.

Objectives – The primary objective of this research project is to use longitudinal cohort data and quantitative research methods to identify sex hormone phenotypes around the time of thelarche.

These analyses incorporate serum concentrations of up to four hormones (DHEA-S, estradiol, estrone and testosterone) at five different time periods measured in 6 month increments from 18 months prior to 6 months after the age of thelarche. The second objective is to determine if the ages of pubertal events (thelarche, pubarche, and menarche) are associated with a proposed hormone phenotype.

4 Hypothesis – This study hypothesizes that in young girls, relative levels or changes in DHEA-S,

estrone, estradiol, and testosterone around the time of thelarche when considered together (at

least two hormone measured at two time points) as an individual hormone phenotype are directly related to an earlier age of pubertal timing (thelarche, pubarche, and menarche).

Specific Aim 1 – Describe the serum DHEA-S, estradiol, estrone and testosterone levels in girls

measured in 6 month increments from 18 months prior to 6 months after the age of thelarche.

Specific Aim 2 – Identify phenotypes of sex hormone profiles using combinations of hormone

levels across the different time points relative to thelarche. Two or more hormone data points

used together as a phenotype in this analysis may refer to two or more measurements of a certain

hormone in an individual girl at different time points (such as the change in E2 from time=-18 to

-12 months relative to thelarche) or measurement of at least two hormones taken at the same or

different time points in time (E2 and DHEA-S at time=-6) relative to the timing of thelarche.

Specific Aim 3 – Use multiple variable survival analysis to determine which among the sex

hormone profile phenotypes (consisting of DHEA-S, estradiol, estrone and testosterone) are

predictive of age of thelarche, age pubarche, and age of menarche to allow researchers to further

understand why some girls experience pubertal milestones at an earlier age than others.

BACKGROUND

Windows of Susceptibility

5 “Windows of susceptibility” refers to time periods during life when a person is at greater

risk of a health effect to a given environmental exposure. In-utero, neonatal, pre-puberty, puberty, , , involution and are all times of when SA hormones fluctuate in a female’s body. They also are times when breast cells undergo changes within the mammary glands. The changing and developing breast cells make the more susceptible to environmental exposures during these transition periods than during other times in a female’s life. It is well known that younger age of menarche, older age of first full-term pregnancy and older age of menopause are known risk factors for breast cancer, suggesting hormonal-related events are part of the pathogenesis of breast cancer. These critical developmental periods for females that have an impact on women’s breast health later in life are considered “windows of susceptibility” for breast cancer.

Puberty in Girls

Puberty, one of these “windows of susceptibility,” in young girls takes place over the span of several years. The changes occurring during puberty in a young girl’s body include the following milestones: pubertal growth spurt, thelarche, pubarche and menarche. The age of thelarche is the age at the onset of breast budding [sexual maturation stage 2 (Figure 1)] and the age of pubarche is the age when pubic hair first appears (Figure 2). The age of menarche is the age when a girl begin menstrual bleeding and is considered one of the later stages of puberty (1).

The ages of the girls experiencing these milestones differ depending on many unknown and

known factors including race (3,5,6), higher body mass index (BMI) (2,7), and endocrine

disrupters (8,24,25). Pubertal tempo, defined as the time between thelarche and menarche,

differs for every girl. All these physical changes during puberty are driven by changes in sex and

6 adrenal hormone levels. The changes occurring throughout puberty result in a fully sexually mature woman (14).

Role of Hormones in Puberty

Adrenal and sex hormones lead to the development of secondary sex characteristics.

Secondary sex characteristics are not directly part of the reproductive system but rather are physical characteristics e.g. pubic hair, enlarged . The secondary sex characteristics of breast development and increased height velocity are the first outward signs of puberty in girls.

Hormones increase from low levels in pre-puberty much to higher levels by mid-puberty

(26). However, the exact sequence of the changes in hormones during puberty is not fully understood. Almost all studies have been cross-sectional and have not ascertained the longitudinal changes within a girl.

In girls, puberty generally begins before the age of nine with the reactivation of the gonadotropin-releasing hormone (GnRH) in the hypothalamus. GnRH is active during prenatal life, becomes dormant in postnatal life, and is reactivated at the beginning of puberty. What triggers the reactivation is unclear. After reactivation, GnRH stimulates the production of the luteinizing hormone (LH) and the follicle-stimulating hormone (FSH) from the pituitary gland into the bloodstream. During early puberty, LH pulses increase in both magnitude and regularity during sleep indicating a relationship between sleep and pubertal maturation. Later in puberty, the pulses also increase during the day and ultimately the pulses stabilize to lower levels by the end of puberty. FSH works to regulate pubertal maturation and stimulates the growth of the follicles on the . Eventually, LH and FSH work together in the reproductive system to regulate the and mature and release from the ovaries (27,28).

7 LH and FSH promote ovarian development which then increases the production of

estradiol and estrone. Estradiol and estrone (which is also produced by the extra-glandular

conversion of adrenal ) stimulate the development of the breast and this development

can occur over the time period of four years. Breast development encompasses five stages as

defined by Marshall and Tanner (1). The increase in the adrenal hormones of DHEA-S and

testosterone stimulate pubic hair growth contributing to pubarche (26,27) which was defined in

five stages by Marshall and Tanner in 1969 (1). Menarche typically occurs around breast

maturation Stage IV and rarely occurs before stage III (29).

Declining Age of Pubertal Milestones in Girls

Studies conducted on data from 1940-1994 indicate that the ages of thelarche and

menarche in the United States (US) have declined (2–7). The National Health and Nutrition

Examination Survey (NHANES) of 1988-1994 reported the average age of thelarche in the

United States is 10.4 years (2). A later study with data collection in 2007 reported an average age of thelarche of 9.7 years (5). NHANES reported women born prior to 1920 had the self-reported average age of menarche of 13.3 years old. This same study found the age for girls born between

1980 and 1984 had dropped to 12.4 years (2,3,7). Most experts will agree that there has been a trend of pubertal milestones in girls being reached at earlier ages than in previous generations

(2). Achieving puberty at an earlier age has been shown to lead to adverse health later in life

such as an increased risk in depression (15–20), earlier occurrence of sexual activity (31–34),

eating disorders (35,36), and substance abuse in post pubertal teens (15,19,20,22), along with

elevated risks of cardiovascular disease (37–39) and breast cancer (11–14) later in life.

Health Risks due to Early Puberty in Girls

8 Girls entering puberty at a younger age are not emotionally ready for the physical

changes in their body (30,33,34). Girls’ coping resources typically relate to their chronological

age not physical or sexual development stage. However, girls who mature earlier do not

necessarily have the coping or emotional skills sufficiently mature for the physical change into

adulthood. Therefore, girls who enter puberty at an earlier age often engage in earlier sexual

activity and suffer more from depression, adverse eating patterns and substance abuse than their

peers who reach puberty at a later age (30–36,40).

Engaging in earlier sexual activity can alter a girl’s life in many ways. Girls engaging in

earlier sexual activity have an increased number of sexual partners during their lives and higher

rates of sexually transmitted diseases. Earlier sexual activity also leads to higher rates of teenage

pregnancy (31–34). Teenage pregnancy limits educational opportunities, increases the likelihood

of single parenting, and limits income opportunities ((36).

Earlier puberty changes a girl’s body in a way that appears physically different than her peers. This physical change often leads girls to feel a heightened awareness of their body image,

resulting in a feeling of emotional distance from their peers and leads to depression (36). Girls

who mature earlier than their peers are often more dissatisfied with their body image leading to

adverse eating patterns associated with anorexia and bulimia (26,36).

Some experimentation with alcohol during the teenage years is normative (41). However, girls experiencing puberty at earlier ages experiment with alcohol at an earlier age and with greater frequency than their peers. Earlier and increased frequency of use leads to alcohol issues and substance abuse (33,42,43).

Not only is there the epidemiological evidence that earlier puberty in girls is a risk factor

for the adverse outcomes during the teenage years but accumulating evidence shows a link

9 between early puberty in girls and adverse health outcomes as an adult. Studies support early

age of menarche as a risk cancer for breast cancer (11–14) and heart disease (37–39) later in life

making early puberty a lifetime health burden.

Early puberty for girls is associated with a higher BMI and fasting insulin levels and

decreased HDL cholesterol (37–39). These metabolic traits increase the risk for cardiovascular

disease. Researchers using the European Prospective Investigation into Cancer (EPIC)-Norfolk cohort noted women in the earliest category of age for menarche (<12 years old) had a 17% higher risk of cardiovascular disease than those who started menarche after the age of 12 (39). A meta-analysis by Prentice and Viner reported early menarche was associated with a 15% increased risk of heart disease (37). Critics have challenged the concept that early puberty is

associated with adult BMI independent of childhood BMI. However, studies using the Northern

Finland Birth Cohort 1966 (38) and the British Birth Cohort study (44) adjusted for childhood

BMI and were able to support the idea that pubertal timing independent of childhood BMI is an

independent risk for adult BMI.

Early menarche is perhaps one of the most widely known and well established risk factors for breast cancer (6,38,45). Throughout puberty, young girls’ and other hormones regulate the elongation and branching of the ductal tree in the breast which opens a

“widow of susceptibility” for the breast. A pooled analysis of studies report that the risk of breast

cancer decreases by nine percent (95% CI 7-11%) for each year of delayed menarche in

premenopausal women and by four percent (95% CI 2-5%) for each delayed year in

postmenopausal women (13). The decrease in age of menarche and the risk for breast cancer due

to the early age of menarche have been seen across the world in different populations.

10 METHODOLOGY

Defining Hormone Phenotypes as Predictors to Early Pubertal Milestones in Girls

As previously noted, many variables contribute to the age of pubertal milestones but little

is known about the contributions of hormones at time points prior to puberty. Is the age of a

pubertal milestone due to the relationship among the hormones and/or a change in one or more of

the hormone levels between times relative to puberty? Are some hormones more influential on

the age of puberty than others? In order to determine which among the hormones influence age

of pubertal milestone, phenotypes of early and late to puberty girls need to be defined.

Symptomatic phenotypes have been developed to better understand co-morbidity of many

diseases including chronic obstructive pulmonary disease (46–48), cardiovascular disease (49),

sleep apnea (50), Parkinson’s disease (51), and asthma (52). After determining relevant

independent variables are indeed correlated, defining a phenotype of a disease is a two-step

process: principal component analysis (PCA), cluster analysis (CA). The PCA-CA technique identifies a subset of independent components that account for much of the variance then employs cluster analysis to cluster subjects into distinct phenotypes. In this study after heterogeneous SA hormone phenotypes are developed, multiple variable survival analysis will determine if different phenotypes are predictive of varying times of the pubertal milestones of thelarche, pubarche and menarche.

Correlations

Explanatory variables must be deemed either positively or negatively correlated prior to performing PCA-CA. Correlations show how statistically related or unrelated variables are in a study. If numerous variables are shown to be highly correlated either directly or indirectly, this

11 correlation supports the use of the PCA-CA technique. PCA is not valid if variables are found to be uncorrelated.

Principal Component Analysis

Principal component analysis (PCA) is a method of variable reduction that is useful when one has a large number of variables and needs to reduce the number of variables used in the analysis. It is an exploratory technique which groups information into orthogonal, linear, combinations (components) of variables while removing noisy variables and still representing the information from the original set of variables. This reduction of variables into components is only possible if there is a redundancy among the variables because they are all measuring the similar constructs or are highly correlated.

In PCA, each component is a combination of correlated variables that explain shared variance. While there are no assumptions regarding the underlying casual model in PCA, the variables used in PCA must be normal in distribution and linear in relationship to one another.

Within PCA, eigenvalues are created from their correlation matrixes. Based on the Kaiser criterion, components with eigenvalues >1 are retained and interpreted. A component with an eigenvalue >1 accounts for a greater amount of variance than that contributed by only one variable. Logic would therefore imply a component, which contains multiple variables, with an eigenvalue <1 contributes less variance than an individual variable and should not be retained as the goal is variable reduction. If all the possible variables loaded onto only 1 factor, the one factor would explain all the variance. The first component is a linear combination of x-variables and represents the largest portion of the variance; the second component is a linear combination of k-variables and represents the second largest portion of the variance not explained by the first component and so forth. Ideally a component should consist of three or more variables. All

12 principal components are uncorrelated with each other. PCA can be conducted using Proc

PRINCOMP or PROC FACTOR in SAS.

Cluster Analysis

The final step in defining a phenotype is cluster analysis. Cluster analysis assigns subjects

in a cohort to clusters where subjects in one cluster are more similar to each other than to subjects in another cluster. The components found in the PCA analysis will be used to assign girls to hormone phenotypes in which girls in the same cluster (or phenotype) will exhibit similar components found from PCA. The number of clusters is not defined a priori. In SAS, the Proc

VARCLUS and FASTCLUS procedures will find disjoint clusters of girls.

Survival Analysis

Multiple variable survival analysis will be employed to determine if different relationships between the phenotypes and pubertal milestones exist. Kaplan-Meier estimates of cumulative probability will be used to confirm the phenotypes are associated with onset of earlier pubertal events. Factors affecting the age of the pubertal events will be assessed by multivariable

Cox proportional hazard models. The following variables besides the phenotypes will be included in each full model: BMIz at the visit closest to the pubertal event, race, caregiver’s education, mother’s age of menarche. The results of the models will be expressed as hazard ratios (HR) with Wald’s 95% confidence intervals (CI). Proportional hazard assumption for each variable will be tested using log of each baseline variable. Girls who either dropped out of the study before its end or did not reach the pubertal event during the study period will be right censored. PROC LIFETEST and PHREG can be used in SAS for the survival analysis.

STUDY POPULATION

13 The Puberty Study of the Breast Cancer and the Environmental Research Program

(BCERP) is a three site observational, longitudinal prospective cohort studying the impact of the environmental exposures on pubertal maturation in young girls. Between 2004 and 2006, the cohort recruited 1,239 girls aged six to eight years old from East Harlem, New York; the greater

Cincinnati, Ohio metropolitan area; and the San Francisco, California Bay Area. The proposed study will utilize data from the Cincinnati site of BCERP, conducted by Cincinnati Children’s

Hospital Medical Center and the University of Cincinnati. The Cincinnati area girls were recruited from public and parochial schools as well as the Breast Cancer Registry of Greater

Cincinnati to enrich the study population with girls with a family history of breast cancer.

Informed consent was obtained from the participant’s parents or guardians (4).

The Cincinnati girls (n=379) were seen every six months for the study years 2004-2010 and thereafter every twelve months with a study visit window was ± four weeks. The Cincinnati sub-cohort was the only one to have blood drawn every six months in order to assay for sex and adrenal hormone levels. Girls eligible for the longitudinal analyses included those who had at least two serum hormone measurements in two different time periods around thelarche.

Girls were excluded if they self-reported taking oral contraceptives or had an underlying hormone condition

Analyses used existing data including measurements of serum hormones E2, E1, T, and DHEA-S at time points relative to thelarche (-18, -12, -6, 0, +6 months). The time periods were defined as:

-18 months if the age at the blood collection was 21 months prior to thelarche up to 15 months prior to thelarche -12 months if the age at the blood collection was 15 months prior to thelarche and up to 9 months prior to thelarche -6 months if the age at the blood collection was 9 months prior to thelarche up to 3 months prior to thelarche

14 0 months if the age at the blood collection was 3 months prior to thelarche up to 3 months after thelarche 6 months if the age at the blood collection was at least 3 months after up to 9 months after thelarche

Due to the ±4week study window, it is possible a girl had two hormone measurements in one time period. If this occurred, values from the study visit closest to the time period name were used (e.g. if a girl had visit from both -7 and -3 months prior to thelarche, the study visit at -7

months is closer to the time period of -6 months and therefore the values from the -7 months visit

were used for the -6 month time period). If the study visits were the equal months from the time

period then the values from the study visit closest to thelarche were used.

Hormone levels were measured in the fasting blood serum of each participant drawn

early morning at every study visit. The use of high performance liquid chromatography with

tandem mass spectrometry (HPLC-MS) is a more sensitive approach that allowed us to evaluate

hormones that are typically too low to measure. Esoterix Laboratories, certified by the Centers

for Disease Control and Prevention, measured estrone, estradiol, and testosterone by (HPLC-MS)

and DHEA-S by Radioimmuno Assay for one batch and HPLC-MS for a second batch. The

average bias estimation from proficiency studies in this laboratory is less than 2%. Further

details of the collection of the serum hormones have been published previously (22).

FORMAT OF DISSERTATION

This dissertation is written in the format of two manuscripts written for submission to a

journal publication, Chapters 2 and 3. Chapter 1 provides background for the dissertation, the

study objective, hypothesis and aims along with information regarding the BCERP Puberty

Study cohort. The first manuscript, Chapter 2, is written as a methods paper. It describes the

15 hormone levels in the cohort across the five time periods and the development of the hormone phenotypes using principal component and cluster analyses. The second manuscript, Chapter 3, is written as a hypothesis testing study investigating the impact of the hormone phenotypes on the age of thelarche, pubarche, and menarche. Chapter 4 discusses the strengths and limitations of the project and details how the findings in Chapter 2 impact the design and interpretation of the analysis detailed in Chapter 3. Appendices include an additional contributor’s statement page and SAS code.

IRB APPROVAL

The University of Cincinnati Institutional Review Board and Cincinnati Children’s Hospital

Medical Center approved this study. The data collection of the GUF study was reviewed and approved by the Cincinnati Children’s Medical Center’s Institutional Review Board, protocol numbers 2008-0170 and 2010-1637.

16 REFERENCES 1. Marshall WA, Tanner JM. Variations in pattern of pubertal changes in girls. Arch. Dis. Child. 1969;44(235):291–303. 2. Euling SY, Herman-Giddens ME, Lee PA, Selevan SG, Juul A, SØrensen TIA, Dunkel L, Himes JH, Teilmann G, Swan SH. Examination of US puberty-timing data from 1940 to 1994 for secular trends: panel findings. Pediatrics 2008;121(Supplement 3):S172–S191. 3. McDowell MA, Brody DJ, Hughes JP. Has age at menarche changed? Results from the National Health and Nutrition Examination Survey (NHANES) 1999–2004. J. Adolesc. Heal. 2007;40(3):227–231. 4. Biro FM, Galvez MP, Greenspan LC, Succop PA, Vangeepuram N, Pinney SM, Teitelbaum S, Windham GC, Kushi LH, Wolff MS. Pubertal assessment method and baseline characteristics in a mixed longitudinal study of girls. Pediatrics 2010;126(3):e583-90. 5. Cabrera SM, Bright GM, Frane JW, Blethen SL, Lee PA. Age of thelarche and menarche in contemporary US females: a cross-sectional analysis. J. Pediatr. Endocrinol. Metab. 2014;27(1–2):47–51. 6. Biro FM, Greenspan LC, Galvez MP, Pinney SM, Teitelbaum S, Windham GC, Deardorff J, Herrick RL, Succop PA, Hiatt RA, Kushi LH, Wolff MS. Onset of breast development in a longitudinal cohort. Pediatrics 2013;132(6):1019–1027. 7. Anderson SE, Dallal GE, Must A. Relative weight and race influence average age at menarche: results from two nationally representative surveys of US girls studied 25 years apart. Pediatrics 2003;11(4):844–850. 8. Ouyang F, Perry MJ, Venners SA, Chen C, Wang B, Yang F, Fang Z, Zang T, Wang L, Xu X, Wang X. Serum DDT, age at menarche, and abnormal menstrual cycle length. Occup. Environ. Med. 2005;62(12):878–84. 9. Wolff MS, Britton JA, Boguski L, Hochman S, Maloney N, Serra N, Liu Z, Berkowitz G, Larson S, Forman J. Environmental exposures and puberty in inner-city girls. Environ. Res. 2008;107(3):393–400. 10. Özen S, Darcan Ş. Effects of environmental endocrine disruptors on pubertal development. J. Clin. Res. Pediatr. Endocrinol. 2011;3(1):1–6. 11. Rockhill B, Moorman PG, Newman B. Age at menarche, time to regular cycling, and breast cancer (North Carolina, United States). Cancer Causes Control 1998;9(4):447–453. 12. Garland M, Hunter DJ, Colditz GA, Manson JE, Stampfer MJ, Spiegelman D, Speizer F, Willett WC. Menstrual cycle characteristics and history of ovulatory infertility in relation to breast cancer risk in a large cohort of US women. Am. J. Epidemiol. 1998;147(7):636– 643. 13. Clavel-Chapelon F. Differential effects of reproductive factors on the risk of pre- and postmenopausal breast cancer. Results from a large cohort of French women. Br. J. Cancer 2002;86(5):723–727. 14. Bodicoat DH, Schoemaker MJ, Jones ME, McFadden E, Griffin J, Ashworth A, Swerdlow AJ. Timing of pubertal stages and breast cancer risk: the Breakthrough Generations Study. Breast Cancer Res. 2014;16(1):R18. 15. Sizonenko PC, Paunier LUC. Hormonal changes in puberty III: correlation of plasma dehy- droepiandrosterone, testosterone, FSH, and LH with stages of puberty and bone age in normal boys and girls and in patients with Addison’s Disease or Hypogonadism or with premature or late adrena. J. Clin. Endocrinol. Metab. 1975;(March):894–904.

17 16. Nottelmann ED, Susman EJ, Inoff-Germain G, Cutler GB, Loriaux DL, Chrousos GP. Developmental processes in early adolescence: relationships between adolescent adjustment problems and chronologic age, pubertal stage, and puberty-related serum hormone levels. J. Pediatr. 1987;110(3):473–80. 17. Shirtcliff EA, Dahl RE, Pollak SD. Pubertal development: correspondence between hormonal and physical development. Child Dev. 2009;80(2):327–37. 18. Aksglaede L, Sorensen K, Petersen JH, Skakkebaek NE, Juul A. Recent decline in age at breast development: The Copenhagen Puberty Study. Pediatrics 2009;123(5):e932–e939. 19. Zheng F, Sheng N, Zhang H, Yan S, Zhang J, Wang J. Perfluorooctanoic acid exposure disturbs glucose metabolism in mouse liver. Toxicol. Appl. Pharmacol. 2017;335:41–48. 20. Courant F, Aksglaede L, Antignac JP, Monteau F, Sorensen K, Andersson AM, Skakkebaek NE, Juul A, Bizec B Le. Assessment of circulating sex steroid levels in prepubertal and pubertal boys and girls by a novel ultrasensitive gas chromatography- tandem mass spectrometry method. J. Clin. Endocrinol. Metab. 2010;95(1):82–92. 21. Mouritsen A, Aksglaede L, Soerensen K, Hagen CP, Petersen JH, Main KM, Juul A. The pubertal transition in 179 healthy Danish children: associations between pubarche, , , and body composition. Eur. J. Endocrinol. 2013;168(2):129–136. 22. Biro FM, Pinney SM, Huang B, Baker ER, Walt Chandler D, Dorn LD. Hormone changes in peripubertal girls. J. Clin. Endocrinol. Metab. 2014;99(10):3829–3835. 23. Biro FM, Huang B, Chandler DW, Fassler CL, Pinney SM. Impact of pubertal maturation and chronologic age on sex steroids in peripubertal girls. J. Clin. Endocrinol. Metab. 2019. doi:10.1210/jc.2018-02684. 24. Wolff MS, Teitelbaum SL, Pinney SM, Windham G, Liao L, Biro F, Kushi LH, Erdmann C, Hiatt RA, Rybak ME, Calafat AM. Investigation of relationships between urinary biomarkers of phytoestrogens, phthalates, and phenols and pubertal stages in girls. Environ. Health Perspect. 2010;118(7):1039–1046. 25. Colón I, Caro D, Bourdony CJ, Rosario O. Identification of phthalate esters in the serum of young Puerto Rican girls with premature breast development. Environ. Health Perspect. 2000;108(9):895–900. 26. Peper JS, Dahl RE. The teenage brain: surging hormones—brain-behavior interactions during puberty. Curr. Dir. Psychol. Sci. 2013;22(2):134–139. 27. Braude P, Hamilton-Fairley D. Hormonal changes during puberty, pregnancy, and the menopause. In: Obstetric and Gyneologic Dermatology. Third Edit. Elsevier; 2008:3–12. 28. Swerdloff RS, Odell WD. Hormonal mechanisms in the onset of puberty. Postgrad. Med. J. 1975;51(594):200–8. 29. Biro FM, Huang B, Crawford PB, Lucky AW, Striegel-Moore R, Barton BA, Daniels S. Pubertal correlates in black and white girls. J. Pediatr. 2006;148(2):234–240. 30. Conley CS, Rudolph KD. The emerging sex difference in adolescent depression: interacting contributions of puberty and peer stress. Dev. Psychopathol. 2009;21(02):593. 31. Kaltiala-Heino R, Kosunen E, Rimpelä M. Pubertal timing, sexual behaviour and self- reported depression in middle adolescence. J. Adolesc. 2003;26(5):531–545. 32. Copeland W, Shanahan L, Miller S, Costello EJ, Angold A, Maughan B. Outcomes of early pubertal timing in young women: a prospective population-based study. Am. J. Psychiatry 2010;167(10):1218–1225. 33. Deardorff J, Gonzales NA, Christopher FS, Roosa MW, Millsap RE, Lumeng J, Deardorff J, Herrick RL, Succop PA, Hiatt RA, Kushi LH, Wolff MS. Early puberty and adolescent

18 pregnancy: the influence of alcohol use. Pediatrics 2005;116(6):1451–6. 34. Downing J, Bellis MA. Early pubertal onset and its relationship with sexual risk taking, substance use and anti-social behaviour: a preliminary cross-sectional study. BMC Public Health 2009;9(1):446. 35. Stice E, Presnell K, Bearman SK. Relation of early menarche to depression, eating disorders, substance abuse, and comorbid psychopathology among adolescent girls. Dev. Psychol. 2001;37:608–619. 36. Mendle J, Turkheimer E, Emery RE. Detrimental psychological outcomes associated with early pubertal timing in adolescent girls. Dev. Rev. 2007;27(2):151–171. 37. Prentice P, Viner RM. Pubertal timing and adult obesity and cardiometabolic risk in women and men: a systematic review and meta-analysis. Int. J. Obes. 2013;37(8):1036– 1043. 38. Widén E, Silventoinen K, Sovio U, Ripatti S, Cousminer DL, Hartikainen A-L, Laitinen J, Pouta A, Kaprio J, Järvelin M-R, Peltonen L, Palotie A. Pubertal timing and growth influences cardiometabolic risk factors in adult males and females. Diabetes Care 2012;35(4):850–6. 39. Lakshman R, Forouhi NG, Sharp SJ, Luben R, Bingham SA, Khaw K-T, Wareham NJ, Ong KK. Early age at menarche associated with cardiovascular disease and mortality. J. Clin. Endocrinol. Metab. 2009;94(12):4953–4960. 40. Rudolph KD, Troop-Gordon W, Lambert SF, Natsuaki MN. Long-term consequences of pubertal timing for youth depression: identifying personal and contextual pathways of risk. Dev. Psychopathol. 2014;26(4pt2):1423–1444. 41. Baumrind D. The influence of parenting style on adolescent competence and substance use. J. Early Adolesc. 1991;11(1):56–95. 42. Chassin L, Flora DB, King KM. Trajectories of alcohol and drug use and dependence from adolescence to adulthood: the effects of familial alcoholism and personality. J. Abnorm. Psychol. 2004;113(4):483–498. 43. Tschann JM, Adler NE, Irwin CE, Millstein SG, Turner RA, Kegeles SM. Initiation of substance use in early adolescence: the roles of pubertal timing and emotional distress. Heal. Psychol. 1994;13(4):326–333. 44. Pierce MB, Kuh D, Hardy R. Role of lifetime body mass index in the association between age at puberty and adult lipids: findings from men and women in a British birth cohort. Ann. Epidemiol. 2010;20(9):676–682. 45. La Vecchia C, Negri E, Bruzzi P, Dardanoni G, Decarli A, Franceschi S, Palli D, Talamini R. The role of age at menarche and at menopause on breast cancer risk: combined evidence from four case-control studies. Ann. Oncol. 1992;3(8):625–629. 46. Newandee DA, Reisman SS, Bartels AN, De Meersman RE. COPD severity classification using principal component and cluster analysis on HRV parameters. Proc. IEEE Annu. Northeast Bioeng. Conf. NEBEC 2003;2003–Janua:134–135. 47. Burgel PR, Paillasseur JL, Caillaud D, Tillie-Leblond I, Chanez P, Escamilla R, Court- Fortune I, Perez T, Carré P, Roche N. Clinical COPD phenotypes: A novel approach using principal component and cluster analyses. Eur. Respir. J. 2010;36(3):531–539. 48. Cho MH, Washko GR, Hoffmann TJ, Criner GJ, Hoffman EA. Cluster analysis in severe emphysema subjects using phenotype and genotype data : an exploratory investigation. Respir. Res. 2010;11(30). doi:10.1186/1465-9921-11-30. 49. Goodman E. Factor analysis of clustered cardiovascular risks in adolescence: obesity is

19 the predominant correlate of risk among youth. Circulation 2005;111(15):1970–1977. 50. Vavougios GD, Natsios G, Pastaka C, Zarogiannis SG, Gourgoulianis KI. Phenotypes of comorbidity in OSAS patients: combining categorical principal component analysis with cluster analysis. J. Sleep Res. 2016;25(1):31–38. 51. Kim SR, So HY, Choi E, Kang JH, Kim HY, Chung SJ. Influencing effect of non-motor symptom clusters on quality of life in Parkinson’s disease. J. Neurol. Sci. 2014;347(1– 2):310–315. 52. Haldar P, Pavord ID, Shaw DE, Berry MA, Thomas M, Brightling CE, Wardlaw AJ, Green RH. Cluster analysis and clinical asthma phenotypes. Am. J. Respir. Crit. Care Med. 2008;178(3):218–224.

20 FIGURES

(attributed to Dr. Frank Biro, 1989)

21

(attributed to Dr. Frank Biro, 1989)

22 CHAPTER 2 - Sex hormone phenotypes in young girls identified by principal component and cluster analyses in a longitudinal cohort

ABSTRACT Background: No studies have considered patterns in sex hormones levels as risk factors for early or late pubertal events even though it is widely known that throughout puberty, numerous

hormones stimulate development and growth within the female body.

Methods: We measured hormones including DHEA-S, estrone, estradiol, and testosterone at

five time periods relative the timing of thelarche in 269 girls from the greater Cincinnati area.

Principal components analysis (PCA) was performed to select a subset of relevant hormone

phenotypic variables. Cluster Analysis (CA) was then applied to identify phenotypes of girls based on the predictive hormone variables.

Results: PCA yielded three components accounting for 74% of the shared variability among our initial phenotypic variables. K-Means CA among 260 of girls then identified four distinct hormone phenotypes. Phenotype 1 was defined by high DHEA-S values. Phenotype 2 was defined by high estradiol values as well as estrone, testosterone and DHEAs at 6 months after thelarche. Phenotype 3a was defined by not high hormone values that changed over time and

Phenotype 3b was defined by low hormone values that changed minimally over time related to thelarche.

Conclusions: PCA-CA analysis identified four meaningful and distinct hormonal phenotypes in a longitudinal cohort of girls. This supports the heterogeneity of hormone profiles prior to puberty. These analyses underscore the need to better understand hormones prior to puberty based on timing of pubertal events rather than chronological age.

23 INTRODUCTION

Puberty is a time of great developmental change in a child’s life. It can last up to seven years in girls due to individual variation in timing and speed of development. During puberty, a young girl’s body experiences both physical growth and sexual maturation, resulting in a body with full sexual reproductive capabilities. There are many pubertal milestones reached before puberty ends in a young girl. These milestones include, but are not limited to thelarche (breast development as defined by Tanner sexual maturation stage 2), pubarche (appearance of pubic hair as defined by Tanner sexual maturation stage 2), the pubertal growth spurt, and menarche

(start of menstrual bleeding). Menarche is often considered one of the later stages of puberty (1).

Studies conducted on data from 1940-1994 show the average ages of both thelarche and menarche in the United States have been on the decline (2–7). The National Health and Nutrition

Examination Survey (NHANES) reported an average age of menarche of 13.3 years for women born prior to 1920 versus an average age of 12.4 years for girls born between 1980 and 1984

(2,3,7). Similar studies showed the age of thelarche dropped from an average age of 10.4 (2) years to 9.6 years (5).

Earlier pubertal timing leads to an increased risk in breast cancer (8–11). A pooled analysis reported that the risk of breast cancer decreases by nine percent for each year of delayed menarche in premenopausal women and by four percent for each delayed year in postmenopausal women (10). Further studies have shown earlier menarche as a risk factor for other adverse health outcomes in later in life including earlier sexual activity, depression, eating

24 disorders, increased BMI, increased insulin, increased diastolic blood pressure, and increased

levels of cardiovascular disease (12–22).

The causes of earlier menarche and thelarche have not been fully answered, although several

factors have been identified as risk markers, including race/ethnicity (African-American) (3,5,6)

and high body mass index (BMI) (2,7). No studies have considered patterns in sex and adrenal

hormones levels as risk factors for early or late pubertal events even though it is widely known

that throughout puberty, numerous hormones stimulate development and growth within the

female body. These hormones include, but are not limited to, dehydroepiandrosterone sulfate

(DHEA-S), testosterone, estrone, and estradiol.

Furthermore, most studies describing hormone levels prior to puberty are either based on chronological age regardless of pubertal status or on pubertal status, comparing those who display secondary sexual characteristics to those who do not. No studies have sought to determine the heterogeneity of a combination of hormone values within individuals and their effect on the age of the pubertal milestones. Consequently, it is not known if the changes in hormones are the same across all girls at all times prior to puberty or if some girls experience changes in these hormones differently which could bring about puberty at an earlier age.

The findings of this study may bridge this knowledge gap. Statistical methods exist to determine patterns and trends in data in order to identify phenotypes, specifically principal components

(PCA) and cluster analysis (CA). These methods have been applied to identify phenotypes of different health outcomes including cardiovascular risk (23), asthma (24–26), sleep apnea

25 (27,28), Parkinson’s (29), inflammatory brain disease (30), and chronic obstructive pulmonary disease (COPD) (31–33). To our knowledge, these strategies have not been applied to longitudinal hormone levels in young girls relative to the timing of pubertal events. This study

seeks to characterize longitudinal levels of a combination of hormones and changes in hormones

within an individual girl over time. In this cohort, we measured sex hormones in 6 month

increments from 12 months prior to thelarche to 6 months after thelarche, in girls. We

hypothesized that a Principal Component-Cluster Analysis (PCA-CA) approach would identify

distinct clusters of hormone phenotypes in the girls.

METHODS

Study Population

This present study used data from the longitudinal prospective Cincinnati cohort of the puberty

study of the Breast Cancer and Environmental Research Centers (BCERC). Overall study aims

and design have been previously detailed (4). Briefly, eligible subjects were females, age six to

seven years old at enrollment living in the greater Cincinnati metropolitan area in Ohio and

Kentucky. Approximately 85% of the participants were recruited through public and parochial

schools. The other 15% of the participants were recruited through the Breast Cancer Registry of

Greater Cincinnati because they had a first- or second-degree family member with breast cancer.

During 2004-2006, 379 participants were recruited. The girls were seen every six months during

2004-2010 and every twelve months thereafter with a study window of ±4 weeks.

Anthropometric data were collected at each study visit, pubertal maturation was staged, and

blood was drawn for hormone assay. Girls were excluded from this analysis if they self-reported

taking oral contraceptives, had an underlying hormone condition, presented with breast

26 development at enrollment, or did not enter thelarche during the duration of the study. The

Institutional Review Board of Cincinnati Children’s Hospital Medical Center approved this

study. (IRB protocol #2010-1637)

Definition of Pubertal Milestones

The pubertal milestone examined in this study was age of thelarche measured in months.

Thelarche assessment was performed by trained and certified clinicians. Thelarche status was determined using both observation and palpation of breast tissue. Status was assigned using the

Tanner Sexual Maturity Rating’s standardized methods (Tanner Stages) and defined as reaching a minimum of breast Stage 2 (breast bud stage) (1). An algorithm, previously described, was used to estimate the age of thelarche and the visit closest to that age (6). Detailed methods for pubertal assessment are described in further detail elsewhere (4).

Hormone Measurements

A fasting blood specimen from each participant was drawn early in the morning at every study visit. The serum was banked and stored in freezers at -80°Celsius. Hormone analyses were conducted for those girls who had entered puberty as defined by breast Stage 2 or greater and had two or more blood serum samples from predefined time points. Specimens for each girl at

18, 12, and 6 months prior to thelarche, thelarche visit, and 6 months post-thelarche were sent to

Esoterix Laboratory for analysis.

Radioimmuno assay (RIA) was used to measure DHEA-S for and initial batch of hormones. A second batch of DHEA-S and all estrone, estradiol, and testosterone were measured using high

27 performance liquid chromatography with tandem mass spectrometry (HPLC-MS). HPLC-MS is

a method for hormone analysis with greater sensitivity than prior methods. Further details of

hormone analysis have been previously detailed (34,35).

The lower limit of quantification (LOQ) is the lowest concentration at which the hormone can be

detected of a particular assay with a specified level of precision. The LOQs for the initial 252 girls included in the analysis were as follow: DHEA-S 10 mg/dL, estrone 2.5 pg/mL, estradiol 1

pg/mL, testosterone 3 ng/dL. For the later batch of hormones, the LOQ for testosterone was 2.5

ng/dL. For levels

both the initial batch of hormones and the additional batch was less than 20% for all hormones

(Table 1).

Pubertal Time Periods

Each girl had measurements of four hormones for up to five time points. Missing measurements

were because the participant was recruited after eighteen months prior to thelarche (timing of

breast development), or because the participant refused to have phlebotomy at a study visit,

missed a study visit, or the serum volume quantity drawn at a study visit was insufficient for

analysis. The pubertal time windows used in this analysis were defined as

-18 months if the age at the blood collection was between 21 months prior and up to 15

months prior to thelarche

28 -12 months if the age at the blood collection was between 15 months prior and up to 9

months prior to thelarche

-6 months if the age at the blood collection was between 9 months prior and up to 3

months prior to thelarche

0 months if the age at the blood collection was between 3 months prior and up to 3

months after thelarche

6 months if the age at the blood collection was between 3 months after and up to 9

months after thelarche

Due to the ±4 week study window, it was possible a girl had two hormone measurements in one time period. If this occurred, the value from the study visit closest to the middle of the pubertal time window was used (e.g. if a girl had visit from both -7 and -3 months prior to thelarche, the

study visit at -7 months was closer to the time period of -6 months and therefore the values from

the -7 months visit were used for the -6 month time period). If the study visits were the same

time amount of months from the time period then the values from the study visit closest to

thelarche were used.

Statistical Analysis

Descriptive statistics were used to describe this study’s participants. Data were expressed a mean

± standard deviation for continuous variables and percentages for categorical variables. We used

an agnostic approach in these analyses focusing solely on the hormone data without considering

any other characteristics of the girls. Statistical methods for developing the phenotypes followed

a step-by-step plan detailed below, beginning with the selection of relevant clinical variables.

29 Absolute hormones values at the time points were log transformed to approximate the normal

distribution for both correlations and Principal Component Analysis (PCA) as histogram plots and Kolmogorov-Smirnov (goodness-of-fit) tests indicated a better fit with log transformation.

For all analyses, a P-value less than 0.05 were considered statistically significant. All statistical analyses were performed using the SAS statistical package version 9.2 (SAS Institute, Cary, NC,

USA).

Correlations between the hormones at the different time periods as well correlations of the

changes of the hormones between the time periods were assessed using Pearson product-

movement correlations (Proc Corr in SAS). High correlations, either positive or negative,

between variables supported the use of the PCA-CA technique.

PCA was then performed as a method of variable reduction (Proc Factor in SAS). PCA is an

exploratory, mathematical technique that groups information into orthogonal, linear

combinations (components) of variables while removing noisy variables and still representing

the information from the original set of variables. This reduction of variables into components is

only possible if there is a redundancy among the variables because they are all measuring the

same thing or are highly correlated.

In PCA, each component is a combination of correlated variables that explain shared variance. A

variable will load onto a component if their loading (bivariate correlation between the observed

variable and the components) is >0.35. Based on the Kaiser criterion, components with

eigenvalues (sum of the squared component loading) >1 are retained and interpreted. A

component with an eigenvalue >1 accounts for a greater amount of variance than that contributed

30 by only one variable. Logic would therefore imply a component, which contains multiple

variables, with an eigenvalue <1 contributes less variance than an individual variable and should

not be retained as the goal is variable reduction. If all the possible variables loaded onto only 1

component, the one component would explain all the variance. The first component is a linear combination of x-variables and represents the largest portion of the variance; the second

component is a linear combination of k-variables and represents the second largest portion of the

variance not explained by the first component and so forth. Ideally a component should consist

of three or more variables and the number of components extracted should explain at least 70%

of the variance. All principal components are uncorrelated with each other.

As Proc Factor does not handle missing data, missing hormone data were imputed using Proc MI

in SAS, a procedure for multiple imputations. In order to improve the stability of the estimated

data, thirty datasets were imputed and then PCA was conducted on each of the imputed datasets.

Component loading from the results of the thirty PCAs were averaged. As a sensitivity analysis,

these results were compared with those from a subset of the data, (n=67) consisting of girls who

had all four hormone measurements at three time periods: -6, 0, 6, with no missing data.

Cluster analysis based on the objective predictive variables found in the PCA followed. Prior to

cluster analysis, the hormone measurements were standardized using z scores as the clustering

algorithm emphasizes variables with larger variances. Convergent K-means clustering allocated

the girls into clusters (hormone phenotypes) where participants in one cluster were more similar

to each other than participants in another cluster (Proc Fastclus in SAS). Inclusion in a cluster

was defined by Euclidian distance, least square estimation. This distance was imputated for

31 missing data. Several iterations of Cluster Analysis were performed in order to determine the optimal number of clusters as the number of clusters is not defined a priori. Another eleven girls were identified as outliers and were not included in any cluster. The differences in the hormone values at each time period between the phenotypes were examined by performing Kruskal-Wallis for overall group differences and Wilcoxon Signed Rank for pairwise comparisons (Proc

Npar1way in SAS).

RESULTS

Characteristics of participants

The general demographic characteristics of the participants are displayed in Table 2. This analysis included 269 girls who had entered thelarche, as defined by breast stage 2 or greater, and had at least two blood samples obtained between 18 months prior to thelarche and 6 months after reaching thelarche as detailed above. A majority of the girls were white and 32% were black. Very few of the girls were Hispanic or Asian. The average age of thelarche of for the girls was 9.02 years. There were 3,493 hormone measurements (four hormones at five possible time windows) from 935 serum samples generated by the 269 participants. Mean and median values of the hormones as a whole and by time relative to thelarche are reported in Tables 1 and 3.

Relationships between the hormones

Relationships between the four hormones at the five time periods were assessed using Pearson- product movement correlations (Table 4). All of the correlations of the hormone levels with significant relationships were positive between the variables. There was a lack of correlation of estradiol levels at the various time points, while with other hormones, levels of the hormone at

32 various time points were positively correlated with each other. Overall estradiol also was not correlated with either DHEA-S or testosterone. Estradiol was highly correlated with estrone at the corresponding time periods. DHEA-S, testosterone, and estrone were highly correlated with themselves and each other across the time periods. The high degree of correlation among the hormones supported the use of PCA.

Relationships among the differences in the hormone values between the time periods (e.g. the difference in testosterone values between -18 and -12) were also assessed using Pearson-product movement correlations. Overall, there was less correlation than in the prior analysis; the lack of correlation did suggest girls with a large change in a hormone between two time periods did not experience a large change in another hormone between time periods. For each time period the change in estrone level was negatively correlated with the change between levels of the ensuing time periods. There was no correlation between the changes in DHEA-S and changes in either estrone or estradiol. This overall lack of correlation did not support inclusion of these “change” variables into PCA. Results that illustrate the correlations are shown in Table 5.

PCA Results

We performed PCA with a varimax rotation to transform the absolute values of the four hormones at each time period. Missing data at the beginning time periods (-18 and -12) and for the “changes” in values between the time periods (as well as a general lack of correlation) directed us to focus on the absolute hormones at the -6, 0, and 6 time periods. Remaining missing data were assumed to be missing at random. To overcome the missing data and ensure the stability of the estimates, only girls with at least one hormone value from the three time

33 periods (-6,0, and 6) were included in the analyses. Nine girls had no hormone values in the three time periods used in PCA and were not included in the PCA or CA. Thirty datasets (Proc MI in

SAS) were imputed. Then we ran 30 PCAs. The reported component loadings are the averages of the loading across the 30 PCAs.

The first three components contributed significantly to explaining the relationships among absolute hormones at the three time periods (eigenvalues >1), and explained 74% of the shared variance (Table 6). The components were interpreted as 1) testosterone (-6, 0, +6), DHEA-S (-6,

0, +6), and estrone (-6 and 0) hormones 2) estradiol and estrone at -6 and 6 and 3) estradiol and estrone at thelarche (0). All twelve variables (all hormones at -6, 0, and 6) loaded on the first three components. Therefore variable reduction was not possible and it was necessary to include all variables as the objective predictive variables in cluster analysis to define the phenotypes. As a sensitivity analyses, PCA was performed on the subset of girls (n=67) with no missing data for the four hormones at the three time periods -6, 0, and 6. Results were comparable to the results of the PCA (from the 30 PCAs of the 30 imputed datasets, n=260) although components 2 and 3 were reversed (Table 7).

Classification of the participants into phenotypes using cluster analysis

Classification of the 260 participants using CA was based on the variables loading onto the first three components identified in PCA. Three to six clusters were modeled to identify the optimal number of clusters. Ultimately four distinct clusters were identified. Analysis creating three clusters (with 9 outliers) proved to identify the most disjoint clusters [Figure 1A vs. SA (4

34 clusters), SB (5 clusters), SC (6 clusters)]. Clusters 1 (n=42) and 2 (n=37) could be easily

defined by the objective predictive variables, but not cluster 3 (n=172).

PCA-CA was then performed only on the girls included in cluster 3 to see if that cluster could be broken into two clusters. PCA again indicated no possible variable reduction of the twelve objective predictive variables (Table 8). CA identified two distinct clusters within in Cluster 3 with 2 outliers (Figure 2). These clusters were named cluster 3a (n=74) and cluster 3b (n=96).

Major differences in the objective predictive variables (the hormones at six months prior to thelarche, thelarche and six months after thelarche) were found between the clusters (Table 9).

The four derived clusters were attributed to the following phenotypes:

Phenotype 1 (cluster 1, high DHEA-S) – There were 42 girls included in this phenotype profile.

Besides, very high DHEA-S these girls had high testosterone, and estrone values across the time

windows. There was a significant increase in their DHEA-S from six months prior to thelarche

and at the time of thelarche (-6 to 0) when compared to the entire cohort and the other

phenotypes.

Phenotype 2 (cluster 2, high estradiol) – These girls (n=37) represented the smallest phenotype

profile group. Not only can these girls be identified by their high estradiol hormone levels but

they also can be identified by high DHEA-S, estrone, and testosterone at six months after

thelarche. These girls also experienced a significant decrease in DHEA-S, estrone, estradiol and testosterone levels between time windows -6 and the time of thelarche (time=0) but then an

35 increase in the three hormones from thelarche to 6 months after thelarche. A decrease in levels

between any time points was not seen in any hormone for the other phenotypes.

Phenotype 3a (cluster 3a, no high hormones) – The 74 girls in this phenotype had at least 20%

lower levels of hormones than the cohort as a whole. The hormones levels increased across the

time related to thelarche. Similar to the girls in phenotype 1 with the high DHEA-S, these girls

experienced a significantly large increase in (14.52%) in DHEA-s from -6 until the time of thelarche.

Phenotype 3b (cluster 3b, all low hormones) – This was the largest cluster representing 36.9% of the girls (n=96). Their hormones levels were at least 30% lower than the cohort as a whole over all time windows related to thelarche and the girls experienced minimal changes in the hormones over the time.

DISCUSSION

The ability to classify hormone heterogeneity around the time of thelarche is very useful due to the disparity in the ages of pubertal milestones among young girls and the associated risks due to early puberty. To the best of our knowledge, this is the first attempt to develop sex hormonal phenotypes using a data-driven approach. Our results suggest there are four distinct hormone phenotypes in young girls. The major strengths of this study are the sensitive methods for measuring sex and adrenal hormones in serum, the innovative statistical analyses, and the unique longitudinal cohort.

36 PCA-CA has been used to identify clinical phenotypes in other medical conditions

(24,25,28,29,31–33). PCA was used for variable reduction and then unsupervised CA identified phenotypes in chronic obstructive pulmonary disease (31–33), asthma (24,25), obstructive sleep apnea (28), and Parkinson’s patients (29). In each analysis, none of the phenotypes included all of the symptoms considered typical of the medical conditions, but instead each phenotype presented with a different set of symptoms thus supporting the heterogeneity of the phenotypes.

Using 12 variables representing DHEA-S, estrone, estradiol, and testosterone measured at three time periods, our study identified four phenotypes of girls differing in hormone characteristics.

Knowing what phenotype a girl may belong to will allow physicians to understand which girls are at risk for earlier puberty or other health risks. For example, rapid increases in estradiol levels, such as those seen in Phenotype 2 (high estradiol), differentiate the breast tissue more and make the breast more resistant to mutagens versus slow growing breast tissue which has more time to be susceptible to exogenous chemicals (36).

This longitudinal cohort offered a unique opportunity to investigate hormones relative to thelarche rather than based on chronological age. The average age of pubertal milestones varies greatly among girls. In this study sample, girls entered thelarche as early as 6.08 years old and as late as 12.42 years. Examining hormones at chronological age (e.g. age 7 vs age 8 vs age 9) regardless of pubertal status would have diluted the differences in hormones levels at times relative to the timing of thelarche.

There are several potential limitations to this study. Although the study population came from one site, the cohort is racially and socioeconomically diverse (4). The staging of thelarche can be seen as a limitation as thelarche is not an instantaneous event. In order to conduct this

37 analysis, we used previously published methods to estimate age and date of thelarche (6) and

then defined time windows around this date. By using a broad period for each window, we

minimized this limitation. Additionally, in other epidemiologic studies, concerns have been

raised regarding breast tissue potentially confused with making some experts question the validity of the breast maturation staging. We included palpation in the determination of breast maturation stage, minimizing the confusion with adipose tissue in the breast area. As previously documented, in a subset of this cohort, estradiol levels differed between girls with high versus low BMI%. Girls with BMI% higher than the median had significantly lower estradiol levels at thelarche than girls with BMI%s lower than the median. This finding necessitated investigation into the validity of the breast maturation staging as it would imply girls with the higher BMI% but lower estradiol might have had breast maturation confused with fat tissue. However, upon further investigation, girls with lower BMI% had pubertal growth spurts and height velocities similar to those with a high BMI% (34). These similarities lend to the accuracy of the breast maturation staging.

Although we conducted sensitivity analysis to ensure the stability of the phenotypes, further validation on a larger cohort is necessary to ensure these finding are generalizable to all girls.

Once done, more definite clinical reference data may be given to physicians, enabling them to personalize care among their pubertal patients.

CONCLUSIONS

PCA-CA analysis in a longitudinal cohort of young girls identified four meaningful and distinct hormonal phenotypes. Results indicated hormones values at time points relative to the age of thelarche are not same in all girls. These analyses underscore the need to better understand

38 hormones changes during puberty based on time related to a pubertal milestone rather than chronological age. An improved understanding of hormone phenotypes and their relation to age of pubertal outcomes could facilitate a better understanding of why some girls are at risk for puberty at an earlier age than others.

39 REFERENCES 1. Marshall WA, Tanner JM. Variations in pattern of pubertal changes in girls. Arch. Dis. Child. 1969;44(235):291–303. 2. Euling SY, Herman-Giddens ME, Lee PA, Selevan SG, Juul A, SØrensen TIA, Dunkel L, Himes JH, Teilmann G, Swan SH. Examination of US puberty-timing data from 1940 to 1994 for secular trends: panel findings. Pediatrics 2008;121(Supplement 3):S172–S191. 3. McDowell MA, Brody DJ, Hughes JP. Has age at menarche changed? Results from the National Health and Nutrition Examination Survey (NHANES) 1999–2004. J. Adolesc. Heal. 2007;40(3):227–231. 4. Biro FM, Galvez MP, Greenspan LC, Succop PA, Vangeepuram N, Pinney SM, Teitelbaum S, Windham GC, Kushi LH, Wolff MS. Pubertal assessment method and baseline characteristics in a mixed longitudinal study of girls. Pediatrics 2010;126(3):e583-90. 5. Cabrera SM, Bright GM, Frane JW, Blethen SL, Lee PA. Age of thelarche and menarche in contemporary US females: a cross-sectional analysis. J. Pediatr. Endocrinol. Metab. 2014;27(1–2):47–51. 6. Biro FM, Greenspan LC, Galvez MP, Pinney SM, Teitelbaum S, Windham GC, Deardorff J, Herrick RL, Succop PA, Hiatt RA, Kushi LH, Wolff MS. Onset of breast development in a longitudinal cohort. Pediatrics 2013;132(6):1019–1027. 7. Anderson SE, Dallal GE, Must A. Relative weight and race influence average age at menarche: results from two nationally representative surveys of US girls studied 25 years apart. Pediatrics 2003;11(4):844–850. 8. Rockhill B, Moorman PG, Newman B. Age at menarche, time to regular cycling, and breast cancer (North Carolina, United States). Cancer Causes Control 1998;9(4):447–453. 9. Garland M, Hunter DJ, Colditz GA, Manson JE, Stampfer MJ, Spiegelman D, Speizer F, Willett WC. Menstrual cycle characteristics and history of ovulatory infertility in relation to breast cancer risk in a large cohort of US women. Am. J. Epidemiol. 1998;147(7):636– 643. 10. Clavel-Chapelon F. Differential effects of reproductive factors on the risk of pre- and postmenopausal breast cancer. Results from a large cohort of French women. Br. J. Cancer 2002;86(5):723–727. 11. Bodicoat DH, Schoemaker MJ, Jones ME, McFadden E, Griffin J, Ashworth A, Swerdlow AJ. Timing of pubertal stages and breast cancer risk: the Breakthrough Generations Study. Breast Cancer Res. 2014;16(1):R18. 12. Conley CS, Rudolph KD. The emerging sex difference in adolescent depression: interacting contributions of puberty and peer stress. Dev. Psychopathol. 2009;21(02):593. 13. Stice E, Presnell K, Bearman SK. Relation of early menarche to depression, eating disorders, substance abuse, and comorbid psychopathology among adolescent girls. Dev. Psychol. 2001;37:608–619. 14. Prentice P, Viner RM. Pubertal timing and adult obesity and cardiometabolic risk in women and men: a systematic review and meta-analysis. Int. J. Obes. 2013;37(8):1036– 1043. 15. Kaltiala-Heino R, Kosunen E, Rimpelä M. Pubertal timing, sexual behaviour and self- reported depression in middle adolescence. J. Adolesc. 2003;26(5):531–545. 16. Rudolph KD, Troop-Gordon W, Lambert SF, Natsuaki MN. Long-term consequences of pubertal timing for youth depression: identifying personal and contextual pathways of

40 risk. Dev. Psychopathol. 2014;26(4pt2):1423–1444. 17. Copeland W, Shanahan L, Miller S, Costello EJ, Angold A, Maughan B. Outcomes of early pubertal timing in young women: a prospective population-based study. Am. J. Psychiatry 2010;167(10):1218–1225. 18. Mendle J, Turkheimer E, Emery RE. Detrimental psychological outcomes associated with early pubertal timing in adolescent girls. Dev. Rev. 2007;27(2):151–171. 19. Deardorff J, Gonzales NA, Christopher FS, Roosa MW, Millsap RE, Lumeng J, Deardorff J, Herrick RL, Succop PA, Hiatt RA, Kushi LH, Wolff MS. Early puberty and adolescent pregnancy: the influence of alcohol use. Pediatrics 2005;116(6):1451–6. 20. Downing J, Bellis MA. Early pubertal onset and its relationship with sexual risk taking, substance use and anti-social behaviour: a preliminary cross-sectional study. BMC Public Health 2009;9(1):446. 21. Baumrind D. The influence of parenting style on adolescent competence and substance use. J. Early Adolesc. 1991;11(1):56–95. 22. Chassin L, Flora DB, King KM. Trajectories of alcohol and drug use and dependence from adolescence to adulthood: the effects of familial alcoholism and personality. J. Abnorm. Psychol. 2004;113(4):483–498. 23. Goodman E. Factor analysis of clustered cardiovascular risks in adolescence: obesity is the predominant correlate of risk among youth. Circulation 2005;111(15):1970–1977. 24. Haldar P, Pavord ID, Shaw DE, Berry MA, Thomas M, Brightling CE, Wardlaw AJ, Green RH. Cluster analysis and clinical asthma phenotypes. Am. J. Respir. Crit. Care Med. 2008;178(3):218–224. 25. Just J, Gouvis-Echraghi R, Rouve S, Wanin S, Moreau D, Annesi-Maesano I. Two novel, severe asthma phenotypes identified during childhood using a clustering approach. Eur. Respir. J. 2012;40(1):55–60. 26. Kurukulaaratchy RJ, Zhang H, Raza A, Patil V, Karmaus W, Ewart S, Arshad SH. The diversity of young adult wheeze: a cluster analysis in a longitudinal birth cohort. Clin. Exp. Allergy 2014;44(5):724–35. 27. Ye L, Pien GW, Ratcliffe SJ, Björnsdottir E, Arnardottir ES, Pack AI, Benediktsdottir B, Gislason T. The different clinical faces of obstructive sleep apnoea: a cluster analysis. Eur. Respir. J. 2014;44(6):1600–7. 28. Vavougios GD, Natsios G, Pastaka C, Zarogiannis SG, Gourgoulianis KI. Phenotypes of comorbidity in OSAS patients: combining categorical principal component analysis with cluster analysis. J. Sleep Res. 2016;25(1):31–38. 29. Kim SR, So HY, Choi E, Kang JH, Kim HY, Chung SJ. Influencing effect of non-motor symptom clusters on quality of life in Parkinson’s disease. J. Neurol. Sci. 2014;347(1– 2):310–315. 30. Cellucci T, Tyrrell PN, Twilt M, Sheikh S, Benseler SM. Distinct phenotype clusters in childhood inflammatory brain diseases: implications for diagnostic evaluation. Arthritis Rheumatol. 2014;66(3):750–756. 31. Burgel PR, Paillasseur JL, Caillaud D, Tillie-Leblond I, Chanez P, Escamilla R, Court- Fortune I, Perez T, Carré P, Roche N. Clinical COPD phenotypes: A novel approach using principal component and cluster analyses. Eur. Respir. J. 2010;36(3):531–539. 32. Newandee DA, Reisman SS, Bartels AN, De Meersman RE. COPD severity classification using principal component and cluster analysis on HRV parameters. Proc. IEEE Annu. Northeast Bioeng. Conf. NEBEC 2003;2003–Janua:134–135.

41 33. Cho MH, Washko GR, Hoffmann TJ, Criner GJ, Hoffman EA. Cluster analysis in severe emphysema subjects using phenotype and genotype data : an exploratory investigation. Respir. Res. 2010;11(30). doi:10.1186/1465-9921-11-30. 34. Biro FM, Pinney SM, Huang B, Baker ER, Walt Chandler D, Dorn LD. Hormone changes in peripubertal girls. J. Clin. Endocrinol. Metab. 2014;99(10):3829–3835. 35. Biro FM, Huang B, Chandler DW, Fassler CL, Pinney SM. Impact of pubertal maturation and chronologic age on sex steroids in peripubertal girls. J. Clin. Endocrinol. Metab. 2019. doi:10.1210/jc.2018-02684. 36. Russo J, Russo IH. The role of estrogen in the initiation of breast cancer. J. Steroid Biochem. Mol. Biol. 2006;102(1–5):89–96.

42 TABLES

Table 1 - Description of hormone values for the study cohort across the 5 time periods (-18,-12,-6,0, and 6)

Hormone N Median Mean Standard Minimum Maximum LOD # < % < LOD DHEA-S (ug/dL) 920 22.00 30.16 26.57 7.07 211.00 10.00 170 18.48 Estradiol (pg/mL) 856 1.80 3.42 6.26 0.71 114.00 1.00 207 24.18 Estrone (pg/mL) 858 3.60 4.36 3.41 1.77 51.00 2.50 254 29.60 Testosterone (ng/dL) 859 4.10 4.89 3.59 1.77 50.00 3 or 2.5* 242 28.17 All values

43

Table 2 - Baseline demographics of 269 young girls Means and standard deviation unless noted Standard Mean Deviation Age of Thelarche (years) 9.02 1.10 Age of Pubarche (years) 9.85 1.35 Age of Menarche (years) 12.30 1.15 BMIZ 0.33 1.02 BMI Percentile 58.97 29.56 Ethnicity (%) Black 31.60% Hispanic, White, Asian, All Other 68.40%

Note: BMI=body mass index (kg/cm2) BMI% based on Centers for Disease Control and Prevention 2000 growth curve reference data Thelarche - Breast maturation stage 2 or greater Pubarche - Pubic maturation stage 2 or greater

44 Table 3 - Description of hormone values for the study cohort, by time period relative to thelarche

18 Months Prior to Thelarche Hormone N Median Mean Std Dev Minimum Maximum DHEA-S (ug/dL) 147 16.00 23.52 23.29 7.07 161.00 Estradiol (pg/mL) 136 1.30 1.70 1.85 0.71 19.00 Estrone (pg/mL) 136 2.70 2.97 1.42 1.77 7.60 Testosterone (ng/dL) 120 3.20 3.78 2.21 1.77 16.00

12 Months Prior to Thelarche Hormone N Median Mean Std Dev Minimum Maximum DHEA-S (ug/dL) 175 19.00 27.63 24.77 7.07 150.00 Estradiol (pg/mL) 164 1.50 3.36 10.10 0.71 114.00 Estrone (pg/mL) 165 3.00 3.88 4.40 1.77 51.00 Testosterone (ng/dL) 168 3.30 4.59 5.30 1.77 50.00

6 Months Prior to Thelarche Hormone N Median Mean Std Dev Minimum Maximum DHEA-S (ug/dL) 201 21.00 28.23 24.99 7.07 163.00 Estradiol (pg/mL) 191 1.80 2.83 3.32 0.71 31.00 Estrone (pg/mL) 192 3.55 4.00 2.19 1.77 14.00 Testosterone (ng/dL) 188 3.70 4.48 2.69 1.77 16.00

Thelarche Hormone N Median Mean Std Dev Minimum Maximum DHEA-S (ug/dL) 190 23.50 30.79 26.08 7.07 184.00 Estradiol (pg/mL) 180 2.00 3.37 4.22 0.71 42.00 Estrone (pg/mL) 181 3.90 4.47 2.46 1.77 12.00 Testosterone (ng/dL) 184 4.30 4.96 2.76 1.77 17.00

6 Months After Thelarche Hormone N Median Mean Std Dev Minimum Maximum DHEA-S (ug/dL) 207 30.00 38.32 29.16 7.07 211.00 Estradiol (pg/mL) 185 3.00 5.38 7.16 0.71 48.00 Estrone (pg/mL) 184 5.30 6.09 4.46 1.77 40.00 Testosterone (ng/dL) 199 5.40 6.16 3.64 1.77 23.00 * All values

45 Table 4 - Pearson correlations of absolute hormones values across the time periods R ≤0.25 Correlations of the Hormones R>0.25 to 0.4 R>0.4 to 0.6 Prob > |r| under H0: Rho=0 R>0.6 Estradiol Estrone Testosterone DHEA-S -18 -12 -6 0 +6 -18 -12 -6 0 +6 -18 -12 -6 0 +6 -18 -12 -6 0 +6 1.00000 0.15998 0.16729 0.70701 0.00209 0.37863 0.15172 0.15393 0.32956 0.10145 0.01617 0.06485 0.10064 0.11009 0.10420 0.00396 -0.00092 0.06593 -0.01327 0.01748 Estradiol 0.1256 0.0880 <.0001 0.9838 <.0001 0.1466 0.1169 0.0023 0.3228 0.8674 0.5236 0.3216 0.3218 0.2997 0.9636 0.9927 0.4978 0.9035 0.8582 -18 136 93 105 83 97 136 93 105 83 97 109 99 99 83 101 135 102 108 86 107 0.15998 1.00000 0.15079 0.02929 0.13030 0.25881 0.84194 0.11513 0.10693 0.16173 -0.00996 0.55499 0.03894 -0.04270 0.10209 -0.03810 0.18465 0.00950 0.01845 0.11363 0.1256 0.0959 0.7624 0.1670 0.0122 <.0001 0.2048 0.2685 0.0856 0.9288 <.0001 0.6715 0.6504 0.2632 0.7052 0.0190 0.9142 0.8441 0.2016 -12 93 164 123 109 114 93 164 123 109 114 83 155 121 115 122 101 161 131 116 128 0.16729 0.15079 1.00000 0.17799 0.26571 -0.02067 0.06923 0.30035 0.09281 0.40442 -0.01215 -0.01769 0.07310 0.02703 0.28938 -0.00012 -0.00276 0.01610 -0.06293 0.12473 0.0880 0.0959 0.0470 0.0018 0.8342 0.4467 <.0001 0.3013 <.0001 0.9111 0.8473 0.3378 0.7611 0.0004 0.9990 0.9754 0.8274 0.4718 0.1258 -6 105 123 191 125 135 105 123 191 126 134 87 121 174 129 146 109 127 186 133 152 0.70701 0.02929 0.17799 1.00000 0.11547 0.27278 0.04140 0.18869 0.50106 0.27551 -0.02977 -0.04451 0.08198 0.26122 0.22521 0.00618 -0.00125 0.02198 0.03138 0.09136 <.0001 0.7624 0.0470 0.1997 0.0126 0.6676 0.0343 <.0001 0.0020 0.8039 0.6489 0.3693 0.0006 0.0092 0.9544 0.9896 0.8025 0.6802 0.2866 0 83 109 125 180 125 83 110 126 180 124 72 107 122 171 133 88 111 132 175 138 0.00209 0.13030 0.26571 0.11547 1.00000 0.07224 0.30193 0.26700 0.02034 0.69192 0.10477 0.16326 0.26358 0.06049 0.43415 0.16612 0.24587 0.24150 0.05960 0.08910 0.9838 0.1670 0.0018 0.1997 0.4820 0.0011 0.0017 0.8211 <.0001 0.3613 0.0799 0.0019 0.4976 <.0001 0.0986 0.0068 0.0034 0.4989 0.2291 +6 97 114 135 125 185 97 114 135 126 184 78 116 137 128 176 100 120 145 131 184 0.37863 0.25881 -0.02067 0.27278 0.07224 1.00000 0.64306 0.59362 0.55022 0.43363 0.68544 0.55792 0.52180 0.49678 0.40474 0.49981 0.48329 0.50929 0.56724 0.48931 Estrone <.0001 0.0122 0.8342 0.0126 0.4820 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 -18 136 93 105 83 97 136 93 105 83 97 109 99 99 83 101 135 102 108 86 107 0.15172 0.84194 0.06923 0.04140 0.30193 0.64306 1.00000 0.71594 0.50795 0.48849 0.15304 0.64450 0.53639 0.51491 0.55938 0.10262 0.34627 0.44059 0.26788 0.33625 0.1466 <.0001 0.4467 0.6676 0.0011 <.0001 <.0001 <.0001 <.0001 0.1672 <.0001 <.0001 <.0001 <.0001 0.3072 <.0001 <.0001 0.0035 0.0001 -12 93 164 123 110 114 93 165 123 110 114 83 156 121 116 122 101 162 131 117 128 0.15393 0.11513 0.30035 0.18869 0.26700 0.59362 0.71594 1.00000 0.63389 0.61997 0.58512 0.25903 0.70292 0.59853 0.54261 0.44836 0.45648 0.48211 0.43159 0.50045 0.1169 0.2048 <.0001 0.0343 0.0017 <.0001 <.0001 <.0001 <.0001 <.0001 0.0041 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 -6 105 123 191 126 135 105 123 192 127 134 87 121 175 129 146 109 127 187 134 152 0.32956 0.10693 0.09281 0.50106 0.02034 0.55022 0.50795 0.63389 1.00000 0.56620 0.54291 0.23422 0.49228 0.72790 0.48426 0.27556 0.33219 0.42521 0.40462 0.51015 0.0023 0.2685 0.3013 <.0001 0.8211 <.0001 <.0001 <.0001 <.0001 <.0001 0.0152 <.0001 <.0001 <.0001 0.0094 0.0004 <.0001 <.0001 <.0001 0 83 109 126 180 126 83 110 127 181 125 72 107 123 172 134 88 111 133 176 139 0.10145 0.16173 0.40442 0.27551 0.69192 0.43363 0.48849 0.61997 0.56620 1.00000 0.29152 0.32677 0.43298 0.49133 0.70939 0.33697 0.36589 0.40236 0.44092 0.38204 0.3228 0.0856 <.0001 0.0020 <.0001 <.0001 <.0001 <.0001 <.0001 0.0096 0.0003 <.0001 <.0001 <.0001 0.0006 <.0001 <.0001 <.0001 <.0001 +6 97 114 134 124 184 97 114 134 125 184 78 116 136 127 175 100 120 144 130 184 0.01617 -0.00996 -0.01215 -0.02977 0.10477 0.68544 0.15304 0.58512 0.54291 0.29152 1.00000 0.30368 0.73321 0.66217 0.52310 0.54304 0.57421 0.56761 0.62001 0.58219 Testosterone 0.8674 0.9288 0.9111 0.8039 0.3613 <.0001 0.1672 <.0001 <.0001 0.0096 0.0040 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 -18 109 83 87 72 78 109 83 87 72 78 120 88 80 73 83 120 91 90 75 88 0.06485 0.55499 -0.01769 -0.04451 0.16326 0.55792 0.64450 0.25903 0.23422 0.32677 0.30368 1.00000 0.41128 0.28095 0.31471 0.20162 0.32158 0.25666 0.18786 0.23882 0.5236 <.0001 0.8473 0.6489 0.0799 <.0001 <.0001 0.0041 0.0152 0.0003 0.0040 <.0001 0.0025 0.0004 0.0382 <.0001 0.0035 0.0444 0.0062 -12 99 155 121 107 116 99 156 121 107 116 88 168 118 114 122 106 166 128 115 130 0.10064 0.03894 0.07310 0.08198 0.26358 0.52180 0.53639 0.70292 0.49228 0.43298 0.73321 0.41128 1.00000 0.61781 0.61184 0.55301 0.55325 0.52658 0.49673 0.49775 0.3216 0.6715 0.3378 0.3693 0.0019 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 -6 99 121 174 122 137 99 121 175 123 136 80 118 188 125 144 103 124 187 130 150 0.11009 -0.04270 0.02703 0.26122 0.06049 0.49678 0.51491 0.59853 0.72790 0.49133 0.66217 0.28095 0.61781 1.00000 0.66151 0.41605 0.48773 0.50349 0.51430 0.57693 0.3218 0.6504 0.7611 0.0006 0.4976 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.0025 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0 83 115 129 171 128 83 116 129 172 127 73 114 125 184 139 89 118 136 180 142 0.10420 0.10209 0.28938 0.22521 0.43415 0.40474 0.55938 0.54261 0.48426 0.70939 0.52310 0.31471 0.61184 0.66151 1.00000 0.33753 0.40425 0.46873 0.42147 0.47916 0.2997 0.2632 0.0004 0.0092 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.0004 <.0001 <.0001 0.0005 <.0001 <.0001 <.0001 <.0001 +6 101 122 146 133 176 101 122 146 134 175 83 122 144 139 199 104 125 156 141 198 0.00396 -0.03810 -0.00012 0.00618 0.16612 0.49981 0.10262 0.44836 0.27556 0.33697 0.54304 0.20162 0.55301 0.41605 0.33753 1.00000 0.87529 0.87399 0.81573 0.79568 DHEA-S 0.9636 0.7052 0.9990 0.9544 0.0986 <.0001 0.3072 <.0001 0.0094 0.0006 <.0001 0.0382 <.0001 <.0001 0.0005 <.0001 <.0001 <.0001 <.0001 -18 135 101 109 88 100 135 101 109 88 100 120 106 103 89 104 147 110 115 92 110 -0.00092 0.18465 -0.00276 -0.00125 0.24587 0.48329 0.34627 0.45648 0.33219 0.36589 0.57421 0.32158 0.55325 0.48773 0.40425 0.87529 1.00000 0.86628 0.84445 0.83328 0.9927 0.0190 0.9754 0.9896 0.0068 <.0001 <.0001 <.0001 0.0004 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 -12 102 161 127 111 120 102 162 127 111 120 91 166 124 118 125 110 175 135 119 134 0.06593 0.00950 0.01610 0.02198 0.24150 0.50929 0.44059 0.48211 0.42521 0.40236 0.56761 0.25666 0.52658 0.50349 0.46873 0.87399 0.86628 1.00000 0.89845 0.87650 0.4978 0.9142 0.8274 0.8025 0.0034 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.0035 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 -6 108 131 186 132 145 108 131 187 133 144 90 128 187 136 156 115 135 201 140 162 -0.01327 0.01845 -0.06293 0.03138 0.05960 0.56724 0.26788 0.43159 0.40462 0.44092 0.62001 0.18786 0.49673 0.51430 0.42147 0.81573 0.84445 0.89845 1.00000 0.88023 0.9035 0.8441 0.4718 0.6802 0.4989 <.0001 0.0035 <.0001 <.0001 <.0001 <.0001 0.0444 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0 86 116 133 175 131 86 117 134 176 130 75 115 130 180 141 92 119 140 190 146 0.01748 0.11363 0.12473 0.09136 0.08910 0.48931 0.33625 0.50045 0.51015 0.38204 0.58219 0.23882 0.49775 0.57693 0.47916 0.79568 0.83328 0.87650 0.88023 1.00000 0.8582 0.2016 0.1258 0.2866 0.2291 <.0001 0.0001 <.0001 <.0001 <.0001 <.0001 0.0062 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 +6 107 128 152 138 184 107 128 152 139 184 88 130 150 142 198 110 134 162 146 207

46 Table 5 - Pearson correlations of the hormone differences between time periods

Correlations of Difference in Hormones Between Time Periods

Standarized differences of the hormones between two time periods

Pearson Correlation Coefficients Prob > |r| under H0: Rho=0 Positivley Correlated - R>0.25 Number of Observations

Negatively Correlated - R<-0.25 Estradiol Estrone Testosterone DHEA-S

-18 to -12 -12 to -6 -6 to 0 0 to +6 -18 to -12 -12 to -6 -6 to 0 0 to +6 -18 to -12 -12 to -6 -6 to 0 0 to +6 -18 to -12 -12 to -6 -6 to 0 0 to +6 1.00000 -0.85571 -0.15953 0.79127 0.67510 0.12728 -0.05240 0.19664 0.40230 -0.09589 -0.15445 0.09652 -0.01137 -0.02248 -0.14081 0.17689 Estradiol <.0001 0.2788 <.0001 <.0001 0.2700 0.7236 0.2008 0.0004 0.4367 0.2946 0.5187 0.9148 0.8462 0.3052 0.2143 -18 to -12 93 77 48 44 93 77 48 44 73 68 48 47 91 77 55 51 -0.85571 1.00000 -0.44727 -0.00063 -0.13761 0.07271 -0.03429 -0.15800 -0.07444 0.06355 0.11636 -0.13909 0.06933 0.08888 0.01430 -0.12152 <.0001 <.0001 0.9962 0.2327 0.4242 0.7612 0.2279 0.5686 0.5195 0.3136 0.2616 0.5491 0.3364 0.8960 0.3127 -12 to -6 77 123 81 60 77 123 81 60 61 105 77 67 77 119 86 71 -0.15953 -0.44727 1.00000 -0.15913 -0.12706 -0.09540 0.37729 -0.13150 -0.10119 -0.00218 0.10956 -0.08239 0.12502 0.03038 0.08456 -0.06970 0.2788 <.0001 0.1433 0.3895 0.3969 <.0001 0.2303 0.5344 0.9858 0.2636 0.4454 0.3972 0.7904 0.3647 0.5115 -6 to 0 48 81 125 86 48 81 125 85 40 69 106 88 48 79 117 91 0.79127 -0.00063 -0.15913 1.00000 0.41882 -0.02828 -0.21876 0.54027 0.16688 0.07623 -0.22372 0.21155 0.11933 0.10787 -0.04730 0.02472 <.0001 0.9962 0.1433 0.0047 0.8302 0.0430 <.0001 0.3099 0.5661 0.0447 0.0232 0.4192 0.3924 0.6598 0.7887 0 to +6 44 60 86 125 44 60 86 124 39 59 81 115 48 65 89 120 0.67510 -0.13761 -0.12706 0.41882 1.00000 -0.39604 -0.09543 -0.03191 0.60969 -0.18868 0.04182 -0.04919 0.19229 -0.08330 -0.12131 0.05485 Estrone <.0001 0.2327 0.3895 0.0047 0.0004 0.5188 0.8371 <.0001 0.1233 0.7778 0.7427 0.0678 0.4714 0.3776 0.7022 -18 to -12 93 77 48 44 93 77 48 44 73 68 48 47 91 77 55 51 0.12728 0.07271 -0.09540 -0.02828 -0.39604 1.00000 -0.29978 0.05116 -0.25166 0.12652 -0.16669 -0.07813 0.04372 0.16310 -0.01235 0.20546 0.2700 0.4242 0.3969 0.8302 0.0004 0.0065 0.6979 0.0504 0.1984 0.1474 0.5297 0.7058 0.0764 0.9102 0.0856 -12 to -6 77 123 81 60 77 123 81 60 61 105 77 67 77 119 86 71 -0.05240 -0.03429 0.37729 -0.21876 -0.09543 -0.29978 1.00000 -0.55340 -0.18284 0.03193 0.46689 -0.25402 0.05248 0.06425 0.16528 -0.04613 0.7236 0.7612 <.0001 0.0430 0.5188 0.0065 <.0001 0.2588 0.7945 <.0001 0.0163 0.7232 0.5737 0.0724 0.6624 -6 to 0 48 81 125 86 48 81 127 86 40 69 107 89 48 79 119 92 0.19664 -0.15800 -0.13150 0.54027 -0.03191 0.05116 -0.55340 1.00000 0.39963 0.10420 -0.36709 0.67557 -0.01166 -0.03443 0.08217 0.18208 0.2008 0.2279 0.2303 <.0001 0.8371 0.6979 <.0001 0.0117 0.4322 0.0007 <.0001 0.9373 0.7854 0.4440 0.0456 0 to +6 44 60 85 124 44 60 86 125 39 59 81 115 48 65 89 121 0.40230 -0.07444 -0.10119 0.16688 0.60969 -0.25166 -0.18284 0.39963 1.00000 -0.39924 -0.13167 0.19521 0.50312 0.02684 -0.07218 0.03442 Testosterone 0.0004 0.5686 0.5344 0.3099 <.0001 0.0504 0.2588 0.0117 0.0016 0.4307 0.2213 <.0001 0.8306 0.6415 0.8245 -18 to -12 73 61 40 39 73 61 40 39 88 60 38 41 87 66 44 44 -0.09589 0.06355 -0.00218 0.07623 -0.18868 0.12652 0.03193 0.10420 -0.39924 1.00000 0.06676 -0.21403 0.00770 0.05882 0.08424 -0.17062 0.4367 0.5195 0.9858 0.5661 0.1233 0.1984 0.7945 0.4322 0.0016 0.5589 0.0895 0.9481 0.5323 0.4575 0.1642 -12 to -6 68 105 69 59 68 105 69 59 60 118 79 64 74 115 80 68 -0.15445 0.11636 0.10956 -0.22372 0.04182 -0.16669 0.46689 -0.36709 -0.13167 0.06676 1.00000 -0.42012 0.09220 0.06547 0.32140 -0.03375 0.2946 0.3136 0.2636 0.0447 0.7778 0.1474 <.0001 0.0007 0.4307 0.5589 <.0001 0.5331 0.5590 0.0003 0.7481 -6 to 0 48 77 106 81 48 77 107 81 38 79 125 95 48 82 122 93 0.09652 -0.13909 -0.08239 0.21155 -0.04919 -0.07813 -0.25402 0.67557 0.19521 -0.21403 -0.42012 1.00000 -0.12287 -0.10876 -0.07689 0.28284 0.5187 0.2616 0.4454 0.0232 0.7427 0.5297 0.0163 <.0001 0.2213 0.0895 <.0001 0.3904 0.3631 0.4447 0.0009 0 to +6 47 67 88 115 47 67 89 115 41 64 95 139 51 72 101 135 -0.01137 0.06933 0.12502 0.11933 0.19229 0.04372 0.05248 -0.01166 0.50312 0.00770 0.09220 -0.12287 1.00000 -0.41418 0.26656 0.02176 DHEA-S 0.9148 0.5491 0.3972 0.4192 0.0678 0.7058 0.7232 0.9373 <.0001 0.9481 0.5331 0.3904 <.0001 0.0450 0.8747 -18 to -12 91 77 48 48 91 77 48 48 87 74 48 51 110 88 57 55 -0.02248 0.08888 0.03038 0.10787 -0.08330 0.16310 0.06425 -0.03443 0.02684 0.05882 0.06547 -0.10876 -0.41418 1.00000 -0.14567 -0.08977 0.8462 0.3364 0.7904 0.3924 0.4714 0.0764 0.5737 0.7854 0.8306 0.5323 0.5590 0.3631 <.0001 0.1659 0.4406 -12 to -6 77 119 79 65 77 119 79 65 66 115 82 72 88 135 92 76 -0.14081 0.01430 0.08456 -0.04730 -0.12131 -0.01235 0.16528 0.08217 -0.07218 0.08424 0.32140 -0.07689 0.26656 -0.14567 1.00000 -0.41274 0.3052 0.8960 0.3647 0.6598 0.3776 0.9102 0.0724 0.4440 0.6415 0.4575 0.0003 0.4447 0.0450 0.1659 <.0001 -6 to 0 55 86 117 89 55 86 119 89 44 80 122 101 57 92 140 108 0.17689 -0.12152 -0.06970 0.02472 0.05485 0.20546 -0.04613 0.18208 0.03442 -0.17062 -0.03375 0.28284 0.02176 -0.08977 -0.41274 1.00000 0.2143 0.3127 0.5115 0.7887 0.7022 0.0856 0.6624 0.0456 0.8245 0.1642 0.7481 0.0009 0.8747 0.4406 <.0001 0 to +6 51 71 91 120 51 71 92 121 44 68 93 135 55 76 108 146

47 Table 6 -Factor loading from principal component analysis of hormones (n=260) Factor loadings greater that 35 are flagged by an '*'. Average of loadings from 30 PCAs of 30 Imputations

Factor 1 Factor 2 Factor 3 Testosterone -6 70 * 34 6 Testosterone 0 73 * 26 31 Testosterone +6 55 * 63 * 7 DHEA-S -6 87 * 9 4 DHEA-S 0 88 * 2 3 DHEA-S +6 87 * 16 8 Estrone -6 61 * 46 * 36 Estrone 0 60 * 35 60 * Estrone +6 33 80 * 20 Estradiol -6 0 40 * 67 * Estradiol 0 11 32 85 * Estradiol +6 5 82 * 26 *Hormones log transformed Variance 0.5013023 0.1541 0.0861067 Explained by Factor Cumulative 0.5013023 0.6554023 0.741509 Variance

48 Table 7 - Factor loading from principal component analysis of hormones (n=67) Factor loadings greater that 35 are flagged by an '*'. n=67 (Girls who have all hormone values at times =-6,0,6) Factor 1 Factor 2 Factor 3 Testosterone -6 81 * 5 8 Testosterone 0 80 * 14 10 Testosterone +6 78 * -4 35 * DHEA-S -6 86 * 8 4 DHEA-S 0 87 * 12 7 DHEA-S +6 91 * 4 14 Estrone -6 79 * 35 12 Estrone 0 59 * 54 * 26 Estrone +6 60 * 22 63 * Estradiol -6 20 81 * -4 Estradiol 0 -9 82 * 26 Estradiol +6 8 15 94 * *Hormones log transformed All factors had eigenvalues >1 Variance Explained 0.5314 0.1413 0.0944 by Factor

Cumulative Variance 0.5314 0.6727 0.7671

49 Table 8 - Factor loading from principal component anlaysis restricted to girls in cluster 3 (n=172) Factor loadings greater that 35 are flagged by an '*'. Average of loadings from 30 PCAs of 30 Imputations

Factor 1 Factor 2 Factor 3 Testosterone -6 59 * 28 5 Testosterone 0 63 * 37 * 22 Testosterone +6 48 * 64 * -3 DHEA-S -6 81 * -1 5 DHEA-S 0 83 * 7 5 DHEA-S +6 78 * 27 5 Estrone -6 44 * 40 * 43 * Estrone 0 47 * 32 57 * Estrone +6 21 80 * 21 Estradiol -6 -9 4 65 * Estradiol 0 5 19 70 * Estradiol +6 -9 72 * 28 *Hormones log transformed All factors had eigenvalues >1

50 Table 9 - Hormone phenotype objective predictors Chracteristics of the girls according to the four phenotypes identified using principal component based cluster anlaysis DHEA-S (ug/dL), estrone and estradiol (pg/mL), testosterone (ng/dL)

Phenotype 1 Phenotype 2 Phenotype 3a Phenotype 3b

Cohort High DHEA-S High E2 No High Hormones All Low Hormones N=269 N=42 N=37 N=74 N=96 Standard Standard Standard Standard Standard Mean Deviation Mean Deviation Mean Deviation Mean Deviation Mean Deviation p-value Testosterone -6 4.48 2.69 6.89 2.53 7.00 3.22 3.79 1.60 2.97 1.29 <0.0001 Testosterone 0 4.96 2.76 8.23 2.80 5.40 1.93 4.55 1.55 3.39 1.72 <0.0001 Testosterone 6 6.16 3.64 9.23 3.22 10.79 4.01 5.72 1.52 3.44 1.26 <0.0001 Estrone -6 4.00 2.19 6.18 1.16 5.24 2.11 3.54 1.68 2.65 1.27 <0.0001 Estrone 0 4.47 2.46 6.81 2.37 5.17 2.31 4.10 1.72 3.43 1.98 <0.0001 Estrone +6 6.09 4.46 8.10 2.39 11.78 2.92 5.45 1.94 3.53 1.72 <0.0001 DHEA-S -6 28.23 24.99 62.97 25.60 32.93 14.39 22.51 10.93 12.56 6.83 <0.0001 DHEA-S 0 30.79 26.08 68.13 24.80 24.91 12.42 31.00 10.77 13.63 6.15 <0.0001 DHEA-S +6 38.32 29.16 79.00 29.29 39.64 19.37 38.00 14.01 18.55 8.13 <0.0001 Estradiol -6 2.83 3.32 2.46 1.75 4.60 2.78 2.07 1.75 2.22 1.96 <0.0001 Estradiol 0 3.37 4.22 3.68 2.61 4.46 4.71 2.67 2.37 2.88 3.05 <0.0001 Estradiol +6 5.38 7.16 5.40 4.04 17.25 10.48 3.51 2.60 3.05 3.55 <0.0001 * P values represent tests for groupwise differences between the phenotypes; values for the phenotypes represent mean values within the phenotypes. Comparison between phenotypes using Kruskal-Wallis for continuous variables and χ2 for categorical.

51 Figure 1 - Scatterplots for clusters using criteria of K-means in the population of 260 peripubertal girls. Each data point represents one girl.

52 Figure 2 - Scatterplots for clusters using criteria of K-means in the population of 172 peripubertal girls from Cluster 3. Each data point represents one girl.

53 SUPPLEMENTAL FIGURES

Scatterplots for clusters using criteria of K-means in the population of 260 peripubertal girls. Each data point represents one girl. Figure S1A - Scatter plot of 4 defined clusters

54 Figure S1B - Scatter plot of 5 defined clusters

55 Figure S1C - Scatter plot of 6 defined clusters

56 CHAPTER 3 - Sex hormone phenotypes in young girls and the age at pubertal milestones

ABSTRACT Context – The age of pubertal onset is influenced by many variables in young girls. Previous studies have not examined sex hormones longitudinally around the time of breast development and their relationship to pubertal onset. Objective – We sought to use an unbiased statistical approach to identify phenotypes of sex hormones in young girls and examine their relationship with pubertal milestones. Methods – Serum concentrations of steroid sex hormones (estradiol, estrone, testosterone, and DHEA-S) were measured by HPLC-MS at time points before, at, and after thelarche. Girls were classified into four hormone phenotypes using objective principal components and cluster analyses of longitudinal hormone data. The association between the identified phenotypes and age of pubertal milestones was estimated using Cox proportional hazards modeling. Results- In total, 269 girls were included in this study. Over a ten-year study, 78.44% entered menarche. Mean ages at thelarche, pubarche, and menarche were 9.02, 9.85, and 12.30 years, respectively. Girls with low levels of all four hormones were youngest at thelarche (8.67 years), those with the highest estradiol levels and estradiol surge six months after thelarche were youngest at menarche (11.87 years), with shortest pubertal tempo. Controlling for race, maternal age of menarche, caregiver education, and body mass, different phenotypes were associated with the age of pubertal events. Conclusion – Hormone phenotypic clustering can identify clinically relevant subgroups with differing ages of thelarche, pubarche, and menarche. These findings may enhance the understanding of timing of pubertal milestones and risk of adult disease.

57 INTRODUCTION Young girls vary greatly in their ages at thelarche, pubarche, and menarche (pubertal

milestones). Girls can enter puberty (thelarche defined as breast development) between the ages of 7 and 11 years and the interval from thelarche to menarche (pubertal tempo) is different for every girl. Many factors, including obesity and race, have been linked to alterations in the ages of attaining pubertal milestones (1–4). More importantly, earlier age of menarche has been identified as a risk factor for breast cancer (5–8) and other adverse health outcomes in later life

(9–17). Earlier puberty, therefore, is associated with an increased lifetime burden of adverse health events.

Although female sex hormones change dramatically throughout puberty, previous studies have not been able to assess patterns of changes considering multiple hormones in a longitudinal cohort. Most studies examining hormone values have been cross-sectional in nature, reporting as chronologic age regardless of pubertal status or as relative to having achieved a specific pubertal milestone (e.g. having reached pubarche versus those who have not). We have described hormone levels longitudinally at time periods relative to thelarche (breast maturation), examining each hormone separately (18). Previous studies have not been able to determine if the pattern of pubertal hormone changes within girls are heterogeneous or homogenous.

The limited number of studies examining sex hormones longitudinally, coupled with the wide variability in ages of pubertal milestones among girls has led to an interest in better understanding longitudinal hormone changes in young girls. The objective of this study was to identify groups of girls with similar patterns of changes in multiple hormones over multiple time periods during puberty. This novel approach differs from the cross-sectional examination of

58 hormone values at different time points in a longitudinal cohort (18). An agnostic approach was

used to examine only hormone values through principal component (PCA) and cluster analysis

(CA), without any attempt to associate the hormones with age of pubertal milestones or characterize them further by variables such as race or body mass. PCA-CA has been a validated statistical approach in identifying sub-groups or phenotypes of patients with different health outcomes including cardiovascular risk (19), chronic obstructive pulmonary disease (20–22),

asthma (23–25), and sleep apnea (26,27). The clinical relevance of these phenotypes was

validated using survival analyses to examine if the phenotypes were related to different ages of

pubertal milestones. Identifying hormone phenotypes of girls and quantifying their risk of earlier

pubertal timing may also produce a better understanding of girls at risk for adult onset diseases

such as polycystic ovarian syndrome (PCOS) or breast cancer.

MATERIALS AND METHODS

Study population

All participants were members of the Cincinnati site of the Puberty Cohort Study of the Breast

Cancer and Environment Research Program (BCERP). Study aims have been published

elsewhere (28). Briefly, this longitudinal prospective cohort comprised of girls from the Greater

Cincinnati area recruited at age six to seven years. The girls (n=379) were recruited and seen for

clinical study visits every six months between 2004 and 2010 and yearly until 2015.

Approximately eight-five percent were recruited through schools with the remaining girls

recruited through the Breast Cancer Registry of Greater Cincinnati, to enrich the cohort with

girls with a family history of breast cancer. The study visit window was ±4 weeks. The

59 Institutional Review Board from Cincinnati Children’s Hospital Medical Center approved this

study and informed consent was given by each participant’s parent or caregiver.

Sex and adrenal hormones

Early morning serum samples were drawn from each study participant at each study visit, and

banked in freezers at -80 degrees Celsius. Samples later identified as those obtained at time

points around thelarche were sent to the laboratory and analyzed for concentrations of estradiol

(E2), estrone (E1), testosterone (T), and dehydroepiandrosterone sulfate (DHEA-S). Missing sex hormones were due to a participant refusing to have blood drawn or insufficient blood serum for a measurement.

Estradiol, estrone and testosterone were measured by high-performance liquid chromatography with tandem mass spectrometry (HPLC-MS), a method for hormone analysis with greater sensitivity than past methods. An initial batch of DHEA-S was measured by radioimmunoassay

(RIA) and a second batch was measured by HPLC-MS. The correlation between the two approaches (RIA and HPLC-MS) for DHEA-S was 0.948.

The laboratory performing the hormone assay was certified by the Centers for Disease Control and Prevention and has an average bias estimation from proficiency studies less than 2%.

Validations to bioanalytical validation standards quantified hormone assay performance. Low, medium, and high control percent correlation coefficients of variation for inter-assay precision from the samples are as follows: E2 – 4.4,3.5,3.3%; E1 – 4.9,4.6,4.7%; DHEA-S – 6.5,8.4,7.3%;

T – 9.9,7.9,5.0%. These percentages are much lower than the standard expectation of inter-assay

60 precision of less than fifteen percent which indicates low level of dispersion in the measurement

of this data. Further details of hormone assay along with lower limits of quantification have been

previously detailed (18).

Clinical characteristics

Height and weight were obtained at each study visit (28). Race (ethnicity), caregiver’s education, and mother’s age of menarche were obtained from the caregivers at the initial study visit. Family history of breast cancer was obtained from questionnaires or interviews at semiannual study visits.

Maturation staging

The primary endpoints of age at pubertal milestone, measured in months from birth, include thelarche, pubarche, and menarche. Trained and certified female staff assessed pubertal maturation. Both thelarche and pubarche were based on reaching Tanner sexual maturation stage

2 or greater. Staff observed and palpated to determine breast stage. To be assigned to a stage, all criteria had to have been met for that stage; girls with inconsistent breast staging over time were

considered at the lower stage until consistently at the higher stage. Pubarche was determined

using accessory light sources to assess the presence or absence of terminal hair in the pubic area.

Age of menarche was self-reported from the study participant’s and/or mother’s answers to

questions regarding first menstrual cycle, asked each year, beginning when girls were 11 or 12

years old (29). Further details regarding staging have been previously published (28). Tempo

was defined as the interval in months between thelarche and menarche. Pubertal pathway

61 described the sequence of initiation of secondary sex characteristics, and was categorized as

pubarche prior to thelarche or thelarche prior to pubarche or synchronous.

Statistical Methods

Variable categorization

Caregiver’s education was categorical, namely a high school degree or less, associate or bachelor’s degree obtained, and more than a bachelor’s degree obtained. Maternal age of menarche was reported as a yearly age and categorized as less than 12 years old, 12 years or greater but less than 14 years old, and at least 14 years old or older. Family history of breast cancer was reported as no or yes that either a mother or maternal second degree blood relative had breast cancer. Race was reported as black versus all other including Non-Hispanic white,

Hispanic, and Asian. BMI, derived from the average of two measurements of height and weight taken at each study visit, was calculated as weight in kilograms divided by the height in meters squared. BMIz scores were determined using age, race, and sex specific 2000 CDC growth charts.

Values of hormones less than the limit of quantification were imputed using LOQ/ 2. This

imputation is commonly used for estimating analyte concentrations below the LOQ when√ the

data are not highly skewed and the percent of values less than the LOQ is less than 30% (30).

The proportion of values below the LOQ in this study was less than 20%.

Phenotype development

62 Employing an agnostic approach, principal component analysis (PCA) was used to reduce the

number of objective predictive variables (hormones levels at time periods). Cluster analysis (CA) was used to assign girls to phenotypes based on the logarithmically transformed sex hormones at the time of breast development, and six months prior and six months after thelarche. K-

clustering assigned the girls to disjoint clusters, based on Euclidean distances. We used an iterative approach over a range of clusters (2 to 6) to determine the optimal number of phenotypes.

Descriptive statistics were used to characterize the study cohort as a whole and each of the hormone phenotypes. The Kruskal-Willis and Wilcoxon Signed Rank tests were used to evaluate

the groupwise and pairwise differences between the phenotypes. Chi-square test assessed the

difference in distribution for categorical variables.

Relationship between hormone phenotypes and pubertal milestones

Kaplan-Meier estimates of cumulative probability were used to examine the unadjusted survival

(to a pubertal milestone) by phenotypes. Factors affecting the age at the pubertal events were

assessed by multiple variable Cox proportional hazard models. Included in each model, in

addition to the phenotype, were the following variables: BMIz at the visit closest to the pubertal

event, race, caregiver’s education, and mother’s age of menarche. The results of the final models

were expressed as hazard ratios (HR) with Wald’s 95% confidence intervals (CI). Proportional

hazard assumption for each variable was tested using log of each baseline variable. If needed,

interactions with time were added to the models for those variables which failed the proportional

63 hazard assumption. Girls who either dropped out of the study before its end or did not reach the

pubertal event during the study period were right censored.

All analyses were conducted using SAS 9.2 (SAS Institute Incorporated, Cary, NC). A p- value 0.05 was considered statistically significant for all analyses.

RESULTS

Identification of hormone phenotypes

Two hundred and sixty-nine girls were selected for these analyses because they had blood serum samples assayed for sex hormones at least twice across the five time periods related to thelarche

(Table 1). By the end of follow-up, all of these girls reached thelarche, 93.68% of the girls reached pubarche, and 78.44% of the girls reached menarche.

PCA reduced the variables from five time periods surrounding breast development for four hormones to three time periods for the four hormones. Due to the extent of the missing data at -

18 months and -12 months, we focused the PCA on the three time periods with the most data (six months prior to thelarche, thelarche, and six months after thelarche). Nine girls were excluded from the PCA who did not have hormone measurements for any the three time periods with the most data. For other missing data during these three time periods, thirty datasets were imputed for the missing hormone measurements. After incorporating the imputed data, we employed thirty PCA and then averaged the eigenvalues of the factor loadings. Results from the PCA analyses indicated the first three components contributed significantly to explaining the relationships among absolute hormones at three time periods (eigenvalues >1) and explained

64 74% of the shared variance. All four sex hormones at each of the three time periods loaded on

the first three components. Therefore variable reduction was not possible and it was necessary to

include all hormone values at the three time periods in the CA. A sensitivity PCA analysis,

consisting of a subset (n=67) of girls who had no missing data for all four hormone

measurements at three time periods (-6, 0, 6), yielded similar results (data not shown). CA then

identified 11 girls as outliers who did not fit into any of the phenotypes. Ultimately 249 girls

were included in the four PCA-CA identified phenotypes.

The phenotypes indicated that hormones levels relative to the timing of thelarche are not the

same for all girls. Girls were assigned to one of four hormone phenotypes: Phenotype 1

characterized by high DHEA-S, E1, and T (n=42); Phenotype 2 characterized by high estradiol

(E2), high E1 and T (N=37); Phenotype 3a as defined by no high hormone levels (n=74);

Phenotype 3b characterized as lower hormone levels, except for DHEA-S, than the rest of the cohort (n=96). Table 2 and Figure 1 detail the variables used to define the phenotypes. Girls within each phenotype varied considerably along ages of pubertal milestones, pubertal pathway, and changes in the hormones levels between the time periods (Table 3, Supplemental Figure 1).

The phenotypes are distinguished by these essential features:

Phenotype 1 (high DHEA-S, T, and E1) – DHEA-S serum concentrations in this group were at almost twice the cohort mean, and testosterone (T) and estrone (E1) values were 50% higher than the cohort mean values. The average age of thelarche in this phenotype (mean=9.44 years) was later than in the other phenotypes. There were significant increases (around 40%) in their

DHEA-S, E1, and T from thelarche to +6 months after thelarche when compared to the other phenotypes.

65

Phenotype 2 (high E2, T, and E1) – These girls represent the smallest phenotype profile group and are distinguished by their high E2 values compared to the other phenotypes especially, at 6

months after thelarche. They also are distinguished by high T and E1 at 6 months after thelarche.

They were younger at menarche (mean=11.87) than girls in Phenotypes 1, 3a and 3b

(mean=12.64, 12.30, 12.39 years) and consequently had the shortest tempo (mean=2.54 years).

These girls also experienced a significant decrease in DHEA-S, E1, and T levels between the

time period six months prior to thelarche and thelarche but then an increase in those hormones

from thelarche to six months after breast development, a pattern not seen in the other

phenotypes. They experienced a huge increase (over 250%) in E2 from thelarche to 6 months

after breast development.

Phenotype 3a (hormone levels lower than the cohort mean except for DHEA-S) – The girls in

this phenotype entered thelarche at a significantly earlier age (mean=8.86 years) than Phenotypes

1 and 2 (mean=9.44 and 9.37 years), and also entered pubarche (mean=9.49 years) earlier than

any of the other phenotypes. The hormone levels on average were 15% lower than the cohort as a

whole. DHEA-S levels were the same as the cohort mean at thelarche and six months after

thelarche.

Phenotype 3b (all low hormone values) – This is the largest cluster representing 36.9% of the

girls. These girls were significantly the earliest to enter thelarche (mean=8.67 years) but latest to

enter pubarche (mean=10.13 years). They also experienced a significantly longer tempo

(mean=3.76 years) than the other phenotypes, over a year longer than the girls in Phenotype 2

66 (mean=2.55 years). The hormone values of these girls were at least 30% lower than those of the cohort as a whole and changed minimally between the time periods when compared to hormone value changes seen in the other phenotypes across time.

Phenotypes and age of pubertal events

Kaplan-Meier analyses of ages of pubertal events stratified by phenotypes are presented in

figures 2A-C for the primary end points of thelarche, pubarche, and menarche. They confirmed

the survival curves of the phenotypes differ in probability of having attained thelarche and

pubarche (Figure 2A: thelarche log-rank p-value=0.0008, Figure 2B: pubarche log-rank p-value=

0.0333). Although the difference for the age of menarche (p-value= 0.1975) was not significant,

the survival time for the Phenotype 2 appeared to be different than those of the other phenotypes

(Figure 2C) which lead us to further pursue Cox regression analysis for this pubertal outcome.

All Cox models were non-parsimonious, controlling for other risk factors (Supplemental Table

1). Caregiver’s education had no significant effect on pubertal timing. Race, BMIz, and mother’s age of menarche had significant effects on the ages of thelarche, pubarche, and

menarche. An increased BMIz closest to the pubertal events had a marked influence on the age

of thelarche (HR=7.18, 95% CI 2.65-19.51), age of pubarche (HR=1.16, 95% CI 1.05-1.30), and

age of menarche (HR=1.57, 95% CI 1.36-1.81). Belonging to the non-black group (non-

Hispanic White, Hispanic and Asian) group resulted in the likelihood of entering a pubertal event

later than black girls for all three pubertal milestones (thelarche HR=0.67, 95% CI 0.51-0.88; pubarche HR=0.01, 95% CI 0.00-0.06; menarche HR=0.01, 95% CI 0.00-0.30). Having a mother who reached menarche before the age of 12 years increased the likelihood of earlier thelarche, pubarche, and menarche over girls whose mothers’ menarche occurred after the age of

67 14 years (thelarche HR=1.68, 95% CI 1.10-2.57; pubarche HR=1.58. 95% CI 1.04-2.42; menarche HR=1.61, 95% CI 1.01-2.59).

Phenotype 1 had less likelihood of earlier thelarche than the Phenotype 3a (HR=0.63, 95% CI

0.43-0.93). Phenotypes 1, 2 and 3a were less likely to have earlier ages of thelarche than

Phenotype 3b (HR=0.45, 95% CI 0.31-0.66; HR=0.48, 95% CI 0.32-0.73; HR=0.71, 95% CI

0.52-0.97).

Phenotype 3a was significantly associated a higher likelihood of earlier age of pubarche compared to Phenotype 3b (HR=1.71, 95% CI 1.24-2.35). No other phenotypes were associated

the likelihood of earlier pubarche when referenced to other phenotypes.

Cox proportional hazard, adjusted for other risk factors, did show girls in Phenotype 2 were

significantly more likely to have an earlier age of menarche than girls in the other three

phenotypes. Membership to Phenotype 1 almost halved the likelihood of entering menarche

earlier than Phenotype 2 (HR=0.55, 95% CI 0.32-0.93). Girls in Phenotype 2 were over fifty

percent more likely to enter menarche earlier than those with Phenotype 3a (HR=1.61, 95% CI

1.02-2.54) and those in Phenotype 3b (HR=1.59, 95% CI 1.01-2.52). There was no significant

difference in the timing of menarche between to the other phenotypes.

DISCUSSION

We applied an objective statistical approach to define sex hormone phenotypes and associate them with the age of pubertal milestones in a longitudinal cohort of young girls. We identified

68 four clinically relevant heterogeneous hormone phenotypes. Girls within each phenotype varied

considerably along measures of hormones, differences in hormone values between time periods,

and age of pubertal milestones as well as other demographic variables. Each phenotype

experienced different ages of pubertal milestones. These findings indicate significant

heterogeneity exists in hormones within young girls around the time of breast development.

They also underscore the need to better understand why hormones profiles vary among girls

during puberty.

To our knowledge, this is the first application of PCA-CA to define hormone phenotypes, as well

as application of survival analysis to determine ages of pubertal milestones based on the

phenotype. Several other studies have used this PCA-CA to successfully identify relevant phenotypes in patient subgroups, as stated previously. In each study, different sets of disease symptoms presented in each of the phenotypes, thus supporting the heterogeneity within a specific disease much as the different hormones presented differently in our pubertal phenotypes.

Others studies have further applied this phenotype analyses, using group membership, to predict

age of mortality or risk of developing a comorbidity or hospitalization (31–37). This longitudinal cohort provided a unique opportunity to define phenotypes based on hormones from multiple time points related to age of thelarche as well as follow the girls throughout achievement of pubertal events and associate the phenotypes with risk of earlier age of pubertal milestones.

To our knowledge, this is the initial description of hormonal phenotypes in girls around the time of thelarche. These hormonal profiles are likely the result of multiple factors, including early life

69 exposures and events. While it is unclear that these phenotypes are risk factors for adult-onset

diseases, they may serve as possible biomarkers of exposure.

The concept for the origins of adult disease in early life is well-accepted, and a variety of disease

states are thought to be related to fetal, newborn, and childhood factors. Several adult disorders

are considered to be related to earlier hormonal factors. In women, these include endometrial

cancer (with higher lifelong estrogen exposure) (38–40), breast cancer (with higher lifelong

estrogen exposure) (41,42), polycystic syndrome (PCOS) (fetal and childhood

‘hyperandrogenia phenotype’) (43), and bone health (with greater bioavailable estrogen) (44).

Girls in Phenotypes 3a and 3b are noted have greater BMI, and have the earliest age of thelarche

in the cohort. These two phenotypes have a longer pubertal tempo than Phenotypes 1 and 2

(mean tempo 3.38 and 3.76 years, contrasted to 3.03 and 2.54 years, respectively). This suggests

that breast budding may have been driven by local production of estrogens through aromatization

of androgens, and rather than reactivation of the hypothalamic-pituitary-ovarian (HPO) axis,

presumably with later central activation. Lower circulating estradiol levels in both Phenotypes 3a and 3b are also consistent with a lack of HPO axis activation and therefore a lack of robust ovarian hormonal production.

The Phenotype 1 profile, with high DHEAS and testosterone levels as well as a notable rise in

DHEAS at thelarche, may be associated with greater risk of PCOS. This group was noted with

latest age of menarche. In addition, higher and rising levels of DHEAS may also indicate a future

risk of PCOS.

70

Phenotype 2 includes girls with high estradiol levels at all time periods, a large estradiol peak

post-thelarche, and significantly earlier menarche than the other phenotypes. This suggests a group with greater lifelong estrogen exposure, lower risk of osteoporosis, and potentially greater

risk of breast cancer.

Ultimately, understanding the varying “normal” hormonal patterns during puberty may help

guide identification of adult disorders and potential of earlier interventions. Each phenotype may

serve as a marker to identify pubertal anomalies or populations at risk for adult health issues,

such as PCOS, breast cancer, and bone health. In addition, understanding these innate hormonal

fluctuations may inform a more physiologic approach to hormonal replacement therapy in girls

with ovarian insufficiency.

The strengths of this study include the objective statistical approach, the longitudinal ability to

assess hormones based on timing around thelarche rather than chronologic ag, and the use of

(HPLC-MS) to evaluate hormones that are typically too low to measure in young girls with

earlier hormone analytic methods. There are several potential limitations. First, this cohort was

from only the greater Cincinnati area and therefore not nationally representative, but it was

racially and socioeconomically diverse. Additionally, the BMI% of the cohort is similar to

National Health and Nutrition Examination Survey data for this age group (45). Second, breast

tissue can often be confused with fat tissue in pre-pubertal girls, making some experts question

the validity of physical examination to assess breast maturation. The examiners were trained and

agreement on breast maturation was 87% with the master trainer in a subset of blind assessments

71 (28). Also as previously documented, in a subset of this cohort, E2 levels differed between girls

with high versus low BMI%. Girls with BMI% higher than the medium had significantly lower

E2 levels at thelarche than girls with BMI%s lower than the median. This finding necessitated

investigation into the validity of the breast maturation staging as it would imply girls with the

higher BMI% but lower E2 might have had breast maturation confused with fat tissue. However,

upon further investigation, girls with lower BMI% had pubertal growth spurts and height

velocities similar to those with a high BMI% (18), supporting the accuracy of the breast

maturation staging. Third, the accuracy of the recall is often a limitation in studies. However,

the recall of age of menarche is typically high among girls as it is not an arbitrary event in their

lives. In a longitudinal Swedish cohort, 63% of the girls were able to recall the age of menarche

within plus or minus three months after four years from the start of menarche (46). Another study found 66.1% of girls were able to accurately recall their age of menarche after a mean time of 323 days had passed (47). Recall is less valid as time passes between menarche and recall, but in this study, the shortest average time between study visits and therefore recall was six months.

The fourth limitation is that we did not have age of menarche for 21% of the girls who either did not reach menarche or who left the study before it concluded. Phenotype 3b had the fewest girls attaining menarche (71.99%). Phenotype 3a had 78% attaining menarche and both high

Phenotypes 1 and 2 had over 83% of their girls attaining menarche.

In conclusion, using PCA-CA on longitudinal sex hormones values related to the time of thelarche identified four distinct heterogeneous phenotypes. Girls within each phenotype varied along hormone values at the time points, changes in the hormones between time points, age of

pubertal milestones and other demographic characteristics. Distinct differences in the ages of

72 achieving pubertal milestones were seen across the phenotypes. These findings highlight the need to better understand the impact of these hormones phenotypes on adult related morbidity.

73 REFERENCES 1. McDowell MA, Brody DJ, Hughes JP. Has age at menarche changed? Results from the National Health and Nutrition Examination Survey (NHANES) 1999–2004. J. Adolesc. Heal. 2007;40(3):227–231. 2. Cabrera SM, Bright GM, Frane JW, Blethen SL, Lee PA. Age of thelarche and menarche in contemporary US females: a cross-sectional analysis. J. Pediatr. Endocrinol. Metab. 2014;27(1–2):47–51. 3. Biro FM, Greenspan LC, Galvez MP, Pinney SM, Teitelbaum S, Windham GC, Deardorff J, Herrick RL, Succop PA, Hiatt RA, Kushi LH, Wolff MS. Onset of breast development in a longitudinal cohort. Pediatrics 2013;132(6):1019–1027. 4. Anderson SE, Dallal GE, Must A. Relative weight and race influence average age at menarche: results from two nationally representative surveys of US girls studied 25 years apart. Pediatrics 2003;11(4):844–850. 5. Bodicoat DH, Schoemaker MJ, Jones ME, McFadden E, Griffin J, Ashworth A, Swerdlow AJ. Timing of pubertal stages and breast cancer risk: the Breakthrough Generations Study. Breast Cancer Res. 2014;16(1):R18. 6. Clavel-Chapelon F. Differential effects of reproductive factors on the risk of pre- and postmenopausal breast cancer. Results from a large cohort of French women. Br. J. Cancer 2002;86(5):723–727. 7. Garland M, Hunter DJ, Colditz GA, Manson JE, Stampfer MJ, Spiegelman D, Speizer F, Willett WC. Menstrual cycle characteristics and history of ovulatory infertility in relation to breast cancer risk in a large cohort of US women. Am. J. Epidemiol. 1998;147(7):636– 643. 8. Rockhill B, Moorman PG, Newman B. Age at menarche, time to regular cycling, and breast cancer (North Carolina, United States). Cancer Causes Control 1998;9(4):447–453. 9. Conley CS, Rudolph KD. The emerging sex difference in adolescent depression: interacting contributions of puberty and peer stress. Dev. Psychopathol. 2009;21(02):593. 10. Stice E, Presnell K, Bearman SK. Relation of early menarche to depression, eating disorders, substance abuse, and comorbid psychopathology among adolescent girls. Dev. Psychol. 2001;37:608–619. 11. Rudolph KD, Troop-Gordon W, Lambert SF, Natsuaki MN. Long-term consequences of pubertal timing for youth depression: identifying personal and contextual pathways of risk. Dev. Psychopathol. 2014;26(4pt2):1423–1444. 12. Copeland W, Shanahan L, Miller S, Costello EJ, Angold A, Maughan B. Outcomes of early pubertal timing in young women: a prospective population-based study. Am. J. Psychiatry 2010;167(10):1218–1225. 13. Mendle J, Turkheimer E, Emery RE. Detrimental psychological outcomes associated with early pubertal timing in adolescent girls. Dev. Rev. 2007;27(2):151–171. 14. Deardorff J, Gonzales NA, Christopher FS, Roosa MW, Millsap RE, Lumeng J, Deardorff J, Herrick RL, Succop PA, Hiatt RA, Kushi LH, Wolff MS. Early puberty and adolescent pregnancy: the influence of alcohol use. Pediatrics 2005;116(6):1451–6. 15. Downing J, Bellis MA. Early pubertal onset and its relationship with sexual risk taking, substance use and anti-social behaviour: a preliminary cross-sectional study. BMC Public Health 2009;9(1):446. 16. Baumrind D. The influence of parenting style on adolescent competence and substance use. J. Early Adolesc. 1991;11(1):56–95.

74 17. Prentice P, Viner RM. Pubertal timing and adult obesity and cardiometabolic risk in women and men: a systematic review and meta-analysis. Int. J. Obes. 2013;37(8):1036– 1043. 18. Biro FM, Pinney SM, Huang B, Baker ER, Walt Chandler D, Dorn LD. Hormone changes in peripubertal girls. J. Clin. Endocrinol. Metab. 2014;99(10):3829–3835. 19. Goodman E. Factor analysis of clustered cardiovascular risks in adolescence: obesity is the predominant correlate of risk among youth. Circulation 2005;111(15):1970–1977. 20. Burgel PR, Paillasseur JL, Caillaud D, Tillie-Leblond I, Chanez P, Escamilla R, Court- Fortune I, Perez T, Carré P, Roche N. Clinical COPD phenotypes: A novel approach using principal component and cluster analyses. Eur. Respir. J. 2010;36(3):531–539. 21. Newandee DA, Reisman SS, Bartels AN, De Meersman RE. COPD severity classification using principal component and cluster analysis on HRV parameters. Proc. IEEE Annu. Northeast Bioeng. Conf. NEBEC 2003;2003–Janua:134–135. 22. Cho MH, Washko GR, Hoffmann TJ, Criner GJ, Hoffman EA. Cluster analysis in severe emphysema subjects using phenotype and genotype data : an exploratory investigation. Respir. Res. 2010;11(30). doi:10.1186/1465-9921-11-30. 23. Haldar P, Pavord ID, Shaw DE, Berry MA, Thomas M, Brightling CE, Wardlaw AJ, Green RH. Cluster analysis and clinical asthma phenotypes. Am. J. Respir. Crit. Care Med. 2008;178(3):218–224. 24. Just J, Gouvis-Echraghi R, Rouve S, Wanin S, Moreau D, Annesi-Maesano I. Two novel, severe asthma phenotypes identified during childhood using a clustering approach. Eur. Respir. J. 2012;40(1):55–60. 25. Kurukulaaratchy RJ, Zhang H, Raza A, Patil V, Karmaus W, Ewart S, Arshad SH. The diversity of young adult wheeze: a cluster analysis in a longitudinal birth cohort. Clin. Exp. Allergy 2014;44(5):724–35. 26. Ye L, Pien GW, Ratcliffe SJ, Björnsdottir E, Arnardottir ES, Pack AI, Benediktsdottir B, Gislason T. The different clinical faces of obstructive sleep apnoea: a cluster analysis. Eur. Respir. J. 2014;44(6):1600–7. 27. Vavougios GD, Natsios G, Pastaka C, Zarogiannis SG, Gourgoulianis KI. Phenotypes of comorbidity in OSAS patients: combining categorical principal component analysis with cluster analysis. J. Sleep Res. 2016;25(1):31–38. 28. Biro FM, Galvez MP, Greenspan LC, Succop PA, Vangeepuram N, Pinney SM, Teitelbaum S, Windham GC, Kushi LH, Wolff MS. Pubertal assessment method and baseline characteristics in a mixed longitudinal study of girls. Pediatrics 2010;126(3):e583-90. 29. Biro FM, Pajak A, Wolff MS, Pinney SM, Windham GC, Galvez MP, Greenspan LC, Kushi LH, Teitelbaum SL. Age of menarche in a longitudinal US cohort. J. Pediatr. Adolesc. Gynecol. 2018;31(4):339–345. 30. Hornung RW, Reed LD. Estimation of average concentration in the presence of nondetectable values. Appl. Occup. Environ. Hyg. 1990;5(1):46–51. 31. Ahmad T, Pencina MJ, Schulte PJ, Emily O, Whellan DJ, Piña IL, Kitzman DW, Lee KL, O’Connor CM, Felker GM. Clinical implications of chronic heart failure phenotypes defined by cluster analysis. J. Am. Coll. Cardiol. 2014;64(17):1765. 32. Bäckström D, Granåsen G, Domellöf ME, Linder J, Jakobson Mo S, Riklund K, Zetterberg H, Blennow K, Forsgren L. Early predictors of mortality in parkinsonism and Parkinson disease: a population-based study. Neurology 2018;91(22):e2045–e2056.

75 33. Burgel PR, Paillasseur JL, Peene B, Dusser D, Roche N, Coolen J, Troosters T, Decramer M, Janssens W. Two distinct chronic obstructive pulmonary disease (COPD) phenotypes are associated with high risk of mortality. PLoS One 2012;7(12):1–9. 34. Howrylak JA, Fuhlbrigge AL, Strunk RC, Zeiger RS, Weiss ST, Raby BA. Classification of childhood asthma phenotypes and long-term clinical responses to inhaled anti- inflammatory medications. J. Allergy Clin. Immunol. 2014;133(5). doi:10.1016/j.jaci.2014.02.006. 35. Huang R-C, Mori TA, Burrows S, Le Ha C, Oddy WH, Herbison C, Hands BH, Beilin LJ. Sex dimorphism in the relation between early adiposity and cardiometabolic risk in adolescents. J. Clin. Endocrinol. Metab. 2012;97(6):E1014–E1022. 36. Kendzerska T, Gershon AS, Atzema C, Dorian P, Mangat I, Hawker G, Leung RS. Sleep apnea increases the risk of new hospitalized atrial fibrillation: a historical cohort study. Chest 2018;154(6):1330–1339. 37. Kim NH, Seo JA, Cho H, Seo JH, Yu JH, Yoo HJ, Kim SG, Choi KM, Baik SH, Choi DS, Shin C, Cho NH. Risk of the development of diabetes and cardiovascular disease in metabolically healthy obese People: The Korean Genome and Epidemiology Study. Medicine (Baltimore). 2016;95(15):e3384. 38. Henderson BE, Ross RK, Pike MC, Casagrande JT. Endogenous hormones as a major factor in human cancer. Cancer Res. 1982;42(8):3232–9. 39. Brinton LA, Berman ML, Mortel R, Twiggs LB, Barrett RJ, Wilbanks GD, Lannom L, Hoover RN. Reproductive, menstrual, and medical risk factors for endometrial cancer: results from a case-control study. Am. J. Obstet. Gynecol. 1992;167(5):1317–1325. 40. England PC, Skinner LG, Cottrell KM, Sellwood RA. Serum oestradiol-17 beta in women with benign and malignant breast disease. Br. J. Cancer 1974;30(6):571–6. 41. Endogenous Hormones and Breast Cancer Collaborative Group. Steroid hormone measurements from different types of assays in relation to body mass index and breast cancer risk in postmenopausal women: reanlaysis of eighteen prospective cohorts. Steroids 2015:49–55. 42. Endogenous Hormones and Breast Cancer Collaborative Group. Sex hormones and risk of breast cancer in premenopausal women: a collaborative reanlaysis of individual participant data from seven prospective studies. Lancet Oncol 2013;14(September). 43. Legro RS, Driscoll D, Strauss JF, Fox J, Dunaif A. Evidence for a genetic basis for hyperandrogenemia in polycystic ovary syndrome. Proc. Natl. Acad. Sci. U. S. A. 1998;95(25):14956–60. 44. Khosla S, Melton LJ, Atkinson EJ, Riggs BL, O’Fallon WM, Klee GG. Relationship of serum sex steroid levels and bone turnover markers with bone mineral density in men and women: a key role for bioavailable Estrogen. J. Clin. Endocrinol. Metab. 2014;83(7):2266–2274. 45. Ogden CL, Carroll MD, Flegal KM. High body mass index for age among US children and adolescents, 2003-2006. JAMA 2008;299(20):2401. 46. Bergsten-Brucefors A. A note on the accuracy of recalled age at menarche. Ann. Hum. Biol. 1976;3(1):71–3. 47. Koo MM, Rohan TE. Accuracy of short-term recall of age at menarche. Ann. Hum. Biol. 1997;24(1):61–64.

76 TABLES

Table 1 - Description of sex hormones* for the study cohort of 269 girls

6 Months Prior to Thelarche Hormone N Median Mean Standard Minimum Maximum Deviation DHEA-S (ug/dL) 201 21.00 28.23 24.99 7.07 163.00 Estradiol (pg/mL) 191 1.80 2.83 3.32 0.71 31.00 Estrone (pg/mL) 192 3.55 4.00 2.19 1.77 14.00 Testosterone (ng/dL) 188 3.70 4.48 2.69 1.77 16.00

Thelarche Hormone N Median Mean Standard Minimum Maximum Deviation DHEA-S (ug/dL) 190 23.50 30.79 26.08 7.07 184.00 Estradiol (pg/mL) 180 2.00 3.37 4.22 0.71 42.00 Estrone (pg/mL) 181 3.90 4.47 2.46 1.77 12.00 Testosterone (ng/dL) 184 4.30 4.96 2.76 1.77 17.00

6 Months After Thelarche Hormone N Median Mean Standard Minimum Maximum Deviation DHEA-S (ug/dL) 207 30.00 38.32 29.16 7.07 211.00 Estradiol (pg/mL) 185 3.00 5.38 7.16 0.71 48.00 Estrone (pg/mL) 184 5.30 6.09 4.46 1.77 40.00 Testosterone (ng/dL) 199 5.40 6.16 3.64 1.77 23.00 * All values

77 Table 2 - Hormone phenotype objective predictors Sex hormone serum concentrations of the girls according to the four phenotypes identified using principal component based cluster anlaysis DHEA-S (ug/dL), estrone and estradiol (pg/mL), testosterone (ng/dL)

Phenotype 1 Phenotype 2 Phenotype 3a Phenotype 3b High DHEA-S, T No High All Low Cohort and E1 High E2, T and E1 Hormones Hormones N=269 N=42 N=37 N=74 N=96 Standard Standard Standard Standard Standard Mean Deviation Mean Deviation Mean Deviation Mean Deviation Mean Deviation p-value Testosterone -6‡ 4.48 2.69 6.89 2.53 7.00 3.22 3.79 1.60 2.97 1.29 <0.0001 Testoserone 0‡ 4.96 2.76 8.23 2.80 5.40 1.93 4.55 1.55 3.39 1.72 <0.0001 Testosterone 6‡ 6.16 3.64 9.23 3.22 10.79 4.01 5.72 1.52 3.44 1.26 <0.0001 Estrone -6‡ 4.00 2.19 6.18 1.16 5.24 2.11 3.54 1.68 2.65 1.27 <0.0001 Estrone 0‡ 4.47 2.46 6.81 2.37 5.17 2.31 4.10 1.72 3.43 1.98 <0.0001 Estrone +6‡ 6.09 4.46 8.10 2.39 11.78 2.92 5.45 1.94 3.53 1.72 <0.0001 DHEA-S -6‡ 28.23 24.99 62.97 25.60 32.93 14.39 22.51 10.93 12.56 6.83 <0.0001 DHEA-S 0‡ 30.79 26.08 68.13 24.80 24.91 12.42 31.00 10.77 13.63 6.15 <0.0001 DHEA-S +6‡ 38.32 29.16 79.00 29.29 39.64 19.37 38.00 14.01 18.55 8.13 <0.0001 Estradiol -6 2.83 3.32 2.46 1.75 4.60 2.78 2.07 1.75 2.22 1.96 <0.0001 Estradiol 0‡ 3.37 4.22 3.68 2.61 4.46 4.71 2.67 2.37 2.88 3.05 <0.0001 Estradiol +6‡ 5.38 7.16 5.40 4.04 17.25 10.48 3.51 2.60 3.05 3.55 <0.0001 * Baseline values for the cohort given as mean unless noted. P values represent tests for groupwise differences between the phenotypes; values for the phenotypes represent mean values within the phenotypes. Comparison between phenotypes using Kruskal-Wallis for continuous variables and χ2 for categorical. ‡ Objective predictive variables used for phenotype development.

78 Table 3 - Hormone phenotype baseline characteristics and changes in hormones Maturation and clinical characteristics of the girls according to the four phenotypes identified using principal component based cluster anlaysis DHEA-S (ug/dL), estrone and estradiol (pg/mL), testosterone (ng/dL)

Phenotype 1 Phenotype 2 Phenotype 3a Phenotype 3b High DHEA-S, T Cohort and E1 High E2, T and E1 No High Hormones All Low Hormones N=269 N=42 N=37 N=74 N=96

Standard Standard Standard Standard Standard Mean Deviation Mean Deviation Mean Deviation Mean Deviation Mean Deviation p-value Age of Thelarche (years) 9.02 1.10 9.44 0.94 9.37 1.10 8.86 1.08 8.67 1.03 <0.0001 Age of Pubarche (years) 9.85 1.35 9.74 1.60 9.89 0.95 9.49 1.43 10.13 1.25 0.0345 Age of Menarche (years) 12.30 1.15 12.56 1.28 11.87 0.99 12.30 1.19 12.39 1.02 0.0606 Tempo (years) ** 3.25 1.09 3.03 0.96 2.55 0.86 3.38 0.91 3.76 1.05 <0.0001 BMIZ 0.33 1.02 0.29 0.96 0.14 0.97 0.42 1.10 0.40 1.03 0.5245 BMI Percentile 58.97 29.56 57.67 27.62 52.47 28.56 61.46 30.70 61.60 30.29 0.3667 Mother's Age of Menarche 0.4507 less than 12 years old 20.07% 14.29% 27.03% 25.68% 14.58% at least 12 years but less than 59.85% 66.67% 51.35% 55.41% 64.58% at least 14 years or older 20.07% 19.05% 21.62% 18.92% 20.83% Ethnicity (%) 0.7755 Black 31.60% 35.71% 27.03% 29.73% 34.38% Hispanic, White, Asian, All 68.40% 64.29% 72.97% 70.27% 65.63% First or Second Degree Maternal Family Member 0.8573 diagnosis of breast cancer 12.64% 11.90% 13.51% 16.21% 11.46% no diagnosis of breast cancer 80.30% 83.33% 78.84% 78.38% 79.70% missing 7.06% 4.76% 8.10% 5.40% 9.37% Caregiver's education (%) 0.4303 high school degree or less 29.00% 40.48% 16.22% 27.03% 29.17% associate's or bachelor's 45.35% 38.10% 54.05% 44.59% 43.75% more than a bachelor's 25.62% 21.43% 29.73% 28.38% 27.08% Pubertal Pathway (%) 0.0017 thelarche before pubarche 69.14% 66.67% 64.86% 64.86% 79.17% pubarche before thelarche 17.10% 26.17% 16.22% 28.38% 5.21% entered at the same time 7.43% 7.14% 13.51% 4.05% 4.17% missing due to censorship 6.32% 0.00% 5.41% 2.70% 11.46% Δ Testosterone from -6 to 0 1.26 4.68 1.81 4.75 -4.95 6.46 1.64 3.45 1.34 3.42 0.0045 Δ Testosterone from 0 to 6 2.38 5.50 3.54 4.31 12.76 10.09 1.97 2.94 0.20 3.60 <0.0001 Δ Estrone from -6 to 0 1.32 3.74 1.24 4.71 -0.73 4.62 1.65 2.87 1.72 3.35 0.1052 Δ Estrone from 0 to 6 1.88 4.87 2.84 3.76 11.80 5.10 1.98 3.55 0.08 4.53 <0.0001 Δ DHEA-S from -6 to 0 9.68 24.85 17.15 21.26 -12.71 31.87 14.52 22.22 3.27 11.11 0.0001 Δ DHEA-S from 0 to 6 15.14 26.71 27.27 41.84 28.38 39.36 15.37 21.16 8.90 14.42 0.1014 Δ E2 from -6 to 0 1.21 8.31 2.13 3.72 1.66 10.04 1.33 5.53 2.26 5.80 0.4542 Δ E2 from 0 to 6 2.19 11.79 1.94 7.54 25.74 22.10 1.46 5.97 1.13 6.63 0.0006 * Baseline values for the cohort given as mean unless noted. P values represent tests for groupwise differences between the phenotypes; values for the phenotypes represent mean values within the phenotypes. Comparison between phenotypes using Kruskal-Wallis for continuous variables and χ2 for categorical.

* * Time between thelarche and menarche Δ = change in hormone value

79 FIGURES

Figure 1 - Mean hormone values by phenoptype

80 Figures 2A-C – Kaplan Meier plots of time until pubertal milestone by phenotypes

Figure 2A - Kaplan Meier plot of time until thelarche between phenotypes Figure 2B - Kaplan Meier plot of time until pubarche between phenotypes

Figure 2C - Kaplan Meier plot of time until menarche between phenotypes

81 Figures 3A-C - Risk estimates of time to pubertal outcomes by phenotype from multiple variable Cox regression models

3A - Risk estimates of time until thelarche 3B - Risk estimates of time until pubarche 3C - Risk estimates of time

* figures created by Clark O; Djulbegovic B. Forest plots in excel software(Data sheet). 2001. Available at www.evidencias.com.

82 SUPPLEMENTAL TABLE

Supplemental Table 1 - Hazard ratio analysis for risk factors for age at pubertal milestones

Age of Thelarche (months) Age of Pubarche (months) Age of Menarche (months) Hazard Wald's Hazard Wald's Hazard Wald's 95% Variable Ratio 95% CI p-vlaue Ratio 95% CI p-vlaue Ratio CI p-vlaue Phenotypes <0.0001 0.012 0.1254 Phenotype 1 vs Phenotype 2 0.93 0.59 1.48 0.7600 0.91 0.57 1.46 0.7013 0.55 0.32 0.93 0.0264 Phenotype 1 vs Phenotype 3a 0.63 0.43 0.93 0.0196 0.77 0.52 1.15 0.2042 0.88 0.57 1.35 0.555 Phenotype 1 vs Phenotype 3b 0.45 0.31 0.66 <0.0001 1.32 0.91 1.93 0.1493 0.87 0.57 1.33 0.514 Phenotype 2 vs Phenotype 3a 0.68 0.45 1.03 0.0656 0.85 0.56 1.29 0.4433 1.61 1.02 2.54 0.0427 Phenotype 2 vs Phenotype 3b 0.48 0.32 0.73 0.0006 1.45 0.96 2.18 0.0752 1.59 1.01 2.52 0.0466 Phenotype 3a vs Phenotype 3b 0.71 0.52 0.97 0.0337 1.71 1.24 2.35 0.0012 0.99 0.70 1.41 0.9647 Race (all other vs black) 0.67 0.51 0.88 0.0256 0.01 0.00 0.06 <0.0001 0.01 0.00 0.30 0.0092 BMIZ closest to outcome 7.18 2.65 19.51 0.0001 1.16 1.05 1.30 0.0056 1.57 1.36 1.81 <0.0001 Mother's age of menarche (years) 0.055 0.0837 0.1327 Under 12 vs ages 12-14 1.34 0.96 1.87 0.0905 1.15 0.81 1.62 0.432 1.20 0.83 1.74 0.3418 Under 12 vs at least 14 1.68 1.10 2.57 0.0168 1.58 1.04 2.42 0.0336 1.61 1.01 2.59 0.0468 ages 12-14 vs at least 14 1.26 0.89 1.77 0.1931 1.38 0.98 1.95 0.0664 1.35 0.91 1.99 0.1334 Caregiver's education level 0.9763 0.6314 0.3051 High school or less vs at least an associate's or bachelor's degree 1.00 0.67 1.49 0.9967 0.88 0.58 1.33 0.5491 0.73 0.45 1.18 0.1946 High school or less vs master's degree or more 0.97 0.68 1.38 0.8618 0.84 0.59 1.20 0.3376 0.97 0.65 1.44 0.8747 At least an associate's or bachelor's degree vs master's degree or more 0.97 0.70 1.35 0.8568 0.95 0.68 1.33 0.7829 1.33 0.90 1.96 0.1478 race*age of milestone in months‡ 0.96 0.94 0.98 0.0004 0.97 0.945 0.994 0.0166 bmiz closest to outcome* age of milestone in months‡‡ 0.99 0.98 1.00 0.0036

‡ Interaction of race and age included due to violation of proportional hazards by race ‡‡ Interaction of bmiz and age included due to violation of proportional hazards by bmiz

83 SUPPLEMENTAL FIGURE

84 CHAPTER 4: Conclusions, Limitations, Future Research, and Final Thoughts

CONCLUSIONS

This study yielded many novel findings, made possible by an array of fortunate

circumstances. This study was one of the first studies to look longitudinally at hormones in

young girls relative to breast development rather than examining hormones cross-sectionally at

pubertal status or chronological ages. Previous longitudinal studies of hormones have lacked

adequate sample size (1–3) or used urine samples as opposed to serum (4,5). Some studies have been cross sectional (6–8), but examining sex hormones one at a time, based on chronological age would not have been as informative as the differences in the levels of the four hormones in girls at the different time points relative to thelarche. The lack of richness in cross sectional analyses would not have necessarily identified the same phenotypes as clusters were identified based on associations of sex hormones at multiple time points in the same girls. Furthermore, some of the previous studies included girls were menstruating (3,6–9) and hormones of girls change throughout the month dependent on the time to their menstrual cycle further complicating the assessment of hormones. Thus this longitudinal cohort improved upon past studies

The girls who were willing to enter and continue to participate in this study make it very unique. Not only are longitudinal cohorts expensive, but it is difficult to recruit and retain

participants in longitudinal cohorts (10). Recruitment and retention of study participants are

more difficult now than in the past. For example the NIH Children’s Study had to alter their

sampling plans and increase the number of study sites to meet an adequate sample size (11). The

retention in this cohort study was remarkable because these young girls allowed their blood to

drawn at each six month clinical visit, which also included pubertal maturation staging. Often

families will refuse to enter studies due to the blood draw required of their children (12) and

85 recruitment into pubertal assessment studies is even more difficult (13). The girls in this study

were recruited at young ages, 6-7 years old, and they entered the study knowing they would

make semi-annual study visits for a number of years. The retention of this cohort was 70% at year 6 and 68% after 10 years. Most of the girls who dropped out of this cohort, dropped out after the first visit. Throughout the study the girls were invited to galas and special events, received newsletters, and received incentives in order to maintain retention.

This study was the first application of using a proven objective statistical analysis,

Principal Component-Cluster Analysis, to develop phenotypes on longitudinal sex hormones in girls. This method has been used to define clinically relevant phenotypes in numerous other medical conditions (9,14–21). The sex hormone phenotypes in this study were developed agnostically, ignoring pre-conceived beliefs about the relationship of sex hormones to pubertal events. Each phenotype included some but not all of the hormones at the different time points

which supports the heterogeneity of the phenotypes.

The ability to measure the sex hormones at such a low levels prior to menarche was made

possible by the technology of high performance liquid chromatography with mass spectrometry

(HPLC-MS). Most other methods cannot measure hormones at levels as low as those found in

young girls prior to thelarche. HPLC-MS was able to detect hormone levels several times lower

than older methods including radioimmunoassay. Furthermore, the average bias estimations for

the lab conducting the assays, Esoterix Laboratories, from on-going proficiency studies was less

than 2% (22).

LIMITATIONS

As with any research study, the analyses in this dissertation have several limitations. The

limitations of each individual part of this dissertation have been previously stated. Overall, the

86 most important limitation was lack of generalizability due to the one site study. The study population had similar demographic characteristics as the entire United States population. The girls were racially, ethnically, and socioeconomically diverse and representative of the greater

Cincinnati area and similar to the entire United States. Additionally, the BMI percentile of the girls was similar to the BMI% in the NHANES (23).

Missing data are always a concern. Eligibility criteria limited the analysis to girls with a minimal amount of missing data. Data were missing due to the girls entering the study at or shortly before thelarche or refusing to have their blood drawn at every study visit. Ideally, hormone collection would have been available for all girls for all five time periods and additional time periods prior to menarche. Imputation techniques ensured the stability of data during principal component analyses. Sensitivity analysis on a data set of 67 girls who had all four hormone values for each of the time periods -6, 0 and 6 demonstrated that without imputation of the data, the results of the PCA were similar. The cost to assay these hormones was over

$215,000 making assaying hormones on a larger longitudinal cohort likely cost prohibitive.

Ultimately the extent of data available did not restrict the ability to do this analysis and it is highly unlikely that there will be another longitudinal cohort study with more repeat measures of sex hormone data within the next ten years.

FUTURE RESEARCH

This study revealed the heterogeneity of hormones levels around the time of thelarche and the association of the identified sex hormone phenotypes around the time of thelarche with ages of pubertal milestones. Hopefully, at a future time, these analyses can be conducted in a larger longitudinal cohort in a nationally representative population. The phenotypes identified in

87 this study will be useful for a multiple future analyses some already identified, and many that we have not even thought of yet.

There is further research that could be done with this existing cohort. One idea is to follow up with data from the previously collected menstrual cycle journals of these girls to see if any of the phenotypes are more likely to experience irregular periods than others. A second idea is to analyze other existing data including included glucose, insulin, and cholesterol levels for their association with each of the phenotypes.

Following the girls in this cohort as they progress through the “windows of susceptibility to breast cancer” including ages of , lactation duration, age of menopause and diagnoses of adult disease would be extremely informational. Although it would incur large costs, it would be interesting to see if the phenotypes are predictive of different breast densities in the girls in early adult hood or if any of the phenotypes are likely to have more girls who are

BRCA gene carriers. Another future study could include the likelihood of the occurrence of polycystic ovarian syndrome, breast cancer and other adult onset diseases.

FINAL THOUGHTS

Better understanding of why some girls enter puberty at different ages and experience varying lengths of pubertal tempo is crucial for a better understanding of women’s health.

Although this research produced some very useful findings, it also yielded multiple new research questions regarding hormones relative to puberty in young girls. Further research needs to identify the mechanisms by which the release of sex hormones occurs, determine why serum concentrations are different for every girl, and also investigate ways to reduce the risk factors for early pubertal development in girls due to the foreseen adverse health effects in adults.

88 REFERENCES

1. Sizonenko PC, Paunier L, Carmignac D. Hormonal changes during puberty. Horm. Res. 1976;7(4–5):288–302. 2. Hagen CP, Mieritz MG, Nielsen JE, Anand-Ivell R, Ivell R, Juul A. Longitudinal assessment of circulating insulin-like peptide 3 levels in healthy peripubertal girls. Fertil. Steril. 2015;103(3):780–786.e1. 3. Singh GKS, Balzer BWR, Kelly PJ, Paxton K, Hawke CI, Handelsman DJ, Steinbeck KS. Urinary sex steroids and anthropometric markers of puberty - A novel approach to characterising within-person changes of puberty hormones. PLoS One 2015;10(11):1–13. 4. Shi L, Wudy SA, Buyken AE, Hartmann MF, Remer T. Body fat and animal protein intakes are associated with adrenal secretion in children. Am. J. Clin. Nutr. 2009;90(5):1321–8. 5. Remer T, Manz F. Role of nutritional status in the regulation of adrenarche. J. Clin. Endocrinol. Metab. 1999;84(11):3936–44. 6. Sizonenko PC, Paunier LUC. Hormonal changes in puberty III: correlation of plasma dehy- droepiandrosterone, testosterone, FSH, and LH with stages of puberty and bone age in normal boys and girls and in patients with Addison’s Disease or Hypogonadism or with premature or late adrena. J. Clin. Endocrinol. Metab. 1975;(March):894–904. 7. Nottelmann ED, Susman EJ, Inoff-Germain G, Cutler GB, Loriaux DL, Chrousos GP. Developmental processes in early adolescence: relationships between adolescent adjustment problems and chronologic age, pubertal stage, and puberty-related serum hormone levels. J. Pediatr. 1987;110(3):473–80. 8. Shirtcliff EA, Dahl RE, Pollak SD. Pubertal development: correspondence between hormonal and physical development. Child Dev. 2009;80(2):327–37. 9. Newandee DA, Reisman SS, Bartels AN, De Meersman RE. COPD severity classification using principal component and cluster analysis on HRV parameters. Proc. IEEE Annu. Northeast Bioeng. Conf. NEBEC 2003;2003–Janua:134–135. 10. Toledano MB, Smith RB, Brook JP, Douglass M, Elliott P. How to establish and follow up a large prospective cohort study in the 21st century--lessons from UK COSMOS. PLoS One 2015;10(7):e0131521. 11. Duncan GJ, Kirkendall NJ, Citro CF. Panel on the design of the National Children’s Study and implications for generalizability of results. 2014. Available at: https://www.ncbi.nlm.nih.gov/books/NBK242362/. Accessed March 6, 2019. 12. McMurtry CM, Noel M, Chambers CT, McGrath PJ. Children’s fear during procedural pain: preliminary investigation of the Children’s Fear Scale. Heal. Psychol. 2011;30(6):780–788.

89 13. Brooks-Gunn J. Overcoming barriers to adolescent research on pubertal and reproductive development. J. Youth Adolesc. 1990;19(5):425–440. 14. Goodman E. Factor analysis of clustered cardiovascular risks in adolescence: obesity is the predominant correlate of risk among youth. Circulation 2005;111(15):1970–1977. 15. Burgel PR, Paillasseur JL, Caillaud D, Tillie-Leblond I, Chanez P, Escamilla R, Court- Fortune I, Perez T, Carré P, Roche N. Clinical COPD phenotypes: A novel approach using principal component and cluster analyses. Eur. Respir. J. 2010;36(3):531–539. 16. Cho MH, Washko GR, Hoffmann TJ, Criner GJ, Hoffman EA. Cluster analysis in severe emphysema subjects using phenotype and genotype data : an exploratory investigation. Respir. Res. 2010;11(30). doi:10.1186/1465-9921-11-30. 17. Haldar P, Pavord ID, Shaw DE, Berry MA, Thomas M, Brightling CE, Wardlaw AJ, Green RH. Cluster analysis and clinical asthma phenotypes. Am. J. Respir. Crit. Care Med. 2008;178(3):218–224. 18. Just J, Gouvis-Echraghi R, Rouve S, Wanin S, Moreau D, Annesi-Maesano I. Two novel, severe asthma phenotypes identified during childhood using a clustering approach. Eur. Respir. J. 2012;40(1):55–60. 19. Kurukulaaratchy RJ, Zhang H, Raza A, Patil V, Karmaus W, Ewart S, Arshad SH. The diversity of young adult wheeze: a cluster analysis in a longitudinal birth cohort. Clin. Exp. Allergy 2014;44(5):724–35. 20. Ye L, Pien GW, Ratcliffe SJ, Björnsdottir E, Arnardottir ES, Pack AI, Benediktsdottir B, Gislason T. The different clinical faces of obstructive sleep apnoea: a cluster analysis. Eur. Respir. J. 2014;44(6):1600–7. 21. Vavougios GD, Natsios G, Pastaka C, Zarogiannis SG, Gourgoulianis KI. Phenotypes of comorbidity in OSAS patients: combining categorical principal component analysis with cluster analysis. J. Sleep Res. 2016;25(1):31–38. 22. Biro FM, Pinney SM, Huang B, Baker ER, Walt Chandler D, Dorn LD. Hormone changes in peripubertal girls. J. Clin. Endocrinol. Metab. 2014;99(10):3829–3835. 23. Biro FM, Galvez MP, Greenspan LC, Succop PA, Vangeepuram N, Pinney SM, Teitelbaum S, Windham GC, Kushi LH, Wolff MS. Pubertal assessment method and baseline characteristics in a mixed longitudinal study of girls. Pediatrics 2010;126(3):e583-90.

90 APPENDIX A

Contributors’ Statement Page Cecily Fassler, MS: Developed hypotheses and specific aims; designed study, wrote all of the SAS code and conducted all of the statistical analyses, wrote all of the dissertation.

Susan M. Pinney, PhD: Principal investigator, conception of project, obtained funding for these analyses, close collaboration on design of study, edited and contributed to all chapters of the dissertation.

Frank M. Biro, MD: Principal investigator, funding for past effort to collect data, contribution to design of dissertation project, edited and contributed to chapters 2 and 3; and especially provided input for clinical translation

Iris Gutmark-Little, MD: contribution to design of dissertation project, edited and contributed, to chapters 2 and 3 f; and especially provided input for clinical translation

Changchun Xie, PhD: advised on statistical analyses including imputation methods; contributions to chapters 2 and 3 of dissertation.

Courtney Giannini: contributed to the data preparation

91 APPENDIX B SAS CODE libname Visits 'S:\OCCH\DOCS\BCERC\Hormone Analyses\Fassler - QE Dissertation\AIM 2\SAS Data'; /* dataset with both BCERC and STUDY ID */

data girlspca; set visits.logclfdissgirls; run; data girlspca; set girlspca; if bmiz ne . ; /* gets rid of the girls who do not have hormones since all girls wiht hormones have bmiz)*/ run;

proc means; var testn6 test0 testp6 estronen6 estrone0 estronep6 dheasn6 dheas0 dheasp6 e2n6 e20 e2p6 test3 test4 estro3 estro4 dheas3 dheas4 e2str3 e2str4; run;

data girlspca; /* deletes 9 girl that are put into an extranoius cluster that means nothing because they have no data */ set girlspca; if testn6 = . and estronen6 = . and dheasn6 = . and e2n6 = . and test0 = . and estrone0 = . and dheas0 = . and e20 = . and testp6 = . and estronep6 = . and dheasp6 = . and e2p6 = . then delete; run; proc means data=girlspca; var testn6 test0 testp6 estronen6 estrone0 estronep6 dheasn6 dheas0 dheasp6 e2n6 e20 e2p6 test3 test4 estro3 estro4 dheas3 dheas4 e2str3 e2str4; run; data girlspca2; set girlspca; run;

proc standard data=girlspca out=girlsstd mean=0 std=1; var testn6 test0 testp6 estronen6 estrone0 estronep6 dheasn6 dheas0 dheasp6 e2n6 e20 e2p6; run;

proc means data=girlspca; var testn6; run; proc means data=girlsstd; var testn6; run; proc fastclus data=girlsstd outseed=mean1 impute maxclusters=12 maxiter=0 summary; var testn6 test0 testp6 estronen6 estrone0 estronep6 dheasn6 dheas0 dheasp6 e2n6 e20 e2p6; run;

proc sgscatter data=mean1; compare y=(_gap_ _radius_) x=_freq_; run;

92 data seed; set mean1; if _freq_>5; run;

proc fastclus data=girlsstd seed=seed impute maxclusters=3 strict=6 out=girls3c; var testn6 test0 testp6 estronen6 estrone0 estronep6 dheasn6 dheas0 dheasp6 e2n6 e20 e2p6; run; proc candisc data=girls3c out=girls3can; class cluster; var testn6 test0 testp6 estronen6 estrone0 estronep6 dheasn6 dheas0 dheasp6 e2n6 e20 e2p6; run;

run; proc sgplot data=girls3can; styleattrs datacontrastcolors=(purple green orange black red); scatter y=can2 x=can1 /group=cluster; xaxis display=(nolabel); yaxis display=(nolable); title 'All Girls included in Cluster Analysis - FASTCLUS'; run;

data firstg; set girls3c; if cluster=1 or cluster=2; run; data firstg (keep=study_id cluster); set firstg; run; data girlslgc; set girls3c; if cluster=3; run;

proc contents data=girlslgc; run;

/*writes out the girls in cluster 3 to do a pca on them to see if this cluster anlaysis makes sense based on the same objective predictive variables - PCA does not support varaible reduction*/ /* data Visits.cluster3; set girlslgc; run; */

93 data girlslgc; set girlslgc (drop=cluster distance _impute_); run; proc fastclus data=girlslgc seed=seed impute maxclusters=3 strict=6 out=bigc; var testn6 test0 testp6 estronen6 estrone0 estronep6 dheasn6 dheas0 dheasp6 e2n6 e20 e2p6; run;

proc candisc data=bigc out=girls3can; class cluster; var testn6 test0 testp6 estronen6 estrone0 estronep6 dheasn6 dheas0 dheasp6 e2n6 e20 e2p6; run; proc sgplot data=girls3can; scatter y=can2 x=can1 /group=cluster; title 'Large Cluster brokend down - max cluster=3 FASTCLUS'; run;

proc sort data=bigc; by cluster; run; data cls1; set bigc (keep=study_id cluster); if cluster=1 then cluster=6; if cluster=2 then cluster=31; if cluster=3 then cluster=32; run; proc freq; tables cluster; run; proc sort; by study_id; run; data firstg; set firstg; run; proc freq; tables cluster; run; proc sort; by study_id; run; proc contents data=cls1; run; proc contents data=firstg; run; data totalgc; set cls1 firstg; run;

94 proc sort data=totalgc; by study_id; run; proc freq; tables cluster; run; proc sort data=girlspca2; by study_id; run;

data clusm1; merge girlspca2 totalgc; by study_id; run; proc sort data=clusm1; by cluster; run; proc means data=clusm1; var testn6 test0 testp6 estronen6 estrone0 estronep6 dheasn6 dheas0 dheasp6 e2n6 e20 e2p6 test3 test4 estro3 estro4 dheas3 dheas4 e2str3 e2str4 ; run; data clusm1; set clusm1; if cluster=6 then delete; if cluster=. then delete; run; proc freq; tables cluster; run; data visits.clustergl(keep=study_id cluster); set clusm1; run;

/* now in clustedemographics program */ proc sort data=clusm1; by cluster; run;

proc means data=clusm1; var testn6 test0 testp6 estronen6 estrone0 estronep6 dheasn6 dheas0 dheasp6 e2n6 e20 e2p6 test3 test4 estro3 estro4 dheas3 dheas4 e2str3 e2str4 ; by cluster; run;

proc npar1way data=clusm1 dscf ; class cluster; var testn6 test0 testp6 estronen6 estrone0 estronep6 dheasn6 dheas0 dheasp6 e2n6 e20 e2p6 test3 test4 estro3 estro4 dheas3 dheas4 e2str3 e2str4; run; ods dscf;

95 run; ods excel file='S:\OCCH\DOCS\BCERC\Hormone Analyses\Fassler - QE Dissertation\AIM 2\SAS Output\diffhormonemn.xlsx'; proc means data=clusm1; var testn6 test0 testp6 estronen6 estrone0 estronep6 dheasn6 dheas0 dheasp6 e2n6 e20 e2p6 test3 test4 estro3 estro4 dheas3 dheas4 e2str3 e2str4 ; by cluster; run; ods excel close; run;

96