Open Cycle: Forecasting Ovulation for Family Planning Karen Clark Southern Methodist University, [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Southern Methodist University SMU Data Science Review Volume 1 | Number 1 Article 2 2018 Open Cycle: Forecasting Ovulation for Family Planning Karen Clark Southern Methodist University, [email protected] Mridul Jain Southern Methodist University, [email protected] Araya Messa Southern Methodist University, [email protected] Vinh Le Southern Methodist University, [email protected] Eric C. Larson Southern Methodist University, [email protected] Follow this and additional works at: https://scholar.smu.edu/datasciencereview Part of the Investigative Techniques Commons, and the Other Analytical, Diagnostic and Therapeutic Techniques and Equipment Commons Recommended Citation Clark, Karen; Jain, Mridul; Messa, Araya; Le, Vinh; and Larson, Eric C. (2018) "Open Cycle: Forecasting Ovulation for Family Planning," SMU Data Science Review: Vol. 1 : No. 1 , Article 2. Available at: https://scholar.smu.edu/datasciencereview/vol1/iss1/2 This Article is brought to you for free and open access by SMU Scholar. It has been accepted for inclusion in SMU Data Science Review by an authorized administrator of SMU Scholar. For more information, please visit http://digitalrepository.smu.edu. Clark et al.: Forecasting Ovulation for Family Planning Open Cycle: Forecasting Ovulation for Family Planning Karen Clark1, Mridul Jain1, Vinh Le1, Araya Messa1, and Eric C. Larson1 1 Southern Methodist University, Master of Science in Data Science Dallas, Texas 75275 ([email protected], [email protected], [email protected], [email protected], [email protected]) 1 Introduction Historically, biological knowledge of a woman’s menstruation cycle has been a black box. It was not until the 19th Century that doctors realized it was linked to ovulation and a woman’s fecundity. Motherhood is often a positive and exciting time, but unanticipated pregnancies and struggles of getting pregnant are often the most stressful issues dealt within a relationship [1]. Family planning enables a couple to space and limit pregnancies that has a direct impact on a couple’s well-being and, often, directly impacts the health of the mother [2]. Predicting ovulation can be valuable for couples because it allows planning around times of increased fertility. However, it is difficult to predict ovulation because not all women follow a normal cycle. There are many factors such as age, health, and nutritional status that can have a direct impact to women’s menstruation cycle. There have been many techniques used by women to predict ovulation, but each technique has its own issues and no method has been found to be universally applicable for all women. 2 A Primer on Basal Body Temperature and Family Planning Pregnancy, especially unintended pregnancies, are also associated with many negative health and economic consequences that could be avoided through family planning. From an economic perspective, family planning can increase women’s labor-force participation, increase wages, and increase lifetime earnings. From a health perspective, women that understand the benefits of family planning can space pregnancies out at least 2 years apart to allow time to care for the new baby and to recuperate after the birth of a child. Frequent and numerous pregnancies are more likely to lead to maternal death from hemorrhage, toxemia, or septicemia [3]. During pregnancy, a woman may be faced with severe or chronic malnutrition due to sickness. Thus, her defenses to fight infection or illness will decrease. Couples who plan their births for the times when the mother is best prepared avoid high-risk pregnancies [3]. Over time knowledge about the female reproductive physiology has grown and natural methods of family planning have started to gain popularity [4]. A difference in Published by SMU Scholar, 2018 1 SMU Data Science Review, Vol. 1 [2018], No. 1, Art. 2 a woman’s body temperature was first documented in 1868 by Dr. W. Squires, a physician at a Tuberculosis Sanatorium. He noted there were two different temperature levels recorded in female patient charts, but he had no explanation for these differences. Later in 1904, Dr. Van der Velde Holland concluded that the temperature changes observed were related to ovulation [5]. It wasn’t until the 1930’s and early 1940’s that the temperature change was found to correspond to the hormonal and endometrial changes that occur with ovulation. Dr. Zuck, an American physician, established that a woman was at her highest fecundity during these temperature changes [6]. Finally, in 1947, a Belgium physician recommended that women should monitor their basal body temperatures to help with family planning. This sounded easy, but physicians found that the basal body temperature may only increase 0.5 to 1.0-degree Fahrenheit, and in most instances, it could take four to five days for the temperature to stabilize. In 1963, Dr. John Marshall, a neurologist conducted a prospective trial of fertility regulation by using the basal body temperature method from a data set that included temperature readings from 1,798 women who took part in the natural family planning clinic provided by the Catholic Marriage Advisory Council of England and Wales in 1950 [7]. 3 The Catholic Marriage Advisory Council Data Set The data used in our study is based on the same data set that Professor John Marshall (Department of Statistical Sciences of the University of Padua) used in his study of women’s menstrual cycle and predicting ovulation by monitoring a woman’s basal body temperature. The data set is registered information from 1,798 women, each having at least one sequence of six cycles. A cycle is defined as a menstrual cycle starting from the first day of the onset of menstruation up to and including the day before the next cycle begins. The total number of segments were 2,397 with each woman having a maximum of eight distinct recorded segments. The statistical information from the charts sent by women was converted into the electronic data to perform analysis. The women who took part in this study were not charged for services or discriminated against based on race, nationality, or religious affiliation [7]. Data was captured when the family planning service began in 1950, initially using calendar method and later superseded by Basal Body Temperature (BBT) method. This method used the temperature shift associated with ovulation to determine the fertile and infertile phases of the menstrual cycle. The analysis of the BBT was used on a single spike among the last six lower temperature levels. The spike was defined by an increase in BBT of 0.2 degrees Fahrenheit above the corresponding temperatures. The predicted ovulation was then recorded for that cycle. The original prediction of ovulation was done manually as the data was analyzed after the participant sent their temperature recordings back to the family planning service. The complete data set is comprised of 36,140 instances (this number excludes the 662 cases where charts or data were missing) and 112 features. Age was captured for only 1,074 participants and ranged between 18 to 50 years of age. The assumption made was that the temperature spike seen could be used and ovulation could be calculated. The participants included women that were considered to have no fertility issues, https://scholar.smu.edu/datasciencereview/vol1/iss1/2 2 Clark et al.: Forecasting Ovulation for Family Planning therefore, there was no reason to believe their cycles are abnormal. The data set is a time series with only BBT recordings. It does not provide any other data such as cervical mucus observations or hormone levels. Additional information was included in the 14,231 cycles, in which the BBT was recorded every day, from the first to the last day of the cycle. In 14,520 cycles, only the first two temperatures were missing and in 14,564 cycles only the last two temperatures were missing [7]. This information is important because the data set used provided us with a robust set of usable BBT records to use in our predictive analysis. The data was split hierarchically into levels. 1. Unique identifier for each participant. 2. Group identifier for each set of consecutive cycles per participant. 3. The number of cycles within the group. 4. Cycle identifier 5. Total number of cycles for that participant. 6. Date of the first menstrual cycle for the participant. 7. Time span of the cycle. 8. Manually calculated preovulation day for the participant. 9. Length of menstruation cycle in days for the participant. 10. Temperature readings ranging from 1 to 99 individual recordings. Each participant’s menstrual cycles were collected within “groups” of consecutive cycles. The consecutive cycles were then grouped together for a single woman (“Donna” in Italian) [7]. 3.1. Data Characteristics Rudimentary visual analysis of the sequences of cycle length and preovulation phase length provides some common features that were observed [8]: 1. The data was recorded in days, and observed length of cycles are discrete. 2. The length of a woman’s cycle did show a trend downward as a woman’s age increased. 3. There were several instances where women recorded long cycles and there were other instances when women recorded a change in the mean level. We can only assume this is representative of a change in life style, health, physical, or behavior since we were not provided any ancillary information. 4. We did see that long cycles in the data set were commonly followed by a shorter cycle. This did impact our ability to predict ovulation for the second cycle when the temperature set was not long enough for the algorithm. 5. Data from some subjects, included either small or large observations, can only be interpreted as outliers when compared to the woman’s normal pattern.