Have you tried a fertility tracking app, only to find that it didn’t work for you? Fertility apps and devices, which make up the bulk of the rapidly growing, billion dollar “Femtech” industry, rely on built-in algorithms to give users feedback on their fertility. However, since each algorithm is designed with a single specific purpose in mind, aiming to optimize certain metrics or reduce specific inaccuracies, no one model can optimize every fertility-related goal. That means one algorithm can’t simultaneously optimize for avoiding pregnancy and achieving pregnancy and health monitoring. And while statistical techniques can be very helpful in detecting shifts in fertility, algorithms rarely, if ever, work with the exact same efficacy rates across subgroups of a population such as older women, or women with irregular cycles.
When it comes to various health and fertility goals, not all Femtech algorithms are created equal
Unfortunately, these facts are rarely well-explained to users. As a result, users may be quick to dismiss algorithmic techniques simply because their selected method is a poor fit with their personal intentions or health history. Certain subsets of users may become frustrated with fertility tracking technology, particularly if they are learning a fertility awareness method (FAM) solely through the app without a foundational understanding of reproductive anatomy and physiology and the body’s fertility biomarkers, and sans the guidance of a real-life, trained instructor.
Understanding how a user’s choice of FAM technology may or may not be working in sync with their personal family planning or health goals is key. Here we’ll distinguish between two types of FAM app algorithmic methods—the symptohormonal and symptothermal models—and examine whether each type of model is optimized for pregnancy achievement, avoidance, or health monitoring.
Symptohormonal monitors and test strips may be best for women seeking to achieve pregnancy
FAM app algorithms fall into one of two categories: algorithms based on symptohormonal monitors such as the ClearBlue Easy Fertility Monitor, and ones that use symptothermal basal body temperature (BBT) algorithms. Each monitor is prone to particular intrinsic bias.
A 2003 study assessed the rates at which urinary hormone monitors and symptothermal methods falsely predicted fertile and infertile days (false positive and false negative rates, respectively) [1]. The Persona monitor reported more false infertile (days that the algorithm called infertile but were actually fertile) days than the symptothermal methods. The study authors therefore concluded that many of the sympthormonal monitors may be a better fit for women looking to conceive.
A new at-home ovulation confirmation test called Proov tests urinary levels of a progesterone byproduct, and has quickly gained traction as a tool for women looking to better understand both whether they’ve ovulated and the quality of ovulation. In 2019, researchers found that the test accurately detected ovulation in 82% of participants, though just thirteen women were studied [2]. The study did not verify ovulation by any other method (such as with imaging, blood tests, or basal body temperature rise), so researchers could not calculate false positive or false negative rates. However, the study authors noted in their conclusions that having a higher proportion of older reproductive-age women in the study, whose post-ovulation progesterone levels may naturally be lower than younger women’s, may help explain why the ovulation prediction rate was not 100%. Additional research is needed to reveal whether the Proov test has systematic inaccuracies for certain subgroups, perhaps stratified by age.
Caveat: Research on symptohormonal method monitors is still in its infancy
The literature on sensitivity and accuracy rates for symtohormonal monitors is still sparse and often relies on studies with small sample sizes. The general consensus is that symptohormonal monitors are certainly useful for users looking to conceive. For women looking to avoid pregnancy, some studies suggest using a symptohormonal method in tandem with other biomarkers like BBT.
For women in health-monitoring mode who are simply looking to understand their own cycle or perhaps get to the bottom of issues like polycystic ovary syndrome (PCOS) or endometriosis, identifying both the occurrence of ovulation and its quality via a symptohormonal method may also be tremendously helpful.
Symptothermal Femtech may be better optimized for pregnancy avoidance
The 2003 study mentioned above found that the symptothermal method of tracking BBT was the most accurate predictor of fertility and infertility when compared to seven symptohormonal monitors, suggesting its particular usefulness for women seeking to avoid pregnancy. Still, the class of algorithms that govern symptothermal methods is again subject to unique biases.
BBT algorithm-based Femtech options include the Natural Cycles and Clue apps, both of which recently received FDA approval as contraceptives. Each app charges a high monthly fee, and utilizes a proprietary algorithm to process users’ data and use complex models to predict fertility. Without examining the source code of these algorithmic options, it’s impossible to know which groups of women might be disadvantaged in their ability to use the method effectively.
Some women who experienced unexpected pregnancies while using Natural Cycles have criticized the company for this lack of transparency. Although the number of women reporting unplanned pregnancies falls within the reported expected failure rates of the Natural Cycles app, these women say they did not understand whether the app was a good fit for them in the first place. One woman reported learning only after the fact that because she experienced highly irregular cycles and struggled with undiagnosed polycystic ovary syndrome, she was a bad candidate for using the app.
For other groups of women though, BBT algorithms may be a poor fit for other reasons. The Natural Cycles algorithm is tuned to be exceedingly cautious on reporting “red days” (i.e. days one should avoid intercourse) for users who intend to use the app to avoid pregnancy. A user with very regular cycles might experience more “green days” while using another method that includes additional biomarkers. The number of reported infertile days for BBT algorithm users varies tremendously, but the reasons why are obscured for users.
A note on wearable thermometers
BBT algorithms often rely on readings from a standard oral thermometer, but wearable thermometers are becoming increasingly popular. Wearable thermometers have algorithms of their own, meaning that a woman using a charting app will be subject to an algorithm twice (once from her wearable thermometer data, and then again once the data gets inputted into the charting app).
Oura Ring
Research examining sensitivity of several temperature reading algorithms found that one wearable device, the Oura ring, generally reported accurate thermometer readings [3]. The study sample only included 22 participants in Finland, but even within that small sample size, temperature readings were less reliable for women with excess weight.
Ava bracelet
The Ava Fertility bracelet also tracks wrist temperature during sleep. In 2021, researchers examined the accuracy of the Ava bracelet readings against the standard oral BBT readings [4]. They reported that wrist temperature measurement detected a greater temperature difference between the follicular and luteal phases. Ava bracelet readings correctly detected ovulation more often than standard thermometer BBT readings (the true positive rate). However, Ava bracelet readings also reported a higher false positive rate than BBT readings. This means that Ava bracelet users received incorrect information that they had ovulated more frequently than they did with BBT readings. Researchers suggested that women who are seeking to become pregnant could benefit from the increased accuracy of the Ava bracelet. But there is no general advantage (or disadvantage) to the Ava bracelet for women looking to avoid pregnancy.
As more women chart their fertility using mobile applications and wearables, companies should make an increased effort to make accuracy and sensitivity rates relevant to subgroups, and based on various health and fertility goals. For women who are new to FAM methods, the use of these apps and devices without adequate training or a more thorough understanding of biomarkers may complicate their efficacy.
Learn from a trained instructor in real life first
The shortcomings of algorithmic methods, be they symptohormonal or symptothermal, need not necessarily deter women from using algorithmic apps to determine their fertility. When data is collected with preventative measures in place to ensure privacy, these datasets can also be rich resources for scientists to better understand women’s health. Still, relying on an algorithm currently necessitates a certain level of trust in an as-yet opaque model.
Women who want to use Femtech will likely be best served by first learning a method from a real-life trained FAM instructor who covers all the basics of reproductive anatomy and physiology and biomarkers, plus helps each woman determine how to properly identify her own biomarkers. When the model outcomes are as important and personal as women’s health monitoring or pregnancy-related goals, matching user intent with the right method is critical.
References:
[1] Freundl, Günter, et al. “Estimated maximum failure rates of cycle monitors using daily conception probabilities in the menstrual cycle.” Human Reproduction, vol. 18, no. 12 (2003): pp. 2628-33. DOI:10.1093/humrep/deg488 [2] Bouchard, Thomas P., Richard J. Fehring, and Mary Schneider. “Pilot evaluation of a new urine progesterone test to confirm ovulation in women using a fertility monitor.” Frontiers in public health, vol. 7 (2019): p. 184. doi: 10.3389/fpubh.2019.00184 [3] Maijala, Anna, et al. “Nocturnal finger skin temperature in menstrual cycle tracking: ambulatory pilot study using a wearable Oura ring.” BMC women’s health, vol.19, no. 1 (2019): pp. 1-10. DOI:10.1186/s12905-019-0844-9 [4] Zhu, Tracy Y., et al. “The accuracy of wrist skin temperature in detecting ovulation compared to basal body temperature: prospective comparative diagnostic accuracy study.” Journal of Medical Internet Research, vol. 23, no. 6 (2021): e20710. doi: 10.2196/20710Additional Reading:
Is your period and ovulation tracker legitimate?
Can you trust fertility tracking apps with your data?
Before switching to a fertility app, consider this
Discerning the efficacy of fertility awareness apps
The road ahead for fertility awareness, charting apps, and women’s health technology