01/13/2025
Researchers investigate machine learning models’ ability to analyze electronic health records for patterns in disease development.
Researchers from Cleveland Clinic’s Center for Quantitative Metabolic Research are exploring how unsupervised machine learning (ML) algorithms can analyze electronic health records (EHR) gathered across a lifetime. Findings published in PLOS Digital Health show how unsupervised ML clustering methods using this data – otherwise known as longitudinal EHR – can support effective clinical research.
Unsupervised longitudinal ML can identify groups within massive amounts of data based on clinical similarities, which makes it useful for identifying patterns that otherwise may have gone unnoticed using other ML approaches. These methods are particularly useful for analyzing patient data over time, since certain factors like BMI or blood pressure may change, this information helps project future risks. For clinical trials, unsupervised ML also has the potential to identify why a certain drug may not have worked for a group of patients – and who might respond.
“Identifying patient subtypes is important because it can help us understand why disease affects different patients in different ways,” says Daniel Rotroff, PhD, Director of the Center for Quantitative Metabolic Research. “By identifying clinical similarities among patients, researchers can better predict which patients are most likely to respond to a certain drug or treatment.”
Longitudinal EHR contains patient health information across their lifetime. These records include demographics, lab results, vital signs, medications and other information.
By analyzing patients’ EHR, researchers have an opportunity to identify patterns and insights of disease progression from real-world cases and use that information for future patients. This can help researchers identify subtypes of patients that have the same disease and are also similar due to genetics, preexisting conditions and other clinical factors.
There are many factors that can influence if a drug will be effective including genetics, sex and BMI.
“One of the primary determinants for whether a medication will be used to treat patients is success in clinical trials,” says Arshiya Mariam, PhD, a postdoctoral fellow and lead author of the study. “However, without a clear understanding of which patients are likely to respond, the trial may be unsuccessful, even if the drug works well in a subset of patients. Tools for identifying clinically relevant patient subtypes may help inform whether a drug should go to market and which patients are best suited for it.”
Before a clinical trial, it can be difficult to determine what factors are going to have the greatest impact on a drug’s effectiveness. This can lead researchers to select a patient group with factors that make a drug appear ineffective – even though it may be able to help a different group of patients.
The team first tested their unsupervised ML algorithms on a large, simulated dataset before evaluating its use on data from more than 43,000 pediatric patients who all had different BMIs and metabolic health statuses (e.g. healthy, diabetes). Patients’ medical records can have gaps depending on provider history and what information is recorded, the use of simulated data helps to understand which methods handle these types of real-world challenges the best.
One of the models that performed the best on the simulated information was used to classify patients at higher risk of pediatric metabolic syndrome, which is when a patient has a group of conditions that elevates risk for diseases like type 2 diabetes. The algorithm identified five different groups based on how a patient’s body mass index (BMI) increases or decreases over time, and used this to identify which patients are at risk of future disease.
“Our findings demonstrate the algorithms’ ability to successfully group patients based on clinical factors that can predict disease development,” says Dr. Rotroff. “This information sets the stage for future tools to help clinicians determine exactly which patients are best suited for clinical trials, drugs and other treatments.”
Discover how you can help Cleveland Clinic save lives and continue to lead the transformation of healthcare.
Give to Cleveland Clinic