Contact us:
040 4016 5703 099 6344 0404
Follow us:

Cutting-edge research: machine learning identifies early predictors of type 1 diabetes

In a study recently published in the Cell Reports Medicine Journal, scientists utilized plasma protein proteomics to identify proteins associated with the onset of type 1 diabetes.

Over 2,250 samples from 184 participants yielded 376 regulated proteins identified using machine learning analyses to predict autoimmunity preceding type 1 diabetes.

These results give insights into the pathways altered during type 1 diabetes development and allow the prediction of the disease six months before its onset.

What is type 1 diabetes?

Type 1 diabetes (T1D) is an autoimmune disorder estimated to affect 20 million people worldwide and is responsible for reducing life expectancy in patients by 11 years. It is characterized by the body's rejection and destruction of β cells due to the development of autoantibodies against the individual's pancreatic islet proteins, a process termed "seroconversion." A cure for this condition does not yet exist.

β cells are responsible for insulin production, and their destruction results in many ailments, including blindness, kidney failure, and cardiovascular disease. Hitherto, the triggers and mechanisms of T1D remain poorly understood.

Recent programs, including The Environmental Determinants of Diabetes in the Young (TEDDY) study, have arisen to elucidate T1D, thus enabling future therapeutic interventions.

These programs have identified plasma proteomics as a viable means to identify biomarkers associated with T1D, thereby gaining insight into the genetic and environmental determinants of the disease.

Analysis of these proteins may increase researchers' predictive powers and provide healthcare practitioners with viable means to treat T1D in the future. Unfortunately, many previous studies have been unable to systematically validate their study participants, thereby confounding the interpretation of results.

About the study

In the present study, researchers conducted a nested case-control study on individuals in the TEDDY cohort. The two-phase research was divided into the discovery phase and the subsequent validation phase.

In the discovery phase, 184 randomly selected donors aged 0–6 years (92 samples + 92 controls) each provided 2,252 plasma samples collected at multiple time points over 18 months. These samples were sequenced using mass spectrometry, and the resulting proteomes were analyzed to identify the 14 most abundant proteins in each sample.

The validation phase comprised 990 donors specifically selected based on their biomarkers, genetic, and demographic characteristics. Researchers developed and deployed a quality control analysis in a real-time (QC-ART) system to ensure data collection quality, which automated data management over the 18-month study.

Thirty-six thousand two hundred fifty-two peptides from 1,720 proteins were thus identified, of which the 376 proteins that had the highest coefficient of variance and were repeated most often were used for statistical analyses.

Researchers finally utilized machine learning (ML) models to predict phenotype based on the 376 proteins identified during phases one and two.

Models specifically tested if identified proteins could serve as biomarkers to predict if a donor would remain in the islet autoimmunity (IA) phase or if this would progress into T1D. Hundred bootstrap iterations of these models were carried out, and logistic regressions with LASSO penalizations were employed to build and identify best-fit models.

Study findings

The present study identified 376 proteins associated with the spectrum of IA, ranging from normoglycemia to complete T1D onset.

These proteins were overexpressed in coagulation and complement cascade-related processes, known to cooccur with T1D-related nutrient digestion and absorption, inflammatory signaling, blood clotting, and cellular metabolism.

Proteins identified in three- to nine-month-old donors were found to predict their development of T1D by age six years successfully. Shifts in protein composition pre-seroconversion were observed in donor metabolic profiles, which ML models used to be able to predict T1D 6–12 months before disease onset.

The study identified and validated 83 biomarkers that can be used in future clinical studies to identify T1D in patients with a genetic predisposition to the disease.

The study's main limitation was that all donors were derived from the TEDDY study cohort – individuals with a genetic predisposition to T1D and of American and European descent. Further studies including individuals from a more diverse set of regions and those without a family history of T1D would help improve the robustness of these results.


Researchers utilized thousands of TEDDY donor samples to identify 376 proteins associated with the future onset of type 1 diabetes.

Machine learning models could use these proteins to accurately predict whether individuals with different permutations of these proteins would remain carriers for T1D or would seroconvert to express the autoimmune disorder up to six months before the onset of the disease.

Of the proteins identified, 83 were termed 'biomarkers' and can be used in clinical and scientific trials in the future. This research is the first robustly validated step in understanding the underlying genetic mechanisms and environmental triggers of T1D.

It forms the basis for future studies with more geographically diverse samples to build upon. Eventually, this study may pave the way for hitherto unavailable therapeutic interventions for this widespread condition.

No Comments Yet.

Leave a reply