Convergence of evolving artificial intelligence and machine learning techniques in precision oncology

Digital pathology
There are multiple areas within the field of digital pathology where AI/ML is being explored. Key applications include automation in immunohistochemistry (IHC) scoring, the inference of clinically relevant features beyond histology from hematoxylin and eosin (H&E) images, and novel insights from emerging tools for measuring multiplex, single-cell, and spatially resolved analytes from tumor tissue.
The role of AI in automating IHC biomarker scoring
AI-based technology may help standardize IHC assessments, including those used in routine practice for treatment selection based on biomarkers (e.g., PD-L1, HER2, ER, PR, Ki-67). This would be especially valuable as an assistance tool for pathologists because the standard manual approach is time consuming and is associated with high intra-observer variability33,34,35. An automated and quantitative AI-based technology has the potential to standardize the quality of patient care across centers and geographic areas by overcoming variability in assessment by pathologists, specifically in rare and complex cases, increasing accuracy and reproducibility, and reducing turnaround time33,36,37,38,39.
Automated AI-based IHC scoring systems have been evaluated by analyzing scans of whole-slide images (WSIs) of tumor samples in settings where the standard of care currently requires manual determination of protein expression by IHC33,40,41,42,43,44,45. For example, several independent groups have demonstrated the potential of AI-supported quantitative PD-L1 evaluation using CNNs38,40,41. Two separate groups developed CNN systems that were able to automatically detect the tumor area within WSIs and to calculate the IHC-based PD-L1 tumor proportion score (TPS) with high consistency between the AI systems and pathologists40,41.
Others developed a similar CNN PD-L1 TPS classifier and retrospectively analyzed 1746 samples across CheckMate studies of nivolumab combined with ipilimumab for the treatment of patients with various cancers38. The automated AI system classified more patients as PD-L1 positive (at both the 1% and 5% expression levels) compared with manual scoring in most tumor types. Importantly, similar improvements in response and survival were observed using both AI-powered and manual scoring. However, automated AI-powered digital analysis may identify more patients who would benefit from immunotherapy treatment compared with manual assessment38. This is because AI-powered methods can analyze larger datasets, detect subtle patterns, and provide more consistent evaluations, potentially reducing the variability inherent in manual assessments.
Recent advances in context-aware attention mechanisms, such as the Context-Aware Multiple Instance Learning (CAMIL) model, have significantly improved diagnostic accuracy in medical imaging. CAMIL prioritizes relevant regions within WSIs by analyzing spatial relationships and contextual interactions between neighboring areas. This approach reduces misclassification rates and enhances diagnostic reliability46.
The workflow for pathologists in the setting of breast cancer diagnosis is burdensome, as it includes manual quantitative IHC assessment with clinically relevant cutoff levels of multiple proteins including HER2, ER, PR, Ki-67, and PD-L1. HER2 assessment is known to be associated with significant diagnostic variability. Intra-tumoral heterogeneity within WSIs of tumor tissue hinders the accurate identification of all cells expressing the respective protein. In addition, manual counting of tumor cells to evaluate biomarker expression levels is associated with low efficiency and poor reproducibility. In clinical practice, the training and experience of pathologists significantly influence the accuracy of biomarker assessment (i.e., PD-L1 expression)47,48. For instance, untrained pathologists exhibit lower intraclass concordance in PD-L1 expression compared to their highly trained colleagues41.
One group assessed various ML and DL approaches to automated quantitative HER2 IHC scoring and found that a CNN model outperformed classical ML approaches49. Using 71 breast tumor samples, a concordance of 83% between the automated scoring system and a pathologist’s assessment was demonstrated. Discordance between automated and manual scoring was found to be associated with HER2 staining heterogeneity in these cases; notably, an independent review of the discordant cases led to a modification of the initial pathologist assessment in 8/12 cases, highlighting the potential utility of AI assistance for the identification of ambiguous cases49.
This potential benefit of using AI as an assistance tool was demonstrated in a separate study using a CNN to classify cells as either tumor or non-tumor and to quantify IHC staining intensity for ER/PR and Ki6742. The goal of the study was to evaluate the reliability of using an AI system as a diagnostic decision support tool in a routine clinical pathology setting (6 WSI scanners/microscopes; 3 staining machines; manual scoring by 10 pathologists from 8 different centers) by ensuring that the use of AI did not adversely impact the pathologist assessment. Individual AI analysis results were confirmed by pathologists in 95.8% of the Ki-67 cases and 93.2% of the ER/PR cases, indicating the reliability of IHC scoring with the support of the CNN AI tool. Statistical analysis also demonstrated high interobserver variance between pathologists in conventional IHC quantification, which decreased slightly with AI assistance42.
These reports indicate that AI can assist pathologists by automating IHC scoring, reducing inter-observer variability (a challenge associated with the determination of clinically relevant expression cutoffs), and shortening the diagnostic workup period. Prospective trials are needed to confirm the clinical validity and utility of these promising technologies.
The use of AI to predict biologic characteristics from H&E-stained WSIs
Given that DL models such as CNNs exhibit “representation learning” and are able to extract “deep” patterns from input data, these DL models have demonstrated the ability to reveal molecular characteristics from H&E-stained WSIs, as histology reflects biology33,50,51,52,53. These DL models have been associated with difficulties in “explaining” how they developed their predictions and offer the opportunity for identifying human-interpretable features (HIFs) based on cell morphology and histological patterns54. Investigators have prioritized identification of HIFs derived from CNN models when analyzing H&E WSIs from patients with cancer to predict molecular phenotypes. HIFs were correlated with established markers of the tumor microenvironment that are predictive of diverse molecular signatures, including expression of immune checkpoint proteins and homologous recombination deficiency, indicating that their application should be further explored54.
DL analysis of H&E images can predict molecular alterations prior to, and potentially in lieu of, performing IHC or molecular confirmatory testing. HER2 and BRCA expression was predicted from H&E-stained WSIs from patients with breast cancer using a CNN that separately processes H&E-stained slide patches or tiles and outputs an IHC label for the WSI54. The study demonstrated 83.3% and 53.8% prediction accuracy for HER2 and BRCA, respectively54. Similarly promising early results for BRCA prediction have been reported by others50. In addition, CNN-based analyses of H&E-stained WSIs have been used to prioritize patients for microsatellite instability (MSI)/mismatch repair deficiency (dMMR) testing to select patients for treatment with immunotherapy55,56. A CNN model to predict MSI was trained using 100 H&E-stained WSIs from patients with colorectal cancer and then validated on an independent validation cohort of 484 H&E-stained WSIs57. The model was associated with high levels of concordance, with an area under the receiver operating characteristic (AUROC) of 0.931 and 0.779 for the training and independent cohorts, respectively. A large international consortium trained and validated a CNN model to predict MSI/dMMR from nine cohorts that included 8,343 patients with colorectal cancer across different countries and ethnicities56. The CNN model achieved “clinical grade” performance, with an AUROC of up to 0.96, indicating that this AI system can rule out 25-50% of patients for MSI/dMMR testing56.
CNN models have also been used to predict EGFR, KRAS, and STK11 mutations from pathology images with high accuracy50,51,58,59,60. For instance, CNN-based analyses of two large H&E WSI datasets with matched genetic profiling across diverse tumor types were used to predict genetic alterations: The Cancer Genome Atlas (TCGA) dataset was used for model training and the Clinical Proteomic Tumor Analysis Consortium dataset was used for validation60. Multiple clinically relevant mutations were predicted (i.e., PTEN and TP53 in endometrial cancer, KRAS and BRAF in colorectal cancer, and EGFR in non-small cell lung cancer [NSCLC]) in both the training and validation sets, demonstrating the potential role of prioritizing patients for confirmatory genetic testing60.
Another CNN model was developed to predict the molecular classification using H&E WSIs from 2028 patients with endometrial cancer. The patient data were derived from three randomized trials and four clinical cohorts and divided into training and independent validation sets61. Using genomic and IHC assessments, patients were classified into one of four prognostic groups: POLEmut, dMMR, p53 abnormal (p53abn), and no specific molecular profile (NSMP). In the independent validation set, the model achieved class-wise AUROCs of 0.849 for POLEmut, 0.844 for dMMR, 0.883 for NSMP, and 0.928 for p53abn61. Subsequent analysis using ML techniques demonstrated that morphological features including inflammatory, stromal, and tumor cell counts as well as tumor nuclear size and shape were associated with the molecular phenotypes, suggesting the potential for integration into an improved risk stratification system.
Other investigators compared the typical workflow for the diagnosis of prostate cancer (using H&E-stained needle biopsies) with the workflow after introduction of a tool that identifies the need for IHC analysis39. They used an ensemble of CNNs to segment tissue from debris and from foci of interest in the H&E-stained WSI and an ML classifier to classify cases as clearly malignant, clearly benign, or ambiguous. This classifier at the time of H&E staining triggered an automated request for IHC in ambiguous cases without waiting for a pathologist’s manual review. The AI assistance tool attained 99% accuracy and a 0.99 area under the curve (AUC) on the test data; on a validation set, the average agreement with pathologists was 0.81, with a mean AUC of 0.80. This AI tool to automate IHC requests would, therefore, result in a significantly leaner workflow39.
These studies indicate that DL computer vision capabilities for predicting molecular characteristics, e.g., genetic mutations and MSI, from H&E-stained WSIs may streamline pathology workflows for known biomarkers.
AI-based biomarker prediction from H&E-stained WSIs is limited by the following: only molecular biomarkers that have an impact on tissue morphology can be identified; the sensitivity and specificity of AI-based mutation identification is suboptimal; and concordance with validated methodologies is limited (owing to limited tumor tissue availability, poor DNA quality, inaccuracy of laboratory procedures, and lack of personnel experience or other resources). In order for AI-based biomarker prediction from H&E-stained WSIs to be applied in clinical practice, extensive validation in external datasets and within clinical trials is needed.
The use of AI to predict novel prognostic and predictive biomarkers from H&E-stained WSIs
Challenges associated with the complexity and heterogeneity of the immune tumor microenvironment and predictive/prognostic biomarkers may be overcome by computational pathology technologies62,63,64,65. The Multiomics Multicohort Assessment platform analyzed H&E-stained WSIs from patients with early-stage colorectal cancer using large publicly available datasets, such as TCGA, that included digital H&E-stained WSIs annotated with sequencing and clinical data62. The investigators employed CNNs and vision transformers to investigate whether DL analysis of H&E-stained WSIs could predict clinical and molecular profiles of interest. The model accurately predicted clinical outcomes, including overall and disease-free survival, as well as molecular aberrations including copy number alterations, expression levels of key genes in cancer development, MSI, BRAF mutation, and CpG island methylator and consensus molecular subtypes62.
A prognostic model for prostate cancer was developed that incorporated CNN analysis of prostate biopsy H&E-stained WSIs with six clinical variables (combined Gleason score, Gleason primary, Gleason secondary, T-stage, baseline PSA, age) from 5654 patients from the Radiation Therapy Oncology Group prostate cancer studies63. This model was shown to have better prognostic accuracy than the commonly used National Comprehensive Cancer Network (NCCN) risk-stratification tool63. Similar multimodal DL approaches have been used to predict outcomes for patients with gliomas64 and high-grade serous ovarian cancer65. A local-global graph-based distillation (ALL-IN) model combining both local and global histological features using a graph-based neural network improved stratification of patient risk groups, with clinical utility66.
Investigators developed a CNN tumor-infiltrating lymphocyte (TIL) “analyzer” to identify three immune phenotypes (IPs)—inflamed, immune-excluded, and immune-desert—based on the concentrations of TILs in tumor epithelium and tumor stroma on H&E-stained WSIs67. The inflamed IP (high TIL concentration in tumor epithelium) was associated with higher response rates and longer PFS in studies of immune checkpoint inhibitor therapy in patients with NSCLC. The TIL analyzer provided prognostic insight in addition to the PD-L1 TPS in the subset of patients with a TPS of 1%–49%. The 42.5% of patients with an inflamed IP had a 22% response rate compared with a response rate of only 3.9% in patients with immune-excluded or immune-desert IPs67.
These studies demonstrate that DL approaches based on H&E image analysis alone or combined with clinical data hold promise for improving prognostic and predictive biomarkers in precision oncology.
Challenges in the implementation of AI/ML tools in digital pathology
The performance of AI/ML tools can complement that of medical doctors in the interpretation, analysis, and conclusions derived from large-scale datasets. The integration and analysis of large-scale datasets such as genomic, radiomic/radiogenomic, digital pathology, real-world, and EHR datasets requires advanced computational tools and increased power, owing to their complexity and heterogeneity. The TCGA includes more than 10,000 digital pathology images from patients with diverse tumor types, along with associated clinicopathological and genomic data ( The Virchow2G Pathology Dataset includes over 3 million pathology slides from 225,000 patients across 45 countries and was used to train Virchow2G, a large pathology model68.
The Cancer Imaging Archive comprises de-identified medical images of cancer that are associated with patient outcomes, treatment, and genomic data69. These large-scale datasets present challenges related to the management and storage of large volumes of data, increased variety of data sources and formats, assessment of batch effects, high processing power requirements, and tool integration, along with relevant feature selection, which is often hindered by nonlinear associations of different features and inter-tumor and intra-tumor heterogeneity. AI/ML algorithms enable the extraction of clinically relevant features from these datasets, providing useful insights that could not be identified by traditional methods or human intelligence.
Digital pathology, while transformative for the application of precision oncology, poses several challenges. The generation of “big data” requires efficient data management and storage systems, and interoperability issues associated with the lack of compatibility of different digital pathology systems across platforms and institutions limit data sharing and integration. The regulatory and legal framework for the use of digital pathology is evolving, and concerns regarding data privacy and the need for standardization of practices should be addressed. In addition, quality control and methodology validation, along with pathologist training, are critical for the application of digital pathology in clinical practice. The transition from traditional to digital workflows may be challenging, requiring time and adaptation. Finally, the increased costs associated with the integration of digital pathology, including scanning equipment, specialized software, data storage, technology infrastructure, and extensive physician training, may be a significant barrier for smaller institutions.
Multiplex, single-cell, and digital spatial analyses
AI/ML tools have the potential to analyze the emerging complex and highly dimensional measurements of disease, offering deeper understanding of tumor biology, including the interaction of the tumor microenvironment with the tumor. They can help analyze results derived from digital pathology multiplex platforms that measure multiple analytes in a single sample, such as gene expression at the protein (IHC, immunofluorescence, or imaging mass cytometry) and mRNA (bulk or single-cell RNA sequencing) levels. AI/ML tools are increasingly employed for the characterization of individual cells using protein, DNA, RNA, and metabolite analysis to pinpoint single-nucleotide mutations70,71,72,73 and for the investigation of epigenomic phenomena such as DNA methylome74,75,76, ChIP-seq analysis, and chromatin accessibility data77,78.
Applying AI algorithms, various tissue types can be classified based on their spatial characteristics (texture, shape, and color). The spatial distribution of cancer and neighboring cells can be combined with other clinicopathological data to establish prognostic and predictive algorithms. For instance, imaging mass cytometry was applied to evaluate the tumor and immunological landscape of tissue samples from 416 patients with NSCLC and to assess a prognostic model79. Investigators demonstrated that CNN-based spatial analysis of immune lineages and activation status identified five markers (CD14, CD16, CD94, αSMA, and CD117) that correlated with OS79.
In another study, imaging mass cytometry-labeled brain tumor biopsies were used to create high-dimensional maps of the brain tumor microenvironment80. CNN algorithms enabled fully automated high-throughput segmentation and identification of individual cells across diverse tissues. Differences in the tumor immune landscapes between patients with high-grade glioma and brain metastasis were observed. Spatial cellular neighborhoods (CNs) that were associated with OS were identified in patients with glioblastoma. Furthermore, CNs enriched in M1-like monocyte-derived macrophages were associated with improved OS, highlighting the value of spatial cellular relationships and showing the complexity of tumor CNs80. Others used multiplexed ion beam imaging by time-of-flight (MIBI-TOF) with a CNN segmentation tool to evaluate in situ expression of 36 immune-related proteins in patients with triple-negative breast cancer and to define the tumor-immune microenvironment, including identification of CNs81.
Other researchers developed a weakly supervised (e.g., not requiring manual expert annotation) DL framework to identify tumor-immune interrelations and CNs and to predict which patients with low-risk early-stage endometrial cancer have a higher risk of recurrence82. Using multiplexed immunofluorescence of tissue microarrays from tumor samples for the simultaneous visualization and quantification of CD68+ macrophages, CD8 + T cells, FOXP3+ regulatory T cells, PD-L1/PD-1 protein expression, and tumor cells, they trained and validated a multilevel interpretable DL framework (using a CNN for patch feature extraction, a graph neural network to capture CNs and tissue areas, and a multilayer perceptron for recurrence risk classification) to predict the risk of recurrence. This model achieved an AUROC of 0.90, and predictions resulted in concordance for 96.8% of cases. The authors concluded that the model could assess the risk of recurrence in this study population, outperforming current prognostic factors, including molecular subtyping82.
Another promising approach combines AI-driven image analysis of cellular phenotypes with automated single-cell or single-nucleus laser microdissection and ultra-high-sensitivity mass spectrometry. This approach links protein abundance to cellular and subcellular phenotypes while preserving spatial context, offering the potential to elucidate pathways that change in a spatial manner as cancer progresses83.
In addition, RNA sequencing (RNAseq) plays a crucial role in multiplex analyses, providing a comprehensive view of gene expression profiles within the tumor microenvironment. The integration of RNAseq data with AI/ML tools allows for the identification of novel biomarkers and gene signatures that are pivotal for understanding tumor biology and patient outcomes. Investigators have used an autoencoder, an unsupervised DL methodology that utilizes input data to create representative features, to regenerate output data and integrate DNA methylation, RNAseq, and miRNAseq data from patients with colorectal cancer84. This approach enabled the identification of a subgroup of patients with improved OS. Another study highlighted that the clustering algorithms applied to RNAseq data can uncover distinct gene expression patterns that correlate with specific tumor characteristics, thereby facilitating the identification of potential therapeutic targets85. This synergy between RNAseq and AI-driven analyses not only enhances the characterization of tumor-immune interactions but also supports the development of prognostic models that can predict patient responses to therapies. By leveraging the high dimensionality of RNAseq data in conjunction with spatial and multiplex imaging techniques, researchers can gain deeper insights into the complex interplay between tumor cells and their microenvironment, ultimately advancing precision medicine approaches in oncology.
In summary, advanced multiplex imaging technologies coupled with AI analytics enable a deepened understanding of tumor-immune interactions in the tumor microenvironment and may enable the discovery of novel biomarkers and therapeutic targets.
Digital radiology (radiomics)
In the past decade, the field of medical image analysis has grown exponentially, with an increased number of pattern recognition tools and larger data sets. Radiomics refers to the high-throughput mining of quantitative image features from standard-of-care medical imaging that enables data to be extracted and applied within clinical decision support systems to identify complex patterns and trends for improving diagnostic, prognostic, and predictive accuracy86. This approach expands the utility of radiologic data beyond medical images that are simply visual aids for human interpretation82.
Digital medical images are converted into mineable high-dimensional quantitative data in a matrix format where each element, known as a voxel, corresponds to a small section of the body. These voxels contain x-ray attenuation values directly proportional to the density of the material being scanned, with a total range of more than 4096 intensities, while only a small fraction of these intensities can be perceived by humans. The limited discriminatory capacity of the human eye suggests the potential for DL methods87. Quantitative radiomic features, measured or mathematically transformed, representing intensity, geometry, and texture may reflect aspects of the tumor phenotype and microenvironment that can predict clinical outcomes and support clinical decisions.
Image segmentation involves partitioning an image into meaningful regions, which is essential for accurately identifying tumors and organs at risk in radiation oncology. Accurate segmentation is crucial for treatment planning, as it directly impacts the precision of radiation delivery. Traditional manual segmentation is not only time-consuming but also prone to inter-observer variability, which can lead to inconsistent results.
For instance, the BRATS (Brain Tumor Segmentation) challenge, an annual international competition focused on brain tumor segmentation, has been instrumental in driving advancements in this field. This challenge encourages the development of innovative segmentation algorithms and fosters collaboration among researchers, leading to improved methodologies and performance benchmarks. A research group introduces a weakly supervised approach to pan-cancer segmentation, showcasing the potential of AI/ML to tackle complex segmentation tasks, even with limited annotation88. Their method leverages slide-level annotations to train segmentation models, demonstrating that effective tumor segmentation can be achieved without extensive pixel-level labeling, which is often a bottleneck in clinical practice88. Many investigators have reported on segmentation algorithms for various organs, such as the liver89, brain90, pancreas91, and prostate92,93. Guidelines for the development, clinical validation, and reporting of AI models in radiation therapy have been developed by the European Society for Therapeutic Radiation Oncology and the American Association of Physics in Medicine for the standardization of this approach94.
Investigators used statistics and ML (Least Absolute Shrinkage and Selection Operator) to develop a radiomic model to predict TIL density, as determined from AI-powered analysis of H&E-stained WSIs, using the same technology as previously described67, and baseline CT imaging from a training cohort of 220 patients with NSCLC treated with immunotherapy95. The final ML-based TIL-prediction model included only two features, both indicative of intralesional texture heterogeneity, and demonstrated that high predicted TIL density ( ≥ median) was associated with longer PFS compared to low predicted TIL density (median, 4.0 months vs. 2.1 months, p = 0.002) when applied to a 294-patient validation cohort. TIL density was significantly associated with PFS independent of PD-L1 status, and patients with high TIL density and high PD-L1 (TPS ≥ 50%) had the longest PFS compared with patients with low TIL density and/or PD-L1 TPS95.
In addition, radiomics has been used to predict immunotherapy outcomes96. For instance, investigators have evaluated CT imaging data from 54 patients with hepatocellular carcinoma treated with immunotherapy using nine ML and two ensemble learning techniques to construct predictive models97. The models were validated in an external set comprising 29 patients; selected ML models were shown to accurately predict the short-term efficacy of immunotherapy in patients with hepatocellular carcinoma97. Other investigators used radiological images annotated with clinical and outcome data from 2552 patients to develop ML models to predict OS in patients with head and neck cancer and validate the models in three external cohorts comprising 873 patients98. Among 12 different models, one achieved the highest prognostic accuracy using multitask learning on clinical data and tumor volume. However, the results demonstrated significant decreases in model performance, and could not be validated in the external datasets98.
Other investigators constructed and validated a sub-regional radiomics model based on a support vector machine algorithm using 1896 features from each tumor sub-region, (5688 features per sample) from 264 patients with NSCLC99. In the validation set, the model demonstrated improved accuracy in predicting immunotherapy response compared to conventional radiomics, tumor mutational burden (TMB), or PD-L199.
In another study, an ML (random forest) prognostic radiomic model was developed using CT images from patients with advanced melanoma who participated in pembrolizumab multicenter clinical trials100. The model achieved a high AUC for OS estimation in the validation set, suggesting that this tool could be used for clinical decisions100. Based on radiomics features, other investigators used pre-operative CT images from 127 patients with NSCLC from TCGA to construct a TMB prediction model101. Three radiomics features (flatness [shape of original feature], autocorrelation [GLCM], and minimum [first order of wavelet features]) were found to be associated with TMB levels and were significantly different between the high- and low-TMB groups101.
Additional ML radiomics models have been developed to identify patients who may benefit from immunotherapy (e.g., patients with melanoma, NSCLC, or breast cancer)100,101,102. The above examples employed ML techniques to assist in the analysis of non-ML-derived, classical, “handcrafted” (i.e., human-defined) radiomic features. Recent investigations employing CNNs for DL of features may outperform approaches using handcrafted radiomic features103,104. For example, transformers and novel architecture methodologies have shown promising results in improving feature extraction and diagnostic accuracy in medical imaging tasks105,106.
An ML model trained on dual-energy CT radiomics (DECT) was shown to be superior to standard CT imaging and enabled quantification of iodine and fat concentrations in lesions, in addition to visual inspection107. The application of DECT to an ML-based radiomics model significantly improved immunotherapy response prediction for patients with stage IV melanoma compared to standard CT imaging107.
The application of AI/ML algorithms has been shown to improve or surpass the performance of physicians in cancer diagnosis and staging108,109,110. In one study, an AI model trained on 506 CT images exhibited better diagnostic accuracy in distinguishing benign vs. malignant pulmonary nodules compared to different groups of physicians108. In another study, an AI-based model developed and validated on 170,230 mammography images demonstrated higher diagnostic performance in terms of breast cancer detection compared to radiologists109. However, with the addition of AI, the performance of radiologists significantly improved.
In summary, use of AI/ML techniques in radiomics analysis can transform medical imaging data into quantifiable variables that may be used as noninvasive prognostic and predictive biomarkers for response to treatment, overcoming the limitations of tissue-based analysis for clinical decision-making. However, these preliminary data warrant validation in larger patient cohorts.
Challenges regarding the use of AI/ML techniques in radiomics include the following: lack of prospective analyses of imaging data; lack of evaluation of radiomics within prospective clinical trials or standardized and homogeneous frameworks; a limited number of studies with independent validation of the results and their interpretability; and a lack of training and knowledge of physicians on radiomics. Data reproducibility across different datasets is hindered by various methodological approaches, including variability in imaging protocols among different hospitals, heterogeneity in patient populations, preprocessing (image normalization, noise reduction, and image segmentation), feature selection, and model training111. Developing multicenter studies assessing the standardization of protocols and workflows in medical imaging is important to ensure reproducibility and applicability across institutions.
Molecular medicine
The exponential growth of techniques to assess “omics” data, including next-generation sequencing (NGS) techniques, has contributed to the identification of novel prognostic and predictive biomarkers and drug targets. A challenge in genomic analysis using NGS is the annotation of molecular alterations and variant calling, e.g., identifying the differences between the analyte sequence (patient’s sample) and the reference sequence112. This process is prone to errors, ranging from 0.1% to 10%, and has important clinical consequences. Variant callers based on ML models (logistic regression; hidden Markov models; naïve Bayes classifiers), such as the Genome Analysis Toolkit, had less than optimal accuracy, even on short-read sequencing technologies such as Illumina with 75-250 bases, and were poorly generalized to the newer long-read NGS technologies, such as Pacific Biosciences with 15,000 bases and Oxford Nanopore with up to 1 million bases113,114. A major step forward was the implementation of CNNs in variant calling as exemplified by DeepVariant113. This model outperformed all other existing tools, winning the highest performance in an FDA-administered variant calling challenge. Furthermore, this model performs well on both short-read and long-read whole genome and exome sequencing technologies and generalizes even to other mammalian species113.
AI/ML tools have also been used to analyze large-scale epigenomic datasets to identify patterns associated with specific tumor types115, which can serve as biomarkers for early detection75, accurate diagnosis116, and prediction of patient outcomes117. By analyzing large-scale genomic and epigenomic data sets, AI can help discover novel epigenetic drugs, repurpose existing drugs, identify potential candidates that target specific epigenetic modifications118, and develop predictive models with integration of epigenomic, clinical, and patient outcomes data70,71.
AI/ML tools have also been used for the analysis of the output of proteomic measurement techniques. A “sample-to-data” roadmap for integrating AI/ML throughout the proteomic workflow has been suggested73. In another study, an AI algorithm was developed to identify protein interaction networks for individual patients based on their proteomic profiling data119, indicating that interaction networks may be accurately reconstructed, representing an advancement over standard methods119.
Integrative (multimodal) analyses
Most applications of AI/ML in precision oncology represent “narrow” tasks using one data modality such as pathology, radiology, or molecular sequencing data. However, oncologists integrate all relevant available modes of data when evaluating patients. The task of modality conversion is central to advancing AI in network medicine applications120,121,122,123. Modality conversion involves transforming data from one form to another, which is crucial for enabling AI to mimic human-like sensory integration and interpretation. One example in the field of radiation therapy is the use of DL tools for the generation of synthetic CT images from magnetic resonance images to aid in radiation therapy planning124. Transformer-based text, vision, and speech models can facilitate these conversions. Multiomic or panomic technologies using AI/ML/DL tools may improve the discovery of molecular biomarkers125. Emerging AI methodologies can drive the progress in network medicine, ultimately improving patient outcomes and uncovering novel therapeutic targets120,121.
The development of multimodal AI models incorporating all relevant sources of data—eventually including biosensor (devices that continuously detect and measure physiologic or environmental parameters to assess specific biomarkers), social determinants, and environmental data—is becoming potentially feasible126. Investigators developed a multimodal classifier to predict response to PD-L1 blockade in patients with NSCLC127 that included the clinical, pathological, radiomic, and genomic characteristics of 247 patients treated at a single center. Radiomic features were extracted using classical radiomics techniques; PD-L1 tumor cell expression was assessed as the standard TPS; a CNN model was also used to develop an automated PD-L1 classifier on digital PD-L1-stained WSIs; and genomic analysis assessed somatic mutations, copy number alterations, and fusions in 341-468 genes most associated with cancer and TMB. Clinical data included neutrophil-to-lymphocyte ratio, pack-years smoking history, age, albumin, tumor burden, presence of brain and liver metastases, tumor histology, and scanner parameters. An attention-based DL model was developed that could account for non-linear relationships across the input modalities. The model was able to predict objective responses better than any modality separately or linearly combined and led to enhanced separation of Kaplan-Meier survival curves (indicating potential as a useful biomarker for longer-term outcome). Analysis of the model revealed that all data modalities (radiomics, genomics, and pathology) contributed to the prognostic classification success127. Frameworks, such as Prototypical Information Bottlenecking and Disentangling, were used to address redundancy issues in multimodal data, thereby improving cancer survival predictions128. In summary, the application of AI/ML algorithms to integrated medical multimodal data has great promise and will depend on the assembly of large, well-annotated, multi-institutional training datasets127.
Large language models and generative AI
Many useful applications in the field may result from AI advances in NLP, especially with the development of LLMs. The advent of LLMs with a user interface, which enables communication between the AI system and a human using natural language, has facilitated the emergence of “generative” AI, i.e., technology that can generate text, images, or other data (video, sound) based on features learned from input training data129.
After training on big data, LLMs can perform various tasks, including summarization, translation, text completion, and imaginative writing130. LLMs have been leveraged to facilitate decision support for patients with cancer131,132,133,134,135. A panel of clinicians evaluated the responses of Almanac, an LLM augmented with retrieval capabilities from curated medical sources, to clinical questions including medical guidelines and treatment recommendations134. Almanac’s responses to 314 clinical questions were better than the other LLMs (ChatGPT-4, Bing, and Gemini) that were not augmented with medical data134.
The use of Med-PaLM Multimodal, a multimodal generative LLM finetuned on medical data, was associated with high performance across diverse tasks including responses to medical questions, interpretation of mammography and dermatology images, radiology report generation and summarization, and genomic variant calling. The application of this multimodal LLM indicates the potential for the broader use of medical AI systems136.
Other applications of LLMs include mining of EHRs to identify clinically relevant data, such as treatment-related adverse events, and to support insurance reimbursement137. The LLM GatorTron was successful in recognizing adverse events attributed to certain drugs137. If validated, this approach may improve patient care138.
However, the application of LLMs should be interpreted with caution because it is associated with challenges. One example is the poor performance of an LLM chatbot (ChatGPT) in terms of providing treatment recommendations concordant with NCCN guidelines139. High rates of discordant responses and “hallucinations” (e.g., responses not related to any recommended treatment) were identified in 13 (12.5%) of 104 ChatGPT outputs. These “hallucinations” have been previously described as a critical issue with AI chatbots140. Other challenges related to the use of LLMs are accountability, research integrity, and data security. In summary, LLMs cannot be incorporated into clinical practice at this time. Thorough clinical validation using stringent criteria is required from developers to ensure high rates of accuracy in terms of generative AI predictions and responses, and clinicians should be aware of their limitations.
FDA-approved AI/ML-enabled medical devices
As of December 20, 2024, the FDA has approved 1016 AI/ML-enabled medical devices that are authorized for marketing in the United States141. Specific examples where AI has successfully impacted clinical outcomes, underscoring the real-world applicability, are listed in Table 2.
Ethical and regulatory aspects of AI deployment in precision oncology
The rapid evolution of AI in precision oncology necessitates thorough ethical and regulatory considerations related to biases associated with data, model transparency, and accountability.
Data bias
One of the major concerns is data bias, as AI models were often trained with non-representative or biased datasets142. Unintentional existing biases within the healthcare system may contribute to treatment inequities among marginalized racial and ethnic groups if the training data do not adequately represent these populations143. Biases in healthcare research and public health databases may mislead AI outputs, which may negatively affect treatment recommendations and patient outcomes144,145.
Model transparency and trust
The complexity of AI algorithms is often associated with lack of transparency, which may result in healthcare professionals feeling uncertain about the reliability of AI applications146. Clinicians may hesitate to rely on AI recommendations due to the “black box” nature of many models147,148. Explainable AI (XAI) methods are essential for building trust in AI recommendations, helping users to understand the reasoning behind the suggestions, providing transparency, and boosting confidence in the decisions made149,150.
Accuracy and reliability
The development of clinical decision support systems is ongoing, and these systems cannot yet be utilized because of the inaccuracy and unreliability of AI predictions147,148. Rigorous clinical validation, standardization, and real-world testing are essential before deployment. Transparency about model limitations and monitoring of performance post-deployment are critical to maintaining clinical safety.
Accountability
As AI systems become integrated into healthcare, issues regarding accountability and liability that may adversely affect a patient’s health should be addressed. A clear guideline that delineates the responsibilities of AI developers, healthcare providers, and institutions is necessary151. Effective post-market surveillance mechanisms to monitor the performance of AI systems after deployment and ensure that they continue to operate within ethical and clinical standards should be implemented by the regulatory agencies152.
Data privacy and ethical use
AI systems require extensive patient data, raising privacy and ethical concerns regarding consent, ownership, and secondary use147,153. Transparent policies governing how patient data are collected, stored, and shared that align with regulations such as the General Data Protection Regulation (GDPR) and the Health Insurance Portability and Accountability Act (HIPAA) are essential to ensure ethical standards and respect for patient rights147,153. As the field evolves, continuous collaboration among stakeholders, including ethicists, legislators, and medical practitioners, is necessary to advance the ethical and effective integration of AI in healthcare. Harmonization of policy and practice are essential components of the implementation of AI/ML in the clinical workflow.
Data privacy and inter-institutional collaboration in AI-driven oncology
The advancement of AI applications in oncology requires extensive, diverse datasets for model training and validation. However, sharing sensitive patient data across institutions presents significant privacy, regulatory, and ethical challenges. The data that was once a byproduct of clinical research is increasingly becoming a resource154. Data management includes ensuring the safety, accessibility, and accuracy of the data. Guidelines and processes to access and curate data and alignment with regulatory and compliance departments are essential elements of data management155.
Federated learning (FL) has emerged as a transformative solution that enables multi-institutional collaboration without compromising patient privacy. This approach allows AI models to be trained across multiple institutions while keeping patient data securely within their original locations156. In FL, instead of centralizing data, the training algorithm travels to each institution’s secure environment, learns from local data, and only shares model parameters rather than raw patient information156,157.
A critical component of successful multi-institutional collaboration is data harmonization. Modern platforms implement standardized clinical data harmonization pipelines that enable FL including Fast Healthcare Interoperability Resources standards and automated data transformation workflows158. Furthermore, FL architectures are designed to comply with major privacy regulations, including GDPR and HIPAA, ensuring that data remain within institutional boundaries and that there is no direct sharing of protected health information156,157.
This privacy-preserving approach to multi-institutional collaboration represents a paradigm shift in how healthcare data can be utilized for research while maintaining the highest standards of patient privacy and data security. This is especially important for the future of AI in oncology to ensure that training datasets are large, diverse, and inclusive of low-frequency “rare” cancers, thereby ensuring generalizability and clinical utility.
Future directions and emerging trends
Biosensors are devices or platforms that continuously detect and measure physiologic or environmental parameters to assess specific biomarkers associated with diverse diseases, including cancer. They comprise a biological sensing component and a transducer responsible for converting the identified signal into a quantifiable output. The combination of AI/ML with biosensors for the real-time continuous monitoring of physiologic parameters may provide new clinically relevant insights into the early diagnosis, prognosis, and treatment of cancer. AI-based biosensors are being evaluated in diverse tumor types to improve early detection159,160,161,162, diagnosis163,164,165 and treatment outcomes166,167; and they should be further validated in large studies. In addition, several biosensors continuously measure parameters including metabolites of glucose or lactate, electrolytes, skin temperature, and cortisol levels using microneedle patches, smart textiles, wristbands, and/or electronic epidermal tattoos168. Biosensors offer real-time monitoring of various functions/laboratory tests to individuals, who may control them and therefore decrease the risk of cancer-associated factors, including diabetes, hypertension, and lack of exercise.
Simple AI models are commonly more transparent, but less accurate, than complex ones. In contrast, complex models (i.e., CNNs) achieve higher accuracy but often lack interpretability. As mentioned earlier, explainable AI (XAI)150 aims to make AI-based predictions more transparent169,170, interpretable, and trustworthy in cancer care. XAI can reveal potential biases in AI-based predictions, strengthening their credibility. The enhanced transparency of XAI algorithms may facilitate their application in clinical decision-making and real-world clinical scenarios171.
link