Advancements in artificial intelligence and machine learning in revolutionising biomarker discovery

Abstract The article explores the significance of biomarkers in clinical research and the advantages of utilizing artificial intelligence (AI) and machine learning (ML) in the discovery process. Biomarkers provide a more comprehensive understanding of disease progression and response to therapy compared to traditional indicators. AI and ML offer a new approach to biomarker discovery, leveraging large amounts of data to identify patterns and optimize existing biomarkers. Additionally, the article touches on the emergence of digital biomarkers, which use technology to assess an individual’s physiological and behavioural states, and the importance of properly processing omics and multi-omics data for efficient handling by computer systems. However, the article acknowledges the challenges posed by AI/ML in the identification of biomarkers, including potential biases in the data and the need for diversity in data representation. To address these challenges, the article suggests the importance of regulation and diversity in the development of AI/ML algorithms.


INTRODUCTION
A biomarker is a measurable biological indicator that reflects normal biological processes, disease progression, or the response to therapy.It can be a cell, molecule, protein, or physical sign that is objectively assessed (Biomarkers Definitions Working Group, 2001).Biomarkers offer advantages over traditional indicators of disease in that they can predict not just disease presence, but also its progression and changes in underlying biological processes (Chen, Sun, Shen, 2015) Clinical researchers are continually searching for new biomarkers, and have recently shifted focus to digital, non-traditional markers.Digital biomarkers often combine biological, neurological, socioeconomic, and environmental data to create an intermediary biomarker (Kyriazakos et al., 2021) Biomarker discovery has played a crucial role in advancing clinical research by enabling the design of personalized treatments and immunotherapy.In recent years, breakthroughs in genomic research and immunotherapy resulted in development of approved biomarker-based therapies (Subbiah, 2023).In the year 2021 more than two third drugs were approved based on genetic research (Ochoa et al., 2022).Presently area such as oncology and genetics are highly benefited due to approved biomarker-based therapies and other areas such as cardiology, nephrology and pulmonary.

OUTCOME DRIVEN BIOMARKERS
Outcome-driven biomarkers are biological measurements that can predict clinical outcomes in patients, such as disease progression or treatment response.These biomarkers can be used to guide clinical decision-making, help identify at-risk patients, and facilitate the development of new therapies.Figure 1.Shows classification of outcome driven biomarker

Advancements in artificial intelligence and machine learning in revolutionising biomarker discovery
Page 2/11 Braz.J. Pharm.Sci.2023;59: e23146 Gokuldas (Vedant) Sarvesh Raikar, Amisha Sarvesh Raikar, Sandesh Narayan Somnache Diagnostic biomarkers have strong evidence of improve diagnosis accuracy.For instance, the carcinoembryonic antigen (CEA) blood test is commonly used to evaluate colorectal cancer and to determine response to treatment.Elevated levels of CEA can indicate the presence of cancer in the body (de Jong et al., 2021).Troponin, Procalcitonin, AFT and CEA are the most common examples of diagnostic biomarkers.Troponin can be used as a diagnostic biomarker for acute myocardial infarction (Jaffe et al., 2000).Procalcitonin is a biomarker for diagnosis of sepsis (Wacker et al., 2013).AFP serve as a diagnostic tool for liver cancer (Lok et al., 2010).CEA as a diagnostic marker for colorectal cancer (Saltz et al., 2000).
A prognostic biomarker predicts the overall outcome of a patient's illness, regardless of the therapy or treatment used.It provides insight into the natural course of the disease, allowing researchers to study and understand its underlying biological processes.In therapeutic research, prognostic biomarkers are often used in combination with predictive biomarkers to evaluate the effectiveness of various treatments.Prognostic biomarkers give information about disease outcome, independent of treatment, while predictive biomarkers predict treatment response.B-type natriuretic peptide (BNP) is Predictive biomarkers are a type of biological measure that can be used to anticipate patient outcomes.They assist in the selection of appropriate therapies and estimate the probability of a positive response to the treatment.For instance, mutations in the KRAS gene are often used as a predictive biomarker in colorectal cancer to determine the likelihood of response to certain therapies.Similarly, the presence of the BCR-ABL fusion gene is used as a predictive biomarker for chronic myeloid leukaemia (CML) to guide therapy decisions (de Jong et al., 2021).Squamous differentiation in NSCLC, CFTR Mutations, BRCA1/2 alterations and thiopurine methyltransferase genotype are the most common examples of predictive biomarkers.Squamous differentiation in NSCLC can be used to predict poor response to pemetrexed, compared to other chemotherapies such as docetaxel/cisplatin gemcitabine (Draisma et al., 2003).CFTR mutations helps to predict cystic fibrosis treatment efficacy (Molinski et al., 2018).BRCA1/2 alterations are used for prediction of platinum sensitivity in ovarian cancer (Tung, Garber, 2018).Thiopurine methyltransferase genotype used as a predictive biomarker for the treatment of 6-mercaptopurine or azathioprine, for toxicity risk evaluation (Gauba et al., 2006).Molecular biomarkers play a crucial role in disease diagnosis and prognosis of disease by comparing the expression and concentration of individual molecules in a pathological condition.Popular methods for identifying molecular biomarkers include DESeq2 and edgeR, by analysing level of gene expression and identification of differentially expressed genes (DEGs) (Tang, Yuan, Chen, 2022   Edge/module biomarkers are biomarkers that combines differentially expressed genes into a comprehensive framework.These biomarkers can express complex biological processes and interactions.Thus, they help in early diagnosis and effective treatments of disease (Zeng et al., 2016).a prognostic biomarker for heart failure (Maisel et al., 2002), C-reactive protein (CRP) used as a prognostic biomarker for cardiovascular diseases (Ridker et al., 2002), Lymphocyte-to-monocyte ratio (LMR) serve as a prognostic biomarker for cancer (Zhou et al., 2016), Albumin-bilirubin (ALBI) grade is used as a prognostic biomarker for liver cirrhosis (Wai et al., 2003).

SYSTEMS BIOLOGY-DRIVEN BIOMARKERS
Another approach towards classification of biomarkers is Systems biology-driven biomarkers which are biological characteristics that are identified using systems biology approaches, which aim to understand biological systems as a whole, rather than as individual parts.These biomarkers are identified through the integration of multiple sources of data, including genomics, transcriptomics, proteomics, metabolomics, and other "omics" data, as well as clinical data and environmental factors.Examples of systems biologydriven biomarkers include gene expression signatures, metabolite profiles, and protein networks.Figure 2 classification of system biology-driven biomarkers.
Gokuldas (Vedant) Sarvesh Raikar, Amisha Sarvesh Raikar, Sandesh Narayan Somnache Narrow AI is a weak AI, which is trained to perform specific but limited predetermined parameters.General AI is strong AI, having thinking capability.Super AI can surpass human intelligence with its capacity of thinking, reasoning, solving a puzzle, making judgments, learning, and communicating on its own.
Reactive machines respond to stimuli using predefined rules and lacks the ability to remember past experiences.Limited memory machines are able to learn and improve over time using data, typically by using artificial neural networks or another programming model.Deep learning, which is a subtype of machine learning, falls into this category.Theory of mind is a hypothetical stage of AI that has human-like decision-making abilities and can understand and recall emotions, as well as respond to social situations.Self-aware AI refers to machines that have an awareness of their own existence and have human-like intellectual and emotional capabilities.Theory of mind and self-aware AI are under developmental stage.
Artificial neural networks are the training models used in AI for classification and analysis of data.Some of the most common artificial neural networks are summarised in Table I.

•
Network biomarkers are involved in analysis of molecular and systems-level interactions to understand the mechanisms of health and disease.They are static (SNBs) or dynamic (DNBs) biomarker.The latter is able to track changes during disease progression and provide a comprehensive view on molecular interactions (Sonawane et al., 2019).
Phenotype biomarkers encompass both imaging screening and analysis of individual symptom.
Imaging techniques such as CT, MRI, ECG, and EEG can identify physical anomalies, whereas analysis of individual symptom includes symptoms such as pain, fever, and bleeding.Understanding the complex pathogenic mechanisms behind changes in clinical phenotypes, such as those caused by molecular disorders, is crucial for effective medical treatments and a deeper understanding of the human body (Carr, Kraft, 2018).
Cross-Tiered biomarkers are related to environmental influences that can affect the progression of diseases.
Recent research findings suggest that lifestyle habits and living environments can be key factors in the development of conditions such as cancer (Subbiah, 2023).

ARTIFICIAL INTELLIGENCE
Artificial Intelligence (AI) aims to understand and replicate intelligent behaviour by creating computer programs, seeking to uncover the underlying principles that govern both artificial and biological system.Artificial Intelligence (AI) can be classified into two different categories based on the capability and functionality.The detailed classification is presented as Figure No. 3.
Various techniques are available for analysing data, including machine learning algorithms, deep learning, natural language processing and data mining techniques.AI is also enhancing medical diagnosis, reducing diagnosis time and improving patient care [Lin et al., 2019)].
Natural language processing (NLP) constitutes a sub-domain of artificial intelligence (AI) focusing on the dynamic relationship and mutual communication sustained between computers and human language (Holmes et al., 2021).it is potentially valuable for the identification and extraction of pertinent information from diverse scientific literature, including research papers, clinical trials, and medical publications.Through the analysis of vast quantities of unstructured textual information, natural language processing methodologies may uncover associations among genetic material, proteins, and additional biological components, thus aiding in the detection of potential biomarkers.
In the study conducted in 2021 (Silverman et al., 2021), highlighted the use of natural language processing (NLP) techniques for the discovery of biomarkers in the context of COVID-19 Specifically, the study compared two methods for extracting symptoms from Emergency Department (ED) admission notes, which are typically unstructured and therefore not readily available for clinical decision making.This study showcased the power of NLP techniques in biomarker discovery and has implications for the development of symptomatologybased models for other diseases.
The utilization of Artificial Neural Networks (ANNs) facilitates the analysis of extensive genetic and clinical datasets, enabling the identification of interrelationships and patterns amid various variables.ANN are capable of analysing gene expression data obtained from cancer patients, with the aim of identifying new and distinctive biomarkers that are linked to specific subtypes of cancer.The identification of biomarkers through the use of artificial neural networks (ANNs) has the potential to enhance the accuracy and individualization of cancer diagnosis and treatment by furnishing precise and tailored information regarding the patient's medical status.The application of artificial neural networks (ANNs) in the sphere of biomarker discovery pertains to the discernment of biomarkers associated with prostate cancer.The application of an artificial neural network (ANN) by researchers in the examination of gene-expression data pertaining to individuals diagnosed with prostate cancer yielded a cohort of genes that demonstrated a noteworthy correlation with the progression of prostate cancer (Takeuchi et al., 2019).Subsequently, this particular collection of genes was subjected to validation in a distinct cohort of patients, thus indicating the promise of artificial neural networks (ANNs) in the identification of biomarkers.AI and machine learning have enormous potential to transform healthcare, providing faster discovery of biomarkers, improved understanding of diseases and treatments, and enabling personalized medicine.AI algorithms can analyse diverse data sources, uncover correlations, and lead to more effective treatments.

Recurrent neural networks
These networks use time-series data or sequences and have the ability to "remember" previous events in the sequence.

Long/Short-term Memory
This is an advanced form of RNN that can remember events from several layers ago through memory cells.

Convolutional neural networks
These networks are commonly used in image recognition and use several layers to filter different parts of an image.

Generative Adversarial Networks
These networks involve two neural networks competing with each other to improve accuracy, where one network creates examples and the other network tries to prove them true or false.

MACHINE LEARNING MODEL
A machine learning model simulates a real-world phenomenon by utilizing training data to generate an artifact, commonly referred to as the model.This involves instructing the algorithm using a training dataset and testing the model's accuracy using a separate dataset, known as the testing data.Overfitting occurs when the model is excessively complex, while underfitting transpires when the model is inadequate in capturing patterns in the training data.Bias and variance are the two errors in machine learning models.Bias arises from incorrect assumptions of the relationship between the features and the target variable, leading to an underfit model.Variance results from the model's sensitivity to slight fluctuations in the training data, leading to overfitting.A successful machine learning model balances the trade-off between bias and variance.Supervised, unsupervised, semi-supervised, and reinforcement learning are common machine learning models for training AI.Supervised learning maps inputs to outputs using labelled training data.Unsupervised learning categorizes unlabelled data based on attributes.Semi-supervised learning utilizes a combination of labelled and unlabelled data, while reinforcement learning rewards good performance and punishes poor performance.Machine learning (ML) algorithms can analyse large and complex datasets to identify patterns and relationships between different variables, such as genomic or proteomic data, and to identify potential biomarkers.Deep learning refers to a subclass of machine learning techniques which rely on artificial neural networks for the purpose of modeling and analysing intricate data.The utilization of high-dimensional biological data, such as genomic and proteomic data, has displayed considerable potential in the identification of biomarkers through comprehensive analysis.Several deep learning techniques have been utilized in the pursuit of biomarker discovery, including: Convolutional Neural Networks (CNNs) have demonstrated exceptional proficiency in the field of image analysis.As a type of neural network, CNNs have exhibited a heightened capability for detecting and extracting meaningful features within images.
Medical imaging technologies such as MRI and CT scans have been leveraged in biomarker discovery, wherein identifying particular features linked to specific ailments is the crucial objective.
Recurrent Neural Networks (RNNs) are a class of neural networks that exhibit notable proficiency in analysing sequential data including time-series data.Biomarker discovery has utilized them to scrutinize gene expression data and discern sequential patterns that are linked to distinct maladies.Autoencoders are a type of artificial neural network utilized for unsupervised learning (Saha et al., 2019).These tools have been employed to detect potential patterns in biological data that may indicate the presence of particular diseases.
These profound learning models possess the capability to construct predictive models that can be utilized for the purposes of identifying and tracking the progression of various illnesses.

Supervised learning model
Supervised learning involves training a model on labelled data, where the target variables are already known.The objective of the training process is to make predictions that accurately reflect the patterns observed in new, previously unseen data, based on the information learned from the labelled training data.The model's accuracy is evaluated based on its performance on the test data, and it is refined iteratively until the desired level of accuracy is achieved.The supervised learning technique is frequently employed for diverse applications, including object recognition in images, voice recognition technology, and language processing for computers (Milali et al., 2020).It also has the capability to undergo training on datasets comprising of patients' health status and diverse biomarkers to anticipate the status of any disease or response to treatment based on the aforementioned biomarker data.An instance of supervised machine learning could involve the development of a model capable of predicting the probability of a patient developing Alzheimer's disease through analysis of a range of biological markers, including genetic, imaging, and cognitive biomarkers.The model will be trained using a dataset containing patient records with established can greatly impact the accuracy of the resulting model.Figure 5 shows classification of Supervised along with unsupervised learning.
The supervised learning process typically involves several steps that can vary depending on the problem and data being used.Factors such as data quality, the choice of algorithm, and the adjustment of hyperparameters disease status and their corresponding biomarker data (Fan et al., 2018).This training process will facilitate the identification and analysis of discernible patterns and interrelationships that exist between various biomarkers and disease outcomes.Steps involved in supervised learning is summarised in Figure 4.

REGRESSION
The aim of regression is to establish a connection between the independent variables and the dependent variable, and use this connection to generate predictions for unseen data.The regression algorithm attempts to fit a line or curve that best represents the relationship between the variables.The quality of the model's predictions is typically measured using metrics such as mean absolute error, mean squared error, and R-squared (Rong, Bao-wen, 2018).It applies statistical techniques to establish a line or curve that optimally characterizes the association between two or more variables, thereby enabling researchers to make predictions regarding the values of a single variable given the underlying values of the others.In the field of biomedicine, regression models may be employed by researchers to ascertain a cohort of biomarkers that exhibit a robust correlation with the progression of a specific disease within a given patient demographic.Through the analysis of the correlation between the identified biomarkers and the clinical outcome, an effective regression model can be employed in order to ascertain the biomarkers exhibiting the highest predictive value with regards to disease progression.By utilizing these identified biomarkers, a biomarker panel may thus be developed which is capable of serving both diagnostic and prognostic objectives (Que et al., 2019).

CLASSIFICATION ALGORITHM
Classification Algorithm is a supervised learning technique used to categorize new data based on prelabelled training data.It maps input to a categorical output variable with pre-defined classes, such as binary (yes/no) or multi-class (more than 2 outcomes).There are two types of learning algorithms: Lazy (stores data before testing) and Eager (develops model before testing).Examples of classification algorithms include Decision Tree, Random Forest, KNN SVM, Naive Bayes and Logistic Regression (Chowdhury, Schoen, 2020).

UNSUPERVISED LEARNING
Machine learning that operates without labelled data is called unsupervised learning.The algorithms used in unsupervised learning work by finding patterns or structures within an unlabelled dataset.The objective of unsupervised learning is to identify relationships and connections within the data, such as grouping similar data points together into clusters.Examples of tasks accomplished through unsupervised learning include reducing the number of features in a dataset, identifying unusual or abnormal instances, and presenting data in a visual format (Amruthnath, Gupta, 2018).Popular algorithms in unsupervised learning include clustering techniques such as the K-means method, hierarchical clustering, and density-based clustering methods.These algorithms can be used to gain insights into the underlying structure of the data and make predictions based on that structure.
Previously unknown relationships between biomarkers, or to cluster patients based on their biomarker profiles can be done using unsupervised learning algorithms.Dimensionality reduction methodologies like principal component analysis, are employed to effectively decrease the number of features or variables present in a dataset, while ensuring the retention of crucial information.The utilization of aforementioned technique carries significant value in the domain of biomarker discovery as it assists researchers in pinpointing the most pertinent biomarkers for a specific affliction or clinical outcome, hence minimizing the prospect of overfitting or extraneous data in the analysis.
Clustering is a method of unsupervised learning that organizes data into groups based on similarities, Gokuldas (Vedant) Sarvesh Raikar, Amisha Sarvesh Raikar, Sandesh Narayan Somnache revealing patterns and structure within the data.The goal is to partition data into clusters of similar objects and is used for tasks like customer segmentation, image segmentation, and market segmentation.Some well-known clustering algorithms includes DBSCAN, Hierarchical Clustering, Gaussian Mixture Model, Affinity Propagation, Mean Shift, Spectral Clustering, Fuzzy C-Means and K-Means.This technique can be used to identify subgroups of patients with distinct biomarker profiles, which may have different disease outcomes or responses to treatment.
Association Rule Learning is an unsupervised machine learning approach to identify correlations and frequent combinations of items in large datasets.Association rule mining is a viable technique that can be employed in tandem with other methods for the purpose of biomarker discovery.It is a widely-used computational method that aims to ascertain recurring patterns or rules within abundant datasets.an approach that could be utilized is association rule mining for the identification of biomarker sets exhibiting frequent co-expression or co-regulation.The occurrence of such co-expression or co-regulation may suggest their involvement in a common biological pathway or process.

SEMI-SUPERVISED LEARNING
Semi-supervised machine learning merges aspects of both supervised and unsupervised techniques.Instead of relying solely on labelled or unlabelled data, it utilises a modest quantity of labelled data alongside of unlabelled information.This approach offers the advantages of both unsupervised and supervised learning while mitigating the difficulties of obtaining a large amount of labelled data (van Engelen, Hoos, 2020).This method enables the training of a model with fewer labelled examples, thus making the process more efficient.
Semi-supervised learning leverages the benefits of both supervised and unsupervised learning, utilizing both labelled and unlabelled data.This approach requires less labelled data compared to purely supervised learning.Compared to supervised learning and makes use of a large quantity of unlabelled data.To do this, the process of pseudo labelling is utilized.The steps in the process are given in the Figure 6.

AI AND ML ENABLED BIOMARKERS
The digital revolution in healthcare is transforming medical research, diagnostics, and therapeutics with the widespread adoption of digital health technologies (DHTs), such as sensors and wearable devices with AI/ ML capabilities.One popular DHT is wearable sensors that which provide continuous real-time monitoring and health data reporting.These sensors can track various health metrics, such as blood pressure, body temperature,

REINFORCEMENT LEARNING
Reinforcement learning (RL) is a method of artificial intelligence which focuses on teaching a machine to make decisions by rewarding and punishing its actions.In RL, an agent interacts with its environment and learns to choose the best course of action to maximize its reward.This process is similar to how a child learns through exploration and trial-and-error.The key difference between RL and other forms of machine learning is that RL does not rely on a supervisor to provide feedback, but instead learns through its own interactions with the environment.As a result, RL is capable of discovering optimal behaviour in an unseen environment and has the potential to be highly effective in a variety of real-world applications (Hammoudeh, 2018).
Reinforcement Learning (RL) is a learning paradigm where an agent takes actions to attain a goal, and the environment provides rewards or penalties to guide the agent's decision-making process.The agent's behaviour is guided by a policy and the goal is to maximize cumulative rewards.In Reinforcement Learning, the value function is a measure of the worth of a state, and the goal is to determine the best policy that results in the highest average value of the states.RL is based on Markov Decision Processes (MDP), a mathematical model for decisionmaking in uncertain environments.Reinforcement Learning algorithms can be separated into two types: model-free and model-based.Model-free algorithms do not build a precise representation of the environment and operate through trial-and-error.Classes of Reinforcement learning Algorithms is given in Figure 7.
Gokuldas (Vedant) Sarvesh Raikar, Amisha Sarvesh Raikar, Sandesh Narayan Somnache Digital biomarkers can be used to evaluate the effectiveness and safety of interventions.Digital biomarkers are either passive or active.Passive digital biomarkers, such as heart rate or oxygen saturation, are collected without patient effort.Active digital biomarkers, like tapping a button to record bladder leakage or using a smartphone camera to capture eye movements, are collected through intentional patient interaction with a device.Various biomarkers and their biosensing mechanisms are summarised in Table III.Flexible electronic materials are key in integrating electronic circuits into AI-biosensors.These materials serve as a structural foundation for a range of device forms such as films, textiles, bandages, patches, and tattoos that are flexible.Polyimide (PI) and polyethylene terephthalate (PET) are favoured for their exceptional characteristics, comfort, high oxygen permeability, and suitability for rollto-roll fabrication.Carbon materials are also becoming a viable option due to their light weight and high electrical conductivity.Additionally, intelligent hydrogels that alter shape and behaviours based on environmental conditions like pH, ions, temperature, and molecules are being studied.These hydrogels can even self-assemble or disassemble when interacting with the target analyte (Attia et al., 2021).

Smartphone-based platforms
The integration of multiple sensors and functions into smartphone-based AI biosensors has made them powerful tools for data processing, storage, sharing, and interaction with the cloud.Add-on bio-sensing modules like electrochemistry, fluorescence, and plasmon resonance along with additional hardware components enhance their functionality.Machine learning and AI enable precise detection and analysis of complex biological processes for faster and more accurate diagnoses and treatment.Past studies on smartphonebased AI biosensors are summarised in Table IV AI-Biosensors AI-biosensors are a combination of artificial intelligence and biosensors that have the potential to revolutionize various industries.An AI-biosensor comprises three fundamental components: data acquisition, signal transformation, and AI-based data analysis.The data collection involves using a various biosensor to monitor different types of information, which is then converted into an electrical output signal by the signal conversion system.The AI-data processing layer processes the output signal to make predictions and decisions based on the data collected by the biosensors.With their ability to provide accurate and precise measurements, AI-biosensors have applications in medical diagnostics, environmental monitoring, and more (Jin et al., 2020).

Wireless communication
AI-biosensors are optimally connected via wireless communication to transfer information between the biosensors and smartphone platforms or other intelligent devices.Popular wireless technologies used in AI-biosensor networks (AIBN) include Bluetooth, radio-frequency identification (RFID), near-field communication (NFC), Wi-Fi, and ZigBee (Galderisi et al., 2019).

Omics data studies
Omics data study involves analysing large datasets that capture information about various biological molecules, such as genes, proteins, and metabolites.This approach can provide a comprehensive view of biological systems and help researchers understand how different molecules interact and contribute to cellular processes and disease.To analyse omics data, researchers typically use computational and statistical methods to identify patterns and relationships among different molecules and to gain insights into their functions and roles in biological systems.Figure 9. Gives an Overview of different Omics data studies

BIOMARKER IDENTIFICATION USING MACHINE LEARNING AND DEEP LEARNING
The application of Machine Learning (ML) and Deep Learning (DL) techniques is gaining increasing traction in the identification of biomarkers.Supervised and unsupervised learning algorithms are used in detecting biomarkers from diverse biological data sources.The former includes methodologies like Random Forest, SVM, Logistic Regression and Deep Learning techniques such as CNNs and RNNs (Lötsch et al., 2018)    Gokuldas (Vedant) Sarvesh Raikar, Amisha Sarvesh Raikar, Sandesh Narayan Somnache carbohydrates, proteinogenic amino acids, sugars, and lipid acids.Metabolomics studies can focus on different levels of metabolites and deviations or imbalances in their relative levels can indicate disease when fall outside of normal ranges (Smirnov et al., 2016).

Multi -omics data type
In order to fully comprehend the biological processes behind diverse phenotypes, a multi-omics approach is often required.Integrating multiple omics data can be a difficult task due to its large size, measurement variations, and data analysis complexity.There are various approaches to integrating omics data that have been proposed in the literature, including matrix factorization algorithms, pathway-based data integration, supervised learning methods, correlation analysis, computational frameworks, and Random Forests.Integrating other unstructured clinical phenotypic datasets, such as medical images, electronic health records, and questionnaires, also presents new challenges and requires standardization.Strategies to incorporate these data include converting them into numerical features, such as word vectors using word2vec models.The past studies carried out are summarised in Table V.
Genomics: Genomics is the study of all genes within an organism, including their interactions and effects.It enables the identification of genetic variants that are linked to health, disease, and therapeutic response.Genomics provides an extensive insight into the genetic makeup of an organism and how it influences its health and well-being.The technology used for this purpose includes high-throughput sequencing, genotyping, and gene expression analysis (van Dijk et al., 2014).Genomic studies have changed the way we view complex diseases by identifying specific biomarkers.
Proteomics: The proteome refers to the full collection of proteins produced by a cell, tissue, or organism during a specific period.Proteomics is a technology used to investigate the expression levels, structure, and functions of proteins and to study the dynamics of their changes over time.Proteomics can be used to identify protein expression patterns in response to external stimuli, to investigate the effects of disease on the proteins expressed in a tissue, and to comprehend the complex interaction networks of proteins at the cellular and tissue level (Al-Amrani et al., 2021).
Transcriptomics: Transcriptomes are collections of all transcripts produced by a particular cell or tissue, including mRNA, miRNA, lncRNA, and other noncoding RNAs.RNA-seq is a powerful technique used to study transcriptomes, producing vast amounts of data that can be used to uncover new insights about gene expression, gene regulation, and other biological processes.Transcriptomes can provide valuable insights into cellular processes, gene expression patterns, and the underlying mechanisms of disease (Lowe et al., 2017).

Metabolome:
The metabolome consists of all the metabolites, which are small-molecule groups like

BIOMARKER IDENTIFICATION PROGNOSTICS, DIAGNOSTICS, PREDICTIVE APPROACH USING ML
Various machine learning algorithms, namely Random Forest, Support Vector Machines, K-Nearest Neighbours, Artificial Neural Networks, and Logistic Regression, have been utilized for the purpose of identifying diagnostic, prognostic, and predictive biomarkers.Transcriptomic, proteomic, metabolomic, and epigenetic data sets derived from diverse biological samples have been employed as dynamic avenues to The Cancer Genome Atlas (TCGA) (Li et al., 2022) Gokuldas (Vedant) Sarvesh Raikar, Amisha Sarvesh Raikar, Sandesh Narayan Somnache

K-Nearest Neighbors
Metabolomics data from urine samples of patients with a particular disease and healthy controls.
K-Nearest Neighbours was used to predict disease status based on the levels of metabolites.Several metabolites were found to be associated with disease status, and these metabolites were used as potential metabolomic biomarkers for disease diagnosis (Gowda et al., 2008) Diagnostic biomarkers

Support Vector Machines
Proteomics data from serum samples of patients with a particular disease and healthy controls.
Support Vector Machines was used to predict disease status depending on the number of proteins.Several proteins were found to be associated with disease status, and these proteins were used as potential proteomic biomarkers chemotherapy resistance prediction in small cell lung cancer.(Han et al., 2012) of machine learning algorithms in the discovery of biomarkers has yielded substantial enhancements in patient outcomes, with forecasts indicating its potential for transformative impact on the domain of healthcare.Studies conducted on Biomarker identification using ML and DL from multi-omics data are given in Table VI uncover biomarkers potentially linked to assorted pathological manifestations such as disease onset, evolution, and therapeutic reactions.The biomarkers that have been identified possess the potential to facilitate the timely diagnosis of diseases, prognosticate the course of disease progression, and determine the most efficacious treatment alternatives.The utilization

STUDIES CONDUCTED ON BIOMARKER IDENTIFICATION USING ML AND DL FROM OMICS DATA
Studies conducted on Biomarker Identification using Machine learning and Deep learning from Omics data is listed in Table VII.The biomarkers were identified using different machine learning and statistical algorithms applied to omics data such as miRNA expression levels, DNA methylation patterns, gene expression profiling, and circulating tumour DNA (ctDNA).These biomarkers were found to have potential for diagnosing, prognosticating, and predicting disease outcomes in several types of cancer, including liver, lung, colon, breast, ovarian, and non-small cell lung cancer.Additionally, protein biomarkers were identified for Alzheimer's disease.The study highlights the importance of omics data in identifying novel biomarkers for different diseases and the potential of machine learning and statistical algorithms to analyze large-scale omics data for biomarker discovery.

GSEA:
A gene set enrichment analysis tool that assesses whether a specified group of genes exhibits statistically significant similarities or differences between two biological conditions (such as phenotypes).This tool helps identify pathways and biological processes that are linked to a particular phenotype of interest (Subramanian et al., 2005) Ingenuity Pathway Analysis (IPA): A web-based software platform that provides knowledge-based analysis of complex biological data, including genomic, proteomic, and metabolomic data, to help researchers understand the biological significance of their data.It offers a suite of tools for pathway analysis, gene network analysis, and more (Shao et al., 2020) Pathway Studio: A proprietary software platform that provides knowledge-based analysis of biological data to help researchers understand the biological significance of their data.It offers a suite of tools for pathway analysis, gene network analysis, and more (Nikitin et al., 2003) MetaboAnalyst: A comprehensive web-based platform for metabolomics data analysis and interpretation.It provides a wide range of functions including quality control, normalization, statistical analysis, pathway analysis, and more (Chen et al., 2022) ArrayExpress is a publicly accessible database that stores microarray gene expression data that provides access to raw and normalized data, and the ability to perform various types of analyses, including differential gene expression analysis and functional annotation (Rustici et al., 2012) MassIVE: Mass Spectrometry Interactive Virtual Environment is a free access database repository for mass spectrometry proteomics data.It provides a platform for researchers to upload, store, and share large-scale mass spectrometry data sets with the scientific community.The repository is searchable and enables users to perform data analysis, visualization, and data integration.MassIVE provides access to raw and processed data, as well as to the analysis pipelines and results (Haider, Pal, 2013).

Trans-Proteomic Pipeline (TPP):
It is a software suite for the analysis of mass spectrometry-based proteomics data.It provides a complete set of tools for data processing, database search, peptide and protein identification, quantification, and functional annotation.The TPP is designed to be highly flexible and customizable, allowing researchers to choose the most appropriate algorithms and parameters for their data (Käll et al., 2007) Gene Expression Omnibus (GEO): is a publicly accessible resource that offers access to microarray and RNA-sequencing data generated by researchers worldwide.GEO provides a centralized resource for the storage and analysis of large-scale gene expression data sets, enabling researchers to compare and integrate data from different studies (Edgar, Domrachev, Lash, 2002) The Cancer Genome Atlas (TCGA): The Cancer Genome Atlas (TCGA) is a comprehensive multi-omics data repository focused on understanding the molecular basis of cancer.TCGA provides access to large-scale genomic data, including DNA sequencing, gene expression, epigenetic modifications, and microRNA expression data, for multiple cancer types (Tomczak, Czerwińska, Wiznerowicz, 2015).
Other tools that can be also utilised are Illumina Platinum Genomes, Proteomics DB, Metabolomics Workbench, Human Protein Atlas, The Immune Epitope Database (IEDB), Bio Mart, STRING, Qiime2 etc.

CONCLUSION
In conclusion, biomarkers play a crucial role in advancing clinical research and enabling the design of personalized treatments.The integration of Artificial Intelligence, Machine Learning, and Digital Biomarkers has the potential to revolutionize the field of clinical research by providing optimized disease diagnosis and treatment strategies.AI and ML algorithms have the ability to identify patterns in large data sets, discover novel biomarkers, and improve accuracy in existing biomarkers.Digital biomarkers, leveraging the widespread adoption of digital health technologies, offer objective and quantifiable data for healthcare assessment.However, challenges in AI/ML algorithm development, such as bias due to incomplete data and the need for regulation, highlight the importance of diversity in data representation.The multidisciplinary approach to healthcare, involving collaboration between specialists, is necessary to address chronic diseases and improve patient care.With the advancements in bioinformatics, biostatistics, and multiple "omics" methods, the future of clinical research looks promising, with a potential for more targeted treatments for various conditions.

ACKNOWLEDGMENT
The authors of this comprehensive review humbly extend their deepest gratitude to the individuals that have played a seminal role in its realization.Our profound thanks are extended to the esteemed experts in the field who were so generous with their time, knowledge, and expertise, and who provided immeasurable insights and counsel.Additionally, the authors are beholden to the publisher for their support and for providing a platform that has enabled our work to reach a more extensive audience.This review serves as an emblem of the dedication and perseverance of all those who have been instrumental in guiding us on this intellectually stimulating journey.

FIGURE 1 -
FIGURE 1 -Identifying outcome-driven biomarkers for early disease detection and therapeutic interventions.

FIGURE 2 -
FIGURE 2 -Harnessing the power of systems biology for the discovery of novel biomarkers.

FIGURE 3 -
FIGURE 3 -AI classification based on capability and functionality.

FIGURE 4 -
FIGURE 4 -A roadmap for supervised learning: the key stages in developing a predictive model.

FIGURE 5 -
FIGURE 5 -A Guide to Supervised and Unsupervised Learning: Choosing the Right Approach for Your Data.
These algorithms are generally subjected to training on a dataset that encompasses both affirmative and negative examples.The positive examples refer to individuals who exhibit the targeted condition or clinical outcome, while negative examples pertain to individuals who do not manifest the aforementioned condition or clinical outcome.The algorithm undergoes a process of learning to discern patterns present in the biomarker data and their association with positive examples.These patterns are subsequently utilized to predict the likelihood of disease manifestation or clinical outcome in unrelated patients.

FIGURE 6 -
FIGURE 6 -The Power of Semi-Supervised Learning: Using Unlabelled Data to Enhance Model Accuracy.

FIGURE 7 -
FIGURE 7 -Classifying Reinforcement Learning Algorithms: From Model-Based to Model-Free Approaches.

FIGURE 8 -
FIGURE 8 -Applying Machine Learning and Deep Learning to Biomarker Discovery: A Step-by-Step Framework.

FIGURE 9 -
FIGURE 9 -A Comprehensive Overview of Omics Data Studies: From Genomics to Proteomics and Beyond.

TABLE II -
Categories of Digital Biomarkers

TABLE III -
Biomarkers and their biosensing mechanisms . The latter includes PCA, ICA, and Clustering algorithms for innovative biomarker identification.Various techniques for identification of biomarkers using Machine Learning and Deep Learning are summarised in Figure No. 8

TABLE IV -
Smartphone-based AI biosensors

TABLE V -
Past research for Biomarker identification using Machine learning and Deep learning

TABLE VI -
Studies conducted on Biomarker identification using ML and DL from multi-omics data

TABLE VI -
Studies conducted on Biomarker identification using ML and DL from multi-omics data

TABLE VI -
Studies conducted on Biomarker identification using ML and DL from multi-omics data Various tools and databases are available to aid in the identification and analysis of biomarkers.Biomarker Base, GSEA, IPA, Pathway Studio, MetaboAnalyst, ArrayExpress, MassIVE, TPP, GEO, and TCGA are just a few examples of these tools and databases.These resources provide researchers with access to largescale multi-omics data, as well as a range of analysis and visualization tools for genomic, transcriptomic, proteomic, and metabolomic data.There are many other

TABLE VII -
Studies conducted on Biomarker identification using ML and DL from omics data