DHSC-Capstone/chapter2.qmd

# Literature Review

The application of machine learning in medicine has garnered enormous attention over the past decade [@rabbani2022]. Artificial intelligence (AI) and especially the subdiscipline of machine learning (ML) have become hot topics generating increasing interest among laboratory professionals. AI is a rather broad term and can be defined as the theory and development of computer systems to perform complex tasks typically requiring human intelligence, such as decision-making, visual perception, speech recognition, and translation between languages. ML is the science of programming, allowing computers to learn from data without being explicitly programmed [@debruyne2021]. The ever more extensive use of ML in clinical and basic medical research is reflected in the number of titles and abstracts of papers indexed on PubMed and published until 2006 as compared to 2007--2017, with a nearly 10-fold increase from 1000 to slightly more than 9000 articles in that time frame [@cabitza2018]. A literature review by Rabbani et al. found 39 articles about the field of clinical chemistry in laboratory medicine between 2011 and 2021 \[-@rabbani2022\].

## A Brief Primer on Machine Learning

While this literature review aims not to provide an extensive representation of the mathematics behind ML algorithms, some basic concepts will be introduced to allow a sufficient understanding of the topics discussed in the paper. ML models can be classified into broad categories based on several criteria. These categories include the type of supervision, whether are not the algorithm can learn incrementally from an incoming stream of data (batch and online learning), and how they generalize (instance-based versus model-based learning) [@debruyne2021]. Rabbani et al. further classified the specific clinical chemistry uses into five board categories, predicting laboratory test values, improving laboratory utilization, automating laboratory processes, promoting precision laboratory test interpretation, and improving laboratory medicine information systems \[-@rabbani2022\].

### Supervised vs Unsupervised Learning

Four important categories can be distinguished based on the amount and type of supervision the models receive during training: supervised, unsupervised, semi-supervised, and reinforcement learning. In supervised learning, training data are labeled, and data samples are predicted with knowledge about the desired solutions [@debruyne2021]. They are typically used for classification and regression purposes. Some of the essential supervised algorithms are Linear Regression, Logistic Regression, K-Nearest Neighbors (KNN), Support Vector Machines (SVMs), Decision Trees (DTs), Random Forests (RFs), and supervised neural networks. In unsupervised learning, training data are unlabeled. In other words, observations are classified without prior data sample knowledge [@debruyne2021]. Unsupervised algorithms can be used for clustering (e.g., k-means clustering, density-based spatial clustering of applications with noise, hierarchical cluster analysis), visualization, and dimensionality reduction (e.g., principal component analysis (PCA), kernel PCA, locally linear embedding, t-distributed stochastic neighbor embedding), anomaly detection and novelty detection (e.g., one-class SVM, isolation forest) and association rule learning (e.g. apriori, eclat). However, some models can deal with partially labeled training data (i.e., semi-supervised learning). At last, in reinforcement learning, an agent (i.e., the learning system) learns what actions to take to optimize the outcome of a strategy (i.e., a policy) or to get the maximum cumulative reward \[@debruyne2021\]. This system resembles humans learning to ride a bike and can typically be used in learning games, such as Go, chess, or even poker, or settings where the outcome is continuous rather than dichotomous (i.e., right or wrong)[@debruyne2021]. The proposed study will use supervised learning, as the data is labeled and a particular outcome is expected.

### Machine Learning Workflow

Since this study will focus on supervised learning, the review will focus on that. Machine learning can be broken into three board steps, data cleaning and processing, training and testing the model, and finally, the model is evaluated, deployed, and monitored [@debruyne2021]. In the first phase, data is collected, cleaned, and labeled. Data cleaning or pre-processing is one of the essential steps in designing a reliable model [@debruyne2021]. Some examples of common pre-processing steps are the handling of missing data, detection of outliers, and encoding of categorical data. Data at this stage is also split into training and testing data, typically following somewhere near a 70-30 split. These two data sets are used for different portions of the rest of the model building. The Training set data is used to develop feature sets, train our algorithms, tune hyperparameters, compare models, and all the other activities required to choose a final model (e.g., the model we want to put into production) [@boehmke2020]. Once the final model is chosen, the test set data is used to estimate an unbiased assessment of the model's performance, which we refer to as the generalization error [@boehmke2020]. Most time (as much as 80%) is invested into the data processes stage. After feature engineering, an ML model is trained and tested on the collected data in the second phase. Feature engineering is performed on the training set to select a good set of features to train on. The ML model will only be able to learn efficiently if the training data contains enough relevant features and minimal irrelevant ones \[@géron2019\]. The data is then run through various models, Linear Regression, Logistic Regression, K-Nearest Neighbors (KNN), Support Vector Machines (SVMs), Decision Trees (DTs), and Random Forests (RFs).

Once a model is selected, the third phase begins to evaluate the model's performance. Historically, the performance of statistical models was primarily based on goodness-of-fit tests and the assessment of residuals. Unfortunately, misleading conclusions may follow from predictive models that pass these assessments [@breiman2001]. Today, it has become widely accepted that a more sound approach to assessing model performance is to assess the predictive accuracy via loss functions [@boehmke2020]. *Loss functions* are metrics that compare the predicted values to the actual value (the output of a loss function is often referred to as the error or pseudo residual). When performing resampling methods, we assess the predicted values for a validation set compared to the actual target value. The overall validation error of the model is computed by aggregating the errors across the entire validation data set \[@boehmke2020\]

.<!--# should I talk about Model types ?-->

### Machine Learning in the Clinical Laboratory

<!--# Table needs to be modified -->

| **Author and Year** | **Objective and Machine Learning Task**                                                                         | **Best Model** | **Major Themes** |
|-----------|---------------------------------------|-----------|-----------|
| Azarkhish (2012)    | Predict iron deficiency anemia and serum iron levels from CBC indices                                           | Neural Network | Prediction       |
| Cao (2012)          | Triage manual review for urinalysis samples                                                                     | Tree-based     | Automation       |
| Yang (2013)         | Predict normal reference ranges of ESR for various laboratories based on geographic and other clinical features | Neural Network | Interpretation   |

: Table 1. Summary of characteristics of machine learning algorithms [@rabbani2022].

<!--# Need to fill in this section -->

## Reflex Testing

The laboratory diagnosis of thyroid dysfunction relies on the measurement of circulating concentrations of thyrotropin (TSH), free thyroxine (fT4), and, in some cases, free triiodothyronine (fT3). TSH measurement is generally regarded as the most sensitive initial laboratory test for screening individuals for thyroid hormone abnormalities [@woodmansee2018]. TSH and fT4 have a complex, nonlinear relationship, such that small changes in fT4 result in relatively large changes in TSH [@plebani2020]. Many clinicians and laboratories check TSH alone as the initial test for thyroid problems and then only add a Free T4 measurement if the TSH is abnormal (outside the laboratory normal reference range), this is known as reflex testing [@woodmansee2018]. Reflex testing became possible with the advent of laboratory information systems (LIS) that were sufficiently flexible to permit modification of existing test requests at various stages of the analytical process [@srivastava2010]. Reflex testing is widely used, the major aim being to optimize the use of laboratory tests. However the common practice of reflex testing relies simply on hard coded rules that allow no flexibility. For instance in the case of TSH, free T4 will be added to the patient order whenever the value falls outside of the established laboratory reference range. This bring into the fold the issue that the thresholds used to trigger reflex addition of tests vary widely. In a study by Murphy he found the hypocalcaemic threshold to trigger magnesium measurement varied from 1.50 mmol/L up to 2.20 mmol/L [-@murphy2021]. Even allowing for differences in the nature, size and staffing of hospital laboratories, and populations served, the extent of the observed variation invites scrutiny [@murphy2021].

<!--# insert table and study from strivastava about hypo/hyper thyroid -->

<!--# data from woodmansee and plebani -->