159 lines
9.6 KiB
Text
159 lines
9.6 KiB
Text
# Literature Review
|
||
|
||
The application of machine learning in medicine has garnered enormous
|
||
attention over the past decade [@rabbani2022]. Artificial intelligence
|
||
(AI) and especially the subdiscipline of machine learning (ML) have
|
||
become hot topics that are generating increasing interest among
|
||
laboratory professionals. AI is a rather broad term and can be defined
|
||
as the theory and development of computer systems to perform complex
|
||
tasks normally requiring human intelligence, such as decision-making,
|
||
visual perception, speech recognition, and translation between
|
||
languages. ML is the science of programming, which gives computers the
|
||
ability to learn from data without being explicitly programmed
|
||
[@debruyne2021]. The ever wider use of ML in clinical and basic medical
|
||
research is reflected in the number of titles and abstracts of papers
|
||
indexed on PubMed and published until 2006 as compared to 2007--2017,
|
||
with a nearly 10-fold increase from 1000 to slightly more than 9000
|
||
articles in the that time frame [@cabitza2018]. A literature review by
|
||
Rabbani et al. found 39 articles pertaining to the field of clinical
|
||
chemistry in laboratory medicine between 2011 and 2021 [-@rabbani2022].
|
||
|
||
## A Brief Primer on Machine Learning
|
||
|
||
While the aim of this literature review is not to provide an extensive
|
||
representation of the mathematics behind ML algorithms, some basic
|
||
concepts will be introduced to allow a sufficient understanding of the
|
||
topics discussed in the paper. ML models can be classified into broad
|
||
categories based on several criteria, such as the type of supervision,
|
||
whether are not the algorithm can learn incrementally from an incoming
|
||
stream of data (batch and online learning), and how they generalize
|
||
(instance-based versus model-based learning) [@debruyne2021]. Rabbani et
|
||
al. further classified the specfic clinical chemistry uses into five
|
||
board categories, predicting laboratory test values, improving
|
||
laboratory utilization, automating laboratory processes, promoting
|
||
precision laboratory test interpretation, and improving laboratory
|
||
medicine information systems [-@rabbani2022].
|
||
|
||
### Supervised vs Unsupervised Learning
|
||
|
||
Four important categories can be distinguished based on the amount and
|
||
type of supervision the models receive during training: supervised,
|
||
unsupervised, semi-supervised, and reinforcement learning. In supervised
|
||
learning, training data are labeled and data samples are predicted with
|
||
knowledge about the desired solutions [@debruyne2021]. They are
|
||
typically used for classification and regression purposes. Some of the
|
||
most important supervised algorithms are Linear Regression, Logistic
|
||
Regression, K-Nearest Neighbors (KNN), Support Vector Machines (SVMs),
|
||
Decision Trees (DTs), Random Forests (RFs), and supervised neural
|
||
networks. In unsupervised learning, training data are unlabeled. In
|
||
other words, observations are classified without any prior data sample
|
||
knowledge [@debruyne2021]. Unsupervised algorithms can be used for
|
||
clustering (e.g. k-means clustering, density-based spatial clustering of
|
||
applications with noise, hierarchical cluster analysis), visualization
|
||
and dimensionality reduction (e.g. principal component analysis (PCA),
|
||
kernel PCA, locally linear embedding, t-distributed stochastic neighbor
|
||
embedding), anomaly detection and novelty detection (e.g. one-class SVM,
|
||
isolation forest) and association rule learning (e.g. apriori, eclat).
|
||
However, some models can deal with partially labeled training data (i.e.
|
||
semi-supervised learning). At last, in reinforcement learning, an agent
|
||
(i.e. the learning system) learns what actions to take to optimize the
|
||
outcome of a strategy (i.e. a policy) or to get the maximum cumulative
|
||
reward [@debruyne2021]. This system resembles humans learning to ride a
|
||
bike and can typically be used in learning games, such as Go, chess, or
|
||
even poker, or settings where the outcome is continuous rather than
|
||
dichotomous (i.e. right or wrong)[@debruyne2021]. The proposed study
|
||
will use supervised learning, as the data is labeled and an particular
|
||
outcome is expected.
|
||
|
||
### Machine Learning Workflow
|
||
|
||
Since this study will focus of supervised learning the review will focus
|
||
on that. Machine learning can be broken into three board steps, data
|
||
cleaning and processing, training and testing the model, finally the
|
||
model is evaluated, deployed, and monitored [@debruyne2021]. In the
|
||
first phase data is collected, cleaned, and labeled. Data cleaning or
|
||
pre-processing is one of the most important steps in designing a
|
||
reliable model [@debruyne2021]. Some examples of common pre-processing
|
||
steps are handling of missing data, detection of outliers, and encoding
|
||
of categorical data. Data at this stage is also split into training and
|
||
testing data, typically following somewhere near a 70-30 split. These
|
||
two data sets are used for different portions of the rest of model
|
||
building. The Training set data is used to develop feature sets, train
|
||
our algorithms, tune hyperparameters, compare models, and all of the
|
||
other activities required to choose a final model (e.g., the model we
|
||
want to put into production) [@boehmke2020]. Once the final model is
|
||
chosen the test set data is used to estimate an unbiased assessment of
|
||
the model's performance, which we refer to as the generalization error
|
||
[@boehmke2020]. Most time (as much as 80%) is invested into the data
|
||
processes stage. In the second phase, a ML model is trained and tested
|
||
on the collected data after feature engineering. Feature engineering is
|
||
performed on the training set to select a good set of features to train
|
||
on. The ML model will only be able to learn efficiently if the training
|
||
data contains enough relevant features and minimal irrelevant ones
|
||
[@géron2019]. The data is then run through various models, Linear
|
||
Regression, Logistic Regression, K-Nearest Neighbors (KNN), Support
|
||
Vector Machines (SVMs), Decision Trees (DTs), Random Forests (RFs). Once
|
||
a model is selected the third phase begins to evaluate the models
|
||
performance. Historically, the performance of statistical models was
|
||
largely based on goodness-of-fit tests and assessment of residuals.
|
||
Unfortunately, misleading conclusions may follow from predictive models
|
||
that pass these kinds of assessments [@breiman2001]. Today, it has
|
||
become widely accepted that a more sound approach to assessing model
|
||
performance is to assess the predictive accuracy via loss functions
|
||
[@boehmke2020]. Loss functions are metrics that compare the predicted
|
||
values to the actual value (the output of a loss function is often
|
||
referred to as the error or pseudo residual). When performing resampling
|
||
methods, we assess the predicted values for a validation set compared to
|
||
the actual target value. The overall validation error of the model is
|
||
computed by aggregating the errors across the entire validation data set
|
||
[@boehmke2020].
|
||
|
||
<!--# should I talk about Model types ?-->
|
||
|
||
### Machine Learning in the Clinical Laboratory
|
||
|
||
<!--# Table needs to be modified -->
|
||
|
||
| **Author and Year** | **Objective and Machine Learning Task** | **Best Model** | **Major Themes** |
|
||
|---------------|-----------------------------|---------------|---------------|
|
||
| Azarkhish (2012) | Predict iron deficiency anemia and serum iron levels from CBC indices | Neural Network | Prediction |
|
||
| Cao (2012) | Triage manual review for urinalysis samples | Tree-based | Automation |
|
||
| Yang (2013) | Predict normal reference ranges of ESR for various laboratories based on geographic and other clinical features | Neural Network | Interpretation |
|
||
|
||
: Table 1. Summary of characteristics of machine learning algorithms
|
||
[@rabbani2022].
|
||
|
||
<!--# Need to fill in this section -->
|
||
|
||
## Reflex Testing
|
||
|
||
The laboratory diagnosis of thyroid dysfunction relies on the
|
||
measurement of circulating concentrations of thyrotropin (TSH), free
|
||
thyroxine (fT4), and, in some cases, free triiodothyronine (fT3). TSH
|
||
measurement is generally regarded as the most sensitive initial
|
||
laboratory test for screening individuals for thyroid hormone
|
||
abnormalities [@woodmansee2018]. TSH and fT4 have a complex, nonlinear
|
||
relationship, such that small changes in fT4 result in relatively large
|
||
changes in TSH [@plebani2020]. Many clinicians and laboratories check
|
||
TSH alone as the initial test for thyroid problems and then only add a
|
||
Free T4 measurement if the TSH is abnormal (outside the laboratory
|
||
normal reference range), this is known as reflex testing
|
||
[@woodmansee2018]. Reflex testing became possible with the advent of
|
||
laboratory information systems (LIS) that were sufficiently flexible to
|
||
permit modification of existing test requests at various stages of the
|
||
analytical process [@srivastava2010]. Reflex testing is widely used, the
|
||
major aim being to optimize the use of laboratory tests. However the
|
||
common practice of reflex testing relies simply on hard coded rules that
|
||
allow no flexibility. For instance in the case of TSH, free T4 will be
|
||
added to the patient order whenever the value falls outside of the
|
||
established laboratory reference range. This bring into the fold the
|
||
issue that the thresholds used to trigger reflex addition of tests vary
|
||
widely. In a study by Murphy he found the hypocalcaemic threshold to
|
||
trigger magnesium measurement varied from 1.50 mmol/L up to 2.20 mmol/L
|
||
[-@murphy2021]. Even allowing for differences in the nature, size and
|
||
staffing of hospital laboratories, and populations served, the extent of
|
||
the observed variation invites scrutiny [@murphy2021].
|
||
|
||
<!--# insert table and study from strivastava about hypo/hyper thyroid -->
|
||
|
||
<!--# data from woodmansee and plebani -->
|