161 lines
9.6 KiB
Text
161 lines
9.6 KiB
Text
# Literature Review
|
||
|
||
The application of machine learning in medicine has garnered enormous
|
||
attention over the past decade [@rabbani2022]. Artificial intelligence
|
||
(AI) and especially the subdiscipline of machine learning (ML) have
|
||
become hot topics generating increasing interest among laboratory
|
||
professionals. AI is a rather broad term and can be defined as the
|
||
theory and development of computer systems to perform complex tasks
|
||
typically requiring human intelligence, such as decision-making, visual
|
||
perception, speech recognition, and translation between languages. ML is
|
||
the science of programming, allowing computers to learn from data
|
||
without being explicitly programmed [@debruyne2021]. The ever more
|
||
extensive use of ML in clinical and basic medical research is reflected
|
||
in the number of titles and abstracts of papers indexed on PubMed and
|
||
published until 2006 as compared to 2007--2017, with a nearly 10-fold
|
||
increase from 1000 to slightly more than 9000 articles in that time
|
||
frame [@cabitza2018]. A literature review by Rabbani et al. found 39
|
||
articles about the field of clinical chemistry in laboratory medicine
|
||
between 2011 and 2021 [-@rabbani2022].
|
||
|
||
## A Brief Primer on Machine Learning
|
||
|
||
While this literature review aims not to provide an extensive
|
||
representation of the mathematics behind ML algorithms, some basic
|
||
concepts will be introduced to allow a sufficient understanding of the
|
||
topics discussed in the paper. ML models can be classified into broad
|
||
categories based on several criteria. These categories include the type
|
||
of supervision, whether are not the algorithm can learn incrementally
|
||
from an incoming stream of data (batch and online learning), and how
|
||
they generalize (instance-based versus model-based learning)
|
||
[@debruyne2021]. Rabbani et al. further classified the specific clinical
|
||
chemistry uses into five board categories, predicting laboratory test
|
||
values, improving laboratory utilization, automating laboratory
|
||
processes, promoting precision laboratory test interpretation, and
|
||
improving laboratory medicine information systems [-@rabbani2022].
|
||
|
||
### Supervised vs Unsupervised Learning
|
||
|
||
Four important categories can be distinguished based on the amount and
|
||
type of supervision the models receive during training: supervised,
|
||
unsupervised, semi-supervised, and reinforcement learning. In supervised
|
||
learning, training data are labeled, and data samples are predicted with
|
||
knowledge about the desired solutions [@debruyne2021]. They are
|
||
typically used for classification and regression purposes. Some of the
|
||
essential supervised algorithms are Linear Regression, Logistic
|
||
Regression, K-Nearest Neighbors (KNN), Support Vector Machines (SVMs),
|
||
Decision Trees (DTs), Random Forests (RFs), and supervised neural
|
||
networks. In unsupervised learning, training data are unlabeled. In
|
||
other words, observations are classified without prior data sample
|
||
knowledge [@debruyne2021]. Unsupervised algorithms can be used for
|
||
clustering (e.g., k-means clustering, density-based spatial clustering
|
||
of applications with noise, hierarchical cluster analysis),
|
||
visualization, and dimensionality reduction (e.g., principal component
|
||
analysis (PCA), kernel PCA, locally linear embedding, t-distributed
|
||
stochastic neighbor embedding), anomaly detection and novelty detection
|
||
(e.g., one-class SVM, isolation forest) and association rule learning
|
||
(e.g. apriori, eclat). However, some models can deal with partially
|
||
labeled training data (i.e., semi-supervised learning). At last, in
|
||
reinforcement learning, an agent (i.e., the learning system) learns what
|
||
actions to take to optimize the outcome of a strategy (i.e., a policy)
|
||
or to get the maximum cumulative reward [@debruyne2021]. This system
|
||
resembles humans learning to ride a bike and can typically be used in
|
||
learning games, such as Go, chess, or even poker, or settings where the
|
||
outcome is continuous rather than dichotomous (i.e., right or
|
||
wrong)[@debruyne2021]. The proposed study will use supervised learning,
|
||
as the data is labeled and a particular outcome is expected.
|
||
|
||
### Machine Learning Workflow
|
||
|
||
Since this study will focus on supervised learning, the review will
|
||
focus on that. Machine learning can be broken into three board steps,
|
||
data cleaning and processing, training and testing the model, and
|
||
finally, the model is evaluated, deployed, and monitored
|
||
[@debruyne2021]. In the first phase, data is collected, cleaned, and
|
||
labeled. Data cleaning or pre-processing is one of the essential steps
|
||
in designing a reliable model [@debruyne2021]. Some examples of common
|
||
pre-processing steps are the handling of missing data, detection of
|
||
outliers, and encoding of categorical data. Data at this stage is also
|
||
split into training and testing data, typically following somewhere near
|
||
a 70-30 split. These two data sets are used for different portions of
|
||
the rest of the model building. The Training set data is used to develop
|
||
feature sets, train our algorithms, tune hyperparameters, compare
|
||
models, and all the other activities required to choose a final model
|
||
(e.g., the model we want to put into production) [@boehmke2020]. Once
|
||
the final model is chosen, the test set data is used to estimate an
|
||
unbiased assessment of the model's performance, which we refer to as the
|
||
generalization error [@boehmke2020]. Most time (as much as 80%) is
|
||
invested into the data processes stage. After feature engineering, an ML
|
||
model is trained and tested on the collected data in the second phase.
|
||
Feature engineering is performed on the training set to select a good
|
||
set of features to train on. The ML model will only be able to learn
|
||
efficiently if the training data contains enough relevant features and
|
||
minimal irrelevant ones [@géron2019]. The data is then run through
|
||
various models, Linear Regression, Logistic Regression, K-Nearest
|
||
Neighbors (KNN), Support Vector Machines (SVMs), Decision Trees (DTs),
|
||
and Random Forests (RFs).
|
||
|
||
Once a model is selected, the third phase begins to evaluate the model's
|
||
performance. Historically, the performance of statistical models was
|
||
primarily based on goodness-of-fit tests and the assessment of
|
||
residuals. Unfortunately, misleading conclusions may follow from
|
||
predictive models that pass these assessments [@breiman2001]. Today, it
|
||
has become widely accepted that a more sound approach to assessing model
|
||
performance is to assess the predictive accuracy via loss functions
|
||
[@boehmke2020]. *Loss functions* are metrics that compare the predicted
|
||
values to the actual value (the output of a loss function is often
|
||
referred to as the error or pseudo residual). When performing resampling
|
||
methods, we assess the predicted values for a validation set compared to
|
||
the actual target value. The overall validation error of the model is
|
||
computed by aggregating the errors across the entire validation data set
|
||
[@boehmke2020]
|
||
|
||
.<!--# should I talk about Model types ?-->
|
||
|
||
### Machine Learning in the Clinical Laboratory
|
||
|
||
<!--# Table needs to be modified -->
|
||
|
||
| **Author and Year** | **Objective and Machine Learning Task** | **Best Model** | **Major Themes** |
|
||
|-------------|---------------------------------|-------------|-------------|
|
||
| Azarkhish (2012) | Predict iron deficiency anemia and serum iron levels from CBC indices | Neural Network | Prediction |
|
||
| Cao (2012) | Triage manual review for urinalysis samples | Tree-based | Automation |
|
||
| Yang (2013) | Predict normal reference ranges of ESR for various laboratories based on geographic and other clinical features | Neural Network | Interpretation |
|
||
|
||
: Summary of characteristics of machine learning algorithms
|
||
[@rabbani2022]. {#tbl-lab_ml}
|
||
|
||
<!--# Need to fill in this section -->
|
||
|
||
## Reflex Testing
|
||
|
||
The laboratory diagnosis of thyroid dysfunction relies on the
|
||
measurement of circulating concentrations of thyrotropin (TSH), free
|
||
thyroxine (fT4), and, in some cases, free triiodothyronine (fT3). TSH
|
||
measurement is generally regarded as the most sensitive initial
|
||
laboratory test for screening individuals for thyroid hormone
|
||
abnormalities [@woodmansee2018]. TSH and fT4 have a complex, nonlinear
|
||
relationship, such that small changes in fT4 result in relatively large
|
||
changes in TSH [@plebani2020]. Many clinicians and laboratories check
|
||
TSH alone as the initial test for thyroid problems and then only add a
|
||
Free T4 measurement if the TSH is abnormal (outside the laboratory
|
||
normal reference range), this is known as reflex testing
|
||
[@woodmansee2018]. Reflex testing became possible with the advent of
|
||
laboratory information systems (LIS) that were sufficiently flexible to
|
||
permit modification of existing test requests at various stages of the
|
||
analytical process [@srivastava2010]. Reflex testing is widely used, the
|
||
major aim being to optimize the use of laboratory tests. However the
|
||
common practice of reflex testing relies simply on hard coded rules that
|
||
allow no flexibility. For instance in the case of TSH, free T4 will be
|
||
added to the patient order whenever the value falls outside of the
|
||
established laboratory reference range. This bring into the fold the
|
||
issue that the thresholds used to trigger reflex addition of tests vary
|
||
widely. In a study by Murphy he found the hypocalcaemic threshold to
|
||
trigger magnesium measurement varied from 1.50 mmol/L up to 2.20 mmol/L
|
||
[-@murphy2021]. Even allowing for differences in the nature, size and
|
||
staffing of hospital laboratories, and populations served, the extent of
|
||
the observed variation invites scrutiny [@murphy2021].
|
||
|
||
<!--# insert table and study from strivastava about hypo/hyper thyroid -->
|
||
|
||
<!--# data from woodmansee and plebani -->
|