2023-06-12 13:45:16 -04:00
|
|
|
|
# Discussion
|
2023-06-07 12:16:22 -04:00
|
|
|
|
|
2023-06-12 13:45:16 -04:00
|
|
|
|
## Summary of Results
|
|
|
|
|
|
2023-06-25 08:45:15 -04:00
|
|
|
|
The findings of this study indicate that within another commonly ordered
|
|
|
|
|
laboratory testing, the diagnostic value of Free T4 can be predicted
|
|
|
|
|
accurately 80% of the time. While examining only the elevated TSH
|
|
|
|
|
results, the algorithm had a false positive rate of 2% and a false
|
|
|
|
|
negative rate of 16%. In the original data, 76% of the time, the result
|
|
|
|
|
was non-diagnostic for Hypo-Thryodism. For the decreased TSH results,
|
|
|
|
|
the algorithm had a false positive rate of 8% and a false negative rate
|
|
|
|
|
of 20%. In the original data, 67% of the time, the result was
|
|
|
|
|
non-diagnostic for Hyper-Thryodism.
|
|
|
|
|
|
|
|
|
|
While the model achieved an overall accuracy of 80%, it struggled to
|
|
|
|
|
identify positives with a sensitivity of only 63%. However, the model
|
|
|
|
|
did achieve a specificity of 89%. Sensitivity refers to a test's ability
|
|
|
|
|
to designate an individual with the disease as positive. A highly
|
|
|
|
|
sensitive test means few false negative results, and thus fewer disease
|
|
|
|
|
cases are missed. The specificity of a test is its ability to designate
|
|
|
|
|
an individual who does not have a disease as negative. A highly specific
|
|
|
|
|
test means that there are few false positive results. It may not be
|
|
|
|
|
feasible to use a test with low specificity for screening since many
|
|
|
|
|
people without the disease will screen positive and potentially receive
|
|
|
|
|
unnecessary diagnostic procedures [@newyorkstatedepartmentofhealth].
|
|
|
|
|
|
|
|
|
|
In a study by Xu et al., a machine learning model was used to predict
|
|
|
|
|
laboratory test results as normal or abnormal to identify low-yield,
|
|
|
|
|
repetitive laboratory tests [-@xu2019]. Their group performed a
|
|
|
|
|
multi-site study of nearly 200,000 inpatient laboratory testing orders
|
|
|
|
|
to identify the most repetitive laboratory tests and then attempted to
|
|
|
|
|
predict each one. They achieved an AUROC of \> 90% for 20 common
|
|
|
|
|
laboratory tests, including sodium, hemoglobin, and lactate
|
|
|
|
|
dehydrogenase. They proposed a sensitive decision threshold of a
|
|
|
|
|
negative predictive value of 95% to power a clinical decision support
|
|
|
|
|
tool aimed at reducing low-yield, repetitive testing [@xu2019]. No other
|
|
|
|
|
published studies exist in the clinical laboratory with a proposed value
|
|
|
|
|
for the success of a machine learning model. If using the 95%
|
|
|
|
|
specificity threshold, the current model does not achieve the result
|
|
|
|
|
necessary to be considered final.
|
2023-06-20 16:34:13 -04:00
|
|
|
|
|
2023-06-25 08:45:15 -04:00
|
|
|
|
While TSH was expected to be the most important variable in building
|
|
|
|
|
random forest models, it was entirely unexpected that the following
|
|
|
|
|
three values would be Hematology results. In the clinical laboratory,
|
|
|
|
|
TSH and CBCs are often run on different analyzers and in other
|
|
|
|
|
departments. Finding this slight correlation could be valuable to
|
|
|
|
|
building further algorithms. The currently available literature states
|
|
|
|
|
TSH and fT4 have a complex, nonlinear relationship, such that small
|
|
|
|
|
changes in fT4 result in relatively large changes in TSH [@plebani2020].
|
|
|
|
|
However, no currently available literature explores a relationship
|
|
|
|
|
between TSH and any of the CBC tests. These small changes between FT4
|
|
|
|
|
and TSH may be explained if this link can be expanded. While this study
|
|
|
|
|
only focuses on high-level CBC testing, most automated CBC analyzers can
|
|
|
|
|
run many more tests, which could be used in the development of future
|
|
|
|
|
algorithms.
|
2023-06-20 16:34:13 -04:00
|
|
|
|
|
2023-06-20 19:52:50 -04:00
|
|
|
|
## Real World Applications
|
2023-06-20 16:34:13 -04:00
|
|
|
|
|
2023-06-25 08:45:15 -04:00
|
|
|
|
While the current algorithm did not quite achieve an accuracy ready for
|
|
|
|
|
deployment, it is hypothesized that a system like this could be
|
|
|
|
|
implemented in clinical decision-making systems. As stated previously,
|
|
|
|
|
current practice is a physician (or other care providers) orders a TSH,
|
|
|
|
|
and if the value is outside laboratory-established reference ranges, the
|
|
|
|
|
Free T4 is added on. In the current study database, this reflex testing
|
|
|
|
|
was non-diagnostic 76% of the time for elevated TSH values and 67% for
|
|
|
|
|
decreased TSH values. Using clinical decision support first to predict
|
|
|
|
|
whether the Free T4 would be diagnostic, the care provider can use this
|
|
|
|
|
prediction and other patient signs and symptoms to determine if running
|
|
|
|
|
a Free T4 lab test is needed.
|
2023-06-20 16:34:13 -04:00
|
|
|
|
|
2023-06-25 08:45:15 -04:00
|
|
|
|
Similarly to Luo et al., the idea that the diagnostic information
|
|
|
|
|
offered by Free T4 often duplicates what other diagnostic tests provide
|
|
|
|
|
suggests a notion of "informationally" redundant testing [-@luo2016]. It
|
|
|
|
|
is speculated that informationally redundant testing occurs in various
|
|
|
|
|
diagnostic settings and diagnostic workups. It is much more frequent
|
|
|
|
|
than the more traditionally defined and narrowly framed notion of
|
|
|
|
|
redundant testing, which most often includes unintended duplications of
|
|
|
|
|
the same or similar tests. Under this narrow definition, redundant
|
|
|
|
|
laboratory testing is estimated to waste more than \$5 billion annually
|
|
|
|
|
in the United States, potentially dwarfed by the waste from
|
|
|
|
|
informationally redundant testing [@luo2016]. However, since Free T4 and
|
|
|
|
|
all other tests used in this study are performed on automated
|
|
|
|
|
instruments, the cost savings to the lab and patient may be minimal.
|
2023-06-20 16:34:13 -04:00
|
|
|
|
|
2023-06-25 08:45:15 -04:00
|
|
|
|
As Rabbani et al. study showed, Machine Learning in the Clinical
|
|
|
|
|
Laboratory is an emerging field. However, few existing studies relate to
|
|
|
|
|
predicting laboratory values based on other results [-@rabbani2022]. The
|
|
|
|
|
few studies that do exist follow a similar premise. All are trying to
|
|
|
|
|
reduce redundant laboratory testing, thus lowering the patient's cost.
|
2023-06-20 16:58:01 -04:00
|
|
|
|
|
2023-06-12 13:45:16 -04:00
|
|
|
|
## Study Limitations
|
|
|
|
|
|
2023-06-25 08:45:15 -04:00
|
|
|
|
While the MIMIC-IV database allowed for a first run of the study, it
|
|
|
|
|
does suffer from some issues compared to other patient results. The
|
|
|
|
|
MIMIC-IV database only contains results from ICU patients. Thus the
|
|
|
|
|
result may not represent normal results for patients typically screened
|
|
|
|
|
for hyper or hypothyroidism. In a study by Tyler et al., they found that
|
|
|
|
|
laboratory value ranges from critically ill patients deviate
|
|
|
|
|
significantly from those of healthy controls [-@tyler2018]. In their
|
|
|
|
|
study, distribution curves based on ICU data, have differed considerably
|
|
|
|
|
from the standard hospital range (mean \[SD\] overlapping coefficient,
|
|
|
|
|
0.51 \[0.32-0.69\]) [@tyler2018]. The data ranges from 2008 to 2019.
|
|
|
|
|
During this time, there could have been several unknown laboratory
|
|
|
|
|
changes. Often laboratories change methods, reference ranges, or even
|
|
|
|
|
vendors. None of this data is available in the MIMIC database. A change
|
|
|
|
|
in method or vendor could cause a shift in results, thus causing the
|
|
|
|
|
algorithm to assign incorrect outcomes.
|
2023-06-12 21:07:37 -04:00
|
|
|
|
|
2023-06-25 08:45:15 -04:00
|
|
|
|
The dataset also sufferers from incompleteness. Due to the fact the
|
|
|
|
|
database was not explicitly designed for this study, many patients do
|
|
|
|
|
not have complete sets of lab results. The study also had to pick and
|
|
|
|
|
choose lab tests to allow for as many groups of TSH and Free T4 results
|
|
|
|
|
as possible. For instance, in a study by Luo et al., a total of 42
|
|
|
|
|
different lab tests were selected for a Machine Learning study, compared
|
|
|
|
|
to only 16 selected for this study [-@luo2016]. The patient demographic
|
|
|
|
|
data also suffered from the same incompleteness. Due to this fact, only
|
|
|
|
|
the age and gender of the patient were used in developing the algorithm.
|
|
|
|
|
An early study by Schectman et al. found the mean TSH level of Blacks
|
|
|
|
|
was 0.4 (SE .053) mU/L lower than that for Whites after age and sex
|
|
|
|
|
adjustment, race explaining 6.5 percent of the variation in TSH levels
|
|
|
|
|
[-@schectman1991]. This variation in results should potentially be
|
|
|
|
|
included in developing a future algorithm. However, as it stands, the
|
|
|
|
|
current data set has incomplete data for patient race and ethnicity.
|
2023-06-12 21:07:37 -04:00
|
|
|
|
|
2023-06-25 08:45:15 -04:00
|
|
|
|
As Machine learning algorithms become more and more powerful, it is
|
|
|
|
|
additionally vital from an infrastructure standpoint to have the
|
|
|
|
|
processing power capable of handling the algorithms. This becomes even
|
|
|
|
|
more important in an attempt to put the algorithm into practice, as the
|
|
|
|
|
computer must be able to process results in mere milliseconds.
|
2023-06-12 21:38:36 -04:00
|
|
|
|
|
2023-06-20 17:02:03 -04:00
|
|
|
|
## Future Studies
|
2023-06-12 21:38:36 -04:00
|
|
|
|
|
2023-06-25 08:45:15 -04:00
|
|
|
|
While the current algorithm is not quite ready for production use, it
|
|
|
|
|
does lead to many promising ideas. The first step to further develop
|
|
|
|
|
this algorithm would be collecting data on non-ICU patients. The idea
|
|
|
|
|
would be gathering data on patients much closer to those screened for
|
|
|
|
|
Hypo and Hyper-Thyrodism. With data closer to normal, the optimal
|
|
|
|
|
hyperparameters could continue to be tweaked, as well as training the
|
|
|
|
|
model with this data. There could also be a reason to try and test the
|
|
|
|
|
current algorithm with different patient data to assess performance.
|
|
|
|
|
This would be similar to what Li et al. performed with their study to
|
|
|
|
|
identify unnecessary laboratory tests [-@li2022]. After developing their
|
|
|
|
|
algorithm on the MIMIC-III database, they gathered data from Memorial
|
|
|
|
|
Hermann Hospital in Houston, Texas. However, their algorithm was
|
|
|
|
|
designed for ICU patients in this study, so this was a more direct
|
|
|
|
|
performance comparison. In the case of this study, the algorithm was
|
|
|
|
|
intended more as a proof of concept than are production-ready idea.
|
2023-06-20 19:52:50 -04:00
|
|
|
|
|
2023-06-25 08:45:15 -04:00
|
|
|
|
One of the most challenging parts of this study and any machine learning
|
|
|
|
|
in the clinical laboratory is implementing it after the fact. Developing
|
|
|
|
|
an algorithm that can predict laboratory testing is just half the idea.
|
|
|
|
|
Many current laboratory information systems would be unable to handle
|
|
|
|
|
this type of clinical decision-making system, as this would be much
|
|
|
|
|
outside the expected behavior of these systems.
|