updates

2023-06-12 21:07:37 -04:00 · 2023-06-12 21:07:37 -04:00 · 8e5096be77
commit 8e5096be77
parent cc110f39ef
3 changed files with 104 additions and 11 deletions
--- a/chapter4.qmd
+++ b/chapter4.qmd
@ -8,7 +8,13 @@ load(here::here("figures", "strata_table.Rda"))
 ```
-The final data set used for this analysis consisted of 11,340 observations. All observations contained a TSH and Free T4 result and less than three missing results from all other analytes selected for the study. The dataset was then randomly split into a training set containing 9071 observations and a testing set containing 2269 observations. The data was split using stratification of the Free T4 laboratory diagnostic value. @tbl-strata shows the split percentages.
+The final data set used for this analysis consisted of 11,340
 observations. All observations contained a TSH and Free T4 result and
 less than three missing results from all other analytes selected for the
 study. The dataset was then randomly split into a training set
 containing 9071 observations and a testing set containing 2269
 observations. The data was split using stratification of the Free T4
 laboratory diagnostic value. @tbl-strata shows the split percentages.
 ```{r}
 #| label: tbl-strata
@ -19,26 +25,65 @@ strata_table %>% knitr::kable()
 ```
-First, the report shows the ability of classification algorithms to predict whether Free T4 will be diagnostic, with the prediction quality measured by Area Under Curve (AUC) and accuracy. Data regarding the importance association between each predictor analyte and the Free T4 Diagnostic value is then presented. Finally, data is presented with the extent to which FT4 can be predicted by examining the correlation statistics denoting the relationship between measured and predicted Free T4 values.
+First, the report shows the ability of classification algorithms to
 predict whether Free T4 will be diagnostic, with the prediction quality
 measured by Area Under Curve (AUC) and accuracy. Data regarding the
 importance association between each predictor analyte and the Free T4
 Diagnostic value is then presented. Finally, data is presented with the
 extent to which FT4 can be predicted by examining the correlation
 statistics denoting the relationship between measured and predicted Free
 T4 values.
 ## Predictability of Free T4 Classifications
-In clinical decision-making, a key consideration in interpreting numerical laboratory results is often just whether the results fall within the normal reference range [@luo2016]. In the case of Free T4 reflex testing, the results will either fall within the normal range indicating the Free T4 is not diagnostic of Hyper or Hypo Throydism, or they will fall outside those ranges indicating they are diagnostic. The final model achieved an accuracy of 0.796 and an AUC of 0.918. @fig-roc_curve provides ROC curves for each of the four outcome classes.
+In clinical decision-making, a key consideration in interpreting
 numerical laboratory results is often just whether the results fall
 within the normal reference range [@luo2016]. In the case of Free T4
 reflex testing, the results will either fall within the normal range
 indicating the Free T4 is not diagnostic of Hyper or Hypo Throydism, or
 they will fall outside those ranges indicating they are diagnostic. The
 final model achieved an accuracy of 0.796 and an AUC of 0.918.
@fig-roc_curve provides ROC curves for each of the four outcome classes.
-![ROC curves for each of the four outcome classes](figures/roc_curve_class){#fig-roc_curve}
+![ROC curves for each of the four outcome
 classes](figures/roc_curve_class){#fig-roc_curve}
-@fig-conf-matrix-class shows the confusion matrix of the final testing data. Of the 2269 total results, 1805 were predicted correctly, leaving 464 incorrectly predicted results. Of the incorrectly predicted results, 72 results predicted a diagnostic Free T4 when the correct result was non-diagnostic. 392 of the incorrectly predicted results were predicted as non-diagnostic when the correct result was diagnostic.
+@fig-conf-matrix-class shows the confusion matrix of the final testing
 data. Of the 2269 total results, 1805 were predicted correctly, leaving
 464 incorrectly predicted results. Of the incorrectly predicted results,
 72 results predicted a diagnostic Free T4 when the correct result was
 non-diagnostic. 392 of the incorrectly predicted results were predicted
 as non-diagnostic when the correct result was diagnostic.
-![Final Model Confusion Matrix](figures/conf_matrix_class){#fig-conf-matrix-class}
+![Final Model Confusion
 Matrix](figures/conf_matrix_class){#fig-conf-matrix-class}
 ## Contributions of Individual Analytes
-Understanding how an ML model makes predictions helps build trust in the model and is the fundamental idea of the emerging field of interpretable machine learning (IML) [@greenwell2020]. @fig-vip-class shows the importance of features in the final model. Importance can be defined as the extent to which a feature has a "meaningful" impact on the predicted outcome [@laan2006]. As expected, TSH is the leading variable in importance rankings, leading all other variables by over 2000's points. The following three variables are all parts of a Complete Blood Count (CBC), followed by the patient's glucose value.
+Understanding how an ML model makes predictions helps build trust in the
 model and is the fundamental idea of the emerging field of interpretable
 machine learning (IML) [@greenwell2020]. @fig-vip-class shows the
 importance of features in the final model. Importance can be defined as
 the extent to which a feature has a "meaningful" impact on the predicted
 outcome [@laan2006]. As expected, TSH is the leading variable in
 importance rankings, leading all other variables by over 2000's points.
 The following three variables are all parts of a Complete Blood Count
 (CBC), followed by the patient's glucose value.
 ![Variable Importance Plot](figures/vip_class){#fig-vip-class}
 ## Predictability of Free T4 Results (Regression)
-Today, it has become widely accepted that a more sound approach to assessing model performance is to assess the predictive accuracy via loss functions. Loss functions are metrics that compare the predicted values to the actual value (the output of a loss function is often referred to as the error or pseudo residual) [@boehmke2020]. The loss function used to evaluate the final model was selected as the Root Mean Square Error, and the final testing data achieved an RMSE of 0.334. @fig-reg-pred shows the plotted results. The predicted results were also used to add the diagnostic classification of Free T4. These results achieved an accuracy of 0.790, and thus very similar to the classification model.
+Today, it has become widely accepted that a more sound approach to
 assessing model performance is to assess the predictive accuracy via
 loss functions. Loss functions are metrics that compare the predicted
 values to the actual value (the output of a loss function is often
 referred to as the error or pseudo residual) [@boehmke2020]. The loss
 function used to evaluate the final model was selected as the Root Mean
 Square Error, and the final testing data achieved an RMSE of 0.334.
@fig-reg-pred shows the plotted results. The predicted results were also
 used to add the diagnostic classification of Free T4. These results
 achieved an accuracy of 0.790, and thus very similar to the
 classification model.
 ![Regression Predictions Plot](figures/reggression_pred){#fig-reg-pred}
--- a/chapter5.qmd
+++ b/chapter5.qmd
@ -1,13 +1,46 @@
 # Discussion
-Intro Paragraph <!--# Write after I write everything else -->
+Intro Paragraph - In
 progress<!--# Write after I write everything else -->
 ## Summary of Results
 ## Study Limitations
-Section overview
+Section overview - In progress
 ### MIMIC Database
-While the MIMIC-IV database allowed for a first run of the study, it does suffer from some issues compared to other patient results. The MIMIC-IV database only contains results from ICU patients. Thus the result may not represent normal results for patients typically screened for hyper or hypothyroidism. In a study by Tyler et al., they found that laboratory value ranges from critically ill patients deviate significantly from those of healthy controls [-@tyler2018]. In their study, distribution curves based on ICU data differed significantly from the hospital standard range (mean \[SD\] overlapping coefficient, 0.51 \[0.32-0.69\]) [@tyler2018]. The data ranges from 2008 to 2019. During this time, there could have been several unknown laboratory changes. Often laboratories change methods, reference ranges, or even vendors. None of this data is available in the MIMIC database. A change in method or vendor could cause a shift in results, thus causing the algorithm to assign incorrect outcomes.
+While the MIMIC-IV database allowed for a first run of the study, it
 does suffer from some issues compared to other patient results. The
 MIMIC-IV database only contains results from ICU patients. Thus the
 result may not represent normal results for patients typically screened
 for hyper or hypothyroidism. In a study by Tyler et al., they found that
 laboratory value ranges from critically ill patients deviate
 significantly from those of healthy controls [-@tyler2018]. In their
 study, distribution curves based on ICU data differed significantly from
 the hospital standard range (mean \[SD\] overlapping coefficient, 0.51
 \[0.32-0.69\]) [@tyler2018]. The data ranges from 2008 to 2019. During
 this time, there could have been several unknown laboratory changes.
 Often laboratories change methods, reference ranges, or even vendors.
 None of this data is available in the MIMIC database. A change in method
 or vendor could cause a shift in results, thus causing the algorithm to
 assign incorrect outcomes.
 The dataset also sufferers from incompleteness. Due to the fact the
 database was not explicitly designed for this study, many patients do
 not have complete sets of lab results. The study also had to pick and
 choose lab tests to allow for as many sets of TSH and Free T4 results as
 possible. For instance, in a study by Luo et al., a total of 42
 different lab tests were selected for a Machine Learning study, compared
 to only 16 selected for this study [-@luo2016]. The patient demographic
 data also suffered from the same incompleteness. Due to this fact, only
 the age and gender of the patient were used in developing the algorithm.
 An early study by Schectman et al. found the mean TSH level of Blacks
 was 0.4 (SE .053) mU/L lower than that for Whites after age and sex
 adjustment, race explaining 6.5 percent of the variation in TSH levels
 [-@schectman1991]. This variation in results should potentially be
 included in developing a future algorithm. However, as it stands, the
 current data set has incomplete data for patient race and ethnicity.
 ## Real World Applications 
--- a/references.bib
+++ b/references.bib
@ -395,3 +395,18 @@ DOI: 10.13026/S6N6-XD98}
 	note = {PMID: 30646358
 PMCID: PMC6324400}
 }
@article{schectman1991,
 	title = {Report of an association between race and thyroid stimulating hormone level.},
 	author = {Schectman, J M and Kallenberg, G A and Hirsch, R P and Shumacher, R J},
 	year = {1991},
 	month = {04},
 	date = {1991-04},
 	journal = {American Journal of Public Health},
 	pages = {505--506},
 	volume = {81},
 	number = {4},
 	url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1405055/},
 	note = {PMID: 2003636
 PMCID: PMC1405055}
 }