updates

2023-06-07 12:16:22 -04:00 · 2023-06-07 12:16:22 -04:00 · 65f773674d
commit 65f773674d
parent 3251c95310
6 changed files with 69 additions and 2 deletions
--- a/.gitignore
+++ b/.gitignore
@ -67,3 +67,4 @@ ML/outputs
 Final Paper/
 test.Rda
--- a/ML/2-modeling.R
+++ b/ML/2-modeling.R
@ -47,9 +47,12 @@ ds_train <- rsample$training(model_data_split)
 ds_test  <- rsample$testing(model_data_split)
 # verify distribution of data
-table(ds_train$ft4_dia) %>% prop.table()
+strata1 <- table(ds_train$ft4_dia) %>% prop.table() %>% tibble::enframe() %>% dplyr::rename(Train = value)
-table(ds_test$ft4_dia) %>% prop.table()
+strata2 <- table(ds_test$ft4_dia) %>% prop.table() %>% tibble::enframe() %>% dplyr::rename(Test = value)
 strata_table <- strata1 %>%
  dplyr::left_join(strata2) %>%
  dplyr::rename(Class = name)
 # random forest classification -----------------------------------------------------------
--- a/_quarto.yml
+++ b/_quarto.yml
@ -10,6 +10,8 @@ book:
    - chapter1.qmd
    - chapter2.qmd
    - chapter3.qmd
    - chapter4.qmd
    - chapter5.qmd
    - references.qmd
  abstract: "This is a test to see what happens with this"
--- a/chapter4.qmd
+++ b/chapter4.qmd
@ -0,0 +1,45 @@
 # Results
 ```{r}
 #| include: false
 #| cache: true
 library(magrittr)
 load("test.Rda")
 ```
 The final data set used for this analysis consisted of 11,340
 observations. All observations contained a TSH and Free T4 result and
 less than three missing results from all other analytes selected for the
 study. The dataset was then randomly split into a training set
 containing 9071 observations and a testing set containing 2269
 observations. The data was split using stratification of the Free T4
 laboratory diagnostic value. @tbl-strata shows the split percentages.
 ```{r}
 #| label: tbl-strata
 #| tbl-cap: Data Stratification 
 #| echo: false
 strata_table %>% knitr::kable()
 ```
 First, the report shows the ability of classification algorithms to
 predict whether Free T4 will be diagnostic, with the prediction quality
 measured by Area Under Curve (AUC) and accuracy. Data regarding the
 univariate association between each predictor analyte and the Free T4
 Diagnostic value is then presented. Finally, data is presented with the
 extent to which FT4 can be predicted by examining the correlation
 statistics denoting the relationship between measured and predicted Free
 T4 values.
 ## Predictability of Free T4 Classifications
 In clinical decision-making, a key consideration in interpreting
 numerical laboratory results is often just whether the results fall
 within the normal reference range [@luo2016]. In the case of Free T4
 reflex testing, the results will either fall within the normal range
 indicating the Free T4 is not diagnostic of Hyper or Hypo Throydism, or
 they will fall outside those ranges indicating they are diagnostic.
--- a/chapter5.qmd
+++ b/chapter5.qmd
@ -0,0 +1 @@
--- a/references.bib
+++ b/references.bib
@ -335,3 +335,18 @@ DOI: 10.13026/S6N6-XD98}
 	url = {https://dl.acm.org/doi/10.1145/2939672.2939785},
 	address = {New York, NY, USA}
 }
@article{luo2016,
 	title = {Using Machine Learning to Predict Laboratory Test Results},
 	author = {Luo, Yuan and Szolovits, Peter and Dighe, Anand S. and Baron, Jason M.},
 	year = {2016},
 	month = {06},
 	date = {2016-06},
 	journal = {American Journal of Clinical Pathology},
 	pages = {778--788},
 	volume = {145},
 	number = {6},
 	doi = {10.1093/ajcp/aqw064},
 	note = {PMID: 27329638},
 	langid = {eng}
 }
`@ -67,3 +67,4 @@ ML/outputs`


	`Final Paper/`	`Final Paper/`
		`test.Rda`