2023-01-22 15:26:35 -05:00
|
|
|
# Methods
|
2022-09-26 20:11:10 -04:00
|
|
|
|
2023-01-24 13:48:02 -05:00
|
|
|
need brief description and IRB statement
|
|
|
|
|
2023-01-22 15:26:35 -05:00
|
|
|
## Population and Data
|
2023-01-23 19:08:48 -05:00
|
|
|
|
2023-01-26 07:39:17 -05:00
|
|
|
This study was designed using the Medical Information Mart for Intensive
|
|
|
|
Care (MIMIC) database [@johnsonalistair]. MIMIC (Medical Information
|
|
|
|
Mart for Intensive Care) is an extensive, freely-available database
|
|
|
|
comprising de-identified health-related data from patients who were
|
|
|
|
admitted to the critical care units of the Beth Israel Deaconess Medical
|
|
|
|
Center. The database contains many different types of information, but
|
|
|
|
only data from the patients and laboratory events table are used in this
|
|
|
|
study. The database contains many kinds of information, but only data
|
|
|
|
from the patients and laboratory events table are used in this study.
|
|
|
|
The study uses version IV of the database, comprising data from 2008 -
|
|
|
|
2019.
|
2023-01-23 19:08:48 -05:00
|
|
|
|
|
|
|
## Data Variables and Outcomes
|
2023-01-25 08:13:31 -05:00
|
|
|
|
2023-01-25 14:29:28 -05:00
|
|
|
```{r}
|
|
|
|
#| include: FALSE
|
2023-01-25 08:13:31 -05:00
|
|
|
|
2023-01-25 14:29:28 -05:00
|
|
|
source(here::here("ML","1-data-exploration.R"))
|
2023-01-25 08:13:31 -05:00
|
|
|
|
2023-01-25 14:29:28 -05:00
|
|
|
```
|
|
|
|
|
2023-01-26 07:39:17 -05:00
|
|
|
A total of 18 variables were chosen for this study. The age and gender
|
|
|
|
of the patient were pulled from the patient table in the MIMIC database.
|
|
|
|
While this database contains some additional demographic information, it
|
|
|
|
is incomplete and thus unusable for this study. 16 lab values were
|
|
|
|
selected for this study, this includes:
|
2023-01-25 14:29:28 -05:00
|
|
|
|
2023-01-26 07:39:17 -05:00
|
|
|
- **BMP**: BUN, bicarbonate, calcium, chloride, creatinine, glucose,
|
|
|
|
potassium, sodium
|
2023-01-25 14:29:28 -05:00
|
|
|
|
2023-01-26 07:39:17 -05:00
|
|
|
- **CBC**: Hematocrit, hemoglobin, platelet count, red blood cell
|
|
|
|
count, white blood cell count
|
2023-01-25 08:13:31 -05:00
|
|
|
|
|
|
|
- TSH
|
|
|
|
|
|
|
|
- Free T4
|
|
|
|
|
2023-01-26 07:39:17 -05:00
|
|
|
The unique patient id and chart time were also retained for identifying
|
|
|
|
each sample. Each sample contains one set of 16 lab values for each
|
|
|
|
patient. Patients may have several samples in the data set that were run
|
|
|
|
at different times. Rows were retained as long as they had less than
|
|
|
|
three missing results. These missing results can be filled in by
|
|
|
|
imputation later in the process. Samples were also filtered for those
|
|
|
|
with TSH above or below the reference range of 0.27 - 4.2 uIU/mL. These
|
|
|
|
represent samples that would have reflexed for Free T4 testing. After
|
|
|
|
filtering, the final data set contained `r nrow(ds1)` rows.
|
|
|
|
|
|
|
|
Once the final data set was collected, an additional column was created
|
|
|
|
for the outcome variable to determine if the Free T4 value was
|
|
|
|
diagnostic. After adding the outcome variable, the Free T4 value was
|
|
|
|
dropped from each row. @tbl-outcome_var shows how the outcomes were
|
|
|
|
added. @tbl-data_summary shows the summary statistics of each variable
|
|
|
|
selected for the study.
|
2023-01-25 14:29:28 -05:00
|
|
|
|
2023-01-25 16:39:42 -05:00
|
|
|
| TSH Value | Free T4 Value | Outcome |
|
|
|
|
|---------------|---------------|---------------------|
|
|
|
|
| \>4.2 uIU/ml | \>0.93 ng/dL | Non-Hypothyroidism |
|
|
|
|
| \>4.2 uIU/ml | \<0.93 ng/dL | Hypothyroidism |
|
|
|
|
| \<0.27 uIU/ml | \<1.7 ng/d | Non-Hyperthyroidism |
|
|
|
|
| \<0.27 uIU/ml | \>1.7 ng/d | Hyperthyroidism |
|
2023-01-25 14:29:28 -05:00
|
|
|
|
|
|
|
: Outcome Variable {#tbl-outcome_var}
|
|
|
|
|
2023-01-25 16:39:42 -05:00
|
|
|
```{r}
|
|
|
|
#| label: tbl-data_summary
|
|
|
|
#| tbl-cap: Data Summary
|
|
|
|
#| echo: false
|
|
|
|
|
2023-01-26 07:39:17 -05:00
|
|
|
summary_tbl %>% gtsummary$as_kable()
|
2023-01-25 16:39:42 -05:00
|
|
|
```
|
|
|
|
|
|
|
|
## Data Inspection
|
|
|
|
|
2023-01-26 08:15:44 -05:00
|
|
|
{#fig-distro_histo}
|
|
|
|
|
|
|
|
{#fig-corr_plot}
|
2023-01-25 16:39:42 -05:00
|
|
|
|
2023-01-25 08:13:31 -05:00
|
|
|
## Data Transformations
|
|
|
|
|
|
|
|
In progress
|
2023-01-25 14:29:28 -05:00
|
|
|
|
|
|
|
## Model Selection
|
2023-01-26 08:15:44 -05:00
|
|
|
|
|
|
|
In Progress
|