I have since converted this blog to a quarto blog, but am leaving this post up in case anyone finds it useful
+
+
+
Introduction
+
This metapost describes changing my personal blog from a blogdown site to a distill blog site. I will not be going over starting a site from scratch as there are already several great resources for doing this. What I will be going over is some of the challenges and some of the changes and tips I have found. If you are looking to start a site from scratch, check out these great resources:
This last post goes into some of the pros and cons of using distill instead of blogdown. If you want simplicity, go with distill without much customization. If you want the ability for a lot of customization and don’t mind being frustrated with Hugo, go with blogdown.
+
+
+
Challenges
+
The good thing about switching from blogdown to distill was I had very few challenges! The distill documentation, combined with the two posts I listed, helped me with most of my troubles. The only issue I ran into was distill contains a function called import_post(), which according to the docs, only needs a published URL to work. I could never solve why, but I could not pull in the posts from my old blogdown site. This wasn’t a big deal as I had the original rmarkdown documents, but this could pose an issue if you didn’t.
+
+
+
Going Outside the Box
+
+
Code Folding
+
When I converted my blog on 12/30/2020, code folding was not included as an option by default in distill. At that time, an excellent package called Codefolder added the functionality. Since going live with the blog, code folding has been added to distill.1 Code folding is available for either the whole document or individual code sections. The default caption is “Show Code”, but instead of typing code_folding=TRUE, you can provide a string to change the caption.
+
+
# Some awesome code
+# That does awesome things
+
+
+
+
Customizing the Home Page
+
By default, a distill blog’s home page will be the blog index page. I chose to edit my home page to be a landing page for myself and then have the blog index as a separate page. When creating a new blog, this is the default YAML header for your index page.
The critical piece here is the line site: distill::distill_website. This line is what is needed to render the website. For my home page, I decided to use the package Postcard, which is used to generate simple landing pages. I won’t go into every step as there is already a great post by Alison Hill on how to do that. However, I will point out the most crucial part of the new index page the YAML header needs to contain these two lines.
I have enjoyed the simplicity of Distill. While not nearly as customizable as blogdown, getting a blog site up and running in under an hour is pretty lovely. I hope to keep exploring what Distill has to offer and keep posting my updates!
+
+
+
+
+
+
Footnotes
+
+
+
Note that as of publishing, code folding is only available in the development version of distill↩︎
Recently I was struggling to find a data project to work on, I felt a bit stuck with some of my current projects, so I begun to scour the internet to find something to work on. I stumbled upon (TidyTuesday)[https://github.com/rfordatascience/tidytuesday] a weekly project where untidy data is posted from various sources, for the goal of practicing cleaning and visualizing. There is not right or wrong answers for TidyTuesday, this was exactly what I was looking for! This week (well by the time this was posted, a few weeks ago) the data set was about Historically Black Colleges and Universities. Within the posted data there were a few different data sets, I chose to work with the set dealing with High school Graduation rates, throughout this post I will explain my steps for cleaning and then present a few different graphs. It should also be noted that in the first section my code blocks will build upon themselves, so the same code will be duplicated as I add more steps to it.
+
+
+
Load Data
+
In this first block we will load some required libraries as well as load in the raw data. This dataset contains data for Highschool graduation rates by race. One thing to point out here is the use of import::from(), will its use here is a bit overkill, it was more for my practice. In this case I am importing the function %nin from the Hmisc package, which in the opposite of the function %in% from base R.
Rows: 48
+Columns: 19
+$ Total <dbl> 1910…
+$ `Total, percent of all persons age 25 and over` <dbl> 13.5…
+$ `Standard Errors - Total, percent of all persons age 25 and over` <chr> "(—)…
+$ White1 <chr> "—",…
+$ `Standard Errors - White1` <chr> "(†)…
+$ Black1 <chr> "—",…
+$ `Standard Errors - Black1` <chr> "(†)…
+$ Hispanic <chr> "—",…
+$ `Standard Errors - Hispanic` <chr> "(†)…
+$ `Total - Asian/Pacific Islander` <chr> "—",…
+$ `Standard Errors - Total - Asian/Pacific Islander` <chr> "(†)…
+$ `Asian/Pacific Islander - Asian` <chr> "—",…
+$ `Standard Errors - Asian/Pacific Islander - Asian` <chr> "(†)…
+$ `Asian/Pacific Islander - Pacific Islander` <chr> "—",…
+$ `Standard Errors - Asian/Pacific Islander - Pacific Islander` <chr> "(†)…
+$ `American Indian/\r\nAlaska Native` <chr> "—",…
+$ `Standard Errors - American Indian/\r\nAlaska Native` <chr> "(†)…
+$ `Two or more race` <chr> "—",…
+$ `Standard Errors - Two or more race` <chr> "(†)…
+
+
+
Now we are going to start cleaning the data. First I am going to filter for years 1985 and up, prior to this year the data set is a bit spardic, so to keep it clean I am only going to look at 1985 and up. There are also 3 odd years (19103,19203,19303) that I am not sure what those are so I will remove that data as well.
Next I am going to convert all columns to be numeric, because of some blanks in the original import all of the columns read in as characters instead of numeric.
Next I am going to rename the columns. First I rename the column Total, into year, as this column holds the year! Then I use stringr::str_remove_all to remove the long phrase ‘percent of all persons age 25 and over’, as well as the number 1. For some reason the Black and White columns each have a number 1 at the end, I think this is for some sort of footnote but we will just remove it.
+
+
hs_students <- hs_students_raw %>%
+filter(Total >=1985) %>%
+filter(Total %nin%c(19103, 19203, 19303)) %>%
+mutate(across(everything(), as.numeric)) %>%
+rename(year = Total) %>%
+rename_with(
+~stringr::str_remove_all(
+ .
+ ,", percent of all persons age 25 and over|1"
+ )
+ )
+
+
Then I am going to drop the column ‘Total - Asian/Pacific Islander’, each of these races is stored in a seperate column so if I needed the total later for some reason I could calculate it. I am also going to drop the string “Asian/Pacific Islander -”, from the begin of each of those columns, so they will now tell me just which race each column refers too.
I now simply pivot the data longer. A nice trick I learned since I want to pivot everything expect the year column is to use the minus sign to select every column expect the year column in the pivot.
With the data now in long form I am going to separate the automatically generate name column into two columns titled, stat and race. The data contains both the percent that graduated and the standard error. Then I replace all the NA’s in the stat column with Total, as these are the total percentage and the other rows will be the standard error. Last I dropped the s from standard errors to make it singular.
It’s now time to graph. Notice the use scales::label_percent() as the labels value for the y axis. If the numbers were left as the default values (75 vs 0.75) the percentages would have been 750%, which is obviously very wrong! I also use geom_ribbon to draw the standard error bars around each line. Notice the use of color = NA, by default the ribbon has outlines, I did not like this so doing color = NA turns them off. (It should be noted there are a few other solutions to turning them off but this seemed the easiest to me). Last we see the use of the aesthetics argument in scale_color_brewer. By setting this we match the color and fill to be the same color, without setting this, the colors of the error bars and lines don’t match!
While I am sure there is much more that could be done with this data this is where I am going to stop for today. Our graphs clearly show a divide in graduation rates by race, however Sex does not seem to have much of an effect on graduation rates.
+
+
+
+
\ No newline at end of file
diff --git a/_site/posts/2021-02-26_tidytuesday-hbcu-enrollment/tidytuesday-2021-week-6-hbcu-enrolment_files/figure-html/unnamed-chunk-10-1.png b/_site/posts/2021-02-26_tidytuesday-hbcu-enrollment/tidytuesday-2021-week-6-hbcu-enrolment_files/figure-html/unnamed-chunk-10-1.png
new file mode 100644
index 0000000..1394b8b
Binary files /dev/null and b/_site/posts/2021-02-26_tidytuesday-hbcu-enrollment/tidytuesday-2021-week-6-hbcu-enrolment_files/figure-html/unnamed-chunk-10-1.png differ
diff --git a/_site/posts/2021-02-26_tidytuesday-hbcu-enrollment/tidytuesday-2021-week-6-hbcu-enrolment_files/figure-html/unnamed-chunk-14-1.png b/_site/posts/2021-02-26_tidytuesday-hbcu-enrollment/tidytuesday-2021-week-6-hbcu-enrolment_files/figure-html/unnamed-chunk-14-1.png
new file mode 100644
index 0000000..fee0412
Binary files /dev/null and b/_site/posts/2021-02-26_tidytuesday-hbcu-enrollment/tidytuesday-2021-week-6-hbcu-enrolment_files/figure-html/unnamed-chunk-14-1.png differ
diff --git a/_site/posts/2021-02-26_tidytuesday-hbcu-enrollment/tidytuesday-2021-week-6-hbcu-enrolment_files/figure-html/unnamed-chunk-15-1.png b/_site/posts/2021-02-26_tidytuesday-hbcu-enrollment/tidytuesday-2021-week-6-hbcu-enrolment_files/figure-html/unnamed-chunk-15-1.png
new file mode 100644
index 0000000..ae64cfa
Binary files /dev/null and b/_site/posts/2021-02-26_tidytuesday-hbcu-enrollment/tidytuesday-2021-week-6-hbcu-enrolment_files/figure-html/unnamed-chunk-15-1.png differ
diff --git a/_site/search.json b/_site/search.json
index 8f97459..c916d96 100644
--- a/_site/search.json
+++ b/_site/search.json
@@ -39,7 +39,7 @@
"href": "blog.html",
"title": "Posts",
"section": "",
- "text": "Diabetes in Rural North Carolina : Data Collection and Cleaning\n\n\nThis is the second post in the series exploring Diabetes in rural North Carolina. This post will explore the data used for this project, from collection, cleaning, and analysis ready data.\n\n\n\n\n\n\n\n\n\nJul 25, 2020\n\n\nKyle Belanger\n\n\n\n\n\n\n \n\n\n\n\nImporting Excel Data with Multiple Header Rows\n\n\nA solution for importing Excel Data that contains two header rows.\n\n\n\n\n\n\n\n\n\nJun 22, 2020\n\n\nKyle Belanger\n\n\n\n\n\n\n \n\n\n\n\nBasic Exploration of WHO Tuberculosis Data\n\n\nToday I am going to dive into some real life data from the World Health Organization (WHO), exploring new and relapse cases of Tuberculosis. I clean up the data, and then make a few graphs to explore different variables.\n\n\n\n\n\n\n\n\n\nFeb 13, 2020\n\n\nKyle Belanger\n\n\n\n\n\n\n \n\n\n\n\nLine Graphs and Interactivity\n\n\nTableau for Healthcare Chapter 10. Static and Interactive examples\n\n\n\n\n\n\n\n\n\nFeb 10, 2020\n\n\nKyle Belanger\n\n\n\n\n\n\n \n\n\n\n\nFacets and a Lesson in Humility\n\n\nA look at Tableau for Healthcare Chapter 8. Table Lens graph.\n\n\n\n\n\n\n\n\n\nJan 29, 2020\n\n\nKyle Belanger\n\n\n\n\n\n\n \n\n\n\n\nMy Start to R\n\n\nA short introduction to my blog, and R journey.\n\n\n\n\n\n\n\n\n\nJan 24, 2020\n\n\nKyle Belanger\n\n\n\n\n\n\nNo matching items"
+ "text": "TidyTuesday 2021 Week 6: HBCU Enrollment\n\n\nTidyTuesday 2021 Week 6: HBCU Enrollment. Posts looks at tidying the data ,as well as making some graphs about the data.\n\n\n\n\nTidyTuesday\n\n\n\n\n\n\n\n\n\n\n\nFeb 26, 2021\n\n\nKyle Belanger\n\n\n\n\n\n\n \n\n\n\n\nConverting From Blogdown to Distill\n\n\nA meta post on transferring from a blogdown to distill blog site\n\n\n\n\nDistill\n\n\n\n\n\n\n\n\n\n\n\nJan 12, 2021\n\n\nKyle Belanger\n\n\n\n\n\n\n \n\n\n\n\nDiabetes in Rural North Carolina : Data Collection and Cleaning\n\n\nThis is the second post in the series exploring Diabetes in rural North Carolina. This post will explore the data used for this project, from collection, cleaning, and analysis ready data.\n\n\n\n\n\n\n\n\n\nJul 25, 2020\n\n\nKyle Belanger\n\n\n\n\n\n\n \n\n\n\n\nImporting Excel Data with Multiple Header Rows\n\n\nA solution for importing Excel Data that contains two header rows.\n\n\n\n\n\n\n\n\n\nJun 22, 2020\n\n\nKyle Belanger\n\n\n\n\n\n\n \n\n\n\n\nBasic Exploration of WHO Tuberculosis Data\n\n\nToday I am going to dive into some real life data from the World Health Organization (WHO), exploring new and relapse cases of Tuberculosis. I clean up the data, and then make a few graphs to explore different variables.\n\n\n\n\n\n\n\n\n\nFeb 13, 2020\n\n\nKyle Belanger\n\n\n\n\n\n\n \n\n\n\n\nLine Graphs and Interactivity\n\n\nTableau for Healthcare Chapter 10. Static and Interactive examples\n\n\n\n\n\n\n\n\n\nFeb 10, 2020\n\n\nKyle Belanger\n\n\n\n\n\n\n \n\n\n\n\nFacets and a Lesson in Humility\n\n\nA look at Tableau for Healthcare Chapter 8. Table Lens graph.\n\n\n\n\n\n\n\n\n\nJan 29, 2020\n\n\nKyle Belanger\n\n\n\n\n\n\n \n\n\n\n\nMy Start to R\n\n\nA short introduction to my blog, and R journey.\n\n\n\n\n\n\n\n\n\nJan 24, 2020\n\n\nKyle Belanger\n\n\n\n\n\n\nNo matching items"
},
{
"objectID": "posts/post-with-code/index.html",
@@ -173,5 +173,40 @@
"title": "Importing Excel Data with Multiple Header Rows",
"section": "",
"text": "Problem\nRecently I tried to important some Microsoft Excel data into R, and ran into an issue were the data actually had two different header rows. The top row listed a group, and then the second row listed a category within that group. Searching goggle I couldn’t really find a good example of what I was looking for, so I am putting it here in hopes of helping someone else!\n\n\nExample Data\nI have created a small Excel file to demonstrate what I am talking about. Download it here. This is the data from Excel. \n\n\nCheck Data\nFirst we will read the file in using the package readxl and view the data without doing anything special to it.\n\nlibrary(readxl) # load the readxl library\nlibrary(tidyverse) # load the tidyverse for manipulating the data\nfile_path <- \"example_data.xlsx\" # set the file path\nds0 <- read_excel(file_path) # read the file\nds0\n\n# A tibble: 7 × 7\n Name `Test 1` ...3 ...4 `Test 2` ...6 ...7 \n <chr> <chr> <chr> <chr> <chr> <chr> <chr>\n1 <NA> Run 1 Run 2 Run 3 Run 1 Run 2 Run 3\n2 Max 22 23 24 25 26 27 \n3 Phoebe 34 34 32 34 51 12 \n4 Scamp 35 36 21 22 23 24 \n5 Chance 1234 1235 1236 1267 173 1233 \n6 Aimee 420 123 690 42 45 12 \n7 Kyle 22 23 25 26 67 54 \n\n\n\n\nNew Header Names\n\nStep 1\nFirst lets read back the data, this time however with some options. We will set the n_max equal to 2, to only read the first two rows, and set col_names to FALSE so we do not read the first row as headers.\n\nds1 <- read_excel(file_path, n_max = 2, col_names = FALSE)\nds1\n\n# A tibble: 2 × 7\n ...1 ...2 ...3 ...4 ...5 ...6 ...7 \n <chr> <chr> <chr> <chr> <chr> <chr> <chr>\n1 Name Test 1 <NA> <NA> Test 2 <NA> <NA> \n2 <NA> Run 1 Run 2 Run 3 Run 1 Run 2 Run 3\n\n\n\n\nStep 2\nNow that we have our headers lets first transpose them to a vertical matrix using the base function t(), then we will turn it back into a tibble to allow us to use tidyr fill function.\n\nnames <- ds1 %>%\n t() %>% #transpose to a matrix\n as_tibble() #back to tibble\nnames\n\n# A tibble: 7 × 2\n V1 V2 \n <chr> <chr>\n1 Name <NA> \n2 Test 1 Run 1\n3 <NA> Run 2\n4 <NA> Run 3\n5 Test 2 Run 1\n6 <NA> Run 2\n7 <NA> Run 3\n\n\nNote that tidyr fill can not work row wise, thus the need to flip the tibble so it is long vs wide.\n\n\nStep 3\nNow we use tidyr fill function to fill the NA’s with whatever value it finds above.\n\nnames <- names %>% fill(V1) #use dplyr fill to fill in the NA's\nnames\n\n# A tibble: 7 × 2\n V1 V2 \n <chr> <chr>\n1 Name <NA> \n2 Test 1 Run 1\n3 Test 1 Run 2\n4 Test 1 Run 3\n5 Test 2 Run 1\n6 Test 2 Run 2\n7 Test 2 Run 3\n\n\n\n\nStep 4\nThis is where my data differed from many of the examples I could find online. Because the second row is also a header we can not just get rid of them. We can solve this using paste() combined with dplyr mutate to form a new column that combines the first and second column.\n\nnames <- names %>%\n mutate(\n new_names = paste(V1,V2, sep = \"_\")\n )\nnames\n\n# A tibble: 7 × 3\n V1 V2 new_names \n <chr> <chr> <chr> \n1 Name <NA> Name_NA \n2 Test 1 Run 1 Test 1_Run 1\n3 Test 1 Run 2 Test 1_Run 2\n4 Test 1 Run 3 Test 1_Run 3\n5 Test 2 Run 1 Test 2_Run 1\n6 Test 2 Run 2 Test 2_Run 2\n7 Test 2 Run 3 Test 2_Run 3\n\n\n\n\nStep 4a\nOne more small clean up task, in the example data the first column header Name, did not have a second label, this has created a name with an NA attached. We can use stringr to remove this NA.\n\nnames <- names %>% mutate(across(new_names, ~str_remove_all(.,\"_NA\")))\nnames\n\n# A tibble: 7 × 3\n V1 V2 new_names \n <chr> <chr> <chr> \n1 Name <NA> Name \n2 Test 1 Run 1 Test 1_Run 1\n3 Test 1 Run 2 Test 1_Run 2\n4 Test 1 Run 3 Test 1_Run 3\n5 Test 2 Run 1 Test 2_Run 1\n6 Test 2 Run 2 Test 2_Run 2\n7 Test 2 Run 3 Test 2_Run 3\n\n\n\n\nStep 5\nNow that are new name column is the way we want it, we can use dpylrs pull to return a vector of just that column\n\nnames <- names %>% pull(new_names)\n\n\n\n\nFinal Data\nNow that we have a vector of column names lets read in the original file using our new names. We set the skip argument to 2, to skip the first two rows, and set col_names equal to our vector of names. Note the last step I used the janitor package to provide names in snake case (the default for the clean names function.)\n\nexample_data <- readxl::read_excel(file_path, col_names = names, skip = 2) %>%\n janitor::clean_names()\nexample_data\n\n# A tibble: 6 × 7\n name test_1_run_1 test_1_run_2 test_1_run_3 test_2_run_1 test_2_run_2\n <chr> <dbl> <dbl> <dbl> <dbl> <dbl>\n1 Max 22 23 24 25 26\n2 Phoebe 34 34 32 34 51\n3 Scamp 35 36 21 22 23\n4 Chance 1234 1235 1236 1267 173\n5 Aimee 420 123 690 42 45\n6 Kyle 22 23 25 26 67\n# ℹ 1 more variable: test_2_run_3 <dbl>\n\n\n\n\nOther Help\nWhile searching for some solutions to my problem I found two good examples, however neither did exactly what I was trying to do.\n\nThis post by Lisa Deburine is pretty close to what I was trying to accomplish and gave me a good starting point. Read it here\nThis post by Alison Hill solves a simlar but slightly different problem. In her data the 2nd row is actually metadata not a second set of headers. Read it here\n\n\n\n\n\nReusehttps://creativecommons.org/licenses/by/4.0/CitationBibTeX citation:@online{belanger2020,\n author = {Belanger, Kyle},\n title = {Importing {Excel} {Data} with {Multiple} {Header} {Rows}},\n date = {2020-06-22},\n langid = {en}\n}\nFor attribution, please cite this work as:\nBelanger, Kyle. 2020. “Importing Excel Data with Multiple Header\nRows.” June 22, 2020."
+ },
+ {
+ "objectID": "posts/2021-02-26_tidytuesday-hbcu-enrollment/tidytuesday-2021-week-6-hbcu-enrolment.html",
+ "href": "posts/2021-02-26_tidytuesday-hbcu-enrollment/tidytuesday-2021-week-6-hbcu-enrolment.html",
+ "title": "TidyTuesday 2021 Week 6: HBCU Enrollment",
+ "section": "",
+ "text": "Introduction\nRecently I was struggling to find a data project to work on, I felt a bit stuck with some of my current projects, so I begun to scour the internet to find something to work on. I stumbled upon (TidyTuesday)[https://github.com/rfordatascience/tidytuesday] a weekly project where untidy data is posted from various sources, for the goal of practicing cleaning and visualizing. There is not right or wrong answers for TidyTuesday, this was exactly what I was looking for! This week (well by the time this was posted, a few weeks ago) the data set was about Historically Black Colleges and Universities. Within the posted data there were a few different data sets, I chose to work with the set dealing with High school Graduation rates, throughout this post I will explain my steps for cleaning and then present a few different graphs. It should also be noted that in the first section my code blocks will build upon themselves, so the same code will be duplicated as I add more steps to it.\n\n\nLoad Data\nIn this first block we will load some required libraries as well as load in the raw data. This dataset contains data for Highschool graduation rates by race. One thing to point out here is the use of import::from(), will its use here is a bit overkill, it was more for my practice. In this case I am importing the function %nin from the Hmisc package, which in the opposite of the function %in% from base R.\n\nlibrary(dplyr)\nlibrary(ggplot2)\n\nimport::from(Hmisc, `%nin%`)\n\nhs_students_raw <- readxl::read_xlsx(\"104.10.xlsx\", sheet = 1)\n\nglimpse(hs_students_raw)\n\nRows: 48\nColumns: 19\n$ Total <dbl> 1910…\n$ `Total, percent of all persons age 25 and over` <dbl> 13.5…\n$ `Standard Errors - Total, percent of all persons age 25 and over` <chr> \"(—)…\n$ White1 <chr> \"—\",…\n$ `Standard Errors - White1` <chr> \"(†)…\n$ Black1 <chr> \"—\",…\n$ `Standard Errors - Black1` <chr> \"(†)…\n$ Hispanic <chr> \"—\",…\n$ `Standard Errors - Hispanic` <chr> \"(†)…\n$ `Total - Asian/Pacific Islander` <chr> \"—\",…\n$ `Standard Errors - Total - Asian/Pacific Islander` <chr> \"(†)…\n$ `Asian/Pacific Islander - Asian` <chr> \"—\",…\n$ `Standard Errors - Asian/Pacific Islander - Asian` <chr> \"(†)…\n$ `Asian/Pacific Islander - Pacific Islander` <chr> \"—\",…\n$ `Standard Errors - Asian/Pacific Islander - Pacific Islander` <chr> \"(†)…\n$ `American Indian/\\r\\nAlaska Native` <chr> \"—\",…\n$ `Standard Errors - American Indian/\\r\\nAlaska Native` <chr> \"(†)…\n$ `Two or more race` <chr> \"—\",…\n$ `Standard Errors - Two or more race` <chr> \"(†)…\n\n\nNow we are going to start cleaning the data. First I am going to filter for years 1985 and up, prior to this year the data set is a bit spardic, so to keep it clean I am only going to look at 1985 and up. There are also 3 odd years (19103,19203,19303) that I am not sure what those are so I will remove that data as well.\n\nhs_students <- hs_students_raw %>% \n filter(Total >= 1985) %>% \n filter(Total %nin% c(19103, 19203, 19303))\n\nNext I am going to convert all columns to be numeric, because of some blanks in the original import all of the columns read in as characters instead of numeric.\n\nhs_students <- hs_students_raw %>% \n filter(Total >= 1985) %>% \n filter(Total %nin% c(19103, 19203, 19303)) %>% \n mutate(across(everything(), as.numeric))\n\nNext I am going to rename the columns. First I rename the column Total, into year, as this column holds the year! Then I use stringr::str_remove_all to remove the long phrase ‘percent of all persons age 25 and over’, as well as the number 1. For some reason the Black and White columns each have a number 1 at the end, I think this is for some sort of footnote but we will just remove it.\n\nhs_students <- hs_students_raw %>% \n filter(Total >= 1985) %>% \n filter(Total %nin% c(19103, 19203, 19303)) %>% \n mutate(across(everything(), as.numeric)) %>% \n rename(year = Total) %>% \n rename_with(\n ~stringr::str_remove_all(\n .\n ,\", percent of all persons age 25 and over|1\"\n )\n )\n\nThen I am going to drop the column ‘Total - Asian/Pacific Islander’, each of these races is stored in a seperate column so if I needed the total later for some reason I could calculate it. I am also going to drop the string “Asian/Pacific Islander -”, from the begin of each of those columns, so they will now tell me just which race each column refers too.\n\nhs_students <- hs_students_raw %>% \n filter(Total >= 1985) %>% \n filter(Total %nin% c(19103, 19203, 19303)) %>% \n mutate(across(everything(), as.numeric)) %>% \n rename(year = Total) %>% \n rename_with(\n ~stringr::str_remove_all(\n .\n ,\", percent of all persons age 25 and over|1\"\n )\n ) %>% \n select(-contains(\"Total - Asian/Pacific Islander\")) %>% \n rename_with(\n ~stringr::str_remove_all(\n .\n ,\"Asian/Pacific Islander - \"\n )\n )\n\nI now simply pivot the data longer. A nice trick I learned since I want to pivot everything expect the year column is to use the minus sign to select every column expect the year column in the pivot.\n\nhs_students_long <- hs_students %>% \n tidyr::pivot_longer(-year)\n\nWith the data now in long form I am going to separate the automatically generate name column into two columns titled, stat and race. The data contains both the percent that graduated and the standard error. Then I replace all the NA’s in the stat column with Total, as these are the total percentage and the other rows will be the standard error. Last I dropped the s from standard errors to make it singular.\n\nhs_students_long <- hs_students %>% \n tidyr::pivot_longer(-year) %>% \n tidyr::separate(name, c(\"stat\", \"race\"), sep = \"- \", fill = \"left\") %>% \n tidyr::replace_na(list(stat = \"Total\")) %>% \n mutate(\n across(\n stat\n ,~stringr::str_replace(\n .\n ,\"Standard Errors\"\n ,\"Standard Error\"\n )\n )\n )\n\nI know pivot the date back to wide form, and use the Janitor package to clean the column names. This puts them in lowercase with _ for spaces.\n\nhs_students_wide <- hs_students_long %>% \n tidyr::pivot_wider(names_from = stat, values_from = value) %>% \n janitor::clean_names()\n\nTo make graphing a bit easier with the scales package, I divide both columns by 100. We will see why in the graphs.\n\nhs_students_wide <- hs_students_long %>% \n tidyr::pivot_wider(names_from = stat, values_from = value) %>% \n janitor::clean_names() %>% \n mutate(across(total:standard_error, ~.x/100))\n\nIt’s now time to graph. Notice the use scales::label_percent() as the labels value for the y axis. If the numbers were left as the default values (75 vs 0.75) the percentages would have been 750%, which is obviously very wrong! I also use geom_ribbon to draw the standard error bars around each line. Notice the use of color = NA, by default the ribbon has outlines, I did not like this so doing color = NA turns them off. (It should be noted there are a few other solutions to turning them off but this seemed the easiest to me). Last we see the use of the aesthetics argument in scale_color_brewer. By setting this we match the color and fill to be the same color, without setting this, the colors of the error bars and lines don’t match!\n\nhs_students_wide <- hs_students_wide %>% \n mutate(\n ymax = total - standard_error\n ,ymin = total + standard_error\n )\n\ng1 <- hs_students_wide %>% \n filter(race != \"Total\") %>% \n ggplot(aes(x = year, y = total, group = race, color = race)) +\n geom_ribbon(aes(ymax = ymax, ymin = ymin, fill = race), alpha = 0.3, color = NA) +\n geom_line() +\n scale_x_continuous(breaks = seq(1985,2016,3)) +\n scale_y_continuous(labels = scales::label_percent()) +\n scale_color_brewer(palette = \"Dark2\", aesthetics = c(\"color\", \"fill\")) +\n theme_bw() +\n labs(\n x = NULL\n ,y = NULL\n ,title = glue::glue(\"Percentage of High School Graduates by Race\"\n ,\"\\n\"\n ,\"1985 - 2016\")\n ,color = \"Race\" \n ,fill = \"Race\"\n ) +\n theme(\n plot.title = element_text(hjust = 0.5)\n ,legend.title = element_text(hjust = 0.5)\n )\n \ng1\n\n\n\n\n\n\nLoad Male/Female Data\nNow the file also contains the same information but split by male and female. I am going to load in that data.\n\nmale_hs_raw <- readxl::read_excel(\"104.10.xlsx\", sheet = 3)\nfemale_hs_raw <- readxl::read_excel(\"104.10.xlsx\", sheet = 5)\n\nHere I will use the same manipulations as above, the only addition is adding a column for sex.\n\nmale_hs <- male_hs_raw %>% \n filter(Total >= 1985) %>% \n filter(Total %nin% c(19103, 19203, 19303)) %>% \n mutate(across(everything(), as.numeric)) %>% \n rename(year = Total) %>% \n rename_with(\n ~stringr::str_remove_all(\n .\n ,\", percent of all persons age 25 and over|1\"\n )\n ) %>% \n select(-contains(\"Total - Asian/Pacific Islander\")) %>% \n rename_with(\n ~stringr::str_remove_all(\n .\n ,\"Asian/Pacific Islander - \"\n )\n ) %>% \n tidyr::pivot_longer(-year) %>% \n tidyr::separate(name, c(\"stat\", \"race\"), sep = \"- \", fill = \"left\") %>% \n tidyr::replace_na(list(stat = \"Total\")) %>% \n mutate(\n across(\n stat\n ,~stringr::str_replace(\n .\n ,\"Standard Errors\"\n ,\"Standard Error\"\n )\n )\n ,sex = \"Male\"\n )\n\n\nfemale_hs <- female_hs_raw %>% \n filter(Total >= 1985) %>% \n filter(Total %nin% c(19103, 19203, 19303)) %>% \n mutate(across(everything(), as.numeric)) %>% \n rename(year = Total) %>% \n rename_with(\n ~stringr::str_remove_all(\n .\n ,\", percent of all persons age 25 and over|1\"\n )\n ) %>% \n select(-contains(\"Total - Asian/Pacific Islander\")) %>% \n rename_with(\n ~stringr::str_remove_all(\n .\n ,\"Asian/Pacific Islander - \"\n )\n ) %>% \n tidyr::pivot_longer(-year) %>% \n tidyr::separate(name, c(\"stat\", \"race\"), sep = \"- \", fill = \"left\") %>% \n tidyr::replace_na(list(stat = \"Total\")) %>% \n mutate(\n across(\n stat\n ,~stringr::str_replace(\n .\n ,\"Standard Errors\"\n ,\"Standard Error\"\n )\n )\n ,sex = \"Female\"\n )\n\nHere we will combine the two data frames and then pivot to our final graphing form.\n\nmale_female_hs_wide <- male_hs %>% \n bind_rows(female_hs) %>% \n tidyr::pivot_wider(names_from = stat, values_from = value) %>% \n janitor::clean_names() %>% \n mutate(across(total:standard_error, ~.x/100)) %>% \n mutate(\n ymax = total - standard_error\n ,ymin = total + standard_error\n )\n\nLets first graph the total for Male and Female graduation rates.\n\ng2 <- male_female_hs_wide %>% \n filter(race == \"Total\") %>% \n ggplot(aes(x = year, y = total, group = sex, color = sex)) +\n geom_ribbon(aes(ymax = ymax, ymin = ymin, fill = sex), alpha = 0.3, color = NA) +\n geom_line() +\n scale_x_continuous(breaks = seq(1985,2016,3)) +\n scale_y_continuous(labels = scales::label_percent()) +\n scale_color_brewer(palette = \"Dark2\", aesthetics = c(\"color\", \"fill\")) +\n theme_bw() +\n labs(\n x = NULL\n ,y = NULL\n ,title = glue::glue(\"Percentage of High School Graduates by Sex\"\n ,\"\\n\"\n ,\"1985 - 2016\")\n ,color = \"Sex\" \n ,fill = \"Sex\"\n ) +\n theme(\n plot.title = element_text(hjust = 0.5)\n ,legend.title = element_text(hjust = 0.5)\n )\n\ng2\n\n\n\n\nNow I am going to graph by Sex and Race.\n\nrace_filter <- c(\"White\", \"Black\", \"Hispanic\")\n\nmake_label <- function(label){\n # browser()\n result <- stringr::str_split(label, \"\\\\.\")\n unlist(lapply(result, function(x) paste(x[2],x[1])))\n}\n\n\ng2 <- male_female_hs_wide %>% \n filter(race %in% race_filter) %>% \n ggplot(aes(x = year, y = total, group = interaction(sex,race), color = interaction(sex,race))) +\n geom_ribbon(aes(ymax = ymax, ymin = ymin, fill = interaction(sex,race)), alpha = 0.3, color = NA) +\n geom_line() +\n scale_x_continuous(breaks = seq(1985,2016,3)) +\n scale_y_continuous(labels = scales::label_percent()) +\n scale_color_brewer(palette = \"Dark2\", aesthetics = c(\"color\", \"fill\"), labels = make_label) +\n theme_bw() +\n labs(\n x = NULL\n ,y = NULL\n ,title = glue::glue(\"Percentage of High School Graduates by Race and Sex\"\n ,\"\\n\"\n ,\"1985 - 2016\")\n ,color = \"Race & Sex\" \n ,fill = \"Race & Sex\"\n ) +\n theme(\n plot.title = element_text(hjust = 0.5)\n ,legend.title = element_text(hjust = 0.5)\n )\n\ng2\n\n\n\n\n\n\nConclusion\nWhile I am sure there is much more that could be done with this data this is where I am going to stop for today. Our graphs clearly show a divide in graduation rates by race, however Sex does not seem to have much of an effect on graduation rates.\n\n\n\n\nReusehttps://creativecommons.org/licenses/by/4.0/CitationBibTeX citation:@online{belanger2021,\n author = {Belanger, Kyle},\n title = {TidyTuesday 2021 {Week} 6: {HBCU} {Enrollment}},\n date = {2021-02-26},\n langid = {en}\n}\nFor attribution, please cite this work as:\nBelanger, Kyle. 2021. “TidyTuesday 2021 Week 6: HBCU\nEnrollment.” February 26, 2021."
+ },
+ {
+ "objectID": "posts/2021-01-12_blogdown-to-distill/creating-a-distill-blog.html",
+ "href": "posts/2021-01-12_blogdown-to-distill/creating-a-distill-blog.html",
+ "title": "Converting From Blogdown to Distill",
+ "section": "",
+ "text": "I have since converted this blog to a quarto blog, but am leaving this post up in case anyone finds it useful"
+ },
+ {
+ "objectID": "posts/2021-01-12_blogdown-to-distill/creating-a-distill-blog.html#code-folding",
+ "href": "posts/2021-01-12_blogdown-to-distill/creating-a-distill-blog.html#code-folding",
+ "title": "Converting From Blogdown to Distill",
+ "section": "Code Folding",
+ "text": "Code Folding\nWhen I converted my blog on 12/30/2020, code folding was not included as an option by default in distill. At that time, an excellent package called Codefolder added the functionality. Since going live with the blog, code folding has been added to distill.1 Code folding is available for either the whole document or individual code sections. The default caption is “Show Code”, but instead of typing code_folding=TRUE, you can provide a string to change the caption.\n\n# Some awesome code \n# That does awesome things"
+ },
+ {
+ "objectID": "posts/2021-01-12_blogdown-to-distill/creating-a-distill-blog.html#customizing-the-home-page",
+ "href": "posts/2021-01-12_blogdown-to-distill/creating-a-distill-blog.html#customizing-the-home-page",
+ "title": "Converting From Blogdown to Distill",
+ "section": "Customizing the Home Page",
+ "text": "Customizing the Home Page\nBy default, a distill blog’s home page will be the blog index page. I chose to edit my home page to be a landing page for myself and then have the blog index as a separate page. When creating a new blog, this is the default YAML header for your index page.\n---\ntitle: \"New Site\"\nsite: distill::distill_website\nlisting: posts\n---\nThe critical piece here is the line site: distill::distill_website. This line is what is needed to render the website. For my home page, I decided to use the package Postcard, which is used to generate simple landing pages. I won’t go into every step as there is already a great post by Alison Hill on how to do that. However, I will point out the most crucial part of the new index page the YAML header needs to contain these two lines.\noutput:\n postcards::trestles\nsite: distill::distill_website"
+ },
+ {
+ "objectID": "posts/2021-01-12_blogdown-to-distill/creating-a-distill-blog.html#footnotes",
+ "href": "posts/2021-01-12_blogdown-to-distill/creating-a-distill-blog.html#footnotes",
+ "title": "Converting From Blogdown to Distill",
+ "section": "Footnotes",
+ "text": "Footnotes\n\n\nNote that as of publishing, code folding is only available in the development version of distill↩︎"
}
]
\ No newline at end of file
diff --git a/posts/2021-01-12_blogdown-to-distill/creating-a-distill-blog.qmd b/posts/2021-01-12_blogdown-to-distill/creating-a-distill-blog.qmd
new file mode 100644
index 0000000..07a8ba6
--- /dev/null
+++ b/posts/2021-01-12_blogdown-to-distill/creating-a-distill-blog.qmd
@@ -0,0 +1,63 @@
+---
+title: "Converting From Blogdown to Distill"
+subtitle: |
+ A meta post on transferring from a blogdown to distill blog site
+date: 01-12-2021
+categories:
+ - Distill
+---
+
+# Authors Note
+
+I have since converted this blog to a quarto blog, but am leaving this post up in case anyone finds it useful
+
+# Introduction
+
+This metapost describes changing my personal blog from a blogdown site to a distill blog site. I will not be going over starting a site from scratch as there are already several great resources for doing this. What I will be going over is some of the challenges and some of the changes and tips I have found. If you are looking to start a site from scratch, check out these great resources:
+
+- The Distill for Rmarkdown page on creating a [blog](https://rstudio.github.io/distill/blog.html)
+- This excellent post from Shamindra Shrotriya on setting up a [blog](https://www.shamindras.com/posts/2019-07-11-shrotriya2019distillpt1/)
+- This post from the Mockup [blog](https://themockup.blog/posts/2020-08-01-building-a-blog-with-distill/)
+
+This last post goes into some of the pros and cons of using distill instead of blogdown. If you want simplicity, go with distill without much customization. If you want the ability for a lot of customization and don't mind being frustrated with Hugo, go with blogdown.
+
+# Challenges
+
+The good thing about switching from blogdown to distill was I had very few challenges! The distill documentation, combined with the two posts I listed, helped me with most of my troubles. The only issue I ran into was distill contains a function called `import_post()`, which according to the docs, only needs a published URL to work. I could never solve why, but I could not pull in the posts from my old blogdown site. This wasn't a big deal as I had the original rmarkdown documents, but this could pose an issue if you didn't.
+
+# Going Outside the Box
+
+## Code Folding
+
+When I converted my blog on 12/30/2020, code folding was not included as an option by default in distill. At that time, an excellent package called [Codefolder](https://github.com/ijlyttle/codefolder) added the functionality. Since going live with the blog, code folding has been added to distill.^[Note that as of publishing, code folding is only available in the development version of distill] Code folding is available for either the whole document or individual code sections. The default caption is "Show Code", but instead of typing `code_folding=TRUE`, you can provide a string to change the caption.
+
+```{r, code_folding="Lets See It",echo=TRUE}
+# Some awesome code
+# That does awesome things
+```
+
+## Customizing the Home Page
+
+By default, a distill blog's home page will be the blog index page. I chose to edit my home page to be a landing page for myself and then have the blog index as a separate page. When creating a new blog, this is the default YAML header for your index page.
+
+```{.yaml}
+---
+title: "New Site"
+site: distill::distill_website
+listing: posts
+---
+```
+
+The critical piece here is the line `site: distill::distill_website`. This line is what is needed to render the website. For my home page, I decided to use the package [Postcard](https://github.com/seankross/postcards), which is used to generate simple landing pages. I won't go into every step as there is already a great post by [Alison Hill](https://alison.rbind.io/post/2020-12-22-postcards-distill/) on how to do that. However, I will point out the most crucial part of the new index page the YAML header needs to contain these two lines.
+
+```{.yaml}
+output:
+ postcards::trestles
+site: distill::distill_website
+```
+
+# Final Thoughts
+
+I have enjoyed the simplicity of Distill. While not nearly as customizable as blogdown, getting a blog site up and running in under an hour is pretty lovely. I hope to keep exploring what Distill has to offer and keep posting my updates!
+
+
diff --git a/posts/2021-02-26_tidytuesday-hbcu-enrollment/104.10.xlsx b/posts/2021-02-26_tidytuesday-hbcu-enrollment/104.10.xlsx
new file mode 100644
index 0000000..1c61d7e
Binary files /dev/null and b/posts/2021-02-26_tidytuesday-hbcu-enrollment/104.10.xlsx differ
diff --git a/posts/2021-02-26_tidytuesday-hbcu-enrollment/tidytuesday-2021-week-6-hbcu-enrolment.qmd b/posts/2021-02-26_tidytuesday-hbcu-enrollment/tidytuesday-2021-week-6-hbcu-enrolment.qmd
new file mode 100644
index 0000000..21f49f3
--- /dev/null
+++ b/posts/2021-02-26_tidytuesday-hbcu-enrollment/tidytuesday-2021-week-6-hbcu-enrolment.qmd
@@ -0,0 +1,362 @@
+---
+title: "TidyTuesday 2021 Week 6: HBCU Enrollment"
+subtitle: |
+ TidyTuesday 2021 Week 6: HBCU Enrollment. Posts looks at tidying the data
+ ,as well as making some graphs about the data.
+date: 02-26-2021
+categories:
+ - TidyTuesday
+---
+
+# Introduction
+
+ Recently I was struggling to find a data project to work on, I felt a bit stuck with some of my current projects, so I begun to scour the internet to find something to work on. I stumbled upon (TidyTuesday)[https://github.com/rfordatascience/tidytuesday] a weekly project where untidy data is posted from various sources, for the goal of practicing cleaning and visualizing. There is not right or wrong answers for TidyTuesday, this was exactly what I was looking for! This week (well by the time this was posted, a few weeks ago) the data set was about Historically Black Colleges and Universities. Within the posted data there were a few different data sets, I chose to work with the set dealing with High school Graduation rates, throughout this post I will explain my steps for cleaning and then present a few different graphs. It should also be noted that in the first section my code blocks will build upon themselves, so the same code will be duplicated as I add more steps to it.
+
+
+# Load Data
+
+In this first block we will load some required libraries as well as load in the raw data. This dataset contains data for Highschool graduation rates by race. One thing to point out here is the use of `import::from()`, will its use here is a bit overkill, it was more for my practice. In this case I am importing the function `%nin` from the *Hmisc* package, which in the opposite of the function `%in%` from base R.
+
+```{r}
+library(dplyr)
+library(ggplot2)
+
+import::from(Hmisc, `%nin%`)
+
+hs_students_raw <- readxl::read_xlsx("104.10.xlsx", sheet = 1)
+
+glimpse(hs_students_raw)
+
+```
+
+Now we are going to start cleaning the data. First I am going to filter for years 1985 and up, prior to this year the data set is a bit spardic, so to keep it clean I am only going to look at 1985 and up. There are also 3 odd years (19103,19203,19303) that I am not sure what those are so I will remove that data as well.
+
+```{r}
+
+hs_students <- hs_students_raw %>%
+ filter(Total >= 1985) %>%
+ filter(Total %nin% c(19103, 19203, 19303))
+
+```
+
+Next I am going to convert all columns to be numeric, because of some blanks in the original import all of the columns read in as characters instead of numeric.
+
+```{r}
+
+hs_students <- hs_students_raw %>%
+ filter(Total >= 1985) %>%
+ filter(Total %nin% c(19103, 19203, 19303)) %>%
+ mutate(across(everything(), as.numeric))
+
+```
+
+Next I am going to rename the columns. First I rename the column Total, into year, as this column holds the year! Then I use `stringr::str_remove_all` to remove the long phrase 'percent of all persons age 25 and over', as well as the number 1. For some reason the Black and White columns each have a number 1 at the end, I think this is for some sort of footnote but we will just remove it.
+
+```{r}
+
+hs_students <- hs_students_raw %>%
+ filter(Total >= 1985) %>%
+ filter(Total %nin% c(19103, 19203, 19303)) %>%
+ mutate(across(everything(), as.numeric)) %>%
+ rename(year = Total) %>%
+ rename_with(
+ ~stringr::str_remove_all(
+ .
+ ,", percent of all persons age 25 and over|1"
+ )
+ )
+
+```
+
+Then I am going to drop the column 'Total - Asian/Pacific Islander', each of these races is stored in a seperate column so if I needed the total later for some reason I could calculate it. I am also going to drop the string "Asian/Pacific Islander - ", from the begin of each of those columns, so they will now tell me just which race each column refers too.
+
+```{r}
+
+hs_students <- hs_students_raw %>%
+ filter(Total >= 1985) %>%
+ filter(Total %nin% c(19103, 19203, 19303)) %>%
+ mutate(across(everything(), as.numeric)) %>%
+ rename(year = Total) %>%
+ rename_with(
+ ~stringr::str_remove_all(
+ .
+ ,", percent of all persons age 25 and over|1"
+ )
+ ) %>%
+ select(-contains("Total - Asian/Pacific Islander")) %>%
+ rename_with(
+ ~stringr::str_remove_all(
+ .
+ ,"Asian/Pacific Islander - "
+ )
+ )
+
+```
+
+I now simply pivot the data longer. A nice trick I learned since I want to pivot everything expect the year column is to use the minus sign to select every column expect the year column in the pivot.
+
+```{r}
+hs_students_long <- hs_students %>%
+ tidyr::pivot_longer(-year)
+```
+
+With the data now in long form I am going to separate the automatically generate name column into two columns titled, stat and race. The data contains both the percent that graduated and the standard error. Then I replace all the NA's in the stat column with Total, as these are the total percentage and the other rows will be the standard error. Last I dropped the s from standard errors to make it singular.
+
+```{r}
+
+hs_students_long <- hs_students %>%
+ tidyr::pivot_longer(-year) %>%
+ tidyr::separate(name, c("stat", "race"), sep = "- ", fill = "left") %>%
+ tidyr::replace_na(list(stat = "Total")) %>%
+ mutate(
+ across(
+ stat
+ ,~stringr::str_replace(
+ .
+ ,"Standard Errors"
+ ,"Standard Error"
+ )
+ )
+ )
+
+```
+
+I know pivot the date back to wide form, and use the *Janitor* package to clean the column names. This puts them in lowercase with _ for spaces.
+
+```{r}
+
+hs_students_wide <- hs_students_long %>%
+ tidyr::pivot_wider(names_from = stat, values_from = value) %>%
+ janitor::clean_names()
+
+
+```
+
+To make graphing a bit easier with the *scales* package, I divide both columns by 100. We will see why in the graphs.
+
+```{r}
+
+hs_students_wide <- hs_students_long %>%
+ tidyr::pivot_wider(names_from = stat, values_from = value) %>%
+ janitor::clean_names() %>%
+ mutate(across(total:standard_error, ~.x/100))
+
+
+```
+
+It's now time to graph. Notice the use `scales::label_percent()` as the labels value for the y axis. If the numbers were left as the default values (75 vs 0.75) the percentages would have been 750%, which is obviously very wrong! I also use geom_ribbon to draw the standard error bars around each line. Notice the use of color = NA, by default the ribbon has outlines, I did not like this so doing color = NA turns them off. (It should be noted there are a few other solutions to turning them off but this seemed the easiest to me). Last we see the use of the aesthetics argument in scale_color_brewer. By setting this we match the color and fill to be the same color, without setting this, the colors of the error bars and lines don't match!
+
+```{r}
+
+hs_students_wide <- hs_students_wide %>%
+ mutate(
+ ymax = total - standard_error
+ ,ymin = total + standard_error
+ )
+
+g1 <- hs_students_wide %>%
+ filter(race != "Total") %>%
+ ggplot(aes(x = year, y = total, group = race, color = race)) +
+ geom_ribbon(aes(ymax = ymax, ymin = ymin, fill = race), alpha = 0.3, color = NA) +
+ geom_line() +
+ scale_x_continuous(breaks = seq(1985,2016,3)) +
+ scale_y_continuous(labels = scales::label_percent()) +
+ scale_color_brewer(palette = "Dark2", aesthetics = c("color", "fill")) +
+ theme_bw() +
+ labs(
+ x = NULL
+ ,y = NULL
+ ,title = glue::glue("Percentage of High School Graduates by Race"
+ ,"\n"
+ ,"1985 - 2016")
+ ,color = "Race"
+ ,fill = "Race"
+ ) +
+ theme(
+ plot.title = element_text(hjust = 0.5)
+ ,legend.title = element_text(hjust = 0.5)
+ )
+
+g1
+
+
+```
+
+# Load Male/Female Data
+
+Now the file also contains the same information but split by male and female. I am going to load in that data.
+
+```{r}
+male_hs_raw <- readxl::read_excel("104.10.xlsx", sheet = 3)
+female_hs_raw <- readxl::read_excel("104.10.xlsx", sheet = 5)
+
+```
+
+Here I will use the same manipulations as above, the only addition is adding a column for sex.
+
+```{r}
+
+male_hs <- male_hs_raw %>%
+ filter(Total >= 1985) %>%
+ filter(Total %nin% c(19103, 19203, 19303)) %>%
+ mutate(across(everything(), as.numeric)) %>%
+ rename(year = Total) %>%
+ rename_with(
+ ~stringr::str_remove_all(
+ .
+ ,", percent of all persons age 25 and over|1"
+ )
+ ) %>%
+ select(-contains("Total - Asian/Pacific Islander")) %>%
+ rename_with(
+ ~stringr::str_remove_all(
+ .
+ ,"Asian/Pacific Islander - "
+ )
+ ) %>%
+ tidyr::pivot_longer(-year) %>%
+ tidyr::separate(name, c("stat", "race"), sep = "- ", fill = "left") %>%
+ tidyr::replace_na(list(stat = "Total")) %>%
+ mutate(
+ across(
+ stat
+ ,~stringr::str_replace(
+ .
+ ,"Standard Errors"
+ ,"Standard Error"
+ )
+ )
+ ,sex = "Male"
+ )
+
+
+female_hs <- female_hs_raw %>%
+ filter(Total >= 1985) %>%
+ filter(Total %nin% c(19103, 19203, 19303)) %>%
+ mutate(across(everything(), as.numeric)) %>%
+ rename(year = Total) %>%
+ rename_with(
+ ~stringr::str_remove_all(
+ .
+ ,", percent of all persons age 25 and over|1"
+ )
+ ) %>%
+ select(-contains("Total - Asian/Pacific Islander")) %>%
+ rename_with(
+ ~stringr::str_remove_all(
+ .
+ ,"Asian/Pacific Islander - "
+ )
+ ) %>%
+ tidyr::pivot_longer(-year) %>%
+ tidyr::separate(name, c("stat", "race"), sep = "- ", fill = "left") %>%
+ tidyr::replace_na(list(stat = "Total")) %>%
+ mutate(
+ across(
+ stat
+ ,~stringr::str_replace(
+ .
+ ,"Standard Errors"
+ ,"Standard Error"
+ )
+ )
+ ,sex = "Female"
+ )
+
+```
+
+Here we will combine the two data frames and then pivot to our final graphing form.
+
+```{r}
+
+male_female_hs_wide <- male_hs %>%
+ bind_rows(female_hs) %>%
+ tidyr::pivot_wider(names_from = stat, values_from = value) %>%
+ janitor::clean_names() %>%
+ mutate(across(total:standard_error, ~.x/100)) %>%
+ mutate(
+ ymax = total - standard_error
+ ,ymin = total + standard_error
+ )
+
+
+```
+
+Lets first graph the total for Male and Female graduation rates.
+
+```{r}
+
+
+
+g2 <- male_female_hs_wide %>%
+ filter(race == "Total") %>%
+ ggplot(aes(x = year, y = total, group = sex, color = sex)) +
+ geom_ribbon(aes(ymax = ymax, ymin = ymin, fill = sex), alpha = 0.3, color = NA) +
+ geom_line() +
+ scale_x_continuous(breaks = seq(1985,2016,3)) +
+ scale_y_continuous(labels = scales::label_percent()) +
+ scale_color_brewer(palette = "Dark2", aesthetics = c("color", "fill")) +
+ theme_bw() +
+ labs(
+ x = NULL
+ ,y = NULL
+ ,title = glue::glue("Percentage of High School Graduates by Sex"
+ ,"\n"
+ ,"1985 - 2016")
+ ,color = "Sex"
+ ,fill = "Sex"
+ ) +
+ theme(
+ plot.title = element_text(hjust = 0.5)
+ ,legend.title = element_text(hjust = 0.5)
+ )
+
+g2
+```
+
+Now I am going to graph by Sex and Race.
+
+```{r}
+
+race_filter <- c("White", "Black", "Hispanic")
+
+make_label <- function(label){
+ # browser()
+ result <- stringr::str_split(label, "\\.")
+ unlist(lapply(result, function(x) paste(x[2],x[1])))
+}
+
+
+g2 <- male_female_hs_wide %>%
+ filter(race %in% race_filter) %>%
+ ggplot(aes(x = year, y = total, group = interaction(sex,race), color = interaction(sex,race))) +
+ geom_ribbon(aes(ymax = ymax, ymin = ymin, fill = interaction(sex,race)), alpha = 0.3, color = NA) +
+ geom_line() +
+ scale_x_continuous(breaks = seq(1985,2016,3)) +
+ scale_y_continuous(labels = scales::label_percent()) +
+ scale_color_brewer(palette = "Dark2", aesthetics = c("color", "fill"), labels = make_label) +
+ theme_bw() +
+ labs(
+ x = NULL
+ ,y = NULL
+ ,title = glue::glue("Percentage of High School Graduates by Race and Sex"
+ ,"\n"
+ ,"1985 - 2016")
+ ,color = "Race & Sex"
+ ,fill = "Race & Sex"
+ ) +
+ theme(
+ plot.title = element_text(hjust = 0.5)
+ ,legend.title = element_text(hjust = 0.5)
+ )
+
+g2
+
+```
+
+# Conclusion
+
+While I am sure there is much more that could be done with this data this is where I am going to stop for today. Our graphs clearly show a divide in graduation rates by race, however Sex does not seem to have much of an effect on graduation rates.
+
+
+