<p>Recently I tried to important some Microsoft Excel data into R, and ran into an issue were the data actually had two different header rows. The top row listed a group, and then the second row listed a category within that group. Searching goggle I couldn’t really find a good example of what I was looking for, so I am putting it here in hopes of helping someone else!</p>
</section>
<sectionid="example-data"class="level1">
<h1>Example Data</h1>
<p>I have created a small Excel file to demonstrate what I am talking about. Download it <ahref="https://github.com/mmmmtoasty19/kyleb/tree/master/content/post/2020-06-15-importing-excel-data-with-multiple-headers/example_data.xlsx">here</a>. This is the data from Excel. <imgsrc="example_data_img1.png"class="img-fluid"alt="image of example data"></p>
</section>
<sectionid="check-data"class="level1">
<h1>Check Data</h1>
<p>First we will read the file in using the package readxl and view the data without doing anything special to it.</p>
<divclass="cell">
<divclass="sourceCode cell-code"id="cb1"><preclass="sourceCode r code-with-copy"><codeclass="sourceCode r"><spanid="cb1-1"><ahref="#cb1-1"aria-hidden="true"tabindex="-1"></a><spanclass="fu">library</span>(readxl) <spanclass="co"># load the readxl library</span></span>
<spanid="cb1-2"><ahref="#cb1-2"aria-hidden="true"tabindex="-1"></a><spanclass="fu">library</span>(tidyverse) <spanclass="co"># load the tidyverse for manipulating the data</span></span>
<spanid="cb1-3"><ahref="#cb1-3"aria-hidden="true"tabindex="-1"></a>file_path <spanclass="ot"><-</span><spanclass="st">"example_data.xlsx"</span><spanclass="co"># set the file path</span></span>
<spanid="cb1-4"><ahref="#cb1-4"aria-hidden="true"tabindex="-1"></a>ds0 <spanclass="ot"><-</span><spanclass="fu">read_excel</span>(file_path) <spanclass="co"># read the file</span></span>
<spanid="cb1-5"><ahref="#cb1-5"aria-hidden="true"tabindex="-1"></a>ds0</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<p>First lets read back the data, this time however with some options. We will set the n_max equal to 2, to only read the first two rows, and set col_names to FALSE so we do not read the first row as headers.</p>
<divclass="cell">
<divclass="sourceCode cell-code"id="cb3"><preclass="sourceCode r code-with-copy"><codeclass="sourceCode r"><spanid="cb3-1"><ahref="#cb3-1"aria-hidden="true"tabindex="-1"></a>ds1 <spanclass="ot"><-</span><spanclass="fu">read_excel</span>(file_path, <spanclass="at">n_max =</span><spanclass="dv">2</span>, <spanclass="at">col_names =</span><spanclass="cn">FALSE</span>)</span>
<spanid="cb3-2"><ahref="#cb3-2"aria-hidden="true"tabindex="-1"></a>ds1</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<p>Now that we have our headers lets first transpose them to a vertical matrix using the base function t(), then we will turn it back into a tibble to allow us to use tidyr fill function.</p>
<divclass="cell">
<divclass="sourceCode cell-code"id="cb5"><preclass="sourceCode r code-with-copy"><codeclass="sourceCode r"><spanid="cb5-1"><ahref="#cb5-1"aria-hidden="true"tabindex="-1"></a>names <spanclass="ot"><-</span> ds1 <spanclass="sc">%>%</span></span>
<spanid="cb5-2"><ahref="#cb5-2"aria-hidden="true"tabindex="-1"></a><spanclass="fu">t</span>() <spanclass="sc">%>%</span><spanclass="co">#transpose to a matrix</span></span>
<spanid="cb5-3"><ahref="#cb5-3"aria-hidden="true"tabindex="-1"></a><spanclass="fu">as_tibble</span>() <spanclass="co">#back to tibble</span></span>
<spanid="cb5-4"><ahref="#cb5-4"aria-hidden="true"tabindex="-1"></a>names</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<divclass="cell-output cell-output-stdout">
<pre><code># A tibble: 7 × 2
V1 V2
<chr><chr>
1 Name <NA>
2 Test 1 Run 1
3 <NA> Run 2
4 <NA> Run 3
5 Test 2 Run 1
6 <NA> Run 2
7 <NA> Run 3</code></pre>
</div>
</div>
<p>Note that tidyr fill can not work row wise, thus the need to flip the tibble so it is long vs wide.</p>
<p>Now we use tidyr fill function to fill the NA’s with whatever value it finds above.</p>
<divclass="cell">
<divclass="sourceCode cell-code"id="cb7"><preclass="sourceCode r code-with-copy"><codeclass="sourceCode r"><spanid="cb7-1"><ahref="#cb7-1"aria-hidden="true"tabindex="-1"></a>names <spanclass="ot"><-</span> names <spanclass="sc">%>%</span><spanclass="fu">fill</span>(V1) <spanclass="co">#use dplyr fill to fill in the NA's</span></span>
<spanid="cb7-2"><ahref="#cb7-2"aria-hidden="true"tabindex="-1"></a>names</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<p>This is where my data differed from many of the examples I could find online. Because the second row is also a header we can not just get rid of them. We can solve this using paste() combined with dplyr mutate to form a new column that combines the first and second column.</p>
<divclass="cell">
<divclass="sourceCode cell-code"id="cb9"><preclass="sourceCode r code-with-copy"><codeclass="sourceCode r"><spanid="cb9-1"><ahref="#cb9-1"aria-hidden="true"tabindex="-1"></a>names <spanclass="ot"><-</span> names <spanclass="sc">%>%</span></span>
<spanid="cb9-5"><ahref="#cb9-5"aria-hidden="true"tabindex="-1"></a>names</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<p>One more small clean up task, in the example data the first column header Name, did not have a second label, this has created a name with an NA attached. We can use stringr to remove this NA.</p>
<divclass="cell">
<divclass="sourceCode cell-code"id="cb11"><preclass="sourceCode r code-with-copy"><codeclass="sourceCode r"><spanid="cb11-1"><ahref="#cb11-1"aria-hidden="true"tabindex="-1"></a>names <spanclass="ot"><-</span> names <spanclass="sc">%>%</span><spanclass="fu">mutate</span>(<spanclass="fu">across</span>(new_names, <spanclass="sc">~</span><spanclass="fu">str_remove_all</span>(.,<spanclass="st">"_NA"</span>)))</span>
<spanid="cb11-2"><ahref="#cb11-2"aria-hidden="true"tabindex="-1"></a>names</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<p>Now that are new name column is the way we want it, we can use dpylrs pull to return a vector of just that column</p>
<divclass="cell">
<divclass="sourceCode cell-code"id="cb13"><preclass="sourceCode r code-with-copy"><codeclass="sourceCode r"><spanid="cb13-1"><ahref="#cb13-1"aria-hidden="true"tabindex="-1"></a>names <spanclass="ot"><-</span> names <spanclass="sc">%>%</span><spanclass="fu">pull</span>(new_names)</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
</div>
</section>
</section>
<sectionid="final-data"class="level1">
<h1>Final Data</h1>
<p>Now that we have a vector of column names lets read in the original file using our new names. We set the skip argument to 2, to skip the first two rows, and set col_names equal to our vector of names. Note the last step I used the janitor package to provide names in snake case (the default for the clean names function.)</p>
<spanid="cb14-3"><ahref="#cb14-3"aria-hidden="true"tabindex="-1"></a>example_data</span></code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre></div>
<divclass="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 7
name test_1_run_1 test_1_run_2 test_1_run_3 test_2_run_1 test_2_run_2
# ℹ 1 more variable: test_2_run_3 <dbl></code></pre>
</div>
</div>
</section>
<sectionid="other-help"class="level1">
<h1>Other Help</h1>
<p>While searching for some solutions to my problem I found two good examples, however neither did exactly what I was trying to do.</p>
<oltype="1">
<li><p>This post by Lisa Deburine is pretty close to what I was trying to accomplish and gave me a good starting point. Read it <ahref="https://debruine.github.io/posts/multi-row-headers/">here</a></p></li>
<li><p>This post by Alison Hill solves a simlar but slightly different problem. In her data the 2nd row is actually metadata not a second set of headers. Read it <ahref="https://alison.rbind.io/post/2018-02-23-read-multiple-header-rows/">here</a></p></li>
title = {Importing {Excel} {Data} with {Multiple} {Header} {Rows}},
date = {2020-06-22},
langid = {en}
}
</code><buttontitle="Copy to Clipboard"class="code-copy-button"><iclass="bi"></i></button></pre><divclass="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><divid="ref-belanger2020"class="csl-entry quarto-appendix-citeas"role="listitem">
Belanger, Kyle. 2020. <span>“Importing Excel Data with Multiple Header
// Inspect non-navigation links and adorn them if external
var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool)');
for (var i=0; i<links.length;i++){
const link = links[i];
if (!isInternal(link.href)) {
// undo the damage that might have been done by quarto-nav.js in the case of
// links that we want to consider external
if (link.dataset.originalHref !== undefined) {
link.href = link.dataset.originalHref;
}
}
}
function tippyHover(el, contentFn, onTriggerFn, onUntriggerFn) {