Notebooks A-Go-Go

I’ve been using RMarkdown notebooks pretty much ever since they were introduced. Being able to weave in code chunks with output and full prose, plus being able to come back to a notebook and have all the output still visible without having to rerun chunks, was a no-brainer for me. It’s been so reflexive a format for me that when I see junior analysts with a html_document output type selection I assume it’s a typo and correct them on what must surely have been an error on there part. But perhaps my use of the format has been a bit too reflexive and has become dogmatic?

In the span of two short weeks I had two major uses of html_document pop up that took me by surprise. The first was in getting myself familiar with the codebase and workflows at my wonderful new dayjob. In reviewing existing workflow, I noticed a lot of html_document and was introduced to a pattern of saving graphics out for incorporating into reports. This tickled a memory and I went out and re-discovered Bob Rudis’ post on publishing documents out to Googledrive, a workflow very similar to what I was now doing as well. It turns out that Bob also uses the html_document format. I have mad respect for Bob, though I’ve never grokked his dislike of notebooks. (Will buy $BEVERAGES for a discussion on this at some point!).

Faced with two big uses of html_document specifically around a workflow I needed to support, I went back to basics and documented my requirements:

Mix long-form text with code (usually, but not always, R)
Be able to share a document with non R users for reviewing document outlines and research directions
Save out pre-publication quality graphics in both PDF and PNG format for sharing, insertion, and final tweaks in layout software
Review cached chunks in documents without having to re-run documents every time

Surprisingly, this doesn’t require my beloved html_notebook format. RStudio helpfully will insert the output of any evaluated code chunk inline into a document by default (though some change their configs to continue to dump to console). The notebook format itself is not required for this handy feature. What a notebook format does get you is a html (specifically a .nb.html) file that is auto-generated every time the source Rmd is saved. This is a big win for me over having to remember to knit documents all the time before committing to a repository.

Additionally, a .nb.html document is fully self-contained and can be opened on its own (without any of the other RStudio Project files), and the generating Rmd file will be auto-recreated. The Rmd will have the evaluated code chunks still displayed inline, just like the last state of the Rmd file. This part is super handy for picking up work that another analyst has been performing. I can just open the notebook file and see the graphs and dataframes of their output without having to do a lot of costly processing on my end. If you use html_document (or any other markdown flavor), RStudio has to cache the chunk output in your local .Rproj.user directory, which another user won’t have. This results in someone opening an html_document only markdown file and not having any of that nifty cached output.

There is one downside of notebooks for my workflow, and it’s a doozy. The nifty knitr and googledrive process that Bob outlines in his blog post requires that visuals are saved out to a stand alone directory, which are then uploaded to a Google Drive location. Only the html_document format can handle the fig.path option. This is by design as having images in a separate directory is directly against the fully self-contained goal of notebooks.

I’ve tried to look into various knitr hooks to recreate this self-contained and external-save approach and haven’t found an entirely satisfactory approach. I currently stick an additional output format of html_notebook on my templates. Even when the notebook format is not the default, RStudio will still generate the nb.html file. So I can have html_document as my default, work with the notebooks as my primary format, then knit to HTML when I’m ready to share graphics.

I don’t love this approach with it’s dependence on my remembering to knit on a regular basis, but my team has a small helper function to save graphics to an external directory that scratches part of my itch. I’d love to hear from others if there’s a more elegant approach!