I’ve been using RMarkdown notebooks pretty much ever since they were
introduced. Being able to
weave in code chunks with output and full prose, plus being able to come
back to a notebook and have all the output still visible without having
to rerun chunks, was a no-brainer for me. It’s been so reflexive a format for
me that when I see junior analysts with a html_document
output type
selection I assume it’s a typo and correct them on what must surely have
been an error on there part. But perhaps my use of the format has been a bit
too reflexive and has become dogmatic?
In the span of two short weeks I had two major uses of html_document
pop up
that took me by surprise. The first was in getting myself familiar with the
codebase and workflows at my wonderful new dayjob.
In reviewing existing workflow, I noticed a lot of html_document
and was
introduced to a pattern of saving graphics out for incorporating into reports.
This tickled a memory and I went out and re-discovered Bob Rudis’ post on
publishing documents out to Googledrive,
a workflow very similar to what I was now doing as well. It turns out that Bob
also uses the html_document
format. I have mad respect for Bob, though
I’ve never grokked his dislike of notebooks.
(Will buy $BEVERAGES for a discussion on this at some point!).
Faced with two big uses of html_document
specifically around a workflow I
needed to support, I went back to basics and documented my requirements:
- Mix long-form text with code (usually, but not always, R)
- Be able to share a document with non R users for reviewing document outlines and research directions
- Save out pre-publication quality graphics in both PDF and PNG format for sharing, insertion, and final tweaks in layout software
- Review cached chunks in documents without having to re-run documents every time
Surprisingly, this doesn’t require my beloved html_notebook
format. RStudio
helpfully will insert the output of any evaluated code chunk inline into a
document by default (though some change their configs to continue to dump to
console). The notebook format itself is not required for this handy feature.
What a notebook format does get you is a html
(specifically a .nb.html
)
file that is auto-generated every time the source Rmd is saved. This is a big
win for me over having to remember to knit documents all the time before
committing to a repository.
Additionally, a .nb.html
document is fully self-contained and can be opened on
its own (without any of the other RStudio Project files), and the generating
Rmd file will be auto-recreated. The Rmd will have the evaluated code chunks
still displayed inline, just like the last state of the Rmd file. This part is
super handy for picking up work that another analyst has been performing. I can
just open the notebook file and see the graphs and dataframes of their output
without having to do a lot of costly processing on my end. If you use
html_document
(or any other markdown flavor), RStudio has to cache the chunk
output in your local .Rproj.user directory, which another user won’t have.
This results in someone opening an html_document
only markdown file and not
having any of that nifty cached output.
There is one downside of notebooks for my workflow, and it’s a doozy. The
nifty knitr
and googledrive
process that Bob outlines in his blog post
requires that visuals are saved out to a stand alone directory, which are
then uploaded to a Google Drive location. Only the html_document
format can
handle the fig.path
option. This is by design as having images in a separate
directory is directly against the fully self-contained goal of notebooks.
I’ve tried to look into various knitr
hooks to recreate this self-contained
and external-save approach and haven’t found an entirely satisfactory
approach. I currently stick an additional output format of html_notebook
on
my templates. Even when the notebook format is not the default, RStudio will
still generate the nb.html
file. So I can have html_document
as my default,
work with the notebooks as my primary format, then knit to HTML when I’m ready to
share graphics.
I don’t love this approach with it’s dependence on my remembering to knit on a regular basis, but my team has a small helper function to save graphics to an external directory that scratches part of my itch. I’d love to hear from others if there’s a more elegant approach!