I’ve been using RMarkdown notebooks pretty much ever since they were
introduced. Being able to
weave in code chunks with output and full prose, plus being able to come
back to a notebook and have all the output still visible without having
to rerun chunks, was a no-brainer for me. It’s been so reflexive a format for
me that when I see junior analysts with a
html_document output type
selection I assume it’s a typo and correct them on what must surely have
been an error on there part. But perhaps my use of the format has been a bit
too reflexive and has become dogmatic?
In the span of two short weeks I had two major uses of
html_document pop up
that took me by surprise. The first was in getting myself familiar with the
codebase and workflows at my wonderful new dayjob.
In reviewing existing workflow, I noticed a lot of
html_document and was
introduced to a pattern of saving graphics out for incorporating into reports.
This tickled a memory and I went out and re-discovered Bob Rudis’ post on
publishing documents out to Googledrive,
a workflow very similar to what I was now doing as well. It turns out that Bob
also uses the
html_document format. I have mad respect for Bob, though
I’ve never grokked his dislike of notebooks.
(Will buy $BEVERAGES for a discussion on this at some point!).
Faced with two big uses of
html_document specifically around a workflow I
needed to support, I went back to basics and documented my requirements:
- Mix long-form text with code (usually, but not always, R)
- Be able to share a document with non R users for reviewing document outlines and research directions
- Save out pre-publication quality graphics in both PDF and PNG format for sharing, insertion, and final tweaks in layout software
- Review cached chunks in documents without having to re-run documents every time
Surprisingly, this doesn’t require my beloved
html_notebook format. RStudio
helpfully will insert the output of any evaluated code chunk inline into a
document by default (though some change their configs to continue to dump to
console). The notebook format itself is not required for this handy feature.
What a notebook format does get you is a
html (specifically a
file that is auto-generated every time the source Rmd is saved. This is a big
win for me over having to remember to knit documents all the time before
committing to a repository.
.nb.html document is fully self-contained and can be opened on
its own (without any of the other RStudio Project files), and the generating
Rmd file will be auto-recreated. The Rmd will have the evaluated code chunks
still displayed inline, just like the last state of the Rmd file. This part is
super handy for picking up work that another analyst has been performing. I can
just open the notebook file and see the graphs and dataframes of their output
without having to do a lot of costly processing on my end. If you use
html_document (or any other markdown flavor), RStudio has to cache the chunk
output in your local .Rproj.user directory, which another user won’t have.
This results in someone opening an
html_document only markdown file and not
having any of that nifty cached output.
There is one downside of notebooks for my workflow, and it’s a doozy. The
googledrive process that Bob outlines in his blog post
requires that visuals are saved out to a stand alone directory, which are
then uploaded to a Google Drive location. Only the
html_document format can
fig.path option. This is by design as having images in a separate
directory is directly against the fully self-contained goal of notebooks.
I’ve tried to look into various
knitr hooks to recreate this self-contained
and external-save approach and haven’t found an entirely satisfactory
approach. I currently stick an additional output format of
my templates. Even when the notebook format is not the default, RStudio will
still generate the
nb.html file. So I can have
html_document as my default,
work with the notebooks as my primary format, then knit to HTML when I’m ready to
I don’t love this approach with it’s dependence on my remembering to knit on a regular basis, but my team has a small helper function to save graphics to an external directory that scratches part of my itch. I’d love to hear from others if there’s a more elegant approach!