Opinionated Analysis Development | |||
Opinionated Approach | Question Addressed | Tool1 | Section1 |
---|---|---|---|
Literate programming | Can a second analyst easily understand your code? | Quarto | Literate Programming |
Source: Parker, Hilary. n.d. “Opinionated Analysis Development.” https://doi.org/10.7287/peerj.preprints.3210v1. | |||
1 Added by Aaron R. Williams |
4 Quarto
4.1 Literate (Statistical) Programming
Donald Knuth, the creator of TeX and LaTeX, is a founder of literate programming. Here is Donald Knuth’s motivation according to Donald Knuth:
Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do. ~ (Knuth 1984)
4.1.1 Example
We used a linear model because there is reason to believe that the population model is linear. The observations are independent and the errors are independently and identically distributed with an approximately normal distribution.
Call:
lm(formula = dist ~ speed, data = cars)
Coefficients:
(Intercept) speed
-17.579 3.932
An increase in travel speed of one mile per hour is associated with a 3.93 foot increase in stopping distance on average.
4.2 Quarto
4.2.1 Why Quarto
Quarto is a literate statistical programming tool for R, Julia, Python, JavaScript, and more that was released by Posit in 2022. Quarto is an important tool for reproducible research. It combines narrative text with styles, code, and the output of code and can be used to create many types of documents including PDFs, html websites, slides, and more.
Quarto builds on the success of R Markdown. In fact, Quarto will Render R Markdown (.Rmd
) documents without any edits or changes.
Jupyter (Julia, Python, and R) is a competing framework that is popular for Python but has not caught on for R.
According to Wickham and Grolemund (2016) Chapter 27, there are three main reasons to use R Markdown (they hold for Quarto) :
- “For communicating to decision makers, who want to focus on the conclusions, not the code behind the analysis.”
- “For collaborating with other data scientists (including future you!), who are interested in both your conclusions, and how you reached them (i.e. the code).”
- “As an environment in which to do data science, as a modern day lab notebook where you can capture not only what you did, but also what you were thinking.”
4.2.2 Quarto Process
Quarto uses
.qmd
files can be used interactively and they can be used as executables by clicking Render. Clicking the “Render” button starts the rendering process.
- Quarto opens a fresh R session and runs all code from scratch. Quarto calls
library(knitr)
and “knits”.qmd
(Quarto files) into.md
(Markdown files). - Pandoc converts the
.md
file into any specified output type.
Quarto and library(knitr)
don’t need to be explicitly loaded and most of the magic happens “under the hood.”
Source: Quarto website
Quarto, library(knitr)
, and Pandoc are all installed with RStudio.2
The “Render” workflow has a few advantages:
- All code is rerun in a clean environment when “Rendering”. This ensures that the code runs in order and is reproducible.
- It is easier to document code than with inline comments.
- The output types are really appealing. By creating publishable documents with code, there is no need to copy-and-paste or transpose results.
- The process is iterable and scalable.
4.3 Three Ingredients in a .qmd
- YAML header
- Markdown text
- Code chunks
4.3.1 1. YAML header
YAML stands for “yet another markup language.” The YAML header contains meta information about the document including output type, document settings, and parameters that can be passed to the document. The YAML header starts with ---
and ends with ---
.
Here is the simplest YAML header for a PDF document:
---
format: html
---
YAML headers can contain many output specific settings. This YAML header creates an HTML document with code folding and a floating table of contents:
---
format:
html:
code-fold: true
toc: true
---
4.3.2 2. Markdown text
Markdown is a shortcut for Hyper Text Markup Language (HTML). Essentially, simple meta characters corresponding to formatting are added to plain text.
Titles and subtitltes
------------------------------------------------------------
# Title 1
## Title 2
### Title 3
Text formatting
------------------------------------------------------------
*italic*
**bold**
`code`
Lists
------------------------------------------------------------
- Bulleted list item 1
- Item 2
- Item 2a
- Item 2b
1. Item 1
2. Item 2
Links and images
------------------------------------------------------------
[text](http://link.com)
![Image caption](images/image.png)
4.3.3 3. Code chunks
More frequently, code is added in code chunks:
The first argument inline or in a code chunk is the language engine. Most commonly, this will just be a lower case r
. knitr
allows for many different language engines:
- R
- Julia
- Python
- SQL
- Bash
- Rcpp
- Stan
- Javascript
- CSS
Quarto has a rich set of options that go inside of the chunks and control the behavior of Quarto.
In this case, eval
makes the code not run. Other chunk-specific settings can be added inside the brackets. Here3 are the most important options:
Option | Effect |
---|---|
echo: false |
Hides code in output |
eval: false |
Turns off evaluation |
output: false |
Hides code output |
warning: false |
Turns off warnings |
message: false |
Turns off messages |
fig-height: 8 |
Changes figure width in inches4 |
fig-width: 8 |
Changes figure height in inches5 |
4.4 Organizing a Quarto Document
It is important to clearly organize a Quarto document and the constellation of files that typically support an analysis.
- Always use
.Rproj
files. - Use sub-directories to sort images,
.css
, data.
4.5 Outputs
The Quarto gallery contains many example outputs.
4.5.1 HTML Notebooks and Reports
---
format: html
---
4.5.2 PDF Notebooks, Reports, and Journal Articles
---
format: pdf
---
- This blog details how Cameron Patrick wrote their PhD thesis in Quarto.
4.5.3 MS Word Reports
---
format: docx
---
4.5.4 Presentations
format:
revealjs:
css: styles.css
incremental: true
reveal_options:
slideNumber: true
previewLinks: true
---
4.5.5 Websites
4.5.6 Books
Bookdown is an R package by Yihui Xie for authoring books in R Markdown. Many books, including the first edition of R for Data Science (Wickham and Grolemund 2016), have been written in Quarto.
Quarto book replaces bookdown. It is oriented around Quarto projects. The second edition of R for Data Science (Wickham, Çetinkaya-Rundel, and Grolemund 2023) was written in Quarto.
4.5.7 GitHub README
---
format: gfm
---
4.5.8 Fact Sheets and Fact Pages
An alternative to rendering a Quarto document with the Render button is to use the quarto::render()
function. This allows for iterating the rendering of documents which is particularly useful for the development of fact sheets and fact pages. The next chapter of the book expands on this use case.
- The Urban Institute State and Local Finance Initiative creates State Fiscal Briefs by iterating R Markdown documents.
- Data@Urban
4.6 Conclusion
Quarto is an updated version of R Markdown that can handle not only R but also Python and Julia code (among other languages). Quarto combines a yaml header, markdown text, and code chunks. It can be used in a variety of settings to create technical documents and presentations. We love Quarto and hope you will learn to love it too!
4.6.1 Suggestions
4.6.2 Resources
Pandoc is free software that converts documents between markup formats. For example, Pandoc can convert files to and from markdown, LaTeX, jupyter notebook (.ipynb), and Microsoft Word (.docx) formats, among many others. You can see a comprehensive list of files Pandoc can convert on their About Page.↩︎
Rendering to PDF requires a LaTeX distribution. Follow these instructions to install
library(tinytex)
if you want to make PDF documents.↩︎This table was typed as Markdown code. But sometimes it is easier to use a code chunk to create and print a table. Pipe any data frame into
knitr::kable()
to create a table that will be formatted in the output of a rendered Quarto document.↩︎The default dimensions for figures change based on the output format. Visit here to learn more.↩︎
The default dimensions for figures change based on the output format. Visit here to learn more.↩︎
library(tidyverse)
isn’t the focus of today. Feel free to write 2 + 2.↩︎