Literate Programming

2023 DSS Bootcamp

Colin Rundel

2022-08-25

R, RMarkdown, and Quarto

What is markdown?

  • Markdown is a lightweight markup language for creating HTML (and other formatted) documents.

  • Markup languages are designed to produce documents from human readable text (and annotations).

  • Some of you may be familiar with LaTeX. This is another (less human friendly) markup language for creating pdf documents.

  • Why markdown is great:

    • Easy to learn and use.
    • Focus on content, rather than coding and debugging errors.
    • Once you have the basics down, you can get fancy via HTML, JavaScript, and CSS.
    • Used by RMarkdown, Jupyter Notebooks, and Quarto

What is Quarto?

R Markdown / Quarto - Input


R Markdown:

---
title: "Untitled"
output: html_document
date: "2023-08-22"
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

```{r cars}
summary(cars)
```

## Including Plots

You can also embed plots, for example:

```{r pressure, echo=FALSE}
plot(pressure)
```

Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Quarto:

---
title: "Untitled"
format: html
---

## Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see <https://quarto.org>.

## Running Code

When you click the **Render** button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

```{r}
1 + 1
```

You can add options to executable code like this 

```{r}
#| echo: false
2 * 2
```

The `echo: false` option disables the printing of code (only output is displayed).

R Markdown / Quarto - Output

Something simple

Something fancy

R Markdown resources

Quarto resources

Quarto / RMarkdown demo

R packages

  • Packages are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and sample data.

  • In the following exercises we’ll use the tidyverse package.

    • The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
    • The core tidyverse packages consists of ggplot2, tibble, tidyr, readr, purrr, dplyr, stringr, and forcats packages.
  • This package is already installed for you on the DSS servers. If needed, you can install it by running the following in the Console:

    install.packages("tidyverse")

A note on environments

  • Your R Markdown / Quarto document and your Console do not share their global environments.

    • This is different than the computational model of Jupyter Notebooks!
  • This is good for reproducibility, but can sometimes result in frustrating errors.

  • This also means any packages or data needed for your analysis need to be loaded in your R Markdown document as well.

Unvotes data analysis

To get started,

  • open examples/unvotes.qmd,

  • try Rendering the entire document and examine the results.

  • try changing one or more of the selected countries, re-render the document and observe any changes.

  • commit and push your changes to GitHub (include the newly generated unvotes.html file as well)


R Markdown / quarto suggestions

  • Remember to name your code chunks

  • Familiarize yourself with chunk options (https://yihui.org/knitr/options/)

    • Use global chunk options to reduce duplication
    • Using #| syntax enables tab completion for chunk options
  • Load packages at the start of a document, generally the chunk after your setup chunk

  • Familiarize yourself with various output formats: Make slides with revealjs, pdfs, books, etc.

R programming resources

More R / RStudio resources

  • Some useful resources from RStudio: https://www.rstudio.com/resources/cheatsheets/

    • RStudio IDE Cheat Sheet
    • R Markdown Cheat Sheet
    • R Markdown Reference Guide
    • Data Import Cheat Sheet
    • Data Transformation Cheat Sheet
    • Data Visualization Cheat Sheet

Some of the above cheat sheets are available in RStudio: Help > Cheatsheets

Jupyter notebook demo

Why python?

RStudio Workbench + Jupyter

If you return to https://rstudio.stat.duke.edu you can launch a new session and select Jupyter Lab as your editor of choice.



Overview of the notebook

Bimodal interface: edit mode and command mode

Click in a cell or hit enter to enter edit mode

When in edit mode you can type code or write text with markdown.

Hit esc to enter command mode

When in command mode you can make edits to the notebook, but not individual cells.

Notebook shortcuts

In edit mode:

  • shift + enter - Run cell and add new cell
  • enter - Add a line within a cell

In command mode:

  • s - Save the notebook
  • m - Change cell to markdown
  • y - Change cell to code
  • x, c, v, d - Cut, copy, paste, delete a cell
  • a, b - Add a cell above, below

The point-and-click interface is also an option to execute these commands.

Jupyter and Terminal

  • Jupyter Lab provides a direct interface to the terminal (similar to RStudio) under Launcher > Other

  • Terminal commands can also be included in notebooks by prefixing with !, e.g.

!pip install --user statsmodels

Jupyter and Git

  • The departmental server has the git jupyter lab extension installed.

  • This provides a GUI similar to RStudio’s for interacting with Git repositories

  • Navigate to a repo’s root directory and then switch to the Git pane.

Unvotes data analysis

To get started,

  • open examples/unvotes.ipynb,

  • Render the entire document by selecting Run > Run All Cells

  • Try changing one or more of the selected countries, rerunning the document, observe any changes.

  • commit and push your changes to GitHub (include the newly generated unvotes.html file as well)

Jupyter notebook versus R Markdown

  • Similar to R Markdown, Jupyter notebooks allow you to write code and text in one easy to read document that is reproducible and easy to share with others.

  • Jupyter notebooks include the text, code and computational output.

  • A Jupyter notebook does not knit to an HTML, PDF or Word file. However, you can embed HTML into a notebook.

    • Exports are possible with packages like nbcovert
  • For a more detailed comparison see The First Notebook War.

Jupyter notebook + quarto

Quarto was build to unify the scientific publish process across the most commonly used notebook formats and this include Jupyter notebooks.

Specifically, quarto has a couple of neat tricks:

  • Render ipynb files using the jupyter engine

    quarto render unvotes.ipynb --execute
    quarto render unvotes.ipynb --execute --to pdf
  • Converts between ipynb and qmd files seamlessly

    quarto convert unvotes.ipynb

Additional Python resources