Chapter 1 Introduction

STA 101 is the primary introductory statistics course at Duke University, where students are assumed to have entered the class without any statistical knowledge. The majority of students enrolled in the course do not plan on enrolling in future statistics curricula, focusing the course on applications of statistics, as students learn how to relate STA 101’s concepts to their future works.

At the beginning of the course, the students are divided into groups where they complete assignments throughout the course together. The class is split into seven units, and every section has at least one lab, taught in R, to bolster the learning segment.

For introductory statistics education, the Guidelines for Assessment and Instruction in Statistics Education (GAISE) sets quality and scope standards, as it provides a set of general recommendations, as well as specific goals to be accomplished by the course’s completion. With its updated version released in 2016, the GAISE, which is endorsed by the American Statistical Association, recommends that students use programming, if accessible, to explore real-world examples of concepts covered in the classroom (Carver et al., 2016).

To conclude the STA 101 semester, students display their grasp of the concepts by completing a group project conducted in R. A sufficient project submission adheres to seven of the nine stated goals of the GAISE, which are listed in the literature review section. The project, while constrained to a specific dataset, is relatively open-ended, as groups can analyze various features of the data.

Due to the flexibility promoted by the project assignment, students are encouraged to display aspects of creativity in their analyses, whether it is focusing on Warner Bros. Entertainment Inc. movies, or assembling an indicator variable tracking if a member of the film received a nomination for best actor or actress. Creativity often stems from the exploratory data analysis process, where groups can uncover interesting aspects to further scrutinize before beginning the analysis portion of the project.

Since the majority of students will apply these concepts in their future work, statistical programming provides students with a platform to individually explore datasets in the future. When using R as a course supplement, there are two prevailing and competing techniques for beginners (Robinson, 2017, Leek (2016)). Students are instructed to work with either a relatively new suite of packages called the tidyverse or base R commands that have been in use for far longer. The tidyverse was created to make coding in R more consistent, but it does not contain as many internet resources, such as debugging responses on Stack Overflow, as base R.

At Duke University, the programming aspect of STA 101 classes have been taught in either base R or the tidyverse syntax, creating an optimal platform to diagnose the direct affects of the syntax. Through the analysis of R code from STA 101 final group projects, we hope to uncover if one programming syntax encourages a more advanced level of creativity while simultaneously adhering to the GAISE recommendations and guidelines.

A study released March 3, 2019 attempted to uncover the answer to a similar question, as it studied the difference in visualization quality between base R plots and ggplot2, a package within the tidyverse, for beginning R programmers (Myint, Hadavand, Jager, & Leek, 2019). Although the evaluation favored visualizations crafted using ggplot2, it focused on the output of the code, rather than the code quality. This study concerns the code itself, and whether beginning coders were encouraged to be more creativity and produce higher-quality final projects based on the syntax.