Event details

When: Friday, March 1 at 5:00pm - Sunday, March 3 at 5:00pm

Where: Penn Pavilion, Duke University

On Friday we will start with a welcome event where you will be introduced to the surprise data set you’ll be working with other the weekend. You’ll learn more about the data provider and a bit about what they’d like to get out of the data. The data will likely be much more complex than what you are used to seeing in your classes, and you will be given free reign to analyze it however you like. In other words, you will come up with a research question that is of interest to you, and conduct the appropriate analysis to answer your question. But you are welcomed, and encouraged, to take cues from the client’s introduction when shaping your research question(s).

Projects will be due on Sunday afternoon. Each team will give a brief presentation of their findings that will be judged by a panel of judges comprised of faculty and professionals from a variety of fields. There will be prizes in many categories, such as best visualization, best use of external data, and best findings. A finalized list of categories will be announced at the beginning of the competition.

What is DataFest?

Click here to see the 2023 winners! Click here to watch highlights from a previous DataFest!

ASA DataFestTM is a data analysis competition where teams of up to five students attack a large, complex, and surprise data set over a weekend. Your job is to represent your school by finding and communicating insights into these data. The teams that impress the judges will win prizes as well as glory for their school. Everyone will have a great experience, lots of food, and fun!

ASA DataFestTM is also a great opportunity to gain experience that employers are looking for. Having worked on a data analysis problem at this scale will certainly help make you a good candidate for any position that involves analysis and critical thinking, and it will provide a concrete example to demonstrate your experience during interviews.

ASA DataFestTM at Duke is organized by the Department of Statistical Science and the Statistical Majors Union at Duke University.


While ASA DataFestTM is a competition, the main goal of the event is to promote collaboration. Here are some testimonials from past participants:

It was a great experience, with a fun and interesting challenge. One of my favorite parts is how varied the presentations and projects from each team are. I love learning about ways in which others looked at and analyzed the same problem/ data.

DataFest was an awesome experience. To me, the best part was working in a team of friends that I usually hung out with, but had not had a chance to work together intensively on a project. We enjoyed analyzing the situations and solving problems together for our client. At the end of the day, we just got to know each other better. It was also fun to interact with other teams to explore other approaches while keeping in mind that we were in competition. The fact that we were given a huge amount of data really challenges us to come up with creative and practical approaches. Another important part was the presentation. Every team had to explain well to the judges their objectives and solutions. Our team won the Best Visualization award which is really awesome. Lastly, the food was fantastic.

Past DataFests at Duke

DataFest 2023 – American Bar Association

Goal: Analyze data to provide advice to the American Bar Association on best to ensure the appropriate legal experts are available to support their pro bono legal advice site.

DataFest 2022 - Play2Prevent Lab at Yale School of Medicine

Goal: Analyze game logs of the Elm City game to determine if there are coherent styles of play that might be useful for characterizing middle school students’ attitudes towards risky behaviors.

DataFest 2021 - Rocky Mountain Poison and Drug Safety

Goal: Use data from surveys conducted in the United States, Canada, Germany, and the United Kingdom to discover and identify patterns of drug use, with particular attention paid to identifying misuse. The analysis results could potentially be used to predict future drug misuse and to inform the development of a questionnaire physicians can use to predict drug misuse.

DataFest 2020 - COVID-19 Virtual Data Challenge

Goal: Explore data to understand a society impact of the COVID-19 pandemic other than its direct health outcomes. What have been the effects on pollution levels, transportation levels, or working from home? Has there been a change in the number of posts on TikTok? What is the impact on online education? The focus is up to you!

DataFest 2019 - Data source: Canadian National Women’s Rugby Team

Goal: How do we quantify the role of fatigue and workload in a team’s performance in Rugby 7s? How reliable are the subjective wellness Fata? Should the quality of the opponent or the outcome of the game be considered when examining fatigue during a game? Can widely used measurements of training load and fatigue be improved? How reliable are GPS data in quantifying fatigue?

DataFest 2018 - Data source: Indeed

Goal: What advice would you give a new high school about what major to choose in college? How does Indeed’s data compare to official government data on the labor market? Can it be used to provide good economic indicators?

DataFest 2017 - Data source: Expedia

Goal: How do visitors’ searches relate to the choices of hotels booked or not booked? What role do external factors play in hotel choice?

Expedia provided DataFesters with data from search results from millions of visitors around the world who were interested in traveling to destinations all over the world. The data were in two files, one of which included data collected on search results from visitors’ sessions, and another which contained detailed information about the destinations that visitors searched for.

DataFest 2016 - Data source: Ticketmaster

Goal: How can site visits be converted to ticket sales, and how can TicketMaster identify “true fans” of an artist or band?

Data consisted of three sets. One included events from the last 12 months that tracked customer travel through the website. Another provided information about advertising campaigns on Google, and the third included data on the events themselves.

DataFest 2015 - Data source:

Goal: Detect insights into the process of car shopping that can help make the process easier for customers.

Data consist of visitor ‘pathways’ through a website that helps customers configure car features and shop for cars. Five data files were linked by a customer key, and including data about the customer, about his or her visits to the webpage, and, when applicable, about the car purchased and the dealership where the car was purchased.

DataFest 2014 - Data source: GridPoint

Goal: Help understand how customers can best save money and energy.

Data consisted of a random sample of customers, with five-minute aggregates over a year of energy consumption that was then aggregated across important features of the commercial properties, as well as supporting climate and location data.

DataFest 2013 - Data source: eHarmony

Goal: Help understand what qualities people look for in prospective dates.

The DataFest students worked with a large sample of prospective matches. For each customer, data were provided on his or her preferences, as well as four matches, their preferences, and information about whether parties contacted one another.

DataFest 2012 - Data source:

Goal: Help understand what motivates people to lend money to developing-nation entrepreneurs and what factors are associated with paying these loans.

Several data sets were provided, including characteristics of lenders and borrowers and loan pay-back data.