Chapter 4 Data Cleaning
Initially provided as XML files, the datasets were converted into csv files and then merged to create a final dataset with 132,000 observations and 98 features. As this project primarily focuses on passing, the data was converted into network data for each game. Each game consists of an array of matrices that represent the passing count between players for each possession.
Below is an example of a 10x10 matrix for a possession. The rows indicate the passer, and the column indicates the receiver.
100023 | 100283 | 839023 | 456782 | 222789 | 134783 | 111124 | 098783 | 352671 | 213416 | |
---|---|---|---|---|---|---|---|---|---|---|
100023 | 0 | 1 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 |
100283 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
839023 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
456782 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
222789 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
134783 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
111124 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
098783 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
352671 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
213416 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4.1 Changes in Shot Clock Time
As college basketball is a consistently changing sport, the NCAA changed the play rules for the 2013-2014 college basketball season. Instead of a 35 second shot clock, the NCAA established a 30 second shot clock. Since this work does not have a temporal component, the rule change does not affect the results of model building drastically. However, the extra five seconds may have allowed players to pass the ball more frequently, which would affect the passing matrices.