Statistical Analysis of Gut Microbiome Data for Memorial Sloan Kettering Patients Recovering from Leukemia
May 2019
Abstract
Human gut microbiome data from the Memorial Sloan Kettering Cancer Center was analyzed in an effort to characterize potential associations between patient traits and their gut bacterial compositions. The DADA2 pipeline was first applied to create an Amplicon Sequence Variant table based upon rRNA reads collected from stool samples of patients recovering from leukemia. Principal Coordinates Analysis was then applied to create ordination plots that represented the distribution of microbiome compositions. Interactive visualizations were developed in Tableau to visualize trends in the microbial dynamics of patients. Summary statistics to measure microbiome dynamics were also devised, and linear regressions were performed to identify potential traits related to changes in the microbiome over time. These analyses suggested a connection between microbiome movement and vital status and graft source. Finally, a phylogenetic tree decomposition was conducted to create a transformation of the bacterial abundance data that would provide contextual insight into connections between patient traits and their microbiomes. Through this process, we found that variables including care environment, the presence of chronic graft versus host disease, the number of days post transplant that the ANC is greater than 500, and graft source were all potentially significantly related to the composition of patient gut bacteria. Ultimately, our methodology shows much promise for the identification of associations between patient recovery from leukemia and their gut microbiomes.