Conclusion

Overall, our work from this year gave us much insight into the methodology required to understand and process microbiome data. It also revealed the potential behind new methodologies for the visualization and analysis of associations between patient information and microbiome compositions.

In terms of future directions for our project, there are many avenues that we can take. With respect to processing, we can investigate the use of different filtration parameters in our initial generation of our ASV table. For visualizations and linear regressions, we could also try to determine new summary statistics that still effectively represent the dynamics of the microbiome over time, but are more resistant to the addition and deletion of data.

With respect to phylogenetic tree decomposition, it would be useful to repeat our procedure with either fewer variables in our mixed effects model, or with more data. We see that our results are relatively sparse, particularly at nodes that occur lower in our phylogenetic tree. With more data, our counts would be higher at a larger number of internal nodes, thus potentially giving us the ability to fit regression models for more than just twenty-two nodes. We also noted that it is unlikely that sparsity is the sole reason for model failure in most of our nodes; indeed, if this were the case, then we would see nodes closer to the root with higher rates of success. However, the twenty-two nodes that worked seem to be randomly scattered throughout the phylogenetic tree. It would be beneficial to investigate more into why our mixed effects model worked for the specific cases it did, as compared to all other nodes. With more results, we could also analyze the output of our regression along chains of related nodes, thus controlling for inter-node variation.

There is also potential that functionality instead of phylogeny might be a better way to characterize the relations between microbiota. Instead of building a tree solely based upon taxonomies, it would also be interesting to use a tree based upon microbial functional groups instead, to have a clearer indication of the relationship between microbiome functions.

Finally, regardless of the type of tree structure used to establish a contextual relationship between ASVs in our data, it would be useful to consider regression models on maximum and last distance within the framework of phylogenetic tree decomposition. If we could find some way to consider change in compositions over time in a node-by-node manner, we may gain more insight into the relation between our covariates and microbiome change, beyond what we could identify in our linear regressions. Ultimately, our methodology shows much promise for the identification of connections between patient recovery from leukemia and their gut microbiomes.