High Throughput Reproducible Literate Phylogenetic Analysis

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Abstract

We present a holistic approach from a literate programming perspective to frame and solve systems biology problems. In particular, given the large data-sets required for answering questions relating to evolutionary histories we focus on the generalization and workflow required on a typical SLURM or PBS TORQUE queue driven high performance computing cluster. We demonstrate how to leverage multiple CLI tools compiled for efficient use in a portable manner on heterogeneous computational resources and further demonstrating the use of R to generate literate data-driven plots and analysis. High Performance Computing cluster (HPC) bottlenecks and installation barriers are also discussed and mitigation strategies are developed. As a concrete example we demonstrate the estimation of a phylogenetic tree, used to pose and answer questions on evolutionary lineages. In this manner, a generalized approach which can be used for systems biology is elucidated for manipulating phylogenetic data, including its validation, multiple sequence alignment, tree estimation through different models and reproduction.

Description

Only IISERM authors are available in the record.

Citation

PDGC 2022 - 2022 7th International Conference on Parallel Distributed and Grid Computing, 337-340.

Endorsement

Review

Supplemented By

Referenced By