High Throughput Reproducible Literate Phylogenetic Analysis

S., Ruhila

High Throughput Reproducible Literate Phylogenetic Analysis

Files

Primary Need To Add…Full Text_PDF. (15.36 KB)

Date

2022

Authors

S., Ruhila

Publisher

IEEE

Abstract

We present a holistic approach from a literate programming perspective to frame and solve systems biology problems. In particular, given the large data-sets required for answering questions relating to evolutionary histories we focus on the generalization and workflow required on a typical SLURM or PBS TORQUE queue driven high performance computing cluster. We demonstrate how to leverage multiple CLI tools compiled for efficient use in a portable manner on heterogeneous computational resources and further demonstrating the use of R to generate literate data-driven plots and analysis. High Performance Computing cluster (HPC) bottlenecks and installation barriers are also discussed and mitigation strategies are developed. As a concrete example we demonstrate the estimation of a phylogenetic tree, used to pose and answer questions on evolutionary lineages. In this manner, a generalized approach which can be used for systems biology is elucidated for manipulating phylogenetic data, including its validation, multiple sequence alignment, tree estimation through different models and reproduction.

Description

Only IISERM authors are available in the record.

Keywords

High Throughput Reproducible Literate, Phylogenetic Analysis

Citation

PDGC 2022 - 2022 7th International Conference on Parallel Distributed and Grid Computing, 337-340.

URI

https://doi.org/10.1109/PDGC56933.2022.10053210
http://hdl.handle.net/123456789/4745

Collections

Research Articles

Full item page

High Throughput Reproducible Literate Phylogenetic Analysis

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By