Please use this identifier to cite or link to this item: http://hdl.handle.net/123456789/4747
Title: Reproducible High Performance Computing without Redundancy with Nix
Authors: S., Ruhila
Keywords: Reproducible High Performance Computing
Redundancy with Nix
Issue Date: 2022
Publisher: IEEE
Citation: PDGC 2022 - 2022 7th International Conference on Parallel Distributed and Grid Computing, 238-242.
Abstract: High performance computing (HPC) clusters are typically managed in a restrictive manner; the large user base makes cluster administrators unwilling to allow privilege escalation. Here we discuss existing methods of package management, including those which have been developed with scalability in mind, and enumerate the drawbacks and advantages of each management methodology. We contrast the paradigms of containerization via docker, virtualization via KVM, pod-infrastructures via Kubernetes, and specialized HPC packaging systems via Spack and identify key areas of neglect. We demonstrate how functional programming due to reliance on immutable states has been leveraged for deterministic package management via the nix-language expressions. We show its associated ecosystem is a prime candidate for HPC package management. We further develop guidelines and identify bottlenecks in the existing structure and present the methodology by which the nix ecosystem should be developed further as an optimal tool for HPC package management. We assert that the caveats of the nix ecosystem can easily mitigated by considerations relevant only to HPC systems, without compromising on functional methodology and features of the nix-language. We show that benefits of adoption in terms of generating reproducible derivations in a secure manner allow for workflows to be scaled across heterogeneous clusters. In particular, from the implementation hurdles faced during the compilation and running of the d-SEAMS scientific software engine, distributed as a nix-derivation on an HPC cluster, we identify communication protocols for working with SLURM and TORQUE user resource allocation queues. These protocols are heuristically defined and described in terms of the reference implementation required for queue-efficient nix builds.
Description: Only IISERM authors are available in the record.
URI: https://doi.org/10.1109/PDGC56933.2022.10053342
http://hdl.handle.net/123456789/4747
Appears in Collections:Research Articles

Files in This Item:
File Description SizeFormat 
Need To Add…Full Text_PDF.15.36 kBUnknownView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.