scida: scalable analysis for scientific big data

Chris Byrohl, Dylan Nelson

October 2023

Preprint PDF GitHub Webpage

Abstract

scida is a Python package for reading and analyzing large scientific data sets. Data access is provided through a hierarchical dictionary-like data structure after a simple load() function. Using the dask library for scalable, parallel and out-of-core computation, all computation requests from a user session are first collected in a task graph. Arbitrary custom analysis, as well as all available dask (array) operations, can be performed. The subsequent computation is executed only upon request, on a target resource (e.g. a HPC cluster).

Type

Journal article

Publication

JOSS

Public development taking place on github.

Scientific Big Data