scida: scalable analysis for scientific big data

Abstract

scida is a Python package for reading and analyzing large scientific data sets. Data access is provided through a hierarchical dictionary-like data structure after a simple load() function. Using the dask library for scalable, parallel and out-of-core computation, all computation requests from a user session are first collected in a task graph. Arbitrary custom analysis, as well as all available dask (array) operations, can be performed. The subsequent computation is executed only upon request, on a target resource (e.g. a HPC cluster).

Publication

Public development taking place on github.