This is an old revision of the document!
This page is under construction.
The efficient, convenient, and robust execution of data-driven workflows and enhanced data management are key for productivity in computer-aided RD&E. Still, the storage stack is based on low-level POSIX I/O (or objects in cloud storage). We are establishing an open, community-driven, next-generation storage interface in a similar fashion to the existing forums. The forum would bring together vendors, storage experts, and users to discuss key features of the API and establish governance strategies. The envisioned coarse-grained API aims to overcome current obstacles for highly parallel workflows but would be beneficial also in the domain of big data and even desktop PC. It bears the opportunity to create a new ecosystem.
The envisioned API abstracts storage and data-flow computation with the clear goal of a separation of concern between the definition of tasks and the execution on a specific hardware. Additionally, it offers means to express workflow execution and lifecycle management. This allows smart schedulers to dispatch compute and I/O across the available hardware in an efficient manner. Since the user shall not care about data and compute locations, we coin the term liquid computing which extends stream processing. We aim to support the development of a prototype on top of existing software technology (e.g. NetCDF, HDF5, iRODS, libfabric, parallel file systems, and NoSQL solutions) but remain agnostic to the specific solution and not compete with these solutions. The final component is how to enable vendors to do differentiating optimizations without violating the common interface.
Main features of the resulting prototype:
What we mean by this, we explain on use cases.
For example, consider a heterogeneous storage landscape:
The system shall make placement decisions, partial replication of data depending on the availability and characteristics of the storage, but also considering the usage patterns particularly of the workflows. Thus, all available storage shall be used concurrently, storing and using data where it is best suited. In contrast, existing systems rely on data migration and policies.
We are building the NGI forum similarly to the MPI forum. Driven by use-cases a standard is created and demonstrated in prototypes.
We welcome contributions. If you are interested in this topic, subscribe to our mailing list.
We will soon list the initial contributors of this effort