User Tools

Site Tools


This is an old revision of the document!


Next Generation Interfaces

The efficient, convenient, and robust execution of data-driven workflows and enhanced data management are key for productivity in computer-aided Research, Development and Engineering (RD&E). Still, the storage stack is based on low-level POSIX I/O (or objects in cloud storage). The forum would bring together vendors, storage experts, and users to discuss key features of alternative APIs and aims to establish governance strategies.

We are exploring to establish an open, community-driven, next-generation data-driven interface in a similar fashion to the existing forums. The envisioned coarse-grained API aims to overcome current obstacles for highly parallel workflows but would be beneficial also in the domain of big data and even desktop PC. It bears the opportunity to create a new ecosystem.

The envisioned API abstracts storage and data-flow computation with the clear goal of a separation of concern between the definition of tasks and the execution on a specific hardware. Additionally, it offers means to express workflow execution and lifecycle management. This allows smart schedulers to dispatch compute and I/O across the available hardware in an efficient manner. Since the user shall not care about data and compute locations, we coin the term liquid computing which extends stream processing. We aim to support the development of a prototype on top of existing software technology (e.g. NetCDF, HDF5, iRODS, libfabric, parallel file systems, and NoSQL solutions) but remain agnostic to the specific solution and not compete with these solutions. The final component is how to enable vendors to do differentiating optimizations without violating the common interface.

Main features of the resulting prototype could be:

  • Smart hardware and software components
  • Storage and compute are covered together
  • User metadata and workflows as first-class citizens
  • Self-aware instead of unconscious
  • Improving over time (self-learning, hardware upgrades)

What we mean by this, we explain on use cases. For example, consider a heterogeneous storage landscape:

The system shall make placement decisions, partial replication of data depending on the availability and characteristics of the storage, but also considering the usage patterns, particularly of the workflows. Thus, all available storage shall be used concurrently, storing and using data where it is best suited. In contrast, existing systems rely on data migration and policies.

Standardization

We are building the NGI forum similarly to the MPI Forum. Driven by use-cases a standard is created and demonstrated in prototypes.

Contribute

We welcome contributions. If you are interested in this topic, subscribe to our mailing list.

Current contributors

We will soon list the initial contributors of this effort