User Tools

Site Tools


This is an old revision of the document!


Next Generation Storage Interfaces

The efficient, convenient, and robust execution of data-driven workflows and enhanced data management are key for productive in computer-aided RD&E. Still, the storage stack is based on the low-level POSIX I/O (or objects in cloud storage). We are establishing an open community-driven next-generation storage interface in a similar fashion to the existing forums. The forum would bring together vendors, storage experts, and users to discuss key features of the API and establish governance strategies. The envisioned coarse-grained API aims to overcome current obstacles for highly parallel workflows but would be beneficial also in the domain of big data and even desktop PC. It bears the opportunity to create a new ecosystem.

The envisioned API abstracts storage and data-flow computation with the clear goal of a separation of concern between the definition of tasks and the execution on a specific hardware. Additionally, it offers means to express workflow execution and lifecycle management. This allows smart schedulers to dispatch compute and I/O across the available hardware in an efficient manner. Since the user shall not care about data and compute locations, we broaden the term liquid computing. We aim to develop the first prototype on top of existing software technology like libfabric, parallel file systems, and NoSQL solutions. The final component is how to enable vendors to do differentiating optimizations without violating the common interface.

Main features of the resulting prototype:

  • Smart hardware and software components
  • Storage and compute are covered together
  • User metadata and workflows as first-class citizens
  • Self-aware instead of unconscious
  • Improving over time (self-learning, hardware upgrades)

What we mean by this, we explain on use cases. For example, consider a heterogeneous storage landscape:

The system shall make placement decisions, partial replication of data depending on the availability and characteristics of the storage, but also considering the usage patterns particularly the workflows. Thus, all available storage shall be used concurrently, storing and using data where it is best suited. In contrast, existing systems rely on data migration and policies.

Standardization

We are building the NGI forum similarly to the MPI forum. Driven by use-cases a standard is created and demonstrated in prototypes.

Contribute

We welcome contributions. If you are interested in this topic, subscribe to our mailing list.

Current contributors

We will soon release a more detailed description of this effort.