IBM Research: Storlets: From research prototype to open source technology

In a previous blog post devoted to storlets, IBM Fellow Michael Factor highlighted how storlets can be used to turn a software-defined object store into a smart storage platform. This is done by allowing the computation to run near the data, rather than bringing the data to the servers doing the computations. Michael's post addresses the potential of storlets in cost reduction as well as in enabling new services.

While in this post we want to concentrate on the technology itself, here are a few things that have happened with storlets since they were a research prototype.

Storlets advancements

We started interactions with the OpenStack community on the question of ‘if and how’ to add storlet support into the official Swift release. We will be having a design session discussion on this topic at the upcoming Vancouver OpenStack Summit. We encourage all interested parties to attend.

Storlets on OpenStack Swift

Our implementation of storlets is integrated with OpenStack Swift. Swift is an open source implementation of an object store and is behind several public object store services, including the IBM SoftLayer object store. Part of the idea behind storlets is to provide a flexible means of extending the function of the object store, by giving Swift users the ability to upload code to be executed near the data.

Running user-written code inside the storage system calls for adequate security and isolation measurements. This is where Docker comes into the picture. Docker is a popular Linux container management framework. Linux containers (LXC) are similar to virtual machines, only instead of virtualizing the hardware they virtualize the operating system.

In addition to providing security and isolation, Docker has tools for packaging and deploying executable images. Using Docker our implementation allows the user to upload the storlet’s code, along with a tailored image where the storlet will execute. Thus, if a storlet relies on some non-trivial software stack, that stack can be packaged into a Docker image and deployed in a Swift cluster, to be later used for executing the user's storlets.

Writing a storlet involves implementing a single method Java interface called IStorlet. That method - called invoke - has two major parameters: an input stream and an output stream. The input stream is used for consuming the data of an object on which the storlet is operating and the output stream is used to write the results of the storlet's computation. Storlets work in a streaming fashion, i.e., they start outputting data before reading all the input data. This is due to the synchronous fashion of storlets’ invocation as part of the upload or download operations as described next.

Once the Docker images and storlets are deployed, they can be invoked on data objects in Swift. Storlets can be invoked in two ways:

Invocation during object download. In this case the storlet transforms the object before it is returned to the user. This can be used for scenarios such as pre-filtering data being retrieved for an analytics engine or as an on-the-fly resolution reduction when downloading to a mobile device.
Invocation during object upload. In this case the data stored is transformed from the data PUT by the user. One example use case is metadata enrichment, where a storlet can tag a data object with additional metadata while it is being uploaded.

In our current implementation, invoking a storlet during the upload or the download of an object involves adding a single header to the upload / download request. This header identifies the storlet to execute on the object that is the target of the request. Once the request is received by Swift, a pluggable middleware intercepts the request at the appropriate point: During download this point is when the response data is on its way back to the user, and during uploads this point is along the input path of the request.

Our code then routes the data to the storlet using file descriptors that are passed over Linux domain sockets to the storlet code running inside the Docker container. Other than these file descriptors, the Docker container has no access to any I/O device. This means the storlet's code has no network access and no access to Swift's own disks. All I/O is done via the file descriptors provided by our Swift plugin.