note, this article is by Eran Rom, storage researcher at IBM Research –
|Researcher Eran Rom|
In a previous
blog post devoted to storlets, IBM Fellow Michael Factor highlighted how
storlets can be used to turn a software-defined object store into a smart
storage platform. This is done by allowing the computation to run near the
data, rather than bringing the data to the servers doing the computations.
Michael's post addresses the potential of storlets in cost reduction as well as
in enabling new services.
While in this
post we want to concentrate on the technology itself, here are a few things
that have happened with storlets since they were a research prototype.
- Storlets played a
central role in a first-of-a-kind
solution, called Active Media Store, developed with Radiotelevisione Italiana
- The work
with RAI was presented at the Paris OpenStack Summit, and adopted by
OpenStack as a "superuser
- A reference
implementation of storlets is now publicly available under 'Open-I-Beam' in github.
interactions with the OpenStack community on the question of ‘if and how’ to
add storlet support into the official Swift release. We will be having a design
session discussion on this topic at the upcoming Vancouver
OpenStack Summit. We encourage all interested parties to attend.
Storlets on OpenStack Swift
implementation of storlets is integrated with OpenStack Swift. Swift is an open
source implementation of an object store and is behind several public object
store services, including the IBM SoftLayer object store. Part of the idea
behind storlets is to provide a flexible means of extending the function of the
object store, by giving Swift users the ability to upload code to be executed
near the data.
code inside the storage system calls for adequate security and isolation
measurements. This is where Docker comes into the picture. Docker is a popular Linux container
management framework. Linux containers
(LXC) are similar to virtual machines, only instead of virtualizing the
hardware they virtualize the operating system.
In addition to
providing security and isolation, Docker has tools for packaging and deploying executable
images. Using Docker our implementation allows the user to upload the storlet’s
code, along with a tailored image where the storlet will execute. Thus, if a storlet relies on some non-trivial
software stack, that stack can be packaged into a Docker image and deployed in a
Swift cluster, to be later used for executing the user's storlets.
Writing a storlet
involves implementing a single method Java interface called IStorlet. That
method - called invoke - has two major parameters: an input stream and an
output stream. The input stream is used for consuming the data of an object on
which the storlet is operating and the output stream is used to write the results
of the storlet's computation. Storlets work in a streaming fashion, i.e., they
start outputting data before reading all the input data. This is due to the
synchronous fashion of storlets’ invocation as part of the upload or download
operations as described next.
Once the Docker images and storlets are deployed, they can be invoked on data objects in Swift. Storlets can be invoked in two ways:
- Invocation during object download. In this case the storlet transforms the object before it is returned to the user. This can be used for scenarios such as pre-filtering data being retrieved for an analytics engine or as an on-the-fly resolution reduction when downloading to a mobile device.
- Invocation during object upload. In this case the data stored is transformed from the data PUT by the user. One example use case is metadata enrichment, where a storlet can tag a data object with additional metadata while it is being uploaded.
In our current implementation, invoking
a storlet during the upload or the download of an object involves adding a
single header to the upload / download request. This header identifies the storlet to
execute on the object that is the target of the request. Once the request is received
by Swift, a pluggable middleware intercepts the request at the appropriate
point: During download this point is when the response data is on its way back
to the user, and during uploads this point is along the input path of the
Our code then routes the data to the storlet using file descriptors
that are passed over Linux domain sockets to the storlet code running inside
the Docker container. Other than these file descriptors, the Docker container
has no access to any I/O device. This means the storlet's code has no network
access and no access to Swift's own disks. All I/O is done via the file
descriptors provided by our Swift plugin.
For those interested in more, github has
comprehensive documentation that includes storlet samples and automated
installation and configuration, which gives you a quick start.
Labels: Active Media Store, cloud storage, Docker, Eran Rom, IBM Research - Haifa, openstack, storlets, Swift