Data Containers
New in SingularityCE 4.2 OCI-Mode.
Overview
Workflows in HPC often involve three distinct inputs:
User data, which needs to be analyzed.
A software application, which will analyze the user data.
Reference data, which the software uses to make sense of the user data.
Packaging the software application into an OCI-SIF, with SingularityCE in OCI-Mode, makes it easy to run and share. User data is also easy to handle with SingularityCE; simply bind your project directories or files from the HPC system into the container.
Reference data is a little more complicated, as it tends to be specific to the software being used and the data being analyzed. Perhaps you are aligning RNA-Seq data to a reference genome sequence, or passing medical images through a neural network model. Different reference data might be needed for different inputs (human vs mouse sequences, CT vs MRI images). Although software is containerized and ready to go, you will probably have to download reference data from a 3rd party, assemble, and often pre-process it before it can be used with the specific program that you need to run.
Putting all the reference data that might ever be needed into the same container as the software application could simplify things, but could make that container very large. What if we could easily distribute different sets of reference data alongside, but separately from the software application? The solution is a data container.
Creating a Data Container
SingularityCE 4.2 introduces the data package
command, to create a data
container OCI-SIF, by ‘packaging’ files and directories on the host:
$ singularity data package <source file/dir> <data container>
For example, to create a data container from the content of the directory
mydata/
on the host:
$ singularity data package mydata mydata.oci.sif
INFO: Converting layers to SquashFS
The resulting OCI-SIF file contains the packaged data as a SquashFS image, stored as an OCI artifact, with associated manifest. This allows it to be pushed / pulled to and from standard OCI registries.
Using a Data Container
Note
OCI-SIF data containers can only be used in OCI-Mode (when running
containers with --oci
).
To use a data container with an application container, the --data
flag is
passed to run / shell / exec
in OCI-Mode. The data flag takes one or more
comma separated <data container>:<dest>
pairs, where <data container>
is
the path to the data container to use, and <dest>
is the path in the
application container at which its content should be made available.
For example, to make the content of the mydata.oci.sif
data container
available under /mydata
in an application container:
$ singularity run --oci --data mydata.oci.sif:/mydata application.oci.sif
dtrudg-sylabs@mini:~$ ls /mydata/
bar foo
You can use more than one data container by specifying the --data
flag
multiple times, or listing comma separated <data container>:<dest>
pairs:
$ singularity run --oci \
--data mydata.oci.sif:/mydata,otherdata.oci.sif:/otherdata \
application.oci.sif
Is equivalent to:
$ singularity run --oci \
--data mydata.oci.sif:/mydata \
--data otherdata.oci.sif:/otherdata \
application.oci.sif