Data Containers

New in SingularityCE 4.2 OCI-Mode.

Overview

Workflows in HPC often involve three distinct inputs:

  • User data, which needs to be analyzed.

  • A software application, which will analyze the user data.

  • Reference data, which the software uses to make sense of the user data.

Packaging the software application into an OCI-SIF, with SingularityCE in OCI-Mode, makes it easy to run and share. User data is also easy to handle with SingularityCE; simply bind your project directories or files from the HPC system into the container.

Reference data is a little more complicated, as it tends to be specific to the software being used and the data being analyzed. Perhaps you are aligning RNA-Seq data to a reference genome sequence, or passing medical images through a neural network model. Different reference data might be needed for different inputs (human vs mouse sequences, CT vs MRI images). Although software is containerized and ready to go, you will probably have to download reference data from a 3rd party, assemble, and often pre-process it before it can be used with the specific program that you need to run.

Putting all the reference data that might ever be needed into the same container as the software application could simplify things, but could make that container very large. What if we could easily distribute different sets of reference data alongside, but separately from the software application? The solution is a data container.

Creating a Data Container

SingularityCE 4.2 introduces the data package command, to create a data container OCI-SIF, by ‘packaging’ files and directories on the host:

$ singularity data package <source file/dir> <data container>

For example, to create a data container from the content of the directory mydata/ on the host:

$ singularity data package mydata mydata.oci.sif
INFO:    Converting layers to SquashFS

The resulting OCI-SIF file contains the packaged data as a SquashFS image, stored as an OCI artifact, with associated manifest. This allows it to be pushed / pulled to and from standard OCI registries.

Using a Data Container

Note

OCI-SIF data containers can only be used in OCI-Mode (when running containers with --oci).

To use a data container with an application container, the --data flag is passed to run / shell / exec in OCI-Mode. The data flag takes one or more comma separated <data container>:<dest> pairs, where <data container> is the path to the data container to use, and <dest> is the path in the application container at which its content should be made available.

For example, to make the content of the mydata.oci.sif data container available under /mydata in an application container:

$ singularity run --oci --data mydata.oci.sif:/mydata application.oci.sif
dtrudg-sylabs@mini:~$ ls /mydata/
bar  foo

You can use more than one data container by specifying the --data flag multiple times, or listing comma separated <data container>:<dest> pairs:

$ singularity run --oci \
    --data mydata.oci.sif:/mydata,otherdata.oci.sif:/otherdata \
    application.oci.sif

Is equivalent to:

$ singularity run --oci \
    --data mydata.oci.sif:/mydata \
    --data otherdata.oci.sif:/otherdata \
    application.oci.sif

Sharing a Data Container

As mentioned above, a data container stores a SquashFS filesystem as an OCI artifact. This means it can be pushed to, and pulled from, standard OCI registries alongside application container images.

To push to the container library:

$ singularity push -U mydata.oci.sif library://example/datac/mydata:latest
WARNING: Skipping container verification
INFO:    Pushing an OCI-SIF to the library OCI registry. Use `--oci` to pull this image.
4.0KiB / 4.0KiB [=================================================================] 100 %0s

To pull from the container library:

$ singularity pull --oci mydata.oci.sif library://example/datac/mydata:latest
WARNING: OCI image doesn't declare a platform. It may not be compatible with this system.
INFO:    Cleaning up.
WARNING: integrity: signature not found for object group 1
WARNING: Skipping container verification

To push to Docker Hub, or a similar OCI registry, after authenticating:

$ singularity push mydata.oci.sif docker://dctrud/mydata:latest
4.0KiB / 4.0KiB [=================================================================] 100 %0s
INFO:    Upload complete

To pull from Docker Hub, or a similar OCI registry:

$ singularity pull --oci docker://dctrud/mydata:latest
WARNING: OCI image doesn't declare a platform. It may not be compatible with this system.
INFO:    Using cached OCI-SIF image