Security Options
SingularityCE can make use of various Linux kernel features to modify the security scope and context of running containers. Non-root users may be granted additional permissions using Linux capabilities. SELinux, AppArmor, and Seccomp can be used to restrict the operations that can be performed by a container.
Linux Capabilities
Native runtime / non-OCI-Mode
In SingularityCE’s default configuration, without --oci
, a container started
by root receives all capabilities, while a container started by a non-root user
receives no capabilities.
Additionally, SingularityCE provides support for granting and revoking Linux
capabilities on a user or group basis. For example, let us suppose that an
administrator has decided to grant a user (named pinger
) capabilities to
open raw sockets so that they can use ping
in a container where the binary
is controlled via capabilities. For information about how to manage capabilities
as an admin please refer to the capability admin docs.
Note
In SingularityCE’s default setuid and non-OCI mode, containers are only isolated in a mount namespace. A user namespace, which limits the scope of capabilities, is not used by default.
Therefore, it is extremely important to recognize that granting users Linux
capabilities with the capability
command group is usually identical
to granting those users root level access on the host system. Most, if not
all, capabilities will allow users to “break out” of the container and become
root on the host. This feature is targeted toward special use cases (like
cloud-native architectures) where an admin/developer might want to limit the
attack surface within a container that normally runs as root. This is not a
good option in multi-tenant HPC environments where an admin wants to grant a
user special privileges within a container. For that and similar use cases,
the fakeroot feature is a better option.
To take advantage of this granted capability as a user, pinger
must
also request the capability when executing a container with the
--add-caps
flag like so:
$ singularity exec --add-caps CAP_NET_RAW library://sylabs/tests/ubuntu_ping:v1.0 ping -c 1 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=52 time=73.1 ms
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 73.178/73.178/73.178/0.000 ms
If the admin decides that it is no longer necessary to allow the user
pinger
to open raw sockets within SingularityCE containers, they can
revoke the appropriate Linux capability and pinger
will not be able
to add that capability to their containers anymore:
$ singularity exec --add-caps CAP_NET_RAW library://sylabs/tests/ubuntu_ping:v1.0 ping -c 1 8.8.8.8
WARNING: not authorized to add capability: CAP_NET_RAW
ping: socket: Operation not permitted
Another scenario which is atypical of shared resource environments, but useful in cloud-native architectures is dropping capabilities when spawning containers as the root user to help minimize attack surfaces. With a default installation of SingularityCE, containers created by the root user will maintain all capabilities. This behavior is configurable if desired. Check out the capability configuration and root default capabilities sections of the admin docs for more information.
Assuming the root user will execute containers with the CAP_NET_RAW
capability by default, executing the same container pinger
executed
above works without the need to grant capabilities:
# singularity exec library://sylabs/tests/ubuntu_ping:v1.0 ping -c 1 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=52 time=59.6 ms
--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 59.673/59.673/59.673/0.000 ms
Now we can manually drop the CAP_NET_RAW
capability like so:
# singularity exec --drop-caps CAP_NET_RAW library://sylabs/tests/ubuntu_ping:v1.0 ping -c 1 8.8.8.8
ping: socket: Operation not permitted
And now the container will not have the ability to create new sockets,
causing the ping
command to fail.
The --add-caps
and --drop-caps
options will accept the all
keyword. Of course appropriate caution should be exercised when using
this keyword.
OCI-Mode
When containers are run in OCI-mode, by a non-root user, initialization is
always performed inside a user namespace. The capabilities granted to a
container are specific to this user namespace. For example, CAP_SYS_ADMIN
granted to an OCI-mode container does not give the user the ability to mount a
filesystem outside of the container’s user namespace.
Because of this isolation of capabilities users can add and drop capabilities,
using --add-caps
and --drop-caps
, without the need for the administrator
to have granted permission to do so with the singularity capabilities
command.
OCI-mode containers do not inherit the user’s own capabilities, but instead run with a default set of capabilities that matches other OCI runtimes.
CAP_NET_RAW
CAP_NET_BIND_SERVICE
CAP_AUDIT_READ
CAP_AUDIT_WRITE
CAP_DAC_OVERRIDE
CAP_SETFCAP
CAP_SETPCAP
CAP_SETGID
CAP_SETUID
CAP_MKNOD
CAP_CHOWN
CAP_FOWNER
CAP_FSETID
CAP_KILL
CAP_SYS_CHROOT
When the container is entered as the root user (e.g. with --fakeroot
), these
default capabilities are added to the effective, permitted, and bounding sets.
When the container is entered as a non-root user, these default capabilities are added to the bounding set.
Encrypted containers
Beginning in SingularityCE 3.4.0 it is possible to build and run encrypted containers. The containers are decrypted at runtime entirely in kernel space, meaning that no intermediate decrypted data is ever present on disk. See encrypted containers for more details.