Skip to main content

Lightweight Registry with Oras OCI-Layouts and Object Storage in The Cloud

· 9 min read

ORAS Lightweight Cloud Registry

Setting up a centralized cloud OCI registry is sometimes too much for small scale tests and proof of concepts!
What if I told you there was a lightweight alternative?

Everything Should Be a File

One of the great Unix philosophies is the concept of making everything a file. Modems, keyboards, printers, disk drives, serial interfaces, configuration, kernel parameteres, IPC, everything and anything is a file that you can read() and/or write().

In a nutshell, this separates a many concerns to the file system layer instead of your program.
It allows, among other things, plugging stuff into any program without writing special integration code. That code was already written in the "driver" for that "device", so why repeat yourself in the program?

In General: programs write and read files
Making everything a file means that you can generally interface with every program.

Oras is EIAF Compatible

oras has a --oci-layout

Instead of talking to an upstream service over http/https, oras can create the OCI artifact structure as files and directories.

Most (if not every) operation that can be performed on an upstream registry can also be performed locally as an OCI directory structure.

Cloud Object Storage is EIAF Compatible

The file system is the foundation on which EIAF rests upon.
They are special components that convert media (CD, Tape, HDD, Controllers, etc.) into interface layers of which you can read/write files.

If you can mount it as a file system, you got EIAF.

Your preferred Cloud provider should have its own FUSE driver for their object storage offering, and if it doesn't, there is probably an alternative.


It is useful to think of Object Storage as key: data databases instead of the file systems they pretend to be.

Mounting one as a file system should be seen as "unintended functionality" and it breaks some assumptions programs make about file systems.

Example: Instant nanosecond retrieval times, turning into millisecond or even second long affairs.

Putting it together

Now that we have the all the parts, it's just a matter of combining them!

For this tutorial we are using s3 with s3fs, but feel free to use a different provider and follow along!

Prerequisites

  • Install ORAS
  • Install s3fs
  • Obtain an AWS account and create a S3 bucket

Creating some test artifacts

Lets's first create some artifacts!

echo "Hello World!" > hello.txt
echo "Goodbye World!" > goodbye.txt

Mounting your s3 bucket

Now we should mount our s3 bucket so that we can use it as an oci-layout target.
Checkout s3fs --help for the different ways you can authenticate, here we are using credential profiles.

mkdir s3
s3fs my-bucket s3 -o profile=MY_AWS_PROFILE

Using your new "registry"

Pushing our newly created artifacts is as easy as:

oras push --oci-layout s3/hello:latest hello.txt:text/plain
oras push --oci-layout s3/goodbye:latest goodbye.txt:text/plain

Pulling them is just as simple!

rm hello.txt
rm goodbye.txt
oras pull --oci-layout s3/hello:latest
oras pull --oci-layout s3/goodbye:latest
cat hello.txt
cat goodbye.txt

Congratulations!

Now you are running your own lightweight OCI registry on the cloud!

Caveats

You can't delete yet!

You can't delete your artifacts yet!

Currently1 you can't delete your local oci-layout artifacts with oras manifest delete or oras blob delete

Normally it is the registry's responsibility to garbage collect blobs without any references, but in this case the cli does not have the fucntionality (yet).

It is still possible to just do rm -rf s3/my-repo, but that is probably not what you want to do.

Concurrency?

Same concurrency rules of your chosen Object Store apply, generally last-write-wins. Please check your documentation

Using this trick in your application

You can use this trick in your applications as well!

Instead of integrating in your application code (hard dependency, vendor lock-in, complicates your interface: "how do I pass in creds?") you instead mount the Object Store in the container and then "just use the file system" (transferable, simple, external, interfaceable)

There are some caveats to this approach and your application might leverage the capabilities of the Native Object Storage API extremely well, but if it's just a simple interaction, why complicate things?

Upgrading to Registry

AWS Simple Storage Service (S3) to ZOT

You can just point ZOT at the s3 bucket as it is already expecting an oci-layout

Other

If you have a "normal" registry you'll need to export all your repos and their tags to that registry

Script

NOTE: won't work with any repo containing a space or tab

ls -1 s3/ | xargs -P0 -n1 sh -c 'oras repo tags --oci-layout "s3/$1" | xargs -P0 -n1 -I{} oras copy --recursive --from-oci-layout "s3/$1:{}" "yourregistry/$1:{}"' _

There is no --oci-layout on oras repo ls, so we are substituting with: ls -1 s3/

Performance

Goofys was the definitive champion utilizing the strengths of a local caching file system with the powers of a remote Object Store, it was the fastest in the cloud and local.

s3fs made a commendable effort and was faster than the native API on small objects, but started to lose performance on the bigger objects.

Findings

tl:dr

Local performance:
Goofys made testing very difficult because it so effectively cached and reused objects, but installing it was a pain.
s3fs installed easily, but suffered around 512MiB, was bad at 1Gib, and was awful at 2Gib.

Cloud performance:
Amazing throughout.
Use Goofys.

Note: goofys is cheating!. It is a caching fs, so the numbers below are not an accurate representation of all use-cases.

Local

Done inside WSL Ubuntu over WiFi.

This test was done over Wifi and was overall pretty slow especially with the bigger objects. Chunking the artifact did not lead to any substantial speed increases.

goofys isn't displayed, but you should try it out yourself!
It's asynchronous update and caching nature is so powerful and led to testing complications.

local plot

AWS

Welp it's WAY faster over here!

goofys is an entire order of magnitude faster, but normalized around 2GiB.
s3fs was also faster than native up until about 2GiB.

Chunking the artifacts did not lead to any substantial speed increases... Except for goofys, which showed an upwards of 2x speed increase!

aws plot

AWS2023 has removed the Extra Packages for Enterprise Linux (EPEL) repositories, so I used Debian instead. I used the apt distribution of s3fs and the go install of goofys.

Scripts used

Feel free to modify and test how your flavors of fuse drivers stack up!

test-lib.sh

#!/bin/bash

# bucket - the s3 bucket
# s3dir - the s3 mounted directory
# size - bytes size. e.g. 15MiB, 1GiB, 2MB. should be valid for `head -c `
# count - a integer
# datafile - the file that contains the time information
# jtag - a valid `junk` repo tag

# bucket dir
s3fsSetup() {
dirname="$2"
mkdir -p "$dirname"
s3fs "$1" "$dirname" -o iam_role; #setup for cloud, you should modify for local testing
echo "$dirname"
}

# bucket dir
goofysSetup() {
dirname="$2"
mkdir -p "$dirname"
goofys "$1" "$dirname" #setup for cloud, you should modify for local testing
echo "$dirname"
}

# s3dir
unmountS3() {
umount "$1" &&
rmdir "$1"
}

# size
genSingle() {
base64 < /dev/random | head -c "$1" > "${1}.junk"
echo "${1}.junk"
}

# size count
genMulti() {
seq 1 "$2" | xargs -n1 sh -c 'base64 < /dev/random | head -c "$1" > "${1}.${2}.junk"; echo "${1}.${2}.junk"' _ "$1"
}

# <(time)
time2line() {
grep -Eo '[0-9]+m.+s' | tr '\n' '\t'
}

# s3dir size
singleBlobPush() {
file=$(genSingle "$2")
oras push --oci-layout "${1}/junk:${2}" "$file"
rm "$file"
}

# s3dir size count
multiBlobPush() {
files=$(genMulti "$2" "$3")
echo "$files" | xargs -d'\n' -n"${3}" oras push --oci-layout "${1}/junk:${2}x${3}"
echo "$files" | xargs -d'\n' rm -f
}

# datafile
resetDataFile() {
printf '#%s\t%s\t%s\t%s\t%s\t\n' "type" "sizelabel" "realtime" "usertime" "systime" > "$1"
}

# s3dir jtag datafile
timeOrasPull() {
exec 3>&1
{ time oras pull --oci-layout "${1}/junk:${2}"; } 2>&1 1>&3 | printf '%s\t%s\t%s\n' "${1}" "${2}" "$(cat | time2line)" >> "$3"
rm -f *.junk
}

# bucket jtag datafile
timeS3Junk() {
exec 3>&1
tmpdir=$(mktemp -d)
{ time { aws s3 cp --recursive "s3://$1/junk/blobs/sha256/" "${tmpdir}/" 1>&1 2>&1 ; } ; } 2>&1 1>&3 | printf '%s\t%s\t%s\n' "aws-cli-s3" "${2}" "$(cat | time2line)" >> "$3"
rm -rf "$tmpdir"
}

# s3dir
deleteJunkRepo() {
rm -rf "$1/junk"
}

oci-test.sh

Remember to set the AWS ENV Variables

#!/bin/bash

. test-lib.sh

set -u
set -e
set -o pipefail

: "${S3_BUCKET:="$1"}"

: "${DAT_FILE:="orastime.dat"}"
: "${S3FS_TDIR:="s3fs3"}"
: "${S3FS_TEST:="true"}"
: "${S3FS_S3_BUCKET:="$S3_BUCKET"}"
: "${GOOFYS_TDIR:="goofys3"}"
: "${GOOFYS_TEST:="true"}"
: "${GOOFYS_S3_BUCKET:="$S3_BUCKET"}"
: "${SINGLE_LIST:="8MiB 16MiB 32MiB 64MiB 128MiB 256MiB 512MiB 1GiB 2GiB"}"
: "${MULTI_LIST:="16MiBx8 64MiBx8 64MiBx16"}"

# time postmessage
sleepy() {
echo "Sleeping for ${1}"
sleep "$1"
echo "$2"
}

# dir bucket datafile singlelist multilist
testit() {
dir="$1"
bucket="$2"
datfile="$3"
singlelist=($4)
multilist=($5)

# size
trippleTimePull() {
s="$1"
echo "Pull 1 Starting..." &&
timeOrasPull "$dir" "$s" "$datfile" && sleepy 1.3 "Pull 2 Starting..." &&
timeOrasPull "$dir" "$s" "$datfile" && sleepy 1.3 "Pull 3 Starting..." &&
timeOrasPull "$dir" "$s" "$datfile"
}

trippleTimeS3() {
s="$1"
echo "Copy 1 Starting..." &&
timeS3Junk "$bucket" "$s" "$datfile" && sleepy 1.3 "Copy 2 Starting..." &&
timeS3Junk "$bucket" "$s" "$datfile" && sleepy 1.3 "Copy 3 Starting..." &&
timeS3Junk "$bucket" "$s" "$datfile"
}

echo "Starting: Oras S3 test on ${dir} into ${datfile}"

for size in "${singlelist[@]}"
do
echo "Testing ${size}" &&
singleBlobPush "$dir" "$size" &&
trippleTimePull "$size" &&
trippleTimeS3 "$size" && echo "Cleaning up..." &&
deleteJunkRepo "$dir"
done

for sizexcount in "${multilist[@]}"
do
count="${sizexcount#*x}"
size="${sizexcount%x*}"
echo "Testing ${count} chunks of ${size}" &&
multiBlobPush "$dir" "$size" "$count" &&
trippleTimePull "$sizexcount" &&
trippleTimeS3 "$sizexcount" && echo "Cleaning up..." &&
deleteJunkRepo "$dir"
done
}


if [ "S3FS-TDIR" = "s3fs" ]; then
echo "IDFK Why but if you make the s3fs directory 's3fs' it breaks"
exit 1
fi

echo "Init Data File"
resetDataFile "$DAT_FILE"

if [ "$S3FS_TEST" = "true" ]; then
echo "Setting up s3fs..."
echo "$S3FS_S3_BUCKET"
s3fsSetup "$S3FS_S3_BUCKET" "$S3FS_TDIR"
echo "Lets do this..."
testit "$S3FS_TDIR" "$S3FS_S3_BUCKET" "$DAT_FILE" "$SINGLE_LIST" "$MULTI_LIST"
echo "Unmounting..."
unmountS3 "$S3FS_TDIR"
fi

if [ "$GOOFYS_TEST" = "true" ]; then
echo "Setting up goofys..."
goofysSetup "$GOOFYS_S3_BUCKET" "$GOOFYS_TDIR"
echo "Lets do this..."
testit "$GOOFYS_TDIR" "$GOOFYS_S3_BUCKET" "$DAT_FILE" "$SINGLE_LIST" "$MULTI_LIST"
echo "Unmounting..."
unmountS3 "$GOOFYS_TDIR"
fi

Footnotes

  1. It will probably be available in the next releases > v1.1.0. See this merged issue.