17 Jan 2023 - tsp
Last update 27 Jan 2023
10 mins
So who doesn’t know this? You’re running a git
repository and have to store large files
like assets in the repository - or you want to use it to keep track of for example some kind
of log- or lab book that also includes larger files. In this case git
usually is not
the best solution and one uses other repositories - for binary artifacts this is most of the time
a repository like Nexus or Artifactory. Don’t get me wrong - those tools are great. But sometimes
you really want to keep that data associated and versioned automatically in sync with your git
repository. For example when building static webpages or - like mentioned above - hacking a lab book
using git instead of using some other custom better suited solution.
Git itself is not suited to files larger than 100 MB and even for decent file sizes it gets pretty slow - it has been designed for textual content like source code or texts anyways. To solve that problem a third party solution - git large file storage (LFS) - has been developed. Note that this LFS plugin does not store large content in the git repository. It only contains references and pushes the objects (identified by their SHA-1 hash) to an referenced external web service. Please note that there are many drawbacks when using git LFS:
lfs
are not directly
stored in your git repositories but on an external web service (a third component). Only references
are kept in the repository itself.But anyways this should be a summary on how to run a server that supports git LFS and how to use it from your clients in a simple fashion. Please note that the described LFS server seems to be used in some particular production scenarios but might not be usable for production in some small scale deployments the way described in this blog article.
As mentioned above using git LFS requires a separate web service that’s reachable by all clients under
the same common name - which usually means it’s exposed to the web. There is a number of implementations
of the git LFS protocol. Most of them are more in the experimental non-mature stage. In addition there
is a number of commercial implementations out there (for example GitHub runs their own service). Anyways
one has to really think about if one wants to host such a service. But nevertheless the simplest
solution as of today in the authors opinion is the usage of giftless.
Giftless is a WSGI application and thus should run in an WSGI application
server such as uwsgi or gunicorn. giftless
has been designed to support a variety of authentication backends (though not many of them being available
despite a generic read only / generic allow read and write to anyone and a JWT based schema). It also
supports a large amount of storage backends - besides storing to local filesystem it allows to use
one of the major cloud storage providers (Amazon’s S3, Google Cloud Storage
or Microsoft Azure Blob Storage as backend.
In the following sample local file storage will be used to illustrate how to use git LFS for an in-house solution. Still keep in mind that you have to make backups of your repository. There won’t be an automatic copy of the whole repository on any client and pushing to multiple remotes does not copy all large files!
First one needs to install at least:
git
git-lfs
plugin when one wants to create repositoriesuwsgi
giftless
git LFS serverIt’s assumed that Python
and pip
is already available since uwsgi
and giftless
are
Python applications or Python containers.
To install the software on FreeBSD one can use either ports or packages. When using packages for example:
pkg install git
pkg install git-lfs
pkg install www/uwsgi
pip install giftless
Since giftless
does not declare all of it’s dependencies as one would expect one has to bootstrap
them oneself:
fetch https://raw.githubusercontent.com/datopian/giftless/master/requirements.txt
# Inspect the fetched file!
pip install -Ur requirements.txt
The next step is to run the server - here it’s illustrated how one could do this from the command line. In the usual
deployment scenario one would of course launch the service using the rc.init
system. For sake of simplicity
first lets look how one launches and configures it manually. Usually one configures the service using
a configuration file that’s then referenced using the GIFTLESS_CONFIG_FILE
environment variable.
Currently the configuration file allows one to configure:
TRANSFER_ADAPTERS
that interface different storage backends. Those specify a storage class as well as
options for the given storage classAUTH_PROVIDERS
that control authentication. The stock implementation only supports three
providers that are usually of limited use for a publically reachable service.MIDDLEWARE
configuration for WSGI middleware (for example when running behind a
proxy so giftless
knows under which URI it should be supplied or which CORS headers
should be set)There is some documentation for the configuration options available.
The most simple - but never to be used for a publicly ran service - authentication
provider is giftless.auth.allow_anon:read_write
. This just allows anonymous users
to store and read arbitrary large files on the service. To configure that authentication
provider one would use the following YAML snippet in ones configuration file:
AUTH_PROVIDERS:
- giftless.auth.allow_anon:read_write
This provider is useful for local deployments for testing purposes only. Usually one
will use a JWT based provider or an anonymous read-only storage provider for an exposed
service - for example to use HMAC protected JWT tokens one could use giftless.auth.jwt:factory
:
AUTH_PROVIDERS:
- factory: giftless.auth.jwt:factory
options:
algorithm: HS256
private_key: XXXX
Unfortunately documentation for authentication provider configuration is not really usable at the current point in time - it can be assumed that any production use of the service uses a cloud backend and JWT tokens for authentication.
Storage backends can be configured for the supported main cloud storage systems which
seems to be the typical use case for the LFS server. As of the time of writing this
summary the official documentation only contains information for Amazon S2, Google Cloud
and Microsoft Azure backends - and just mentions there is a local file storage
but nothing about how to configure it. When launching giftless
with uwsgi
though the local storage backend is the default backend - just using the current working
directory - not a clean way to solve but enough to play around.
So one can simply create a directory, create a configuration file in there and launch
the service in the uwsgi
container on any arbitrary local port - in this case
it’s decided that the service should only run on port 1234
on the local host:
$ mkdir giftless-server
$ cd giftless-server
$ cat > giftless.conf.yaml << EOL
AUTH_PROVIDERS:
- giftless.auth.allow_anon:read_write
EOL
$ env GIFTLESS_CONFIG_FILE=giftless.conf.yaml uwsgi -M -T --threads 2 -p 2 --manage-script-name --module giftless.wsgi_entrypoint --callable app --http 127.0.0.1:1234
First one has to create a new (bare) repository on the server as usual:
mkdir -p testrepo
cd testrepo
git init --bare
It’s assumed that git
and git-lfs
is already installed on the client. In case the client complains about
an unknown lfs
command the git-lfs
package is missing. First one has to clone the repository as usual:
git clone REPOURI
In this case REPOURI
is the URI of the repository - this can be referenced on the local filesystem, an SSH
server or using the git protocol - it does not matter as usual. Then one has to install lfs
in the local
repository and tell it which files to track (inside the repository directory) and also which repository service
to use - in this example it’s assumed to be running on 127.0.0.1:1234
and the repository is assumed
to be referenced by the path my-organization/test-repo
which is a pattern suggested by the developers of
the giftless
server. The URI should be reachable from any client that is supposed to access files stored
in the repositories using LFS - using the same pattern (i.e. no distinction when accessing from outside
your network or inside, etc.).
git lfs install
git config -f .lfsconfig lfs.url http://127.0.0.1:1234/my-organization/test-repo
git lfs track "*.bin"
This would track all files matching the *.bin
pattern - which works by creating two local files in the repository.
The first one is .lfsconfig
that tells the lfs
module about it’s configuration, on the other hand it’s
the .gitattributes
file that prevents git to include the files matched to be stored in the repository but
redirects them to the lfs
module for filter, diff and merge operations - which means that each client has
to install the lfs extension! One should add .lfsconfig
and .gitattributes
to the repository and push
them at the next commit after enabling lfs
git add .gitattributes .lfsconfig
git commit
git push
Pushing now already pushes all newly created files into the LFS repository - one can locate all *.bin
files that
one has added and committed / pushed to the repository not at the git directory on the remote but on the LFS server
identified by it’s SHA hash. Do not forget to install lfs
on each client after cloning the repository - LFS really
is just a redirection wrapper around git that has some additional maintenance to do.
This article is tagged:
Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)
This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/