07 Oct 2021 - tsp
Last update 07 Oct 2021
9 mins
First of - what is everything in the context of this blog entry and what is a version control system? And who is this article targeted at? It’s not targeted at the experienced software developer who manages his code already using git or SVN. It will be boring and sound somewhat strange in this case. It’s targeted at people who currently don’t use SCM for any task. By everything I mean stuff like:
The stuff that I don’t mean are large binary files such as your media collection, photo collection, etc. and temporary files that can easily be regenerated at a later time as well as large databases, scraped data, extracted data that can be regenerated, etc.
What is version control? Version control systems (sometimes also called revision
control, source control or source code management system) allow one to centrally
or decentrally manage collections of files in different versions each. Imagine
you change something in your computer programs source code or in your thesis
and want to look into the old version later on. Often one sees people calling
their files thesis
, thesis_final
, thesis_finallyfinal
, thesis_finallyfinal_really
and so on. And then shifting around the files on external storage devices such as
external harddisks or USB flash drives, many times with colliding names and then
later on overwriting much of their new work or not locating the most current version,
not being able to locate comments, etc. Version control systems solve that problem including
the moving around on USB sticks - they usually provide a blaming feature that even
can show who changed what and when in case one’s working in a team. And they usually
allow for seamless interoperation by including merge tools - if many people modify
the same file at different positions they’re usually able to automatically
merge (if using proper file formats) differences or at least highlight merging
conflicts. And you never loose any old content - so think about what you put inside
a repository, usually if everything goes right nothing will ever be deleted and
most systems do not even support that without major hacking around in their internal
representation.
As already mentioned they’ve been mainly developed for software development but the problem of revision management is as old as writing itself - and these systems are really great to be applied to all textual content in a highly efficient way. In fact this web page is built out of a source control system.
There exist two different main models for source control (but only two really popular software packages though there are is a huge number of different tools out there).
First there are centralized version control systems. These are built around a
central repository that’s usually hosted on a server that’s reachable on the
network or via the internet. A typical representative is Subversion (SVN).
One creates a repository on the server (should do automated backups there) and
then checks out (copies) the version or branch one requires from the server
using the svn
tool. Changes are stored locally and then commited
(copied
back onto the server) into the central storage. One only stores the working copy
in one fixed version locally. The main advantage of a centralized version control
system is that one only checks out a given version or a given subset of the
project, is able to perform centralized rule checking and centralized linting of
the commits. To use SVN one usually only needs to know 3 different commands:
Checkout
creates a new copy of a centralized repository or a subset of
it in a given revision. This is usually the first operation one ever performs
after creating a repository on the server side.Update
pulls the most current version of content from the server into
a local repository.Commit
pushes local changes into the remote repository - if there is a conflict
that is not solvable automatically the commit fails and one is able to perform
a local merge of the changes before trying againIn addition SVN also supports locking and unlocking resources so one can negotiate
who modifies which resources but usually this is not needed. Another operation
that one might need is Revert
that reverts a file to an older revision
previously stored discarding any newer changes. The blame
utility helps
identifying modifications.
Then there are distributed version control systems such as the really popular GIT (note that this is not directly related to the well known GitHub hosting service though that’s an really easy starting point for newcomers) or the less well known older darcs. Git provides the ability to run in distributed mode by keeping an own complete local repository including all versions - but also allows one to synchronize to remote ones like in the centralized case. This makes using git a little bit more cumbersome and harder to think about than using SVN - but for source code in the open source environment it’s currently more popular than SVN due to it’s distributed nature. You can simply take the whole repository with you offline, you have a whole copy (solves the backup problem if you simply clone / pull the repositories on different machines and keeps them in sync).
To use git one requires at least the following commands:
clone
is similar to checkout in SVN. It copies a remote repository - but
in contrast to SVN it copies everything including all old revisions and branches.
Later on when one uses nested repositories one will see that it does only
clone them recursively when one instructs it to but this is nothing a beginner
will usually have to worry about. It also adds the remotes to the repositorypull
fetches the latest version from the registered remotes and includes
the latest changes into the local repository. Note that any changes to local files
should be commited or the pull will fail in case there is a conflict to prevent
data loss. In case the commit chain differs the system will try to automatically
merge the repositories.push
uploads all local changes to the remote repositoryadd
adds files to the local staging area. Data stages will be included in
the next local commitrm
and mv
should also be done through the
git utility and will be added to the staging area.checkout
can be used to revert local changes that have not yet been
commitedcommit
creates a new commit / revision in the commit hierarchy from
all staged changes. A commit can also be signed using OpenPGP to proof the
identity of the author even when using some untrusted repository storage.The previously mentioned GitHub service is a nice external storage solution for your git repositories if they are either public or should be shared only with a small number of collaborators or a small group.
Previously I’ve written a short git cheat-sheet
that should provide a nice summary on how to do common stuff using git
. It’s
really worth it and other than centralized systems it does not require one to
perform proper server administration for the central repository.
git
when using a central remote such as GitHub
or
a GitLab
instance). No more guessing which USB stick now has your current
version. Just always push your changes to your remotes. And you can use multiple
remotes to increase reliability.GitHub
it even formats your Markdown documents in
a nice fashion which is nice for documenting stuff - one can of course also
build a fully blown wiki solution on top of version control if this is really
required but if you build your lab book around markdown that’s pretty efficient.tag
. For example if you have a pre-print for your paper
or something that you handed in you can simply tag it to identify it later on.
This is also done for software when a version is released into testing or to
the general public. For software one can also decide which commits form a
given release to include only partial features, etc.This article is tagged:
Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)
This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/