Why one should use version control like GIT or SVN for nearly everything
07 Oct 2021 - tsp
Last update 07 Oct 2021
9 mins
Introduction
First of - what is everything in the context of this blog entry and
what is a version control system? And who is this article targeted at? Itās not
targeted at the experienced software developer who manages his code already
using git or SVN. It will be boring and sound somewhat strange in this case. Itās
targeted at people who currently donāt use SCM for any task. By everything I
mean stuff like:
- Program source code (obviously)
- Articles, thesis, papers, document collections, your schools homework, etc.
- Your curriculum vitae
- Your website or the sources your website is generated from
- Your exam samples
- Books
- Configuration files
The stuff that I donāt mean are large binary files such as your media collection,
photo collection, etc. and temporary files that can easily be regenerated at
a later time as well as large databases, scraped data, extracted data that can
be regenerated, etc.
What is version control? Version control systems (sometimes also called revision
control, source control or source code management system) allow one to centrally
or decentrally manage collections of files in different versions each. Imagine
you change something in your computer programs source code or in your thesis
and want to look into the old version later on. Often one sees people calling
their files thesis, thesis_final, thesis_finallyfinal, thesis_finallyfinal_really
and so on. And then shifting around the files on external storage devices such as
external harddisks or USB flash drives, many times with colliding names and then
later on overwriting much of their new work or not locating the most current version,
not being able to locate comments, etc. Version control systems solve that problem including
the moving around on USB sticks - they usually provide a blaming feature that even
can show who changed what and when in case oneās working in a team. And they usually
allow for seamless interoperation by including merge tools - if many people modify
the same file at different positions theyāre usually able to automatically
merge (if using proper file formats) differences or at least highlight merging
conflicts. And you never loose any old content - so think about what you put inside
a repository, usually if everything goes right nothing will ever be deleted and
most systems do not even support that without major hacking around in their internal
representation.
As already mentioned theyāve been mainly developed for software development but
the problem of revision management is as old as writing itself - and these systems
are really great to be applied to all textual content in a highly efficient way.
In fact this web page is built out of a source control system.
Different models and basic operations
There exist two different main models for source control (but only two really
popular software packages though there are is a huge number of different tools
out there).
First there are centralized version control systems. These are built around a
central repository thatās usually hosted on a server thatās reachable on the
network or via the internet. A typical representative is Subversion (SVN).
One creates a repository on the server (should do automated backups there) and
then checks out (copies) the version or branch one requires from the server
using the svn tool. Changes are stored locally and then commited (copied
back onto the server) into the central storage. One only stores the working copy
in one fixed version locally. The main advantage of a centralized version control
system is that one only checks out a given version or a given subset of the
project, is able to perform centralized rule checking and centralized linting of
the commits. To use SVN one usually only needs to know 3 different commands:
Checkout creates a new copy of a centralized repository or a subset of
it in a given revision. This is usually the first operation one ever performs
after creating a repository on the server side.
Update pulls the most current version of content from the server into
a local repository.
Commit pushes local changes into the remote repository - if there is a conflict
that is not solvable automatically the commit fails and one is able to perform
a local merge of the changes before trying again
In addition SVN also supports locking and unlocking resources so one can negotiate
who modifies which resources but usually this is not needed. Another operation
that one might need is Revert that reverts a file to an older revision
previously stored discarding any newer changes. The blame utility helps
identifying modifications.
Then there are distributed version control systems such as the really
popular GIT (note that this is not directly related
to the well known GitHub hosting service though thatās an
really easy starting point for newcomers) or the less well known older darcs.
Git provides the ability to run in distributed mode by keeping an own complete
local repository including all versions - but also allows one to synchronize
to remote ones like in the centralized case. This makes using git a little bit
more cumbersome and harder to think about than using SVN - but for source code
in the open source environment itās currently more popular than SVN due to itās
distributed nature. You can simply take the whole repository with you offline,
you have a whole copy (solves the backup problem if you simply clone / pull the
repositories on different machines and keeps them in sync).
To use git one requires at least the following commands:
- Remote repository:
clone is similar to checkout in SVN. It copies a remote repository - but
in contrast to SVN it copies everything including all old revisions and branches.
Later on when one uses nested repositories one will see that it does only
clone them recursively when one instructs it to but this is nothing a beginner
will usually have to worry about. It also adds the remotes to the repository
pull fetches the latest version from the registered remotes and includes
the latest changes into the local repository. Note that any changes to local files
should be commited or the pull will fail in case there is a conflict to prevent
data loss. In case the commit chain differs the system will try to automatically
merge the repositories.
push uploads all local changes to the remote repository
- Local repository:
add adds files to the local staging area. Data stages will be included in
the next local commit
- Operations such as
rm and mv should also be done through the
git utility and will be added to the staging area.
checkout can be used to revert local changes that have not yet been
commited
commit creates a new commit / revision in the commit hierarchy from
all staged changes. A commit can also be signed using OpenPGP to proof the
identity of the author even when using some untrusted repository storage.
The previously mentioned GitHub service is a nice external storage solution for
your git repositories if they are either public or should be shared only with
a small number of collaborators or a small group.
Previously Iāve written a short git cheat-sheet
that should provide a nice summary on how to do common stuff using git. Itās
really worth it and other than centralized systems it does not require one to
perform proper server administration for the central repository.
- You never ever have to worry about millions of filenames again. You have a
single object that you write your changes into and can walk the log to
see what has been changed when
- Collaboration gets easier since multiple people can in fact work on the
same documents in parallel and merge their changes later on
- You get a consistent copy of everything on all machines that you are using
- When using build automation you just have to upload your changes and the
build automation system builds your software package, book, documents, webpage
or articles in a clean fashion. No more āthis project only builds on
developer X machineā or āI donāt know how to format the LaTeX documentā
- You do not accidentally loose your hard work
- It makes backups easier
- You get a central repository (even when using a distributed revision control
system such as
git when using a central remote such as GitHub or
a GitLab instance). No more guessing which USB stick now has your current
version. Just always push your changes to your remotes. And you can use multiple
remotes to increase reliability.
- When using stuff like
GitHub it even formats your Markdown documents in
a nice fashion which is nice for documenting stuff - one can of course also
build a fully blown wiki solution on top of version control if this is really
required but if you build your lab book around markdown thatās pretty efficient.
- You can easily track progress and locate problematic changes later on.
- In case you have some version that you want to remember for a given reason
you can add a
tag. For example if you have a pre-print for your paper
or something that you handed in you can simply tag it to identify it later on.
This is also done for software when a version is released into testing or to
the general public. For software one can also decide which commits form a
given release to include only partial features, etc.
- Version control systems integrate very well with build automation systems
like Jenkins. Even though these systems are designed
for continuous integration for software development one can extend this concept
for example to book publishing or web publishing.
Drawbacks
- I donāt consider this as an drawback since I think this writing tool is
a mistake anyways but automatic merging of Microsoft Office files does not
work of course.
- You have to invest some time to learn and setup the tools.
This article is tagged: