In the following blog post I look at some nice features and aspects of the
genius Erlang language and OTP framework - and what one can learn from them
even when programming in other languages.
What is Erlang anyhow? Erlang is a solid functional programming language with
built in concurrency. Itās been designed by Joe Armstrong, Robert Virding, and
Mike Williams at Ericsson back in 1986 and later open sourced.
The open telecom platform (OTP) is a set of libraries and paradigms that are
used on top of Erlang that have also specifically been developed to meet some
requirements from the telecommunication market. Erlang is used in many products
even as of today and is also the reason for the legendary 99.9999999% (nine nines)
reliability of the Ericsson AXD301 ATM switch. Please note that this nine nines
reliability might be somewhat misleading since itās of course just a statistical
figure. But it shows that Erlang has to offer some solid basis for reliable
products. There have even been implementations of drone control systems for UAVs
that allowed hot code reloading in flight.
Erlang is a pretty easy to learn language - one might refer to the great
Learn you some Erlang book (also available
on Amazon (note: Affiliate link, this pages author
profits from qualified purchases)). It takes about one to two weeks to learn
the basic language even in case itās the first programming language oneās learning
but donāt be fooled - a good understanding of the OTP framework as well as functional
programming patterns in general might be hard to grasp, especially if oneās learning
the first functional language after having programmed in object oriented or
imperative languages for a long time. It will be rewarding though.
So whatās the great stuff about the Erlang ecosystem that one can use even
when not programming with Erlang?
Functional programming
Even though Iām also a huge friend of object oriented programming the functional
pattern is providing a really interesting approach onto problem solving. Especially
if oneās more inclined to the mathematical approach of problem solving stuff like
variables being objects that have properties that can only be assigned once and having
pure functions that only depend on their input parameters and no external shared
state (of course thatās not possible for persistent data storage, etc.) can
really help when developing applications.
The basic idea is - as usual - to reduce coupling and side effects of any
components oneās writing. For example that would require one to not use global
variables - which is a good idea anyways since this allows more easy parallelization
and produces code thatās easier to modify later on. Of course reducing coupling
is one of the key ideas with any modern paradigm (like object oriented programming).
This is especially important in case oneās building architectures like microservice
architectures.
Of course this doesnāt mean one should simply use the same concepts like tail recursion
or keeping state in simple lists directly in other programming languages - but
keeping functions as functional pure as possible helps a lot - especially in case
one wants to do formal proofs using ACSL/Frama-c later on which is also a good idea
anyways.
Concurrency pattern
In Erlang concurrency is done with lightweight processes. These lightweight
processes do communicate via message passing and are capable of being distributed
transparently over a whole cluster of systems. Since Erlang is a (non pure)
functional language one is somewhat forced to have no shared state between
different lightweight processes except for persistent data-stores like databases
or files. Since processes in Erlang are really lightweight one might launch one
process (or even a process tree) for each and every independent task like for
example a network service handling incoming requests with one process tree
per request.
Of course since threads and processes are not as lightweight with other runtime
systems or especially with native threads and processes one might not fork as
many processes with other environments. But one can learn from keeping out
shared state and using message passing to handle coupling between various
components of a larger project. This might map to microservice architectures
as well as to loosely coupled asynchronous solutions built around work queues.
Leaving out shared state allows one to easily scale oneās application - for
example simply donāt use global variables.
Clustering
Erlang even supports transparent clustering. This allows re-balancing processes
transparently (since there is no shared state) inside a predefined sharded cluster
infrastructure. This is not easily configured in Erlang and one might do some
kind of auto-configuration and allow runtime scaling when developing something
similar today. But the basic idea of simply spawning a process and
having it assigned to any cluster member transparently is a powerful idea
especially in case on supports autoscaling.
In my opinion this behavior is somewhat comparable like AWS Lambda
infrastructure is working - processes are forked on demand on a transparent
scaling number of nodes.
Error handling
The way error handling is normally realized with Erlang/OTP applications is
rather unique. The code is written to match the correct supported state. In
case some state is entered that is invalid there is no condition matching
that state and the lightweight threads are simply crashing in a controlled
way (i.e. are terminated). This leads one to inherently check for valid arguments
and valid state during development of software. This is something one should
always do:
- Validate if input parameters (even to internal functions) are sane and
valid - donāt try to catch invalid state but instead verify theyāre
valid. One might for example also use ACSL annotations when programming in ANSI C
or JML annotations when programming in Java to proof valid state.
- In case of an error donāt try to recover gracefully by interpreting or repairing
data except in some really specific cases - like having invalid readings from
sensors when building flight control systems or some high reliable systems.
In fact proofing correctness of validity of function arguments on a whole
program basis eliminates a whole bunch of programming errors - this is also
true for missing initialization of variables or having some undefined random
state anywhere in oneās code. One should always state which conditions have
to be fulfilled for all parameters passed to functions and which conditions
are guaranteed for result values.
Since any undefined state leads to clean termination of the worker processes
the likelihood off exploits being based on running with invalid state or
some condition exhibiting undefined behavior is reduced by magnitudes.
Supervisor pattern
The supervisor pattern
is something that is highly linked to the way error handling
is done on Erlang - but it can also be used independently. The basic idea is
that no worker thread is launched directly but is launched via a supervisor tree.
The idea is that any launched process is monitored for termination and automatically
restarted as specified with a given policy. One might run a supervisor that monitors
a bunch of other supervisors or worker threads - hence the idea of a supervisor
tree.
The basic idea is that the condition that lead to unexpected termination of
a thread was an exception and not the norm so the service is simply restarted and
kept up and running even though an undefined condition or invalid state has lead
to the (safe) termination of a process.
The supervisor pattern makes sense on a large scale - for example when
implemented like in systemd - but also on a local scale
when monitoring local processes. One might for example monitor any long running
network service since itās entirely possible for any process on most modern
Unixoid system to be killed by an out of memory killer which is by the way
a really crude way to deal with out of memory conditions - one can change
this behavior of course and let memory allocation functions report an out of
memory condition but thatās not the default behavior unfortunately.
Runtime module switching (Hot code reloading)
This is another one of the killer features of Erlang. With Erlang itās entirely
possible to run a mixture of different module versions at the same time and
replace code - for example during an upgrade - without any service interruption.
This is done using hot code reloading. The beam virtual machine allows to
versions of the same module to be loaded at the same time - an old one and
a current one. Processes currently executing old code are simply running inside
old modules till the explicitly allow calling into a new version at runtime.
Note that there is a limit of two versions being loaded at the same time - so one
has to monitor how many old processes are still running. In the most basic
implementation one might have a single thread handling each client connected
to a service with medium processing time. One might simply load a new service
version and let all clients that have already been connected being handled
with the old version of code. As soon as all threads have been terminated the
old module version can be purged and unloaded.
This limitation is of course something one might do better when implementing
some similar mechanism - for example using runtime swapable C modules.
The basic idea is:
code:purge old versions of the module. This unloads any code marked as old.
In case any processes are still executing in old code these are killed at this
state. In case one doesnāt purge old code and one wants to load a new version
one might crash the whole application in case some code marked as old is still
present. There is soft_purge that works similar but only purges in case
no processes are lingering in the old modules any more.
- Use
compile:file to generate bytecode for the given modified new module
and load it using code:load_file. This is marking any old versions as
old and loading the new module versions. In case one now accesses any module
function supplying the module name (for example ?MODULE:examplefunction)
one calls into the new module.
A short example:
-module(hotcodereloadsample).
-export([loop/0]).
loop() ->
receive
replace ->
code:purge(?MODULE),
compile:file(?MODULE),
code:load_file(?MODULE),
?MODULE:loop();
helloworld ->
io:format("Hello WOld with a typo~n"),
loop();
_ ->
loop()
end.
As one can see in this example there is a single function that runs in an
endless loop and receives incoming messages. There are two different messages
handled - helloworld is simply writing a message to the console and
the replace function is purging old code, compiling a new module version
and loading it into memory. Then it uses ?MODULE:loop() to call into
the new version. As one can see helloworld is simply looping with
the tail recursive loop() so it will always stay in the old version of
the module.
Now one can spawn a process containing this version that is also containing
a typo.
$ erl
1> HotCodeReloadSample = spawn(hotcodereloadsample, loop, []).
Transmission of a helloworld message is as one would expect outputting the
message including the typographic error.
2> HotCodeReloadSample ! helloworld
Hello WOld with a typo
Now one can fix the offending line in the source code:
helloworld ->
io:format("Hello World!~n"),
loop();
The process still behaves as previously:
3> HotCodeReloadSample ! helloworld
Hello WOld with a typo
Now one can trigger the replace:
4> HotCodeReloadSample ! replace
After replace has been called the message is outputted from the new version
since the process entered the new version using ?MODULE:loop():
5> HotCodeReloadSample ! helloworld
Hello World!
In this example one did of course not exploit the ability of having multiple
versions up and running at the same time - this is especially interesting in case
one has a system that one doesnāt want to restart - like the previously managed
ATM switch that doesnāt drop connections during an software upgrade or the
mentioned drone flight control system that has to keep the drone in a safe
flying condition and react to external events even though one wants to upgrade
software.
In my opinion the idea of being capable of loading different versions of a module
at the same time is a really powerful one - even if itās somewhat hard to
realize in many languages like Java (one has to write one owns class-loader hierarchy)
or the .NET language family (keeping multiple assemblies loaded at the same time
is not possible up to my knowledge). Even when programming in C one has to
make local copies of each version to be capable of overwriting existing module
versions - and one has to implement some kind of message routing and module registry
to be capable of loading different versions at the same time. This requires some
discipline and clear planning during the architecture stage. Iāve previously
implemented such a system for the Steamboat Forgery
project that aims to implement services that allow easy and cheap (mainly hobbyist)
CNC machining DIY style.
There are existing implementations of such behaviors for example for Java - when
one thinks about JavaEE servlet containers that allow runtime replacement and
redeployment - most of the time unfortunately with some really short downtime
and not seamless.
This article is tagged: