What (in my opinion) one can learn from Erlang/OTP for other programming languages

01 Jul 2020 - tsp
Last update 05 Jul 2020
Reading time 12 mins

In the following blog post I look at some nice features and aspects of the genius Erlang language and OTP framework - and what one can learn from them even when programming in other languages.

What is Erlang anyhow? Erlang is a solid functional programming language with built in concurrency. It’s been designed by Joe Armstrong, Robert Virding, and Mike Williams at Ericsson back in 1986 and later open sourced. The open telecom platform (OTP) is a set of libraries and paradigms that are used on top of Erlang that have also specifically been developed to meet some requirements from the telecommunication market. Erlang is used in many products even as of today and is also the reason for the legendary 99.9999999% (nine nines) reliability of the Ericsson AXD301 ATM switch. Please note that this nine nines reliability might be somewhat misleading since it’s of course just a statistical figure. But it shows that Erlang has to offer some solid basis for reliable products. There have even been implementations of drone control systems for UAVs that allowed hot code reloading in flight.

Erlang is a pretty easy to learn language - one might refer to the great Learn you some Erlang book (also available on Amazon (note: Affiliate link, this pages author profits from qualified purchases)). It takes about one to two weeks to learn the basic language even in case it’s the first programming language one’s learning but don’t be fooled - a good understanding of the OTP framework as well as functional programming patterns in general might be hard to grasp, especially if one’s learning the first functional language after having programmed in object oriented or imperative languages for a long time. It will be rewarding though.

So what’s the great stuff about the Erlang ecosystem that one can use even when not programming with Erlang?

Functional programming
Concurrency pattern
Error handling
Supervisor pattern
Runtime module switching (Hot code reloading)

Functional programming

Even though I’m also a huge friend of object oriented programming the functional pattern is providing a really interesting approach onto problem solving. Especially if one’s more inclined to the mathematical approach of problem solving stuff like variables being objects that have properties that can only be assigned once and having pure functions that only depend on their input parameters and no external shared state (of course that’s not possible for persistent data storage, etc.) can really help when developing applications.

The basic idea is - as usual - to reduce coupling and side effects of any components one’s writing. For example that would require one to not use global variables - which is a good idea anyways since this allows more easy parallelization and produces code that’s easier to modify later on. Of course reducing coupling is one of the key ideas with any modern paradigm (like object oriented programming). This is especially important in case one’s building architectures like microservice architectures.

Of course this doesn’t mean one should simply use the same concepts like tail recursion or keeping state in simple lists directly in other programming languages - but keeping functions as functional pure as possible helps a lot - especially in case one wants to do formal proofs using ACSL/Frama-c later on which is also a good idea anyways.

Concurrency pattern

In Erlang concurrency is done with lightweight processes. These lightweight processes do communicate via message passing and are capable of being distributed transparently over a whole cluster of systems. Since Erlang is a (non pure) functional language one is somewhat forced to have no shared state between different lightweight processes except for persistent data-stores like databases or files. Since processes in Erlang are really lightweight one might launch one process (or even a process tree) for each and every independent task like for example a network service handling incoming requests with one process tree per request.

Of course since threads and processes are not as lightweight with other runtime systems or especially with native threads and processes one might not fork as many processes with other environments. But one can learn from keeping out shared state and using message passing to handle coupling between various components of a larger project. This might map to microservice architectures as well as to loosely coupled asynchronous solutions built around work queues. Leaving out shared state allows one to easily scale one’s application - for example simply don’t use global variables.

Clustering

Erlang even supports transparent clustering. This allows re-balancing processes transparently (since there is no shared state) inside a predefined sharded cluster infrastructure. This is not easily configured in Erlang and one might do some kind of auto-configuration and allow runtime scaling when developing something similar today. But the basic idea of simply spawning a process and having it assigned to any cluster member transparently is a powerful idea especially in case on supports autoscaling.

In my opinion this behavior is somewhat comparable like AWS Lambda infrastructure is working - processes are forked on demand on a transparent scaling number of nodes.

Error handling

The way error handling is normally realized with Erlang/OTP applications is rather unique. The code is written to match the correct supported state. In case some state is entered that is invalid there is no condition matching that state and the lightweight threads are simply crashing in a controlled way (i.e. are terminated). This leads one to inherently check for valid arguments and valid state during development of software. This is something one should always do:

Validate if input parameters (even to internal functions) are sane and valid - don’t try to catch invalid state but instead verify they’re valid. One might for example also use ACSL annotations when programming in ANSI C or JML annotations when programming in Java to proof valid state.
In case of an error don’t try to recover gracefully by interpreting or repairing data except in some really specific cases - like having invalid readings from sensors when building flight control systems or some high reliable systems.

In fact proofing correctness of validity of function arguments on a whole program basis eliminates a whole bunch of programming errors - this is also true for missing initialization of variables or having some undefined random state anywhere in one’s code. One should always state which conditions have to be fulfilled for all parameters passed to functions and which conditions are guaranteed for result values.

Since any undefined state leads to clean termination of the worker processes the likelihood off exploits being based on running with invalid state or some condition exhibiting undefined behavior is reduced by magnitudes.

Supervisor pattern

The supervisor pattern is something that is highly linked to the way error handling is done on Erlang - but it can also be used independently. The basic idea is that no worker thread is launched directly but is launched via a supervisor tree. The idea is that any launched process is monitored for termination and automatically restarted as specified with a given policy. One might run a supervisor that monitors a bunch of other supervisors or worker threads - hence the idea of a supervisor tree.

The basic idea is that the condition that lead to unexpected termination of a thread was an exception and not the norm so the service is simply restarted and kept up and running even though an undefined condition or invalid state has lead to the (safe) termination of a process.

The supervisor pattern makes sense on a large scale - for example when implemented like in systemd - but also on a local scale when monitoring local processes. One might for example monitor any long running network service since it’s entirely possible for any process on most modern Unixoid system to be killed by an out of memory killer which is by the way a really crude way to deal with out of memory conditions - one can change this behavior of course and let memory allocation functions report an out of memory condition but that’s not the default behavior unfortunately.

Runtime module switching (Hot code reloading)

This is another one of the killer features of Erlang. With Erlang it’s entirely possible to run a mixture of different module versions at the same time and replace code - for example during an upgrade - without any service interruption. This is done using hot code reloading. The beam virtual machine allows to versions of the same module to be loaded at the same time - an old one and a current one. Processes currently executing old code are simply running inside old modules till the explicitly allow calling into a new version at runtime.

Note that there is a limit of two versions being loaded at the same time - so one has to monitor how many old processes are still running. In the most basic implementation one might have a single thread handling each client connected to a service with medium processing time. One might simply load a new service version and let all clients that have already been connected being handled with the old version of code. As soon as all threads have been terminated the old module version can be purged and unloaded.

This limitation is of course something one might do better when implementing some similar mechanism - for example using runtime swapable C modules.

The basic idea is:

code:purge old versions of the module. This unloads any code marked as old. In case any processes are still executing in old code these are killed at this state. In case one doesn’t purge old code and one wants to load a new version one might crash the whole application in case some code marked as old is still present. There is soft_purge that works similar but only purges in case no processes are lingering in the old modules any more.
Use compile:file to generate bytecode for the given modified new module and load it using code:load_file. This is marking any old versions as old and loading the new module versions. In case one now accesses any module function supplying the module name (for example ?MODULE:examplefunction) one calls into the new module.

A short example:

-module(hotcodereloadsample).
-export([loop/0]).

loop() ->
	receive
		replace ->
			code:purge(?MODULE),
			compile:file(?MODULE),
			code:load_file(?MODULE),
			?MODULE:loop();
		helloworld ->
			io:format("Hello WOld with a typo~n"),
			loop();
		_ ->
			loop()
	end.

As one can see in this example there is a single function that runs in an endless loop and receives incoming messages. There are two different messages handled - helloworld is simply writing a message to the console and the replace function is purging old code, compiling a new module version and loading it into memory. Then it uses ?MODULE:loop() to call into the new version. As one can see helloworld is simply looping with the tail recursive loop() so it will always stay in the old version of the module.

Now one can spawn a process containing this version that is also containing a typo.

$ erl
1> HotCodeReloadSample = spawn(hotcodereloadsample, loop, []).

Transmission of a helloworld message is as one would expect outputting the message including the typographic error.

2> HotCodeReloadSample ! helloworld
Hello WOld with a typo

Now one can fix the offending line in the source code:

helloworld ->
	io:format("Hello World!~n"),
	loop();

The process still behaves as previously:

3> HotCodeReloadSample ! helloworld
Hello WOld with a typo

Now one can trigger the replace:

4> HotCodeReloadSample ! replace

After replace has been called the message is outputted from the new version since the process entered the new version using ?MODULE:loop():

5> HotCodeReloadSample ! helloworld
Hello World!

In this example one did of course not exploit the ability of having multiple versions up and running at the same time - this is especially interesting in case one has a system that one doesn’t want to restart - like the previously managed ATM switch that doesn’t drop connections during an software upgrade or the mentioned drone flight control system that has to keep the drone in a safe flying condition and react to external events even though one wants to upgrade software.

In my opinion the idea of being capable of loading different versions of a module at the same time is a really powerful one - even if it’s somewhat hard to realize in many languages like Java (one has to write one owns class-loader hierarchy) or the .NET language family (keeping multiple assemblies loaded at the same time is not possible up to my knowledge). Even when programming in C one has to make local copies of each version to be capable of overwriting existing module versions - and one has to implement some kind of message routing and module registry to be capable of loading different versions at the same time. This requires some discipline and clear planning during the architecture stage. I’ve previously implemented such a system for the Steamboat Forgery project that aims to implement services that allow easy and cheap (mainly hobbyist) CNC machining DIY style.

There are existing implementations of such behaviors for example for Java - when one thinks about JavaEE servlet containers that allow runtime replacement and redeployment - most of the time unfortunately with some really short downtime and not seamless.