Running JupyterLab behind an authenticating haproxy setup using basic auth

27 Jan 2023 - tsp
Last update 27 Jan 2023
Reading time 9 mins

TL;DR In theory one can just put any JupyterLab installation behind any HTTP authentication based authenticating reverse proxy. But since Jupyter passes the internal token to all XML HTTP Requests it leaks the given token - and replaces and HTTP basic or digest authentication information during XHR requests. This means one either has to match the shared persistent token (that’s also leaked to all users - they could use this token to access all content on the Jupyter notebook) - so this might be no solution at all depending on the environment those notebooks are used.

So since I had to setup a JupyterLab instance at work and we required a proxied setup - that means we wanted to run the Jupyter instance behind a reverse proxy that would handle SSL termination, validation of requests, load balancing and authentication - and since it turned out to be not as straight forward as one would like to I decided to write this short summary about the usage of haproxy in front of JupyterLab. The work relation is also the reason why in this blog article all actions will be described for Manjaro Linux as well as my personal all time favorite Unixoid operating system - FreeBSD.

Note that there currently is a problem with the way how JupyterLab handles authentication with their XML HTTP requests so this solution leaks the authentication token that is constant over all time and shared by all users to authenticated users. This token in exchange can be used to perform arbitrary requests on the JupyterLab instance - one can see that by design JupyterLab is not a multiuser solution. This means that it looks like users get their own custom username and password combination but they are still able to fetch a universal authentication token and perform actions using this token even when one has removed their user accounts. This has to be fixed in JupyterLab though, there is no way to do this on the proxy (and as far as the bugtracker history goes I don’t think there is a huge intent to fix this).

Installing `haproxy`

The first step is the installation of the proxy server. In this case haproxy has been used. This is an open reverse proxy used for load balancing for HTTP over TCP connections.

FreeBSD

On FreeBSD the simplest way to install haproxy is the package manager pkg

$ sudo pkg install haproxy

One could also install using ports:

# cd /usr/ports/net/haproxy
# make install clean

This installs haproxy system wide as well as the rc.init startup script at /usr/local/etc/rc.d/haproxy. To launch haproxy on boot one can add the line

haproxy_enable="YES"

to /etc/rc.conf. Starting and stopping works as usual for a FreeBSD application:

Starting: /usr/local/etc/rc.d/haproxy start
Stop: /usr/local/etc/rc.d/haproxy stop
Reload: /usr/local/etc/rc.d/haproxy reload

Manjaro Linux

The simplest way to install haproxy on Manjaro is via the pamac command

sudo pamac install haproxy

This installs haproxy system wide as well as the systemd startup files to launch haproxy on boot. The configuration file is created at /etc/haproxy/haproxy.conf

To enable haproxy on boot one has to enable it using

sudo systemctl enable haproxy

Starting, restarting and stopping can be done using systemctl:

Starting: systemctl start haproxy
Stopping: systemctl stop haproxy
Restarting: systemctl restart haproxy

Configuring `haproxy`

`SSL certificates`

Since one usually wants to expose JupyterLab to the public using SSL one has to either generate a self signed certificate as shown below or use some kind of certificate deployment mechanism (I personally use acme.sh with DNS-01 method and some custom distribution mechanism from the certificate bot machine).

To generate a self signed certificate for internal use or testing one can use OpenSSL:

openssl genrsa -out selfsigned.key 1024
openssl req -new -key selfsigned.key -out selfsigned.csr
openssl x509 -req -days 365 -in selfsigned.csr -signkey selfsigned.key -out selfsigned.crt
cat selfsigned.crt selfsigned.key > selfsigned.pem

The PEM file including key and certificate is best stored in the same location as the configuration file (/usr/local/etc/haproxy/ssl.pem for FreeBSD or /etc/haproxy/ssl.pem for Manjaro Linux).

`haproxy.conf`

The next step is frontend, backend and user configuration. This is done in the haproxy configuration file (FreeBSD: /usr/local/etc/haproxy.conf, Manjaro Linux: /etc/haproxy/haproxy.conf)

The globalsection looks different on Linux and FreeBSD since FreeBSD sets users, chroot and logging as well as pidfile and daemon operation in it’s rc.init script while Manjaro does not. For FreeBSD the following global section is sufficient:

global
	daemon
	maxconn 20000

For Manjaro the following could be used:

global
	maxconn 20000
	log 127.0.0.1 local0
	user haproxy
	chroot /usr/share/haproxy
	pidfile /run/haproxy.pid
	daemon

The next section is the user configuration. When one stores the user database directly inside the configuration file one will use hashed passwords. Those are generated by the mkpasswd command in the shell (this can be installed on Manjaro using the whois package for example).

userlist examplerealm
	user exampleuser password $y$j....

The user entry always starts with user followed by a username, the string password indicates that an hashed password is going to follow. One can add one user a line.

Now the frontend can be configured. A frontend is the component of haproxy that accepts incoming connections. Since JupyterLab uses token authentication for it’s XML RPC requests one has to prevent haproxy from stripping or failing requests with a given token. This is simple since JupyterLab also only uses a single static token all the time - one can directly match the correct token and prevent authentication in this case. Else one performs authentication on all unauthenticated requests, adds an forwardfor header as any good proxy should do and might want to run an http log:

frontend wwwsport
	bind :80
	bind :443 ssl crt /usr/local/etc/haproxy/ssl.pem
	mode http

	option httplog
	option dontlognull
	option forwardfor except 127.0.0.0/8

	acl correcttoken req.hdr(Authorization) -i -m str "token XXXXXXX"
	acl jupyauthok http_auth(examplerealm)

	http-request auth realm SampleRealm if !correcttoken !jupyauthok

	maxconn 3000
	timeout client 30s

	acl url_jupynotebook path_beg -i /examplebook

	use_backend examplejupyterbook if url_jupynotebook
	default_backend defaultwww

Now the only missing part for haproxy is the backend configuration:

backend examplejupyterbook
	mode http
	balance roundrobin

	option forwardfor
	option http-server-close
	option redispatch

	timeout connect 10s
	timeout server 300s

	http-response del-header Authorization
	http-request set-header Authorization "token XXXXX"

	server examplejupyterbook jupyter.example.com:8888 check

Note that the token supplied in both Àuthorization` header has also to be specified in the JupyterLab configuration later on.

In my case I also configured a default static site serving default backend for an index at /

backend defaultwww
	mode http
	balancce roundrobin

	timeout connect 5s
	timeout server 5s

	server staticwwwserver www.example.com:80 check

After finishing up the configuration one can simply reload the configuration or start haproxy.

Installing JupyterLab

In best case create a new Unix user that will later on run JupyterLab. Then install as usual using pip. In the following example it’s assumed this user is called myjupyteruser

$ su myjupyteruser
$ cd ~
$ pip install jupyterlab

Then one can generate a new configuration file when one performs the installation manually:

$ jupyter-lab --generate-config

The configuration file will be stored in ~/.jupyter/jupyter_lab_config.conf. Some minor modifications will be required:

c.NotebookApp.token="XXX" will be used to provide the same shared authentication token as has been specified above in the haproxy configuration
c.NotebookApp.password="..." might be set to any password in addition when also accessing the notebook server via it’s port directly instead of via the proxy.
c.ServerApp.base_url="/examplebook" allows one to set the base path relative to the URL so one can share the same hostname and domain with other applications or run multiple instances.
c.NotebookApp.allow_origin="*" can be used to set the CORS policy. * is pretty unsafe, usually one should list all allowed hosts.#
c.ServerApp.port = 8888 might be set to ensure the JupyterLab is always launched at the same port. This is especially important when one runs multiple instances.

Starting Jupyter

This basically was all of the required configuration. One can now start Jupyter and try out the new configuration. This could be done by simply launching Jupyter from the command line as a quick test:

/usr/bin/jupyter-lab --ip="192.0.2.1" --no-browser --notebook-dir=/home/myjupyteruser/notebooks --collaborative

FreeBSD

The best way to launch services is a init.d script. This could be put into /usr/local/etc/rc.d/jupyterlab for example:

Manjaro Linux

The best way to launch services is a systemd init script. This could be put into /etc/systemd/system/jupyter.service for example:

[Unit]
Description=Jupyter Lab

[Service]
Type=simple
PIDFile=/run/jupyter.pid
ExecStart=/bin/bash -c "/usr/bin/jupyter-lab --ip="192.0.2.1" --no-browser --notebook-dir=/home/myjupyteruser/notebooks --collaborative
User=myjupyteruser
Group=myjupyteruser
WorkingDirectory=/home/myjupyteruser/notebooks
Restart=always
RestartSec=10

Now one can start and enable the service:

$ sudo systemctl enable jupyter
$ sudo systemctl start jupyter

A word of caution

One cannot emphasize this enough - the workaround presented in this blog article leaks the authentication token used between JupyterLab and the proxy. Using this token anyone can perform any action - even through the proxy. This is due to an design problem in JupyterLab that simply does not assume multiuser operation and there is no simple non stateful fix on the proxy side for this problem. So even when you remove a user or change a password anyone who knows the token still can access the JupyterLab and perform arbitrary actions - and thus also perform arbitrary actions with the Unix user account that JupyterLab is running under.

Running JupyterLab behind an authenticating haproxy setup using basic auth

Installing haproxy

FreeBSD

Manjaro Linux

Configuring haproxy

SSL certificates

haproxy.conf

Installing JupyterLab

Starting Jupyter

FreeBSD

Manjaro Linux

A word of caution

Installing `haproxy`

Configuring `haproxy`

`SSL certificates`

`haproxy.conf`