27 Jan 2023 - tsp
Last update 27 Jan 2023
9 mins
TL;DR In theory one can just put any JupyterLab installation behind any HTTP authentication based authenticating reverse proxy. But since Jupyter passes the internal token to all XML HTTP Requests it leaks the given token - and replaces and HTTP basic or digest authentication information during XHR requests. This means one either has to match the shared persistent token (that’s also leaked to all users - they could use this token to access all content on the Jupyter notebook) - so this might be no solution at all depending on the environment those notebooks are used.
So since I had to setup a JupyterLab instance at work and we
required a proxied setup - that means we wanted to run the Jupyter instance behind
a reverse proxy that would handle SSL termination, validation of requests, load balancing
and authentication - and since it turned out to be not as straight forward as one would
like to I decided to write this short summary about the usage of haproxy
in front
of JupyterLab
. The work relation is also the reason why in this blog article
all actions will be described for Manjaro Linux as well as my personal all time favorite
Unixoid operating system - FreeBSD.
Note that there currently is a problem with the way how JupyterLab handles authentication with their XML HTTP requests so this solution leaks the authentication token that is constant over all time and shared by all users to authenticated users. This token in exchange can be used to perform arbitrary requests on the JupyterLab instance - one can see that by design JupyterLab is not a multiuser solution. This means that it looks like users get their own custom username and password combination but they are still able to fetch a universal authentication token and perform actions using this token even when one has removed their user accounts. This has to be fixed in JupyterLab though, there is no way to do this on the proxy (and as far as the bugtracker history goes I don’t think there is a huge intent to fix this).
haproxy
The first step is the installation of the proxy server. In this case haproxy
has been used. This is an open reverse proxy used for load balancing for HTTP over TCP
connections.
On FreeBSD the simplest way to install haproxy
is the package manager pkg
$ sudo pkg install haproxy
One could also install using ports:
# cd /usr/ports/net/haproxy
# make install clean
This installs haproxy
system wide as well as the rc.init
startup script
at /usr/local/etc/rc.d/haproxy
. To launch haproxy
on boot one can add
the line
haproxy_enable="YES"
to /etc/rc.conf
. Starting and stopping works as usual for a FreeBSD application:
/usr/local/etc/rc.d/haproxy start
/usr/local/etc/rc.d/haproxy stop
/usr/local/etc/rc.d/haproxy reload
The simplest way to install haproxy
on Manjaro is via the pamac
command
sudo pamac install haproxy
This installs haproxy
system wide as well as the systemd
startup files
to launch haproxy
on boot. The configuration file is created
at /etc/haproxy/haproxy.conf
To enable haproxy
on boot one has to enable it using
sudo systemctl enable haproxy
Starting, restarting and stopping can be done using systemctl
:
systemctl start haproxy
systemctl stop haproxy
systemctl restart haproxy
haproxy
SSL certificates
Since one usually wants to expose JupyterLab to the public using SSL one has to either generate a self signed certificate as shown below or use some kind of certificate deployment mechanism (I personally use acme.sh with DNS-01 method and some custom distribution mechanism from the certificate bot machine).
To generate a self signed certificate for internal use or testing one can use OpenSSL:
openssl genrsa -out selfsigned.key 1024
openssl req -new -key selfsigned.key -out selfsigned.csr
openssl x509 -req -days 365 -in selfsigned.csr -signkey selfsigned.key -out selfsigned.crt
cat selfsigned.crt selfsigned.key > selfsigned.pem
The PEM file including key and certificate is best stored in the same location as
the configuration file (/usr/local/etc/haproxy/ssl.pem
for FreeBSD
or /etc/haproxy/ssl.pem
for Manjaro Linux).
haproxy.conf
The next step is frontend, backend and user configuration. This is done in the
haproxy configuration file (FreeBSD: /usr/local/etc/haproxy.conf
,
Manjaro Linux: /etc/haproxy/haproxy.conf
)
The global
section looks different on Linux and FreeBSD since FreeBSD sets users,
chroot and logging as well as pidfile and daemon operation in it’s rc.init
script
while Manjaro does not. For FreeBSD the following global section is sufficient:
global
daemon
maxconn 20000
For Manjaro the following could be used:
global
maxconn 20000
log 127.0.0.1 local0
user haproxy
chroot /usr/share/haproxy
pidfile /run/haproxy.pid
daemon
The next section is the user configuration. When one stores the user database directly
inside the configuration file one will use hashed passwords. Those are generated by the mkpasswd
command in the shell (this can be installed on Manjaro using the whois
package for example).
userlist examplerealm
user exampleuser password $y$j....
The user entry always starts with user
followed by a username, the string password
indicates
that an hashed password is going to follow. One can add one user a line.
Now the frontend can be configured. A frontend is the component of haproxy
that accepts incoming
connections. Since JupyterLab
uses token authentication for it’s XML RPC requests one has
to prevent haproxy
from stripping or failing requests with a given token. This is simple since
JupyterLab also only uses a single static token all the time - one can directly match the correct
token and prevent authentication in this case. Else one performs authentication on all unauthenticated
requests, adds an forwardfor
header as any good proxy should do and might want to run
an http log:
frontend wwwsport
bind :80
bind :443 ssl crt /usr/local/etc/haproxy/ssl.pem
mode http
option httplog
option dontlognull
option forwardfor except 127.0.0.0/8
acl correcttoken req.hdr(Authorization) -i -m str "token XXXXXXX"
acl jupyauthok http_auth(examplerealm)
http-request auth realm SampleRealm if !correcttoken !jupyauthok
maxconn 3000
timeout client 30s
acl url_jupynotebook path_beg -i /examplebook
use_backend examplejupyterbook if url_jupynotebook
default_backend defaultwww
Now the only missing part for haproxy
is the backend configuration:
backend examplejupyterbook
mode http
balance roundrobin
option forwardfor
option http-server-close
option redispatch
timeout connect 10s
timeout server 300s
http-response del-header Authorization
http-request set-header Authorization "token XXXXX"
server examplejupyterbook jupyter.example.com:8888 check
Note that the token supplied in both Àuthorization
` header has also to be specified in the
JupyterLab configuration later on.
In my case I also configured a default static site serving default backend for an index
at /
backend defaultwww
mode http
balancce roundrobin
timeout connect 5s
timeout server 5s
server staticwwwserver www.example.com:80 check
After finishing up the configuration one can simply reload the configuration or start haproxy
.
In best case create a new Unix user that will later on run JupyterLab. Then install as
usual using pip
. In the following example it’s assumed this user is called myjupyteruser
$ su myjupyteruser
$ cd ~
$ pip install jupyterlab
Then one can generate a new configuration file when one performs the installation manually:
$ jupyter-lab --generate-config
The configuration file will be stored in ~/.jupyter/jupyter_lab_config.conf
.
Some minor modifications will be required:
c.NotebookApp.token="XXX"
will be used to provide the same shared authentication token
as has been specified above in the haproxy
configurationc.NotebookApp.password="..."
might be set to any password in addition when also accessing
the notebook server via it’s port directly instead of via the proxy.c.ServerApp.base_url="/examplebook"
allows one to set the base path relative to the
URL so one can share the same hostname and domain with other applications or run multiple instances.c.NotebookApp.allow_origin="*"
can be used to set the CORS policy. *
is pretty unsafe,
usually one should list all allowed hosts.#c.ServerApp.port = 8888
might be set to ensure the JupyterLab is always launched at the same
port. This is especially important when one runs multiple instances.This basically was all of the required configuration. One can now start Jupyter and try out the new configuration. This could be done by simply launching Jupyter from the command line as a quick test:
/usr/bin/jupyter-lab --ip="192.0.2.1" --no-browser --notebook-dir=/home/myjupyteruser/notebooks --collaborative
The best way to launch services is a init.d
script. This could be put
into /usr/local/etc/rc.d/jupyterlab
for example:
The best way to launch services is a systemd
init script. This could be put
into /etc/systemd/system/jupyter.service
for example:
[Unit]
Description=Jupyter Lab
[Service]
Type=simple
PIDFile=/run/jupyter.pid
ExecStart=/bin/bash -c "/usr/bin/jupyter-lab --ip="192.0.2.1" --no-browser --notebook-dir=/home/myjupyteruser/notebooks --collaborative
User=myjupyteruser
Group=myjupyteruser
WorkingDirectory=/home/myjupyteruser/notebooks
Restart=always
RestartSec=10
Now one can start and enable the service:
$ sudo systemctl enable jupyter
$ sudo systemctl start jupyter
One cannot emphasize this enough - the workaround presented in this blog article leaks the authentication token used between JupyterLab and the proxy. Using this token anyone can perform any action - even through the proxy. This is due to an design problem in JupyterLab that simply does not assume multiuser operation and there is no simple non stateful fix on the proxy side for this problem. So even when you remove a user or change a password anyone who knows the token still can access the JupyterLab and perform arbitrary actions - and thus also perform arbitrary actions with the Unix user account that JupyterLab is running under.
This article is tagged:
Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)
This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/