03 Jun 2025 - tsp
Last update 03 Jun 2025
8 mins
I remember a time when I didn’t particularly enjoy working with Jupyter Notebooks - or Python, for that matter. It felt too unstructured, too informal for “real” work. But as my professional toolkit expanded and the needs of my work in science evolved, I found myself returning to this environment more and more often. What started as reluctant usage slowly turned into recognition: Jupyter has a very particular kind of strength, especially when used as a digital lab book and interactive control interface for experiments. Suddenly, I could document everything I did - measurements, observations, thoughts, ideas - right next to the actual code that generated or analyzed the data or interacted with distributed control systems. That sort of closeness between narrative and logic is hard to beat.
Over time, I began using Jupyter as a central hub for some applications:
Despite not relying on collaboration features - mainly due to concerns (based on experience, look up the issues about blank notebooks, etc. especially with unstable network connections which is not justifyable for any web application) around data loss and platform stability - I managed to create a robust workflow. My mobile phone automatically tags images with GPS metadata, uploads them via the (non free) FolderSync application when I’m back in a trusted network, and makes them available for documentation without any manual intervention. From within the notebook, I can schedule tasks, schedule notifications, edit my calendar, query sensor data or issue real-time commands to my automation infrastructure using Python, MQTT, and a few custom libraries that are simply importable.
However, despite all the interactivity, when it comes to searching in, sharing or archiving these notebooks, I prefer something more lightweight and durable: static HTML. It’s cleaner, faster and easier to access on any device. There’s no need for client-side JavaScript, no risk of losing functionality due to future incompatibilities and no risk of editing or destroying content. In this article, I’ll show how I convert my Jupyter notebooks into static HTML in a very simple and minimalistic way - no fancy indexing or dashboards, just clean, readable output that can serve as a lasting part of your lab book or digital diary. It’s a modest setup, but it captures what I like about Jupyter today: the seamless blend of narrative, computation, and automation. With Python, Matplotlib, Pillow, MQTT, and some glue code of my own, I’ve turned Jupyter into a personal mixture of notebook, lab console and creative canvas.
The solution presented in this article is of course not the only way - and compared to other solutions it’s very simplistic. I wanted to have a solution with minimal dependencies that I can trust and most likely still run in decades from now. It’s based on jupyters own nbconvert
utility. Other more advanced solutions would be:
The basic idea is just to run a simple script that iterates over all directories that should be converted. It will check for each ipynb
if there is already a corresponding HTML at a target path. There is mirrors the same directory structure. If an corresponding html
already exists and is newer than the last modified date of the ipynb
nothing will happen. If the ipynb
is newer or no HTML exists jupyter nbconvert
is executed - passing it my image resizing filter to provide a sane size of my photographs in the rendered output. After all pages have been processed a very simple index page is generated (note that this keeps a list of all pages in memory so it will struggle with very huge sites - and the index is very basic, it’s nothing more than a mere list of links containing the first level heading as link text). This script is then executed from a shellscript followed by an rsync
call that is executed whenever something has changed.
Without further explainations - this is the script that I’m using at the moment:
import os
import subprocess
import json
import sys
from pathlib import Path
# Update the following to match the deployment
SOURCE_DIR = Path("Diary") # root directory containing .ipynb files
OUTPUT_DIR = Path("Diary HTML") # parallel directory for .html files
INDEX_PATH = Path("Diary HTML/index.html")
os.chdir("/usr/home/exampleuser/jupyter")
# Do not motify starting from here till the index page
nbindex_toplevel = []
grouped_by_subdir = {}
has_updated = False
def extract_first_heading_from_ipynb(nb_path):
try:
with open(nb_path, 'r', encoding='utf-8') as f:
nb = json.load(f)
for cell in nb.get("cells", []):
if cell.get("cell_type") == "markdown":
lines = cell.get("source", [])
for line in lines:
if line.strip().startswith("#"):
return line.lstrip("#").strip()
except Exception:
pass
return str(nb_path)
for ipynb_path in SOURCE_DIR.rglob("*.ipynb"):
if ".ipynb_checkpoints" in ipynb_path.parts:
continue
rel_path = ipynb_path.relative_to(SOURCE_DIR)
html_path = OUTPUT_DIR / rel_path.with_suffix(".html")
html_rel_path = rel_path.with_suffix(".html")
# Get title (we need this for all files)
title = extract_first_heading_from_ipynb(ipynb_path.resolve())
if rel_path.parent == Path("."):
nbindex_toplevel.append((html_rel_path.as_posix(), title))
else:
group = rel_path.parent.as_posix()
grouped_by_subdir.setdefault(group, []).append((html_rel_path.as_posix(), title))
if html_path.exists() and html_path.stat().st_mtime >= ipynb_path.stat().st_mtime:
continue # skip conversion
html_path.parent.mkdir(parents=True, exist_ok=True)
cmd = [
"jupyter", "nbconvert", str(ipynb_path.resolve()),
"--embed-images",
"--to", "html",
"--output", str(html_path.resolve()),
"--HTMLExporter.preprocessors=['inline_markdown_images_preprocessor.InlineMarkdownImagesPreprocessor']"
]
subprocess.run(cmd) # , cwd=html_path.parent)
has_updated = True
if has_updated:
with open(INDEX_PATH, "w", encoding="utf-8") as f:
f.write("<html><head><title>Diary</title><body><h1>Diary</h1>\n")
# Top-level files
if nbindex_toplevel:
f.write("<h2>Top level entries</h2>\n<ul>\n")
for rel_link, title in sorted(nbindex_toplevel):
f.write(f'<li><a href="{rel_link}">{title}</a></li>\n')
f.write("</ul>\n")
# Subdirectories
for group, entries in sorted(grouped_by_subdir.items()):
f.write(f"<h2>{group}</h2>\n<ul>\n")
for rel_link, title in sorted(entries):
f.write(f'<li><a href="{rel_link}">{title}</a></li>\n')
f.write("</ul>\n")
f.write("</body></html>")
sys.exit(0)
else:
sys.exit(1)
The markdown preprocessor is the same one as shown in a previous article. It resizes all images references in the Markdown sections to some sane size.
from nbconvert.preprocessors import Preprocessor
from traitlets import Integer
from PIL import Image
import base64
import os
import re
from io import BytesIO
from pathlib import Path
class InlineMarkdownImagesPreprocessor(Preprocessor):
max_size = Integer(600, help="Maximum width/height in pixels").tag(config=True)
jpeg_quality = Integer(85, help="JPEG compression quality").tag(config=True)
def preprocess(self, nb, resources):
self.notebook_dir = Path(resources.get('metadata', {}).get('path', '.')).resolve()
return super().preprocess(nb, resources)
def process_img(self, path):
# If the supplied image is not a filename (for example in case it's
# already a base64 string or similar) we ignore it
try:
full_path = (self.notebook_dir / path).resolve()
except exception:
return None
if not os.path.isfile(full_path):
return None
# Now try to open the image using Pillow
# and utlize the thumbnail method. This only resizes in case the
# image is larger along one of the supplied dimensions and
# preserves the aspect ratio (in-place)
try:
img = Image.open(full_path).convert("RGB")
img.thumbnail((self.max_size, self.max_size), Image.LANCZOS)
buffer = BytesIO()
img.save(buffer, format="JPEG", quality=self.jpeg_quality)
encoded = base64.b64encode(buffer.getvalue()).decode('utf-8')
return f'data:image/jpeg;base64,{encoded}'
except Exception as e:
# In case we had not been able to modify the image we keep the
# original
print(f"Failed to inline image {path}: {e}")
return None
# This is the overriden public method that is called by Preprocessor
def preprocess_cell(self, cell, resources, cell_index):
# We only handle _markdown_ cells (this does not affect
# the output of code cells for example).
if cell.cell_type == "markdown":
# Find Markdown-style and HTML-style image references
# utilizing regular expressions - this is not proper
# parsing of HTML though it should be sufficient for
# Jupyter notebooks. If not this should be replaced with
# proper markdown and HTML parsing.
def replace_match(match):
path = match.group(1) or match.group(2)
b64 = self.process_img(path)
if b64:
return f'<img src="{b64}" style="max-width:100%;">'
else:
return match.group(0)
# Regex for  or <img src="file.jpg">
cell.source = re.sub(
r'!\[.*?\]\(([^)]+)\)|<img\s+[^>]*src="([^"]+)"[^>]*>',
replace_match,
cell.source
)
return cell, resources
Everything is now glued together with a shellscript that is executed via cron
at fixed intervals. There is no inode-watching to update on demand, checking for modifications is done periodically. The script I’m using at the moment is:
cd /usr/home/exampleuser/jupyter
python3.11 make_html.py
if [ $? -eq 0 ]; then
rsync -av /home/exampleuser/jupyter/Diary\ HTML/ exampleuser@remote.example.com:/usr/www/www.example.com/www/diary/
fi
The shown solution is a very crude and extremly simple method to generate a static rendering of the Jupyter notebooks contained in a specific directory. It’s useful for very small scale deployments where larger solutions would be just too much work or effort to set up. I’ve also used hardcoded paths in this case - since it’s really thought only to be used for a single site.
This article is tagged:
Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)
This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/