Converting a simple Jupyter LabBook or Diary to static HTML in a very crude and simple way

03 Jun 2025 - tsp
Last update 03 Jun 2025
Reading time 8 mins

Introduction

I remember a time when I didn’t particularly enjoy working with Jupyter Notebooks - or Python, for that matter. It felt too unstructured, too informal for “real” work. But as my professional toolkit expanded and the needs of my work in science evolved, I found myself returning to this environment more and more often. What started as reluctant usage slowly turned into recognition: Jupyter has a very particular kind of strength, especially when used as a digital lab book and interactive control interface for experiments. Suddenly, I could document everything I did - measurements, observations, thoughts, ideas - right next to the actual code that generated or analyzed the data or interacted with distributed control systems. That sort of closeness between narrative and logic is hard to beat.

Over time, I began using Jupyter as a central hub for some applications:

a markdown-rich notebook for structured notes
an interactive shell for controlling sensors and actuators
even a bridge to local LLMs (like Ollama) for dynamic assistance

Despite not relying on collaboration features - mainly due to concerns (based on experience, look up the issues about blank notebooks, etc. especially with unstable network connections which is not justifyable for any web application) around data loss and platform stability - I managed to create a robust workflow. My mobile phone automatically tags images with GPS metadata, uploads them via the (non free) FolderSync application when I’m back in a trusted network, and makes them available for documentation without any manual intervention. From within the notebook, I can schedule tasks, schedule notifications, edit my calendar, query sensor data or issue real-time commands to my automation infrastructure using Python, MQTT, and a few custom libraries that are simply importable.

However, despite all the interactivity, when it comes to searching in, sharing or archiving these notebooks, I prefer something more lightweight and durable: static HTML. It’s cleaner, faster and easier to access on any device. There’s no need for client-side JavaScript, no risk of losing functionality due to future incompatibilities and no risk of editing or destroying content. In this article, I’ll show how I convert my Jupyter notebooks into static HTML in a very simple and minimalistic way - no fancy indexing or dashboards, just clean, readable output that can serve as a lasting part of your lab book or digital diary. It’s a modest setup, but it captures what I like about Jupyter today: the seamless blend of narrative, computation, and automation. With Python, Matplotlib, Pillow, MQTT, and some glue code of my own, I’ve turned Jupyter into a personal mixture of notebook, lab console and creative canvas.

The solution presented in this article is of course not the only way - and compared to other solutions it’s very simplistic. I wanted to have a solution with minimal dependencies that I can trust and most likely still run in decades from now. It’s based on jupyters own nbconvert utility. Other more advanced solutions would be:

The Jekyll static site generator (that is actually used to build this site at the moment of writing this article). This has Ruby dependencies and requires a little bit more setup than I imagined
JupyterBook can create whole books including PDF out of Jupyter notebooks
Using mkdocs
Using Quarto which I have seen actually being used but had problems due to portability issues of the package
Many, many more

The script

The basic idea is just to run a simple script that iterates over all directories that should be converted. It will check for each ipynb if there is already a corresponding HTML at a target path. There is mirrors the same directory structure. If an corresponding html already exists and is newer than the last modified date of the ipynb nothing will happen. If the ipynb is newer or no HTML exists jupyter nbconvert is executed - passing it my image resizing filter to provide a sane size of my photographs in the rendered output. After all pages have been processed a very simple index page is generated (note that this keeps a list of all pages in memory so it will struggle with very huge sites - and the index is very basic, it’s nothing more than a mere list of links containing the first level heading as link text). This script is then executed from a shellscript followed by an rsync call that is executed whenever something has changed.

Without further explainations - this is the script that I’m using at the moment:

import os
import subprocess
import json
import sys
from pathlib import Path

# Update the following to match the deployment

SOURCE_DIR = Path("Diary")  # root directory containing .ipynb files
OUTPUT_DIR = Path("Diary HTML")  # parallel directory for .html files
INDEX_PATH = Path("Diary HTML/index.html")

os.chdir("/usr/home/exampleuser/jupyter")

# Do not motify starting from here till the index page

nbindex_toplevel = []
grouped_by_subdir = {}

has_updated = False

def extract_first_heading_from_ipynb(nb_path):
    try:
        with open(nb_path, 'r', encoding='utf-8') as f:
            nb = json.load(f)
            for cell in nb.get("cells", []):
                if cell.get("cell_type") == "markdown":
                    lines = cell.get("source", [])
                    for line in lines:
                        if line.strip().startswith("#"):
                            return line.lstrip("#").strip()
    except Exception:
        pass
    return str(nb_path)


for ipynb_path in SOURCE_DIR.rglob("*.ipynb"):
    if ".ipynb_checkpoints" in ipynb_path.parts:
        continue

    rel_path = ipynb_path.relative_to(SOURCE_DIR)
    html_path = OUTPUT_DIR / rel_path.with_suffix(".html")
    html_rel_path = rel_path.with_suffix(".html")

    # Get title (we need this for all files)
    title = extract_first_heading_from_ipynb(ipynb_path.resolve())

    if rel_path.parent == Path("."):
        nbindex_toplevel.append((html_rel_path.as_posix(), title))
    else:
        group = rel_path.parent.as_posix()
        grouped_by_subdir.setdefault(group, []).append((html_rel_path.as_posix(), title))

    if html_path.exists() and html_path.stat().st_mtime >= ipynb_path.stat().st_mtime:
        continue  # skip conversion


    html_path.parent.mkdir(parents=True, exist_ok=True)

    cmd = [
        "jupyter", "nbconvert", str(ipynb_path.resolve()),
        "--embed-images",
        "--to", "html",
        "--output", str(html_path.resolve()),
        "--HTMLExporter.preprocessors=['inline_markdown_images_preprocessor.InlineMarkdownImagesPreprocessor']"
    ]

    subprocess.run(cmd) # , cwd=html_path.parent)
    has_updated = True


if has_updated:
    with open(INDEX_PATH, "w", encoding="utf-8") as f:
        f.write("<html><head><title>Diary</title><body><h1>Diary</h1>\n")

        # Top-level files
        if nbindex_toplevel:
            f.write("<h2>Top level entries</h2>\n<ul>\n")
            for rel_link, title in sorted(nbindex_toplevel):
                f.write(f'<li><a href="{rel_link}">{title}</a></li>\n')
            f.write("</ul>\n")

        # Subdirectories
        for group, entries in sorted(grouped_by_subdir.items()):
            f.write(f"<h2>{group}</h2>\n<ul>\n")
            for rel_link, title in sorted(entries):
                f.write(f'<li><a href="{rel_link}">{title}</a></li>\n')
            f.write("</ul>\n")

        f.write("</body></html>")

    sys.exit(0)
else:
    sys.exit(1)

The markdown processor

The markdown preprocessor is the same one as shown in a previous article. It resizes all images references in the Markdown sections to some sane size.

from nbconvert.preprocessors import Preprocessor
from traitlets import Integer
from PIL import Image
import base64
import os
import re
from io import BytesIO
from pathlib import Path

class InlineMarkdownImagesPreprocessor(Preprocessor):
    max_size = Integer(600, help="Maximum width/height in pixels").tag(config=True)
    jpeg_quality = Integer(85, help="JPEG compression quality").tag(config=True)

    def preprocess(self, nb, resources):
        self.notebook_dir = Path(resources.get('metadata', {}).get('path', '.')).resolve()
        return super().preprocess(nb, resources)

    def process_img(self, path):
        # If the supplied image is not a filename (for example in case it's
        # already a base64 string or similar) we ignore it
        try:
            full_path = (self.notebook_dir / path).resolve()
        except exception:
            return None

        if not os.path.isfile(full_path):
            return None

        # Now try to open the image using Pillow
        # and utlize the thumbnail method. This only resizes in case the
        # image is larger along one of the supplied dimensions and
        # preserves the aspect ratio (in-place)
        try:
            img = Image.open(full_path).convert("RGB")
            img.thumbnail((self.max_size, self.max_size), Image.LANCZOS)
            buffer = BytesIO()
            img.save(buffer, format="JPEG", quality=self.jpeg_quality)
            encoded = base64.b64encode(buffer.getvalue()).decode('utf-8')
            return f'data:image/jpeg;base64,{encoded}'
        except Exception as e:
            # In case we had not been able to modify the image we keep the
            # original
            print(f"Failed to inline image {path}: {e}")
            return None

    # This is the overriden public method that is called by Preprocessor
    def preprocess_cell(self, cell, resources, cell_index):
        # We only handle _markdown_ cells (this does not affect
        # the output of code cells for example).

        if cell.cell_type == "markdown":
            # Find Markdown-style and HTML-style image references
            # utilizing regular expressions - this is not proper
            # parsing of HTML though it should be sufficient for
            # Jupyter notebooks. If not this should be replaced with
            # proper markdown and HTML parsing.
            def replace_match(match):
                path = match.group(1) or match.group(2)
                b64 = self.process_img(path)
                if b64:
                    return f'<img src="{b64}" style="max-width:100%;">'
                else:
                    return match.group(0)

            # Regex for ![alt](file.jpg) or <img src="file.jpg">
            cell.source = re.sub(
                r'!\[.*?\]\(([^)]+)\)|<img\s+[^>]*src="([^"]+)"[^>]*>',
                replace_match,
                cell.source
            )

        return cell, resources

The shell script

Everything is now glued together with a shellscript that is executed via cron at fixed intervals. There is no inode-watching to update on demand, checking for modifications is done periodically. The script I’m using at the moment is:

cd /usr/home/exampleuser/jupyter
python3.11 make_html.py
if [ $? -eq 0 ]; then
	rsync -av /home/exampleuser/jupyter/Diary\ HTML/  exampleuser@remote.example.com:/usr/www/www.example.com/www/diary/
fi

Conclusion

The shown solution is a very crude and extremly simple method to generate a static rendering of the Jupyter notebooks contained in a specific directory. It’s useful for very small scale deployments where larger solutions would be just too much work or effort to set up. I’ve also used hardcoded paths in this case - since it’s really thought only to be used for a single site.