Using Aspell to perform spellchecking (manually and inside the build pipeline)

05 Jul 2020 - tsp
Last update 05 Jul 2020
Reading time 3 mins

So now that my page got larger and larger - and most articles are written sometimes at night or in short breaks - I decided that it would be time to include some kind of spell checking into my writing and publishing process. The typical tool to be used on Unices is aspell.

Installation

It’s easily installable on FreeBSD using the textproc/aspell package as well as the desired dictionaries.

sudo pkg install textproc/aspell
sudo pkg install textproc/en-aspell

Manual (interactive) checking

The basic interactive usage on the command line to check a markdown document is rather simple:

aspell --dont-backup -p dictionary.pwd -M check 2020-06-12-nacaprofiles3d.md

In theory there exists another command line switch -p that allows one to specify a user dictionary.This allows to keep a user dictionary inside my repository so the fully automated pipeline automatically uses the same user dictionary without modifying the master-dictionary on the build machine manually.

The --dont-backup switch suppresses the generation of a backup file with the same filename as the original and target file - just with an .bak extension. The -M switch enables the markdown filter to suppress any error detection inside markdown markup - or inside code tags.

Batch mode checking

For batch mode checking there are some different options:

Automatically performing every replacement with the highest probability correctly spelled word. This is a really bad idea since spell checkers often miss names, specific terminology, etc.
Running the spellchecker in automatic mode to count the number of errors. In case the error count reaches a given threshold the build pipeline will simply reject the markdown file(s). This is a way better approach.

The basic idea of the second approach is to modify the build script to execute a simple command that counts the candidates of spelling errors for every .md file:

cat ${FILENAME} | aspell -p dictionary.pwd -M list | wc -l

To get a total error candidate count:

find ./_posts/ -name "*.md" -exec cat {} \; | aspell -p dictionary.pwd -M list | wc -l

Another option would be to execute the spellchecker for every file separately which is way more useful. I’ve done this by implementing a small shell script that either accepts a directory or a filename. In case one specifies a directory name the script simply iterates over the specified directory and executes itself for every file. I didn’t use find since this POSIX conforming find cannot process the return value of and program or script executed via -exec.

In case the script has been called with a filename the aspell -M list command gets executed and the output gets counted. In case the spelling error count is above a configurable threshold the script returns an error code.

#!/bin/sh

if [ $# -lt 1 ]; then
        echo "Specify directory or filename"
        return 1
fi

if [ -d ${1} ]; then
        # find ${1} -type f -name "*.md" -exec ${0} {} \;
        FAILING=0
        for FNAME in ${1}/*.md; do
                ${0} ${FNAME}
                if [ ! $? -eq 0 ]; then
                        FAILING=1
                fi
        done

        if [ ! ${FAILING} -eq 0 ]; then
                echo "Aborting - too many spellchecking errors"
        fi

        return ${FAILING}
else
        ERRCOUNT=`cat ${1} | aspell -p dictionary.pwd -M list | wc -l`
        echo "${ERRCOUNT}   ${1}"

        if [ ${ERRCOUNT} -gt 10 ]; then
                return 1
        fi
        return 0
fi

This script is then called as usual using the Makefile in it’s own Jenkins stage - in parallel to the automatic tag page generation.