Using Aspell to perform spellchecking (manually and inside the build pipeline)
05 Jul 2020 - tsp
Last update 05 Jul 2020
3 mins
So now that my page got larger and larger - and most articles are written sometimes
at night or in short breaks - I decided that it would be time to include some
kind of spell checking into my writing and publishing process. The typical
tool to be used on Unices is aspell.
Installation
Itβs easily installable on FreeBSD using the textproc/aspell package as
well as the desired dictionaries.
sudo pkg install textproc/aspell
sudo pkg install textproc/en-aspell
Manual (interactive) checking
The basic interactive usage on the command line to check a markdown document
is rather simple:
aspell --dont-backup -p dictionary.pwd -M check 2020-06-12-nacaprofiles3d.md
In theory there exists another command line switch -p that allows one
to specify a user dictionary.This allows to keep a user dictionary inside my
repository so the fully automated pipeline automatically uses the same
user dictionary without modifying the master-dictionary on the build machine
manually.
The --dont-backup switch suppresses the generation of a backup file with
the same filename as the original and target file - just with an .bak
extension. The -M switch enables the markdown filter to suppress any
error detection inside markdown markup - or inside code tags.
Batch mode checking
For batch mode checking there are some different options:
- Automatically performing every replacement with the highest probability
correctly spelled word. This is a really bad idea since spell checkers
often miss names, specific terminology, etc.
- Running the spellchecker in automatic mode to count the number of errors.
In case the error count reaches a given threshold the build pipeline will
simply reject the markdown file(s). This is a way better approach.
The basic idea of the second approach is to modify the build script to
execute a simple command that counts the candidates of spelling errors
for every .md file:
cat ${FILENAME} | aspell -p dictionary.pwd -M list | wc -l
To get a total error candidate count:
find ./_posts/ -name "*.md" -exec cat {} \; | aspell -p dictionary.pwd -M list | wc -l
Another option would be to execute the spellchecker for every file separately
which is way more useful. Iβve done this by implementing a small shell script
that either accepts a directory or a filename. In case one specifies a directory
name the script simply iterates over the specified directory and executes itself
for every file. I didnβt use find since this POSIX conforming find
cannot process the return value of and program or script executed via -exec.
In case the script has been called with a filename the aspell -M list
command gets executed and the output gets counted. In case the spelling error
count is above a configurable threshold the script returns an error code.
#!/bin/sh
if [ $# -lt 1 ]; then
echo "Specify directory or filename"
return 1
fi
if [ -d ${1} ]; then
# find ${1} -type f -name "*.md" -exec ${0} {} \;
FAILING=0
for FNAME in ${1}/*.md; do
${0} ${FNAME}
if [ ! $? -eq 0 ]; then
FAILING=1
fi
done
if [ ! ${FAILING} -eq 0 ]; then
echo "Aborting - too many spellchecking errors"
fi
return ${FAILING}
else
ERRCOUNT=`cat ${1} | aspell -p dictionary.pwd -M list | wc -l`
echo "${ERRCOUNT} ${1}"
if [ ${ERRCOUNT} -gt 10 ]; then
return 1
fi
return 0
fi
This script is then called as usual using
the Makefile in itβs own Jenkins stage - in parallel to the
automatic tag page generation.
This article is tagged: