Simple JPEG image I/O with libjpeg

20 Mar 2020 - tsp
Last update 23 Dec 2021
Reading time 11 mins

Back when I started experimenting with computer vision algorithms I’ve always had to major problem of having to use either proprietary toolboxes (back then when I started playing around with computer vision at university we used MatLab with the highly flexible image processing toolbox) or use some ugly hacks (like having a half baked JPEG decoder implementation written by myself that unfortunately was never really finished). After some time I came to the decision that using libjpeg or libjpeg-turbo would be a good idea since it provides a solid implementation of JPEG and a rather simple programming interface. But the documentation was hard to read for me and I felt that I just missed an easy example on how to use libjpeg for accessing JPEG images. The same problem arose later when I tried to process images captures via a Video4Linux device (i.e. a webcam) and the RaspberryPi camera. So I decided to write this really short introduction - and provide a basic method that just reads an JPEG file into an bitmap buffer that one can simply copy and paste into existing projects without having any other dependencies than libjpeg.

Update 2021: I’ve also written a short summary on how to capture frames using webcams from C when using the Video4Linux API on FreeBSD or Linux which is also rather simple.

Data structure used in this example

To do experimentation in the field of computer vision it’s often simple and feasible to keep a whole uncompressed bitmap of the source images in main memory. This of course assumes that either enough memory is present of one wants to rely on swapping and/or memory mapping large regions of data into memory. Note that this approach is really nice when doing experimentation and is also feasible for some real world tasks (like image classification) but might be problematic when one has do deal with high resolution images without downsampling or wants to keep a huge number of images in main memory (for example when doing reconstruction in radio astronomy, etc.). In this case one should start thinking about resource management before implementing anything though.

Keeping the image inside main memory as a continuous block also allows easy transfer to OpenCL or CUDA processing pipelines.

To store the image for easy accessing the following datastructure will be defined:

struct imgRawImage {
	unsigned int numComponents;
	unsigned long int width, height;

	unsigned char* lpData;
};

numComponents specifies the number (but not the type) of components. Usually these are 3 components for RGB and YCbCr, 4 for ARGB and AYCbCr and 1 for greyscale color space. Note that this functions do not store any information about the used color space - this is of course a drawback in real world scenarios but the extension is pretty simple.
width and height specify the number of pixels per scanline and the number of scanlines (i.e. width and height of the image).
lpData points to raw data. For this library it’s assumed to point to RGB888, Ycbcr888 or greyscale8 data (i.e. 3 bytes for 3 components, 1 byte for 1 component). Data is normalized into the range 0-255.

How to load a JPEG using libjpeg

The basic process is rather simple as one can also see from the code given below:

Create a JPEG decompressor using standard error handling methods
Set a libc FILE reference as source when reading from disk
Read the image header
Start the decompressor
Allocate the required buffer
Read the JPEG file scanline by scanline into the target buffer
Release associated resources

Note that the following code does not perform proper error handling. This has been left out due to readability of the code. Error handling has to be implemented in any real life scenario that’s going to be used for more than a simple experiment. Crashing a program is no error handling (except when developing with a framework like Erlang/OTP of course)!

The code

#include <jpeglib.h>
#include <jerror.h>

struct imgRawImage* loadJpegImageFile(char* lpFilename) {
	struct jpeg_decompress_struct info;
	struct jpeg_error_mgr err;

	struct imgRawImage* lpNewImage;

	unsigned long int imgWidth, imgHeight;
	int numComponents;

	unsigned long int dwBufferBytes;
	unsigned char* lpData;

	unsigned char* lpRowBuffer[1];

	FILE* fHandle;

	fHandle = fopen(lpFilename, "rb");
	if(fHandle == NULL) {
		#ifdef DEBUG
			fprintf(stderr, "%s:%u: Failed to read file %s\n", __FILE__, __LINE__, lpFilename);
		#endif
		return NULL; /* ToDo */
	}

	info.err = jpeg_std_error(&err);
	jpeg_create_decompress(&info);

	jpeg_stdio_src(&info, fHandle);
	jpeg_read_header(&info, TRUE);

	jpeg_start_decompress(&info);
	imgWidth = info.output_width;
	imgHeight = info.output_height;
	numComponents = info.num_components;

	#ifdef DEBUG
		fprintf(
			stderr,
			"%s:%u: Reading JPEG with dimensions %lu x %lu and %u components\n",
			__FILE__, __LINE__,
			imgWidth, imgHeight, numComponents
		);
	#endif

	dwBufferBytes = imgWidth * imgHeight * 3; /* We only read RGB, not A */
	lpData = (unsigned char*)malloc(sizeof(unsigned char)*dwBufferBytes);

	lpNewImage = (struct imgRawImage*)malloc(sizeof(struct imgRawImage));
	lpNewImage->numComponents = numComponents;
	lpNewImage->width = imgWidth;
	lpNewImage->height = imgHeight;
	lpNewImage->lpData = lpData;

	/* Read scanline by scanline */
	while(info.output_scanline < info.output_height) {
		lpRowBuffer[0] = (unsigned char *)(&lpData[3*info.output_width*info.output_scanline]);
		jpeg_read_scanlines(&info, lpRowBuffer, 1);
	}

	jpeg_finish_decompress(&info);
	jpeg_destroy_decompress(&info);
	fclose(fHandle);

	return lpNewImage;
}

Walkthrough and explaination

First the compressor is created using the jpeg_create_decompress function. This function requires a set of error handling routines. In the most simple case one can use the default ones provided by libjpeg. The error manager state structure struct jpeg_error_mgr can be initialized by jpeg_std_error. This function also returns a reference to the newly initialized structure. (Since a student of mine made that mistake a number of times: Note that this error manager will be declared as a local variable in the following example - when modularizing further one should take care that this error manager stays valid till the decoder is released!)

After that the compressor has it’s data supplied from one of the sources. In the following example the source is set to a libc FILE reference to read out of a file located on disk. This is done using jpeg_stdio_src. Another way would be reading from a memory location using jpeg_mem_src after an JPEG has been received via any other mean (camera device for MJPEG camera streams, network without caching, using memory mapping for file access, etc.)

Then the decompressor is initialized and the header is read (jpeg_read_header followed by jpeg_start_decompress). Note that both functions might fail so proper error handling is required.

Then the buffer is allocated. In this example two different data buffers are used - one might also use a flexible array member for that. I’ve implemented it that way to allow easy handling (including releasing, re-allocating, etc.) of a raw data array independent of any metadata. This sampel code of course also lacks error handling (malloc returns NULL in hard out of memory conditions if no out-of-memory killer is configured or in case resource limits are reached).

After the buffer has been allocated the code reads the image scanline by scanline using jpeg_read_scanlines. One could also read multiple scanlines at a time but since it might be desired to process them while streaming this example has been implemented that way. One could of course substitute the whole loop

while(info.output_scanline < info.output_height) {
	lpRowBuffer[0] = (unsigned char *)(&lpData[3*info.output_width*info.output_scanline]);
	jpeg_read_scanlines(&info, lpRowBuffer, 1);
}

by a single read:

lpRowBuffer[0] = (unsigned char *)(&lpData[0]);
jpeg_read_scanlines(&info, &lpRowBuffer, info.output_height);

Note that this function might also fail - add error handling again. At the end the decompressor is finalized using jpeg_finish_decompress and then released using jpeg_destroy_decompress. Again please take care that jpeg_finish_decompress might indicate an error in which case one might not want to use the already read data.

The other way: Storing raw images into JPEG

This works similar to the example above:

Create a compressor
Set it’s target either to a file or memory sink
Provide it metadata (image width and height, number of components and used colorspace - RGB in this example)
Setting compression parameters (to default in this case) and setting a quality setting for the compressor.
Initializing the compressors internal state machine
Writing one scanline after each other (or a number of them in a bulk)
Finishing the compressor
Releasing resources

In this section no walkthrough will be provided since the idea is the same as for the decompressor described above (except the direction of data transfer).

The code

#include <jpeglib.h>
#include <jerror.h>

int storeJpegImageFile(struct imgRawImage* lpImage, char* lpFilename) {
	struct jpeg_compress_struct info;
	struct jpeg_error_mgr err;

	unsigned char* lpRowBuffer[1];

	FILE* fHandle;

	fHandle = fopen(lpFilename, "wb");
	if(fHandle == NULL) {
		#ifdef DEBUG
			fprintf(stderr, "%s:%u Failed to open output file %s\n", __FILE__, __LINE__, lpFilename);
		#endif
		return 1;
	}

	info.err = jpeg_std_error(&err);
	jpeg_create_compress(&info);

	jpeg_stdio_dest(&info, fHandle);

	info.image_width = lpImage->width;
	info.image_height = lpImage->height;
	info.input_components = 3;
	info.in_color_space = JCS_RGB;

	jpeg_set_defaults(&info);
	jpeg_set_quality(&info, 100, TRUE);

	jpeg_start_compress(&info, TRUE);

	/* Write every scanline ... */
	while(info.next_scanline < info.image_height) {
		lpRowBuffer[0] = &(lpImage->lpData[info.next_scanline * (lpImage->width * 3)]);
		jpeg_write_scanlines(&info, lpRowBuffer, 1);
	}

	jpeg_finish_compress(&info);
	fclose(fHandle);

	jpeg_destroy_compress(&info);
	return 0;
}

Some comments

The color space provided has to match the number of components and the supplied data. Normally this is one of:

JCS_RGB with 3 components (red, green and blue channels; each 1 byte in size)
JCS_GRAYSCALE with 1 component (only the illumination channel) with 1 byte per pixel.

A simple experiment (implementing a grayscale filter)

Just as an example how one might implement a simple grayscale filter. In this case it’s assumed that the image should keep 3 color channels, all representing the same intensity value as the greyscale channel. This has the advantage that other processing functions do not have to discriminate the number of components they are using. The disadvantage is three times as much memory usage.

So how does greyscale conversion work? Basically the intensity value of each channel is mapped to a given contribution into general intensity values. This is done to reflect the sensitivity of the eye for different colors. There is a huge number of different greyscale schemes, all differing somehow in detail (the magnitude of numbers is similar).

In the most naive way one might:

Allocate an output buffer in case one doesn’t want to replace input data. For a grayscale filter one is capable of replacing each pixel with it’s greyscale value without being capable of continuing processing. This would not be possible when one would calculate an integral image or apply any kernel function like an Gaussian blur.
One can simply use the factors 0.299, 0.587 and 0.114 that are supplied in the models used by PAL and NTSC (other values are nicely summarized at Wikipedia). Note that the values supplied are not perceptual luminance preserving.

enum imageLibraryError filterGrayscale(
	struct imgRawImage* lpInput,
	struct imgRawImage** lpOutput
) {
	unsigned long int i;

	if(lpOutput == NULL) {
		(*lpOutput) = lpInput; /* We will replace our input structure ... */
	} else {
		(*lpOutput) = malloc(sizeof(struct imgRawImage));
		(*lpOutput)->width = lpInput->width;
		(*lpOutput)->height = lpInput->height;
		(*lpOutput)->numComponents = lpInput->numComponents;
		(*lpOutput)->lpData = malloc(sizeof(unsigned char) * lpInput->width*lpInput->height*3);
	}

	for(i = 0; i < lpInput->width*lpInput->height; i=i+1) {
		/* Do a grayscale transformation */
		unsigned char luma = (unsigned char)(
			0.299f * (float)lpInput->lpData[i * 3 + 0]
			+ 0.587f * (float)lpInput->lpData[i * 3 + 1]
			+ 0.114f * (float)lpInput->lpData[i * 3 + 2]
		);
		(*lpOutput)->lpData[i * 3 + 0] = luma;
		(*lpOutput)->lpData[i * 3 + 1] = luma;
		(*lpOutput)->lpData[i * 3 + 2] = luma;
	}

	return imageLibE_Ok;
}

Some other things to implement when doing first experiments

Some of the most useful filters and modules I’ve implemented during my early experiments with computer vision have been:

Integral image computation. Integral images are a great tool for simple weak classifier cascades like the famous Viola–Jones object detection framework that allows pretty easy object detection - like fast face detectors.
Gaussian blur. This is a filter that’s pretty useful during downsampling and downsampling is a requirement to build …
… Difference of Gaussian (DoG) pyramids. These are great pyramids to detect local blobs or corners.
The SIFT keypoint detector and descriptor. Although being patented it’s one of the most powerful ones.
The BRIST keypoint detector and descriptor. This is a really good alternative to SIFT (but it includes some training to build an simple ID3 tree to train the detector to fit the desired application domain)
Global image descriptors like HOG might be nice when dealing with larger data sets.