20 Mar 2020 - tsp
Last update 23 Dec 2021
11 mins
Back when I started experimenting with computer vision algorithms I’ve
always had to major problem of having to use either proprietary toolboxes (back
then when I started playing around with computer vision at university we
used MatLab with the highly flexible image processing toolbox) or use some
ugly hacks (like having a half baked JPEG decoder implementation written
by myself that unfortunately was never really finished). After some
time I came to the decision that using libjpeg
or libjpeg-turbo would be a good idea
since it provides a solid implementation of JPEG and a rather
simple programming interface. But the documentation was hard to read
for me and I felt that I just missed an easy example on how to use libjpeg
for accessing JPEG images. The same problem arose later when I tried
to process images captures via a Video4Linux device (i.e. a webcam) and
the RaspberryPi camera. So I decided to write this really short introduction - and
provide a basic method that just reads an JPEG file into an bitmap buffer that
one can simply copy and paste into existing projects without having any
other dependencies than libjpeg
.
Update 2021: I’ve also written a short summary on how to capture frames using webcams from C when using the Video4Linux API on FreeBSD or Linux which is also rather simple.
To do experimentation in the field of computer vision it’s often simple and feasible to keep a whole uncompressed bitmap of the source images in main memory. This of course assumes that either enough memory is present of one wants to rely on swapping and/or memory mapping large regions of data into memory. Note that this approach is really nice when doing experimentation and is also feasible for some real world tasks (like image classification) but might be problematic when one has do deal with high resolution images without downsampling or wants to keep a huge number of images in main memory (for example when doing reconstruction in radio astronomy, etc.). In this case one should start thinking about resource management before implementing anything though.
Keeping the image inside main memory as a continuous block also allows easy transfer to OpenCL or CUDA processing pipelines.
To store the image for easy accessing the following datastructure will be defined:
struct imgRawImage {
unsigned int numComponents;
unsigned long int width, height;
unsigned char* lpData;
};
numComponents
specifies the number (but not the type) of components.
Usually these are 3 components for RGB and YCbCr, 4 for ARGB
and AYCbCr and 1 for greyscale color space. Note that this functions
do not store any information about the used color space - this is of
course a drawback in real world scenarios but the extension is pretty simple.width
and height
specify the number of pixels per scanline
and the number of scanlines (i.e. width and height of the image).The basic process is rather simple as one can also see from the code given below:
FILE
reference as source when reading from diskNote that the following code does not perform proper error handling. This has been left out due to readability of the code. Error handling has to be implemented in any real life scenario that’s going to be used for more than a simple experiment. Crashing a program is no error handling (except when developing with a framework like Erlang/OTP of course)!
#include <jpeglib.h>
#include <jerror.h>
struct imgRawImage* loadJpegImageFile(char* lpFilename) {
struct jpeg_decompress_struct info;
struct jpeg_error_mgr err;
struct imgRawImage* lpNewImage;
unsigned long int imgWidth, imgHeight;
int numComponents;
unsigned long int dwBufferBytes;
unsigned char* lpData;
unsigned char* lpRowBuffer[1];
FILE* fHandle;
fHandle = fopen(lpFilename, "rb");
if(fHandle == NULL) {
#ifdef DEBUG
fprintf(stderr, "%s:%u: Failed to read file %s\n", __FILE__, __LINE__, lpFilename);
#endif
return NULL; /* ToDo */
}
info.err = jpeg_std_error(&err);
jpeg_create_decompress(&info);
jpeg_stdio_src(&info, fHandle);
jpeg_read_header(&info, TRUE);
jpeg_start_decompress(&info);
imgWidth = info.output_width;
imgHeight = info.output_height;
numComponents = info.num_components;
#ifdef DEBUG
fprintf(
stderr,
"%s:%u: Reading JPEG with dimensions %lu x %lu and %u components\n",
__FILE__, __LINE__,
imgWidth, imgHeight, numComponents
);
#endif
dwBufferBytes = imgWidth * imgHeight * 3; /* We only read RGB, not A */
lpData = (unsigned char*)malloc(sizeof(unsigned char)*dwBufferBytes);
lpNewImage = (struct imgRawImage*)malloc(sizeof(struct imgRawImage));
lpNewImage->numComponents = numComponents;
lpNewImage->width = imgWidth;
lpNewImage->height = imgHeight;
lpNewImage->lpData = lpData;
/* Read scanline by scanline */
while(info.output_scanline < info.output_height) {
lpRowBuffer[0] = (unsigned char *)(&lpData[3*info.output_width*info.output_scanline]);
jpeg_read_scanlines(&info, lpRowBuffer, 1);
}
jpeg_finish_decompress(&info);
jpeg_destroy_decompress(&info);
fclose(fHandle);
return lpNewImage;
}
First the compressor is created using the jpeg_create_decompress
function. This function requires a set of error handling routines. In the
most simple case one can use the default ones provided by libjpeg
. The
error manager state structure struct jpeg_error_mgr
can be initialized
by jpeg_std_error
. This function also returns a reference to the newly
initialized structure. (Since a student of mine made that mistake a number of
times: Note that this error manager will be declared as
a local variable in the following example - when modularizing further one
should take care that this error manager stays valid till the decoder is released!)
After that the compressor has it’s data supplied from one of the sources. In
the following example the source is set to a libc FILE
reference to
read out of a file located on disk. This is done using jpeg_stdio_src
.
Another way would be reading from a memory location using jpeg_mem_src
after an JPEG has been received via any other mean (camera device for MJPEG
camera streams, network without caching, using memory mapping for file access,
etc.)
Then the decompressor is initialized and the header is read (jpeg_read_header
followed by jpeg_start_decompress
). Note that both functions might fail
so proper error handling is required.
Then the buffer is allocated. In this example two different data buffers are
used - one might also use a flexible array member for that. I’ve implemented
it that way to allow easy handling (including releasing, re-allocating, etc.)
of a raw data array independent of any metadata. This sampel code of course
also lacks error handling (malloc returns NULL
in hard out of memory
conditions if no out-of-memory killer is configured or in case resource
limits are reached).
After the buffer has been allocated the code reads the image scanline
by scanline using jpeg_read_scanlines
. One could also read multiple
scanlines at a time but since it might be desired to process them while
streaming this example has been implemented that way. One could of course
substitute the whole loop
while(info.output_scanline < info.output_height) {
lpRowBuffer[0] = (unsigned char *)(&lpData[3*info.output_width*info.output_scanline]);
jpeg_read_scanlines(&info, lpRowBuffer, 1);
}
by a single read:
lpRowBuffer[0] = (unsigned char *)(&lpData[0]);
jpeg_read_scanlines(&info, &lpRowBuffer, info.output_height);
Note that this function might also fail - add error handling again.
At the end the decompressor is finalized using jpeg_finish_decompress
and then released using jpeg_destroy_decompress
. Again please
take care that jpeg_finish_decompress
might indicate an error in which
case one might not want to use the already read data.
This works similar to the example above:
In this section no walkthrough will be provided since the idea is the same as for the decompressor described above (except the direction of data transfer).
#include <jpeglib.h>
#include <jerror.h>
int storeJpegImageFile(struct imgRawImage* lpImage, char* lpFilename) {
struct jpeg_compress_struct info;
struct jpeg_error_mgr err;
unsigned char* lpRowBuffer[1];
FILE* fHandle;
fHandle = fopen(lpFilename, "wb");
if(fHandle == NULL) {
#ifdef DEBUG
fprintf(stderr, "%s:%u Failed to open output file %s\n", __FILE__, __LINE__, lpFilename);
#endif
return 1;
}
info.err = jpeg_std_error(&err);
jpeg_create_compress(&info);
jpeg_stdio_dest(&info, fHandle);
info.image_width = lpImage->width;
info.image_height = lpImage->height;
info.input_components = 3;
info.in_color_space = JCS_RGB;
jpeg_set_defaults(&info);
jpeg_set_quality(&info, 100, TRUE);
jpeg_start_compress(&info, TRUE);
/* Write every scanline ... */
while(info.next_scanline < info.image_height) {
lpRowBuffer[0] = &(lpImage->lpData[info.next_scanline * (lpImage->width * 3)]);
jpeg_write_scanlines(&info, lpRowBuffer, 1);
}
jpeg_finish_compress(&info);
fclose(fHandle);
jpeg_destroy_compress(&info);
return 0;
}
The color space provided has to match the number of components and the supplied data. Normally this is one of:
JCS_RGB
with 3 components (red, green and blue channels; each 1 byte
in size)JCS_GRAYSCALE
with 1 component (only the illumination channel) with 1
byte per pixel.Just as an example how one might implement a simple grayscale filter. In this case it’s assumed that the image should keep 3 color channels, all representing the same intensity value as the greyscale channel. This has the advantage that other processing functions do not have to discriminate the number of components they are using. The disadvantage is three times as much memory usage.
So how does greyscale conversion work? Basically the intensity value of each channel is mapped to a given contribution into general intensity values. This is done to reflect the sensitivity of the eye for different colors. There is a huge number of different greyscale schemes, all differing somehow in detail (the magnitude of numbers is similar).
In the most naive way one might:
0.299
, 0.587
and 0.114
that are supplied in the models used by PAL and NTSC (other values
are nicely summarized at Wikipedia).
Note that the values supplied are not perceptual luminance preserving.enum imageLibraryError filterGrayscale(
struct imgRawImage* lpInput,
struct imgRawImage** lpOutput
) {
unsigned long int i;
if(lpOutput == NULL) {
(*lpOutput) = lpInput; /* We will replace our input structure ... */
} else {
(*lpOutput) = malloc(sizeof(struct imgRawImage));
(*lpOutput)->width = lpInput->width;
(*lpOutput)->height = lpInput->height;
(*lpOutput)->numComponents = lpInput->numComponents;
(*lpOutput)->lpData = malloc(sizeof(unsigned char) * lpInput->width*lpInput->height*3);
}
for(i = 0; i < lpInput->width*lpInput->height; i=i+1) {
/* Do a grayscale transformation */
unsigned char luma = (unsigned char)(
0.299f * (float)lpInput->lpData[i * 3 + 0]
+ 0.587f * (float)lpInput->lpData[i * 3 + 1]
+ 0.114f * (float)lpInput->lpData[i * 3 + 2]
);
(*lpOutput)->lpData[i * 3 + 0] = luma;
(*lpOutput)->lpData[i * 3 + 1] = luma;
(*lpOutput)->lpData[i * 3 + 2] = luma;
}
return imageLibE_Ok;
}
Some of the most useful filters and modules I’ve implemented during my early experiments with computer vision have been:
This article is tagged: Programming, Data Mining, Computer Vision
Dipl.-Ing. Thomas Spielauer, Wien (webcomplains389t48957@tspi.at)
This webpage is also available via TOR at http://rh6v563nt2dnxd5h2vhhqkudmyvjaevgiv77c62xflas52d5omtkxuid.onion/