Simple webcam access from C

06 Feb 2021 - tsp
Last update 06 Feb 2021
Reading time 26 mins

Video for Linux 2 (V4L2) API
- Header files used
Getting the frames
External references

Video for Linux 2 (V4L2) API

The API used on most Unixoid operating systems (i.e. Linux, FreeBSD, etc.) is Video 4 Linux. It basically consists of a specification for device naming (i.e. the /dev/videoN devices) as well as:

Capability querying
Defined data formats
Audio and Video I/O operations

These are realized using the standard Unix read / write and ioctl APIs as usual. V4L does not only support webcams but also tuners, video capture, satellite receivers, etc. - this page only focuses on cameras though most of the operations being the same for other video capture devices.

The specification for V4L2 can be found online.

For webcams there are three different methods that can be used to read or stream frames from the camera:

A simple interface based around the read syscall indicated by the capability flag V4L2_CAP_READWRITE. Using this API no metadata is passed besides image information (i.e. no framecounters, timestamps, etc.) which would be required when sychronizing with other frames or detecting frame dropping. This is the most simple I/O method.
Mapping stream buffers via shared memory regions using mmap. This mode is supported whenever the V4L2_CAP_STREAMING flag is set and the mmap mode is supported by VIDIOC_REQBUFS. This has been one of the most efficient streaming modes and is usually widely supported. The application can provide multiple buffers to allow seamless streaming.
A way for kernel mode drivers to write directly into usermode memory using a usermode memory pointer. This mode is only supported if V4L2_CAP_STREAMING is set and the mode of usermode pointers is supported by VIDIOC_REQBUFS. The main difference to mmap is that the application allocated the buffers itself and thus can for example be easily shared with different processes or swapped out - the application just passes a pointer to the driver, the driver then locks the buffer if required and reads data into the applications memory space. Metadata is passed in an extra structure.

Up to my knowledge USB webcams currently only support the mmap mode for USB webcams so this is what this blog post will look into first. Note the v4l2 specification does not specify any mandatory interface so for a truly portable application it would be a good idea to support both streaming methods as well as a method based on read/write.

Header files used

All Video4Linux2 methods and data types are defined in a single header file that’s usually contained in linux/videodev2.h

Getting the frames

Opening the device

The first thing is obviously opening the device file. The naming is specified by the Video4Linux specification but it’s a good idea to allow overriding by the user anyways - as one usually has to support systems including multiple capture devices this is not a huge problem anyways.

The devices are usually named:

/dev/video0 to /dev/video63 for video capture devices. There might also be a /dev/video device for the default capture device though this doesn’t always exist.
For video capture from DVB and analog tuner cards there might be /dev/bttv0 as well as /dev/vbi0 to /dev/vbi31
Radio receivers use /dev/radio0 up to /dev/radio63 and the optional default device /dev/radio
Teletext decoders use /dev/vtx0 up to /dev/vtx31 and the optional default device /dev/vtx

Before one opens the device it’s a good idea to check if the file exists and is really a device file:

	enum cameraError deviceOpen(
	    int* lpDeviceOut,
	    char* deviceName
	) {
	    struct stat st;
	    int hHandle;

	    if(lpDeviceOut == NULL) { return cameraE_InvalidParam; }
	    (*lpDeviceOut) = -1;
	    if(deviceName == NULL) { return cameraE_InvalidParam; }

	    /* Check if the device exists */
	    if (stat(deviceName, &st) == -1) {
	        return cameraE_UnknownDevice;
	    }

	    /* Check if it's a device file */
	    if (!S_ISCHR (st.st_mode)) {
	        return cameraE_UnknownDevice;
	    }

	    hHandle = open(deviceName, O_RDWR | O_NONBLOCK, 0);
	    if(hHandle < 0) {
	        switch(errno) {
	            case EACCES:    return cameraE_PermissionDenied;
	            case EPERM:     return cameraE_PermissionDenied;
	            default:        return cameraE_Failed;
	        }
	    }

	    (*lpDeviceOut) = hHandle;
	    return cameraE_Ok;
	}

Since we opened the device using open we have to close the device in the end using close:

	enum cameraError deviceClose(
	    int hHandle
	) {
	    close(hHandle);
	    return cameraE_Ok;
	}

Querying capabilities

The next step is to query capabilities of the opened device. This is first done via the VIDIOC_QUERYCAP ioctl. This call fills a struct v4l2_capability structure. This structure contains:

Human readable strings:
- 16 characters of driver information (driver)
- 32 characters of card information (card)
- 32 characters of bus information (bus_info)
A 32 bit version field (version)
A 32 bit capability bitmask (capabilities)
A 32 bit device capability bitmask (device_caps)
Some reserved bytes (12)

The most important field is the capabilities field. This can be used together with some interesting flags:

V4L2_CAP_VIDEO_CAPTURE identifies a cpature device - which is what one’s looking for when looking for an webcam.
Flags indicating the I/O interfaces supported:
- V4L2_CAP_READWRITE is set if read and write syscalls are supported to read and write data
- V4L2_CAP_ASYNCIO signals support for asynchronous I/O mechanisms. Since this is usually not supported by V4L2 this is not of any interest usually.
- V4L2_CAP_STREAMING is required to support streaming input and output which includes userspace buffer pointers and memory mapping.
V4L2_CAP_VIDEO_OUTPUT and V4L2_CAP_VIDEO_OVERLAY would identify video output and overlay devices, V4L2_CAP_VBI_CAPTURE and V4L2_CAP_VBI_OUTPUT raw VBI devices. The same category are V4L2_CAP_SLICED_VBI_CAPTURE and V4L2_CAP_SLICED_VBI_OUTPUT
V4L2_CAP_RDS_CAPTURE devices allow one to capture RDS packets, V4L2_CAP_RDS_OUTPUT is an RDS encoder
V4L2_CAP_VIDEO_OUTPUT_OVERLAY signals that the device supports video output overlay
V4L2_CAP_HW_FREQ_SEEK supports hardware frequency seeking
V4L2_CAP_VIDEO_CAPTURE_MPLANE and V4L2_CAP_VIDEO_OUTPUT_MPLANE signal input and output support for multiplanar formats.
V4L2_CAP_VIDEO_M2M_MPLANE indicates multi planar format support on memory to memory devices.
V4L2_CAP_VIDEO_M2M identifies a memory to memory device.
V4L2_CAP_TUNER for tuner support, V4L2_CAP_AUDIO for audio as well as V4L2_CAP_RADIO for radio and V4L2_CAP_MODULATOR for modulator support.

The first thing to check for when capturing from a webcam or video camera is, that the device really supports V4L2_CAP_VIDEO_CAPTURE and either the V4L2_CAP_READWRITE mode for single frame capture or V4L2_CAP_STREAMING for mmap or userptr mode.

Since the ioctl calls can be interrupted which is indicated by an EINTR error code libraries usually supply an xioctl method that retries the ioctl until it either succeeds or fails:

static int xioctl(int fh, int request, void *arg) {
	int r;
	do {
		r = ioctl(fh, request, arg);
	} while ((r == -1) && (errno == EINTR));
	return r;
}

To fetch the capability flags one simply uses this xioctl method and checks for the required flags:

	struct v4l2_capability cap;
	bool bReadWriteSupported = false;
	bool bStreamingSupported = false;

	if(xioctl(hHandle, VIDIOC_QUERYCAP, &cap) == -1) {
		return cameraE_Failed; /* Failed to fetch capabilities */
	}

	if((cap.capabilities & V4L2_CAP_VIDEO_CAPTURE) == 0) {
		return cameraE_InvalidParam; /* We are not a capture device */
	}

	if((cap.capabilities & V4L2_CAP_READWRITE) != 0) { bReadWriteSupported = true; }
	if((cap.capabilities & V4L2_CAP_STREAMING) != 0) { bStreamingSupported = true; }

The next step is to query cropping capabilities and pixel aspects. This is done using the VIDIOC_CROPCAP call. This call requires a pointer to a to be filled struct v4l2_cropcap that’s initialized to the requested stream type. Since the task of this blog post is to describe video capture the buffer type will be V4L2_BUF_TYPE_VIDEO_CAPTURE.

Now one can simply call the driver:

	struct v4l2_cropcap cropcap;

	memset(&cropcap, 0, sizeof(cropcap));
	cropcap.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

	if(xioctl(hHandle, VIDIOC_CROPCAP, &cropcap) == -1) {
		return cameraE_Failed; /* failed to fetch crop capabilities */
		/*
			Note that some applications simply ignore this error
			and simply don't set any cropping rectangle later on
			since there are drivers that don't support cropping.
		*/
	}

The v4l2_cropcap structure contains three interesting members:

bounds is an struct v4l2_rect that specifies the boundary of the window in which cropping is possible - this is the maximum possible window size.
defrect is the default cropping rectangle that whould cover the whole image. For an pixel aspect ratio of 1:1 this would be for example 640 × 480 for NTSC images.
The last interesting value is the pixelaspect which is an struct v4l2_fract. This specifies the aspect ratio (y/x) when no scaling is applied. This is the ratio required to get square pixels.

Each rect contains left, top, width and height

Initializing device

Setting cropping region

After querying one can initialize cropping - for example to the default cropping rectangle that should usually cover the whole image. This is done using the VIDIOC_S_CROP call supplying an struct v4l2_crop. Usually this should not be required but since there are drivers that do not initialize using the default cropping rectangle it’s a good idea anyways. The structure basically only contains a cropping rectangle c.

	struct v4l2_crop crop;
	/*
		Note that this should only be done if VIDIOC_CROPCAP was successful
	*/
	crop.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
	crop.c = cropcap.defrect;

	if(xioctl(hHandle, VIDIOC_S_CROP, &crop) == -1) {
		/* Failed. Maybe only not supported (EINVAL) */
	}

Format negotiation

To be able to negotiate a format one should usually query the formats supported by each device to locate one supported by the application. The code sample accompanying this blog post does not perform this negotiation but simply assumes an webcam to support the YUYV color model and at least 640x480 resolution to make the code easier to read. But I’ll cover the format negotiation here - it’s rather simple.

The first thing one has to know is that there are two major basic representations for colors used:

A single value per primitive color (Red-green-blue or RGB models)
Luma and Chroma based models that are using relative luminance (Y) and chrominance channels (usually for red and blue called Cr and Cb). In general one might associate these two chrominance channels with different wavelength out of which the generic names U and V emerged - in most use cases UV equals CrCb but technically that would not be required.

The main advantage of luma and chroma based models is that one immediately has an grayscale image available when just looking at the luma channel. This is also how this encoding schemes emerges historically - YUV models have just added two subcarrier encoded chroma channels to transmit color information in addition to backwards compatible grayscale images for TV usage.

RGB models on the other side are usually easier to use on modern input and output devices.

All color models basically support the same information but dependent on their encoding support different resolution and scales. Nearly all models allow one to add an optional alpha channel that covers transparency. Since we’re interested in video capture alpha channels usually don’t play a role.

The most major difference for all color models is the way they encode the data. Again there are two major encoding methods:

Planar at which one has a separate buffer for each channel
Interleaved at which all information is encoded per pixel (or pixel group). For RGB888 for example there are 3 bytes per pixel that encode the red, green and blue channel followed by the next 3 bytes for the next pixel and so on.

Depending on the chosen format the information for each channel may be of the same amount or there may be different amount of information for each pixel. For the mostly used YUYV format (that’s also selected by the example and is often calles YUV422) there are for each two pixels two luminance informations but only one U and one V coordinate for both. The idea is that the human eye is more sensitive to luminance changes than chroma changes so one has to encode way less chromatic information. These four values then occupy - for YUV422 - three bytes in a specific pattern that has to be decoded.

There is a huge number of supported formats - the usualy way to handle this inside media processing libraries is to decide on one or two internally supported formats and decode as well as re-encode on the application boundaries. For example I personally usually decide to support:

RGB888 with 3 interleaved bytes per pixel encoding R, G, B as an 8 bit value
YUV888 encoding an luma and two chroma channels per pixel in an interleaved way.
A grayscale Y only format. This is particularly interesting in case one wants to do CV. It’s of course possible to access an YUV image with a stride of 3 but having a more compact representations might be interesting many times.

For more specialized algorithms I personally also use:

RGB with double precision values. This is also encoded interleaved and I usually use it when doing HDR reconstruction or calculations. Since file formats and output devices usually do not support such numeric ranges one has to tone-map in the end again
Grayscale with double precision values. Again this is used for some specialized applications - like for example integral images of luminance plots (which are especially interesting for classifier cascades built on top of wavelets)

To determine which format an capture devices supports one can use the VIDIOC_ENUM_FMT function call. This is built around the struct v4l2_fmtdesc structure:

	struct v4l2_fmtdesc {
		__u32		    	index;
		enum v4l2_buf_type  type;
		__u32               flags;
		__u8		    	description[32];
		__u32		    	pixelformat;
		__u32		    	reserved[4];
	};

The basic idea is that the application just fills the index and type fields, calls the VIDIOC_ENUM_FMT function and the driver fills the fields with available information. To query information about our capture device one will iterate the index value from 0 and count upwards till the driver fails with an error code of EINVAL. The type has to be set to V4L2_BUF_TYPE_VIDEO_CAPTURE:

	for(int idx = 0;; idx = idx + 1)
		struct v4l2_fmtdesc fmt;

		fmt.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
		fmt.index = idx;

		if(xioctl(hHandle, VIDIOC_ENUM_FMT, &fmt) < 0) {
			/* Failed, usually one should check the error code ... */
			break;
		}

		/* We got some format information. For demo purposes just display it */
		printf("Detected format %08x (is compressed: %s): %s\n", fmt.pixelformat, ((fmt.flags & V4L2_FMT_FLAG_COMPRESSED) != 0) ? "yes" : "no", fmt.description);
	}

Setting the format

The next step is setting the desired format. There are three calls involved with setting, trying or getting the format:

VIDIOC_G_FMT queries the current format
VIDIOC_S_FMT sets the format (might change the width an height though)
VIDIOC_TRY_FMT passes a format to the driver like S_FMT but does not change driver state. It fails if the format is not supported and might change width/height as S_FMT. Note that drivers are not required to implement this call so it might also fail every time.

Setting the format requires usually negotiation of the format but most webcams support YUYV color space and interlaced pixel layout. This can be set in a struct v4l2_format:

	struct v4l2_format fmt;
	unsigned int width, height;

	/*
		Select 640 x 480 resolution (you should use dimensions
        as previously set while setting cropping parameters),
        YUYV color format and interlaced order
	*/
	fmt.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
	fmt.fmt.pix.width = 640;
	fmt.fmt.pix.height = 480;
	fmt.fmt.pix.pixelformat = V4L2_PIX_FMT_YUYV;
	fmt.fmt.pix.field = V4L2_FIELD_INTERLACED;

	if(xioctl(hHandle, VIDIOC_S_FMT, &fmt) == -1) {
		/* Failed to set format ... */
	}

	/* Now one should query the real size ... */
	width = fmt.fmt.pix.width;
	height = fmt.fmt.pix.height;

In some code like v4l2grab there is some additional handling of buggy drivers. Since webcams are usually cheap products and there are some buggy drivers so on Linux they check if the fmt.fmt.pix.bytesperline is at least two times the fmt.fmt.pix.width and that fmt.fmt.pix.sizeimage is at least 2 * fmt.fmt.pix.width * fmt.fmt.pix.height.

Capturing and capturing frames

Streaming I/O using mmap

The interface supported for most webcams is streaming I/O using memory mapped buffers. This has been the most efficient streaming method for a long time - allowing an application to virtually map device memory areas (for example memory contained on an PCI capture card) directly into application memory. Later on a second method using userptr has been added that allows one also to exploit DMA transfer into real main memory when using devices supporting busmastering. For cheap USB webcams this usually doesn’t make a difference though and userptr streaming I/O mode is usually not supported by most hardware anyways.

Note that there is no way for a driver to indicate which type of streaming methods they support except for one to request allocation of buffers.

The basic idea is:

The application requests a number of buffers to be allocated inside the drivers address space. Buffers for use with the mmap method have to be allocated using the V4L2_MEMORY_MMAP memory type using the VIDIOC_REQBUFS ioctl. Note that though the buffer descriptors seem to contain real memory offsets these are just some kind of magic cookie that is used by the driver to recognize the allocated buffers (for example these might be real adresses or just )
After having successfully allocated a given number of buffers they can be mapped by virtual memory mapping into the applications address space.
There can be either a single buffer per frame in planar mode or multiple buffers per frame in multi planar mode.
Buffers are allocated in dequeued state, i.e. the device won’t write data into the given buffers. To allow writing into a buffer they have to be enqueued using VIVIOC_QBUF. Whenever a buffer has been written successfully it has to be dequeued using VIDIOC_DQBUF.
The buffer state (mapped, enqueued, full, empty) can be queried using VIDIOC_QUERYBUF
The ioctl can be executed synchronously which is the default behavior or asynchronous and then being used like a network socket using select, poll or kqueue event notification frameworks to determine readiness of new frames.
Streaming can be started and stopped using VIDIOC_STREAMON and VIDIOC_STREAMOFF

There is a common structure used by the queue and dequeue operations that’s called struct v4l2_buffer. This structure contains:

An index. This is a linear index into a sequence of allocated buffers - used only with memory mapped buffers.
The type which identifies either input (V4L2_BUF_TYPE_VIDEO_CAPTURE) or output (V4L2_BUF_TYPE_VIDEO_OUTPUT) buffers.
The size in bytes (length). The size of the allocated buffer has to be able to contain a full frame of the requested data. After dequeueing a capture buffer the driver also has set bytesused which might be equal or smaller than length. For output buffers the bytesused is set by the application to indicate real used data size
field
timestamp might be set to indicate when the buffer had been captured. For output the timestamp can specify at which point in time the buffer should be transmitted by the output device.
timecode is another method to determine the position inside the data stream.
sequence allows tracking of lost frames. It’s a monotonically increasing sequence number.
memory indicates the type of the buffer (memory mapped or userptr)
userptr or offset contained in the same union provide a way for the driver to identify either the offset inside the applications user mode memory range or provides a cookie to pass to mmap.
input would allow switching between multiple supported data sources on the same device.
flags can be a combination of:
- V4L2_BUF_FLAG_MAPPED indicates that a buffer is mapped into the application address space.
- V4L2_BUF_FLAG_QUEUED indicates a buffer is currently enqueued for the device driver to be used. The application should not modify the buffer. The buffer is said to be in the driver incoming queue.
- V4L2_BUF_FLAG_DONE indicates that a buffer is already processed by the driver and is waiting to be dequeued by the application.
- V4L2_BUF_FLAG_KEYFRAME signals that a buffer contains a keyframe - which is interesting when resynchronizing within compressed streams.
- V4L2_BUF_FLAG_PFRAME predicted frame (compressed streams only)
- V4L2_BUF_FLAG_BFRAME difference frame (compressed streams only)
- V4L2_BUF_FLAG_TIMECODE is set whenever the timecode field is valid.
- V4L2_BUF_FLAG_INPUT is only set it the input field is valid.

As shown in the outline above the first step is to request buffers from the device driver. One can request multiple buffers - the driver itself determines the lower (!) and upper bound onto the number of buffers that have to be requested. It’s a good idea to support a variable number in case the driver requests on to use more or less buffers.

To request buffers one can use the VIDIOC_REQBUFS ioctl that resembles the function call int (*vidioc_reqbufs) (struct file *file, void *private_data, struct v4l2_requestbuffers *req);

The struct v4l2_requestbuffers structure contains:

The number of requested buffers count. This is an input and output field that might be increased or decreased arbitrarily by the driver. Note that setting count to 0 has the special meaning of releasing all buffers.
The type (V4L2_BUF_TYPE_VIDEO_CAPTURE or V4L2_BUF_TYPE_VIDEO_OUTPUT) of the buffer
An memory specifier. This identifies if the memory are is mapped into userspace. In this case the V4L2_MEMORY_MMAP constant is used. If one would use userptr like DMA transfers one would set the constant to V4L2_MEMORY_USERPTR.

If the driver does not support mmap (or if it has been requested the userptr mode) it will return EINVAL. This is the only way to determine the supported streaming data transfer mode.

	struct v4l2_requestbuffers rqBuffers;

	/*
		Request 1 buffer (simple but not seamless, usually use 3+) ...
	*/

	rqBuffers.count = bufferCount;
	rqBuffers.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
	rqBuffers.memory = V4L2_MEMORY_MMAP;

	if(xioctl(hHandle, VIDIOC_REQBUFS, &rqBuffers) == -1) {
		printf("%s:%u Requesting buffers failed!\n", __FILE__, __LINE__);
		deviceClose(cameraE_Ok);
		return 2;
	}

	bufferCount = rqBuffers.count;

After the buffers have been requested they have to be mapped into memory. To do so one has to VIDIOC_QUERYBUF each buffer to determine the parameters that will be passed to mmap in the same way as mapping from a memory mapped file. On entry into QUERYBUF one just has to pass type and index.

	struct imageBuffer* lpBuffers;
	{
		lpBuffers = calloc(bufferCount, sizeof(struct imageBuffer));
		if(lpBuffers == NULL) {
			printf("%s:%u Out of memory\n", __FILE__, __LINE__);
			deviceClose(hHandle);
			return 2;
		}

		int iBuf;
		for(iBuf = 0; iBuf < bufferCount; iBuf = iBuf + 1) {
			struct v4l2_buffer vBuffer;

			memset(&vBuffer, 0, sizeof(struct v4l2_buffer));

			/*
				Query a buffer identifying magic cookie from the driver
			*/
			vBuffer.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
			vBuffer.memory = V4L2_MEMORY_MMAP;
			vBuffer.index = iBuf;

			if(xioctl(hHandle, VIDIOC_QUERYBUF, &vBuffer) == -1) {
				printf("%s:%u Failed to query buffer %d\n", __FILE__, __LINE__, iBuf);
				deviceClose(hHandle);
				return 2;
			}

			/*
				Use the mmap syscall to map the drivers buffer into our
				address space at an arbitrary location.
			*/
			lpBuffers[iBuf].lpBase = mmap(NULL, vBuffer.length, PROT_READ|PROT_WRITE, MAP_SHARED, hHandle, vBuffer.m.offset);
			lpBuffers[iBuf].sLen = vBuffer.length;

			if(lpBuffers[iBuf].lpBase == MAP_FAILED) {
				printf("%s:%u Failed to map buffer %d\n", __FILE__, __LINE__, iBuf);
				deviceClose(hHandle);
				return 2;
			}
		}
	}

Then one has to enqueue all buffers that one wants to provide to the driver (typically all of them before starting the processing loop) by using the VIDIOC_QBUF function. One just has to supply type and index when using memory mapped buffers.

	{
		int iBuf;
		for(iBuf = 0; iBuf < bufferCount; iBuf = iBuf + 1) {
			struct v4l2_buffer buf;
			memset(&buf, 0, sizeof(struct v4l2_buffer));

			buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
			buf.memory = V4L2_MEMORY_MMAP;
			buf.index = iBuf;

			if(xioctl(hHandle, VIDIOC_QBUF, &buf) == -1) {
				printf("%s:%u Queueing buffer %d failed ...\n", __FILE__, __LINE__, iBuf);
				deviceClose(hHandle);
				return 2;
			}
		}
	}

Whenever the device is ready the processing loop will use VIDIOC_DQBUF to pop the oldest filled buffer from the output queue. This is a blocking call - that can also be realized using standard select, epoll or kqueue asynchronous processing functions in case O_NONBLOCK had been set during the open. Usually one wants to re-enqueue the buffer after having finished processing or having copied the data for further processing.

	int iFrames = 0;
	while(iFrames < numFrames) {
		struct kevent kev;
		struct v4l2_buffer buf;

		int r = kevent(kq, NULL, 0, &kev, 1, NULL);
		if(r < 0) {
			printf("%s:%u kevent failed\n", __FILE__, __LINE__);
			deviceClose(hHandle);
			return 2;
		}

		if(r > 0) {
			/* We got our frame or EOF ... try to dqueue */
			memset(&buf, 0, sizeof(struct v4l2_buffer));

			buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
			buf.memory = V4L2_MEMORY_MMAP;

			if(xioctl(hHandle, VIDIOC_DQBUF, &buf) == -1) {
				if(errno == EAGAIN) { continue; }

				printf("%s:%u DQBUF failed\n", __FILE__, __LINE__);
				deviceClose(hHandle);
				return 2;
			}

			printf("%s:%u Dequeued buffer %d\n", __FILE__, __LINE__, buf.index);

			/* ToDo: Process image ... */

			/* Re-enqueue */
			if(xioctl(hHandle, VIDIOC_QBUF, &buf) == -1) {
				printf("%s:%u Queueing buffer %d failed ...\n", __FILE__, __LINE__, buf.index);
				deviceClose(hHandle);
				return 2;
			}

			iFrames = iFrames + 1;
		}
	}

The last two important functions start and stop the stream processing. These are VIDIOC_STREAMON and VIDIOC_STREAMOFF. Of course one should start streaming before running the event processing loop.

	{
		/* Enable streaming */
		enum v4l2_buf_type type;

		type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

		if(xioctl(hHandle, VIDIOC_STREAMON, &type) == -1) {
			printf("%s:%u Stream on failed\n", __FILE__, __LINE__);
			deviceClose(hHandle);
			return 2;
		}
	}

	{
		/* Disable streaming */
		enum v4l2_buf_type type;

		type = V4L2_BUF_TYPE_VIDEO_CAPTURE;

		if(xioctl(hHandle, VIDIOC_STREAMOFF, &type) == -1) {
			printf("%s:%u Stream off failed\n", __FILE__, __LINE__);
			deviceClose(hHandle);
			return 2;
		}
	}

Using read/write interface

The usage of the read/write interface will be added in near future (hopefully). Note that it’s usually not supported by webcams on FreeBSD anyways.

Writing frames into a JPEG file using libjpeg

The process of writing an raw image into a JPEG file has been discussed in a previous blog post. The major remaining task is to convert the read image into the format accepted by libjpeg. In my application I had to convert the YUV422 format into RGB888. In YUV422 there are always two luminance values as well as a single set of chroma values per sample - two pixels share the chroma values but have different luminance values.

Simple sample (FreeBSD, streaming mmap)

External references

One real great resource that I’ve found during writing this article has been the Video4Linux2 API introduction on LWN.