WebRTC Media Streams
14 Apr 2020 - tsp
Last update 05 Jul 2020
8 mins
WebRTC is currently a really hot topic in the web development and JavaScript
community. Itās a set of protocols and APIs that can be used for real-time
communication from within the typical browser environment it supports APIs
that allow media capture into media streams (from screen sharing, webcams
and other similar sources) as well as APIs and protocols that allow for
peer to peer communication from within the browser. Strictly spoken these
two features are not required to be used together. One can use the
media capture APIs to simply gather data inside the browser, do some
image processing (like implementing some object recognition using tensorflow.js)
or perform any other manipulation on the captured data. On the other hand
the streaming APIs can be used to do peer to peer communication between
browsers and provide an more flexible way of communication than classical
WebSockets that only allow communicating with a webserver. They can be used
for audio and video streams, file transfers and any other data transfer between
clients. To perform P2P connections the framework also uses interactive connection
establishment (ICE) as specified by RFC8455.
This allows client behind network address translation to build up P2P connections
in many cases (at least via UDP, there exists a similar technique STUNT for
TCP but that would require root privileges on network interfaces to inject
fake packets into non-established TCP connections).
The following article provides a summary on how to capture media, the data
transfer will be handled in a later blog post.
First off there are two different media device classes known to WebRTC:
userMedia summarizes webcams and similar devices
displayMedia is primarily used to implement screen sharing
Requesting a media device is always an asynchronous action. The user
will always be asked - for webcams on the first visit of the application,
for desktop sharing always - which device to use or which window or screen
should be shared. As usual for asynchronous options these APIs return
a promise that can be used to determine when the action completed
asynchronously. If one wantās to code in an synchronous style one has
as usual to use an async function oneself and can then await
for a promise. If one wants to perform actions in a more modern
event driven way one should use then
The following example is going to illustrate both ways of querying
a shared screen or a webcam and assign them as a local video source.
The video target is simply the well known HTML5 video element:
Note: Currently browsers require webpages to be served from sources
that they determine to be secure. That means that the page cannot
be opened from a local file and cannot be served through plain HTTP.
<!DOCTYPE html>
<html>
<head>
<title> Simple capture example </title>
</head>
<body>
<h1 id="top">Simple capture example</h1>
<video id="videoTarget" autoplay playsinline controls="false"> </video>
</body>
</html>
Now one can simply request the media device for screen sharing and after
successful completing assign it to an video element:
async function screenCaptureExample() {
if(!navigator.mediaDevices) {
console.log("Failed to query media devices");
return;
}
let targetElement = document.getElementById('videoTarget');
let mediaStream = await navigator.mediaDevices.getDisplayMedia();
targetElement.srcObject = mediaStream;
}
Alternatively one can use the event driven approach:
function screenCaptureEventExample() {
if(!navigator.mediaDevices) {
console.log("Failed to query media devices");
return;
}
let mediaStream = navigator.mediaDevices.getDisplayMedia();
mediaStream.then((stream) => {
let targetElement = document.getElementById('videoTarget');
targetElement.srcObject = stream;
}).catch((err) => {
console.log("Capture example failed: "+err);
});
}
The same method can be used to query user media (i.e. for example webcams):
async function screenCaptureExample() {
if(!navigator.mediaDevices) {
console.log("Failed to query media devices");
return;
}
let targetElement = document.getElementById('videoTarget');
let mediaStream = await navigator.mediaDevices.getUserMedia({ audio : false, video : true});
targetElement.srcObject = mediaStream;
}
function screenCaptureEventExample() {
if(!navigator.mediaDevices) {
console.log("Failed to query media devices");
return;
}
let mediaStream = navigator.mediaDevices.getUserMedia({ audio : false, video : true});
mediaStream.then((stream) => {
let targetElement = document.getElementById('videoTarget');
targetElement.srcObject = stream;
}).catch((err) => {
console.log("Capture example failed: "+err);
});
}
As one can see one has to specify constraints for the user media source (i.e.
should it support audio or video). Itās possible to specify additional constraints
like a minimum or maximum resolution for video:
{
video : {
width : { min: 320, ideal : 640, max : 1280 },
height: { min : 240, ideal : 480, max : 720 }
}
}
Note that there are additional constraints (for example to select a specific
device, etc.). This wonāt be shown in this short blog post.
Taking snapshots
Taking snapshots of video frames into a canvas is pretty simple. Just create
a context and use drawImage to transfer image data from the video frame
into the canvas. This works because the video element just exposes the same
object methods as HTMLVideoElement.
To demonstrate this we just add a simple canvas to the page:
<button id="takeSnapshot"> Take snapshot </button>
<canvas id="snapshotCanvas"> </canvas>
window.onload = function() {
// ... old code ...
document.getElementById('takeSnapshot').addEventListener('click', () => {
takeSnapshotToCanvas();
});
}
async function takeSnapshotToCanvas() {
let canvas = document.getElementById("snapshotCanvas");
let ctx = canvas.getContext('2d');
let videoElement = document.getElementById('videoTarget');
ctx.drawImage(videoElement, 0, 0, 320, 240);
}
Doing simple manipulation directly inside the browser
Now one can do simple image manipulation inside the canvas as usual. For
example we can implement a simple greyscale filter again just to show how
to access pixels. Note that this is by far not the most performant way
to do such stuff (in fact itās the most unperformant one - one shouldnāt
use that in production but itās nice for experimentation).
Again weāll add another button to the page and attach an event handler
inside the onload function:
<button id="takeSnapshotGreyscale"> Take snapshot in greyscale </button>
window.onload = function() {
// ... old code ...
document.getElementById('takeSnapshotGreyscale').addEventListener('click', () => {
takeSnapshotToCanvas();
filterCanvas();
});
}
function filterCanvas() {
let canvas = document.getElementById("snapshotCanvas");
let ctx = canvas.getContext('2d');
let imgData = ctx.getImageData(0, 0, ctx.canvas.width, ctx.canvas.height);
let pixelData = imgData.data;
for(let i = 0; i < pixelData.length; i += 4) {
let intensity = parseInt(pixelData[i] * 0.299 + pixelData[i+1] * 0.587 + pixelData[i+2]*0.114);
pixelData[i] = intensity;
pixelData[i+1] = intensity;
pixelData[i+2] = intensity;
}
ctx.putImageData(imgData, 0, 0);
}
Converting an captures image into something to be passed around
The easiest way is to use a data URL - in this case the whole image content will
be base 64 encoded in the requested format to be passed around or stored as URL
safe parameter:
let canvas = document.getElementById("snapshotCanvas");
let dataUri = canvas.toDataURL("image/png");
// now dataUri can be passed to img.src or via any available data channel
There is a simple hack of assigning an image as data URI to the href
attribute of an anchor element (<a>) and letting the link event
bubble. This allows to trigger the default action of the browser when
accessing the image.
As one has seen above itās pretty simply to access pixels out of a canvas.
This filter and snapshot function can of course be applied on a continuous
basis. For example one can use the requestAnimationFrame function to
periodically fetch frames from the video element and transfer them into
a canvas. This canvas doesnāt have to exist visible on the page - it can
be kept as JavaScript object. The same is true for the video element:
var videoElement = document.createElement('video');
videoElement.srcObject = mediaStream;
var processingBuffer = document.createElement('canvas');
function processFrame() {
let ctx = processingBuffer.getContext('2d');
ctx.drawImage(videoElement, 0, 0, 320, 240);
// Add additional manipulation code here
window.requestAnimationFrame(processFrame);
}
processFrame();
Now thatās of course only half of the story since the slow pixel by
pixel access stays the same which is not really wise from the standpoint
of performance. One should use - if possible - the power of the GPU to
perform calculations. And thatās exactly what one can do using WebGL shaders.
The trick is to pass the original canvas content as a texture into the
GLSL pipeline and do processing inside the fragment shader by providing
a pretty simple vertex model (for example a cube consisting of two triangles)
onto which the original image is mapped as a texture (data access into
the texture happens using texture2D inside the fragment shader).
This article might be expanded to contain an sample of such processing
soon.
This article is tagged: Programming, Computer Vision