Ermir's programming writings - Ermir Suldashi writes about programming and other topics

TL;DR

The following code is the minimal implementation of a WebRTC video connection between two peers, implemented in browser JavaScript. There are at least 4 major parts in the implementation, details are given below. Those parts are:

Offer/answer exchange between peers
ICE candidate exchange
Handling incoming/outgoing streams
Signaling server

The Code

class PeerConnection {
	constructor() {
		this.incomingIceBuffer = [];
		this.outgoingIceBuffer = [];
		this.canAcceptIce = false;
		this.pc = new RTCPeerConnection();
		this.pc.onicecandidate = (event) => {
			if(event && event.candidate) {
				this.outgoingIceBuffer.push(event.candidate);
			}
		}
	}

	addIceCandidate(candidate) {
		if(candidate) {
			if(this.canAcceptIce) {
				this.pc.addIceCandidate(candidate);	
			}
			else {
				this.incomingIceBuffer.push(candidate);
			}		
		}
	}

	onIceCandidate(callback) {
		this.pc.onicecandidate = (event) => {
			if(event && event.candidate) {
				callback(event.candidate);	
			}
		}

		for(var i in this.outgoingIceBuffer) {
			callback(this.outgoingIceBuffer[i]);
		}
		this.outgoingIceBuffer = [];
	}

	startAcceptingIce() {
		this.canAcceptIce = true;
		for(var i in this.incomingIceBuffer) {
			this.pc.addIceCandidate(this.incomingIceBuffer[i]);
		}
		this.incomingIceBuffer = [];
	}

	getRemoteStream() {
		let remoteReceivers = this.pc.getReceivers();
		return new MediaStream(remoteReceivers.map(el => el.track));
	}
}

class PeerConnectionSender extends PeerConnection {
	constructor() {
		super();
	}

	async createOffer() {
		let userStream = await navigator.mediaDevices.getUserMedia({
			audio:true
		});
		this.pc.addStream(userStream);
		let offer = await this.pc.createOffer();
		await this.pc.setLocalDescription(offer);
		return offer;
	}

	async setAnswer(answer) {
		await this.pc.setRemoteDescription(answer);
		this.startAcceptingIce();
	}
}

class PeerConnectionReceiver extends PeerConnection {
	constructor() {
		super();
	}

	async setOffer(offer) {	
		let userStream = await navigator.mediaDevices.getUserMedia({
			audio:true   });
		this.pc.addStream(userStream);
		await this.pc.setRemoteDescription(offer);
		this.startAcceptingIce();
		let answer = await this.pc.createAnswer();
		await this.pc.setLocalDescription(answer);
		return answer;
	}
}

async function makeConnection() {
	let sender = new PeerConnectionSender();
	let receiver = new PeerConnectionReceiver();
	sender.onIceCandidate((candidate) => {
		receiver.addIceCandidate(candidate);
	});
	receiver.onIceCandidate((candidate) => {
		sender.addIceCandidate(candidate);
	});
	let offer = await sender.createOffer();
	let answer = await receiver.setOffer(offer);
	await sender.setAnswer(answer);
	let remoteStream = receiver.getRemoteStream();
	let sinkElement = document.getElementById("sinkElement");	
	sinkElement.srcObject = remoteStream;
}

Detailed explanation

Offer/answer exchange between peers

WebRTC is a peer-to-peer technology that is used for voice and video communication. The process of establishing a WebRTC connection begins with a negotiation between the two peers, where they exchange their capabilites through an offer/answer exchange. Peer A, who initiates the peer connection, generates an offer and sends it to the receiver. This offer contains capabilities such as number of streams, resolution of the video streams, the bitrate and encoding of the audio streams, networking information, and other necessary data.

The offer is accepted by peer B, who at the time also generates an answer and sends it back to peer A. Once peer A accepts the answer, the negotiation is complete. However, further information needs to be exchanged between the peers for the data to actually start flowing between them.

There are two methods that are used to accept the offer or answer. They are called setLocalDescription and setRemoteDescription. The input of these functions is either the offer or answer, and depends on who generated them. In general, you need to call setLocalDescription on the offer or answer that you generated, and setRemoteDescription on the offer or answer sent by the other peer.

Peer A, who generated the initial offer, uses setLocalDescription on the offer that he generated, and setRemoteDescription on the answer he receives. Peer B first calls setRemoteDescription on the offer sent by Peer A, then calls setLocalDescription on the answer that he generated.

The code that implements this functionality is defined in the PeerConnectionSender class, which represents the initiator of the offer, and the PeerConnectionReceiver who accepts the offer and generates the answer.

ICE candidate exchange

The peers also need to exchange ICE candidates, which contain networking information needed to establish the peer-to-peer link. The ICE candidates are generated asynchronously as soon as the setLocalDescription method resolves. Therefore, you must supply an event handler to the onicecandidate function that captures the ICE candidates. In this implementation, a helper method onIceCandidate has been added to the PeerConnection class that abstracts some of the details and captures the ICE candidates in an internal buffer, in case you forget to supply a handler before calling setLocalCandidate.

The generated ICE candidates should be sent to the remote peer. The remote peer then adds the ICE candidates by calling addIceCandidate. The addIceCandidate method cannot be called unless setRemoteDescription has been called first, but this implementation adds a helper method, also called addIceCandidate that stores the incoming ICE candidates in a buffer until the peer connection is ready to accept them.

Once the ICE candidates have been send and accepted by both parties, the connection is fully established and data is flowing between the peers.

Handling incoming/outgoing streams

Of course, to communicate using WebRTC we must have a way to add our streams to the peer connection object and get the remote streams once the connection is established.

Local streams can come from many places, such as files, a canvas element, the microphone and camera, etc. In our example, we want to access the microphone and camera, and to do this we use the getUserMedia method from the navigator.mediaDevices object. Once called, the getUserMedia method prompts the user to give access to the requested devices, and if access is given, the media stream object is ready to be used. This media stream object is passed to the addStream method of the peer connection, and must be called before setting the local or remote description.

You can in fact add streams to the peer connection at a later point, but that use case is not covered here.

Once the WebRTC connection is established, getting the remote streams is not hard at all. On the peer connection, we get the remote receivers by calling getReceivers, and we pass the tracks of these receivers to the constructor of a new MediaStream object. In our case, there is a helper method called getRemoteStreams that does all of this, and returns the MediaStream object.

To display the remote video, simply assign the remote stream to a video element’s srcObject property.

After all of this effort, we finally got a WebRTC video chat working.

Signaling server

Throughout the article, it’s been taken for granted that the clients have some way of communicating and sending the offers, answers, and ICE candidates to each other. I will be explaining some aspects of how this communication should take place, but I will not be supplying any code, just some rough guidelines.

In our example, both the peers are running on the same browser window, so they don’t need an external service to coordinate their messaging. In a real app, the two peers would be running in two different browser instances, meaning they would need to coordinate their communication through a centralized server. This server is responsible for registering the peers, sending the offers, answers, ICE candidates to the proper recipients, and notifying the peers of any changes in the state of the call. The existence of this server makes the claim that WebRTC is a peer-to-peer technology a bit shaky, and while it is true that the data flows directly between the two peers, a server is needed to coordinate between them.

Usually, the messaging channel to the server requires a real time component, therefore WebSockets is my preferred tech to use in this case. Usually, the way it works is that you need to serialize the offer using JSON.stringify, then send them off to the server using WebSockets, which can forward them to the other peer. The other peer then deserializes the offer, generates the answer, and sends it back to the server using WebSockets. The same technique can be used for ICE candidates, as well as control messages, such as user joined, user left, text messaging, and more.

Is that it?

Yes, this is a complete and working implementation, but there are quite a bit of edge cases to consider before being ready for production, such as STUN and TURN servers, additional events, error handling, and more. Still, it is a good starting point that will get you pointed in the right direction.

Calculating distances between elements is a common operation when dealing with graphics. In our case, we want the minimal distance between a point, represented by two coordinates $(P_x,P_Y)$ , and a line, represented by the line formula $y=ax+b$ . There are many different methods of determining this distance, but we will be using calculus, so this method should hopefully be clear to anyone who understands the basics of calculus.

There are two major steps to solving our problem:

Find the distance between our point and all other points in the line
Pick out the smallest of these distances

Finding the distance between two points is not difficult, usually a slightly modified pythagorean formula is used. Given two points $(x_1,y_1)$ and $(x_2,y_2)$ , the distance $d$ is equal to:

$d=\sqrt{(x_2-x_1)^2+(y_2-y_1)^2}$

This formula will always give us a non-negative number, since distances can’t be negative. However, we are not trying to find the distance between two particular points, but between one point $(P_x,P_y)$ and ALL other points in the line $y=ax+b$ . So, we will have to modify our formula to accept not two points, but a point and two variables, $x$ and $y$ , that will give us the distance between $(P_x,P_y)$ and our variables. However, our variable $y$ is dependent on $x$ ( $y$ is a function of $x$ ), so we can rewrite out variables like so: $(x,f(x))$ where $f(x)=ax+b$ .

$d(x)=\sqrt{(P_x-x)^2+(P_y-f(x))^2}$

So now we have a function, the distance function, that will give us the distance between our point $(P_x, P_y)$ , and and all other points in the line $f(x)=ax+b$ . So, given a particular $x$ value, we will have the distance between $(P_x,P_y)$ and $(x,f(x))$ .

But how do we find the minimal distance between our new function and our point? This is where the calculus part comes in. A function reaches it’s minimal value whenever the derivative of that function is equal to 0. We will have to calculate the derivative of our function $d(x)$ . Let’s do it step by step, and by rewriting some of the terms to make it clearer.

$f(x)=ax+b$
$f'(x)=a$

$g(x)=(P_x-x)^2+(P_y-f(x))^2$
$g(x) = {P_x}^2-2{P_x}x+x^2+ {P_y}^2-2{P_y}f(x) + {f(x)}^2$
$g(x) = {P_x}^2-2{P_x}x+x^2+ {P_y}^2-2{P_y}ax-2{P_y}b + {a^2}{x^2}+2axb+{b^2}$

$g'(x)=-2{P_x}+2x-2{P_y}f'(x)+2f(x)f'(x)$
$g'(x)=-2{P_x}+2x-2{P_y}a-2{P_y}b+2{a^2}x +2ab$

Note the use of the chain rule on ${f(x)}^2$

$d(x) = \sqrt{g(x)}$

$d'(x) = {{g'(x)}\over{2\sqrt{g(x)}}}$

We are using the chain rule here as well, inside the square root

$d'(x)={{-2{P_x}+2x-2{P_y}a-2{P_y}b+2{a^2}x +2ab}\over{2\sqrt{{P_x}^2-2{P_x}x+x^2+ {P_y}^2-2{P_y}ax-2{P_y}b + {a^2}{x^2}+2axb+{b^2}}}}$

To find the minimal distance, we set $d'(x)=0$ . Knowing this, we can greatly simplify the formula above:

$d'(x)={{-2{P_x}+2x-2{P_y}a-2{P_y}b+2{a^2}x +2ab}\over{2\sqrt{{P_x}^2-2{P_x}x+x^2+ {P_y}^2-2{P_y}ax-2{P_y}b + {a^2}{x^2}+2axb+{b^2}}}}=0$

We can divide both sides by the denominator. This simplification is generally not always valid, due to cases of division by zero, however, in our case, the only time when there is division by zero is when the distance is zero. In that case, the numerator is also zero, and we already have a solution, since zero is the smallest possible distance. So we will accept this simplification, as it always leads to a correct answer.

$d'(x)={-2{P_x}+2x-2{P_y}a-2{P_y}b+2{a^2}x +2ab}=0$

So let’s give our new formula a try. Given the point $(1,3)$ and the line $y=2x+3$ , we will plug these numbers in $d'(x)$ above to try to find a solution:

$d'(x)=-2*1+2x-2*3*2-2*3*3+2*{2^2}*x+2*2*3=0$
$d'(x)=10x-20=0$
$x-2=0$
$x=2$

Dead simple WebRTC explanation