TL;DR

The following code is the minimal implementation of a WebRTC video connection between two peers, implemented in browser JavaScript. There are at least 4 major parts in the implementation, details are given below. Those parts are:

The Code

class PeerConnection {
	constructor() {
		this.incomingIceBuffer = [];
		this.outgoingIceBuffer = [];
		this.canAcceptIce = false;
		this.pc = new RTCPeerConnection();
		this.pc.onicecandidate = (event) => {
			if(event && event.candidate) {
				this.outgoingIceBuffer.push(event.candidate);
			}
		}
	}

	addIceCandidate(candidate) {
		if(candidate) {
			if(this.canAcceptIce) {
				this.pc.addIceCandidate(candidate);	
			}
			else {
				this.incomingIceBuffer.push(candidate);
			}		
		}
	}

	onIceCandidate(callback) {
		this.pc.onicecandidate = (event) => {
			if(event && event.candidate) {
				callback(event.candidate);	
			}
		}

		for(var i in this.outgoingIceBuffer) {
			callback(this.outgoingIceBuffer[i]);
		}
		this.outgoingIceBuffer = [];
	}

	startAcceptingIce() {
		this.canAcceptIce = true;
		for(var i in this.incomingIceBuffer) {
			this.pc.addIceCandidate(this.incomingIceBuffer[i]);
		}
		this.incomingIceBuffer = [];
	}

	getRemoteStream() {
		let remoteReceivers = this.pc.getReceivers();
		return new MediaStream(remoteReceivers.map(el => el.track));
	}
}

class PeerConnectionSender extends PeerConnection {
	constructor() {
		super();
	}

	async createOffer() {
		let userStream = await navigator.mediaDevices.getUserMedia({
			audio:true
		});
		this.pc.addStream(userStream);
		let offer = await this.pc.createOffer();
		await this.pc.setLocalDescription(offer);
		return offer;
	}

	async setAnswer(answer) {
		await this.pc.setRemoteDescription(answer);
		this.startAcceptingIce();
	}
}

class PeerConnectionReceiver extends PeerConnection {
	constructor() {
		super();
	}

	async setOffer(offer) {	
		let userStream = await navigator.mediaDevices.getUserMedia({
			audio:true   });
		this.pc.addStream(userStream);
		await this.pc.setRemoteDescription(offer);
		this.startAcceptingIce();
		let answer = await this.pc.createAnswer();
		await this.pc.setLocalDescription(answer);
		return answer;
	}
}

async function makeConnection() {
	let sender = new PeerConnectionSender();
	let receiver = new PeerConnectionReceiver();
	sender.onIceCandidate((candidate) => {
		receiver.addIceCandidate(candidate);
	});
	receiver.onIceCandidate((candidate) => {
		sender.addIceCandidate(candidate);
	});
	let offer = await sender.createOffer();
	let answer = await receiver.setOffer(offer);
	await sender.setAnswer(answer);
	let remoteStream = receiver.getRemoteStream();
	let sinkElement = document.getElementById("sinkElement");	
	sinkElement.srcObject = remoteStream;
}

Detailed explanation

Offer/answer exchange between peers

WebRTC is a peer-to-peer technology that is used for voice and video communication. The process of establishing a WebRTC connection begins with a negotiation between the two peers, where they exchange their capabilites through an offer/answer exchange. Peer A, who initiates the peer connection, generates an offer and sends it to the receiver. This offer contains capabilities such as number of streams, resolution of the video streams, the bitrate and encoding of the audio streams, networking information, and other necessary data.

The offer is accepted by peer B, who at the time also generates an answer and sends it back to peer A. Once peer A accepts the answer, the negotiation is complete. However, further information needs to be exchanged between the peers for the data to actually start flowing between them.

There are two methods that are used to accept the offer or answer. They are called setLocalDescription and setRemoteDescription. The input of these functions is either the offer or answer, and depends on who generated them. In general, you need to call setLocalDescription on the offer or answer that you generated, and setRemoteDescription on the offer or answer sent by the other peer.

Peer A, who generated the initial offer, uses setLocalDescription on the offer that he generated, and setRemoteDescription on the answer he receives. Peer B first calls setRemoteDescription on the offer sent by Peer A, then calls setLocalDescription on the answer that he generated.

The code that implements this functionality is defined in the PeerConnectionSender class, which represents the initiator of the offer, and the PeerConnectionReceiver who accepts the offer and generates the answer.

ICE candidate exchange

The peers also need to exchange ICE candidates, which contain networking information needed to establish the peer-to-peer link. The ICE candidates are generated asynchronously as soon as the setLocalDescription method resolves. Therefore, you must supply an event handler to the onicecandidate function that captures the ICE candidates. In this implementation, a helper method onIceCandidate has been added to the PeerConnection class that abstracts some of the details and captures the ICE candidates in an internal buffer, in case you forget to supply a handler before calling setLocalCandidate.

The generated ICE candidates should be sent to the remote peer. The remote peer then adds the ICE candidates by calling addIceCandidate. The addIceCandidate method cannot be called unless setRemoteDescription has been called first, but this implementation adds a helper method, also called addIceCandidate that stores the incoming ICE candidates in a buffer until the peer connection is ready to accept them.

Once the ICE candidates have been send and accepted by both parties, the connection is fully established and data is flowing between the peers.

Handling incoming/outgoing streams

Of course, to communicate using WebRTC we must have a way to add our streams to the peer connection object and get the remote streams once the connection is established.

Local streams can come from many places, such as files, a canvas element, the microphone and camera, etc. In our example, we want to access the microphone and camera, and to do this we use the getUserMedia method from the navigator.mediaDevices object. Once called, the getUserMedia method prompts the user to give access to the requested devices, and if access is given, the media stream object is ready to be used. This media stream object is passed to the addStream method of the peer connection, and must be called before setting the local or remote description.

You can in fact add streams to the peer connection at a later point, but that use case is not covered here.

Once the WebRTC connection is established, getting the remote streams is not hard at all. On the peer connection, we get the remote receivers by calling getReceivers, and we pass the tracks of these receivers to the constructor of a new MediaStream object. In our case, there is a helper method called getRemoteStreams that does all of this, and returns the MediaStream object.

To display the remote video, simply assign the remote stream to a video element’s srcObject property.

After all of this effort, we finally got a WebRTC video chat working.

Signaling server

Throughout the article, it’s been taken for granted that the clients have some way of communicating and sending the offers, answers, and ICE candidates to each other. I will be explaining some aspects of how this communication should take place, but I will not be supplying any code, just some rough guidelines.

In our example, both the peers are running on the same browser window, so they don’t need an external service to coordinate their messaging. In a real app, the two peers would be running in two different browser instances, meaning they would need to coordinate their communication through a centralized server. This server is responsible for registering the peers, sending the offers, answers, ICE candidates to the proper recipients, and notifying the peers of any changes in the state of the call. The existence of this server makes the claim that WebRTC is a peer-to-peer technology a bit shaky, and while it is true that the data flows directly between the two peers, a server is needed to coordinate between them.

Usually, the messaging channel to the server requires a real time component, therefore WebSockets is my preferred tech to use in this case. Usually, the way it works is that you need to serialize the offer using JSON.stringify, then send them off to the server using WebSockets, which can forward them to the other peer. The other peer then deserializes the offer, generates the answer, and sends it back to the server using WebSockets. The same technique can be used for ICE candidates, as well as control messages, such as user joined, user left, text messaging, and more.

Is that it?

Yes, this is a complete and working implementation, but there are quite a bit of edge cases to consider before being ready for production, such as STUN and TURN servers, additional events, error handling, and more. Still, it is a good starting point that will get you pointed in the right direction.