Let's talk about videos on the internet

Short but detailed overview of how video on the internet works. We will analyze and go through pitfalls and some recommended solutions.
profile picture@KayandraJT

In 2018 I took on a side project, you know like one of the eight billion side projects we developers start and abandon. What I wanted to build is similar to what Mux is today.

Fast-forward to November 2022, I start rebuilding my website for the first time in 5 years and while browsing through my abandoned git repositories on GitHub, I stumbled across this project.

I decided to make a series on my newly formed YouTube channel where we build a simple video platform.

But before that, let's take a bit about videos on the internet.

What makes up a video?

Like most file formats out there, a video is technically a zip container made up of pictures, graphics, audio, text, and everything else that makes up a video.

There are many video file formats or containers like the infamous MP4, you may have also heard of WebM, Ogg, Matroska (MKV), AVI, and many more. They are all known as video containers and they each define how pictures, graphics, etc should be organized within the container.

By default, containers are not that great for the internet. Let us use our phones for an example: if you record a minute of video on your phone, the file size can vary from a few megabytes to as much as 12GB and possibly more depending on the phone camera, resolution, and whether you set the RAW option in your camera to on.

Imagine going to YouTube to watch a video that is 10 minutes long and you have to download 120GB of video data. No one will watch or make videos for YouTube or the internet in general... At least not yet.

Data requirements aside, it means most people would need powerful computers with the latest graphic cards and large RAM just to be able to make/play videos online. Devices like mobile phones, smartwatches, and tablets do not even make the cut.

This is where coder-decoders or codecs come in, since we already know that video files are basically zip files, we use coders to tell the containers how to compress the contents of the video into more computer gibberish that is smaller and very efficient.

Decoders understand this gibberish and can decode them on end-devices.

Using the video example above, if we encode the video using the h.265 codecs we can reduce the size of that video 25% - 65% of its original size. The best part is, to the untrained eye (90% of end-users) the video quality is barely affected.

absoluteWin meme

While some significant improvements have been made, it's still not enough for the internet.

Adaptive and Live streaming

Going forward let us assume we have a 10-minute video that is 1080p in quality and compressed to about 1GB of data.

As we can probably guess, we don't want to go around waiting for a gigabyte of video to download before you can play it. Also, network conditions may not be ideal to play the video. This is where "Adaptive Streaming" comes in.

Adaptive streaming allows us to break a large video file into smaller segments which are typically 2-10secs long.

Instead of this monolith:

media
└── demo-video.mp4

We break it down into segments, in our case, each segment is 5-seconds long which means each segment is about 85MB in size:

media
└── demo-video
    ├── segment-0-5.mp4
    ├── segment-6-10.mp4
    ├── ...
    └── segment-56-60.mp4

Then we can tell our video player to play the first segment, while playing it will preload the next segment, rinse and repeat until the video has been fully played.

This way we download little chunks of video data to play which results in a smoother experience.

To make the experience even better we can adjust for internet conditions and further encode the videos into smaller resolutions.

media
└── demo-video
    ├── 144p
    │   ├── segment-0-5.mp4
    │   ├── segment-6-10.mp4
    │   ├── ...
    │   └── segment-56-60.mp4
    ├── ....
    └── 1080p
        ├── segment-0-5.mp4
        ├── segment-6-10.mp4
        ├── ...
        └── segment-56-60.mp4

After making this adjustment, our video player can first do a test to determine the user's internet speed and serve them a video quality that will give them a better experience. Or, the user can pick a preferred quality and that gets automatically served.

The original video that was uploaded can be kept in cold storage, somewhere without public access, while these generated bits and pieces can be kept on a CDN for faster delivery.

Live streaming

For live streaming, we do the same thing as above only in real time. It is a more complex pipeline than simplified in this article, but we will revisit the old code and build an example video platform. Follow on

YouTube

Streaming formats

Now that our video has been broken into bits and pieces, we need to let the video player know the locations of those bits. You may be tempted to return a JSON file from your API containing the segments in the order that they should be played.

Thankfully, we don't have to worry about that, some standards have been worked on for many years to achieve this.

Similar to the HTML specification used to describe a website's content, we have transport protocols used to describe video content.

HTML5 video

We are finally at the point where we can put a video on a website. HTML5 has a <video /> tag where we can define our video source, and enable playback controls.

<video src="/media/demo-video.mp4" controls />

Then we can programmatically control the video player using JavaScript:

const player = document.querySelector('video')
 
let isPlaying = false
playPauseButton.addEventListener('click', function () {
  if (isPlaying) {
    player.pause()
  } else {
    player.play()
  }
 
  isPlaying = !isPlaying
})

Recent browsers support the DASH format, in the same way as we specify an MP4, we can also use MPD for the DASH format.

<video src="/media/demo-video.mpd" controls />

By doing this, the browser will automatically adjust the playback based on the user's network conditions. For compatibility with older browsers, we can use Media Source API to load the DASH manifests.

Real-world pipeline

Now that we have some understanding of video, file formats, and codecs. How would we build our app to handle videos?

Recap: we need a way to make sure the video works on every web browser, smart phone, smart device, computer, operating system, internet connection, and what have you.

A less naive pipeline

  • Upload a video to the server. - Convert video to different resolutions like 144p, 240p, 360p, 480p, 720p etc using the proper containers and codecs. - Put video assets and generated manifests on a CDN. - Deliver using transport protocols like DASH. - Use the native HTML5 <video /> tag in conjunction with Media Source API to play the video. - Support live streaming.

Up next

We will build a simple YouTube clone on YouTube so subscribe to get updated. If you want to stay in touch and receive the latest updates, please give me a follow on Twitter.

kayandra