Let's talk about videos on the internet
In 2018 I took on a side project, you know like one of the eight billion side projects we developers start and abandon. What I wanted to build is similar to what Mux is today.
Fast-forward to November 2022, I start rebuilding my website for the first time in 5 years and while browsing through my abandoned git repositories on GitHub , I stumbled across this project.
I decided to make a series on my newly formed YouTube channel where we build a simple video platform.
But before that, let’s take a bit about videos on the internet.
What makes up a video?
Like most file formats out there, a video is technically a zip container made up of pictures, graphics, audio, text, and everything else that makes up a video.
There are many video file formats or containers like the infamous MP4, you may have also heard of WebM, Ogg, Matroska (MKV), AVI, and many more. They are all known as video containers and they each define how pictures, graphics, etc should be organized within the container.
By default, containers are not that great for the internet. Let us use our phones for an example: if you record a minute of video on your phone, the file size can vary from a few megabytes to as much as 12GB and possibly more depending on the phone camera, resolution, and whether you set the RAW option in your camera to on.
Imagine going to YouTube to watch a video that is 10 minutes long and you have to download 120GB of video data. No one will watch or make videos for YouTube or the internet in general… At least not yet.
Data requirements aside, it means most people would need powerful computers with the latest graphic cards and large RAM just to be able to make/play videos online. Devices like mobile phones, smartwatches, and tablets do not even make the cut.
This is where coder-decoders or codecs come in, since we already know that video files are basically zip files, we use coders to tell the containers how to compress the contents of the video into more computer gibberish that is smaller and very efficient.
Decoders understand this gibberish and can decode them on end-devices.
Using the video example above, if we encode the video using the h.265
codecs we can reduce the size of that video 25% - 65% of its original size. The best part is, to the untrained eye (90% of end-users) the video quality is barely affected.
While some significant improvements have been made, it’s still not enough for the internet.
Adaptive and Live streaming
Going forward let us assume we have a 10-minute video that is 1080p in quality and compressed to about 1GB of data.
As we can probably guess, we don’t want to go around waiting for a gigabyte of video to download before you can play it. Also, network conditions may not be ideal to play the video. This is where “Adaptive Streaming” comes in.
Adaptive streaming allows us to break a large video file into smaller segments which are typically 2-10secs long.
Instead of this monolith:
media
└── demo-video.mp4
We break it down into segments, in our case, each segment is 5-seconds long which means each segment is about 85MB in size:
media
└── demo-video
├── segment-0-5.mp4
├── segment-6-10.mp4
├── ...
└── segment-56-60.mp4
Then we can tell our video player to play the first segment, while playing it will preload the next segment, rinse and repeat until the video has been fully played.
This way we download little chunks of video data to play which results in a smoother experience.
To make the experience even better we can adjust for internet conditions and further encode the videos into smaller resolutions.
media
└── demo-video
├── 144p
│ ├── segment-0-5.mp4
│ ├── segment-6-10.mp4
│ ├── ...
│ └── segment-56-60.mp4
├── ....
└── 1080p
├── segment-0-5.mp4
├── segment-6-10.mp4
├── ...
└── segment-56-60.mp4
After making this adjustment, our video player can first do a test to determine the user’s internet speed and serve them a video quality that will give them a better experience. Or, the user can pick a preferred quality and that gets automatically served.
The original video that was uploaded can be kept in cold storage, somewhere without public access, while these generated bits and pieces can be kept on a CDN for faster delivery.
Live streaming
For live streaming, we do the same thing as above only in real time. It is a more complex pipeline than simplified in this article, but we will revisit the old code and build an example video platform. Follow on
YouTubeStreaming formats
Now that our video has been broken into bits and pieces, we need to let the video player know the locations of those bits. You may be tempted to return a JSON file from your API containing the segments in the order that they should be played.
Thankfully, we don’t have to worry about that, some standards have been worked on for many years to achieve this.
Similar to the HTML specification used to describe a website’s content, we have transport protocols used to describe video content.
- Dynamic Adaptive Streaming over HTTP (DASH): This is my favorite and recommended format and it’s used by companies such as YouTube and Netflix. It is an independent, open, and international standard.
- HTTP Live Streaming (HLS): Created by Apple, was the major format at one point, probably still is, but it’s not as open as DASH.
- Others are Smooth Streaming from Microsoft and Adobe’s HTTP Dynamic Streaming.
HTML5 video
We are finally at the point where we can put a video on a website. HTML5 has a <video />
tag where we can define our video source, and enable playback controls.
<video src="/media/demo-video.mp4" controls />
Then we can programmatically control the video player using JavaScript:
const player = document.querySelector('video')
let isPlaying = false
playPauseButton.addEventListener('click', function () {
if (isPlaying) {
player.pause()
} else {
player.play()
}
isPlaying = !isPlaying
})
Recent browsers support the DASH format, in the same way as we specify an MP4
, we can also use MPD
for the DASH format.
<video src="/media/demo-video.mpd" controls />
By doing this, the browser will automatically adjust the playback based on the user’s network conditions. For compatibility with older browsers, we can use Media Source API to load the DASH manifests.
Real-world pipeline
Now that we have some understanding of video, file formats, and codecs. How would we build our app to handle videos?
Recap: we need a way to make sure the video works on every web browser, smart phone, smart device, computer, operating system, internet connection, and what have you.
A less naive pipeline
- Upload a video to the server. - Convert video to different resolutions like
144p, 240p, 360p, 480p, 720p etc using the proper containers and codecs. - Put
video assets and generated manifests on a CDN. - Deliver using transport
protocols like DASH. - Use the native HTML5
<video />
tag in conjunction with Media Source API to play the video. - Support live streaming.
Up next
We will build a simple YouTube clone on YouTube so subscribe to get updated. If you want to stay in touch and receive the latest updates, please give me a follow on Twitter .