반응형
Comparing Adaptive HTTP Streaming Technologies
적응형 HTTP 스트리밍 기술 비교
A Comparison of Apple’s HTTP Live Streaming (HLS), Microsoft’s Silverlight Smooth Streaming (MSS) and Adobe’s HTTP Dynamic Streaming (HDS)
1. Introduction
Watching video online or on the go on a tablet or mobile phone is no longer a novelty, nor is streaming Internet-delivered content to the TV in your living room. Driven by the boom in video-enabled devices, including PCs, smartphones, and Internet-enabled set-top boxes and televisions, consumers have rapidly moved through the early-adopter phase of TV Everywhere to the stage where a growing mass of consumers expect that any media should be available on any device over any network connection, delivered at the same high quality they’ve come to expect from traditional television services. This explosion of multiscreen IP video – whether regarded as disruption for traditional pay-TV providers or an opportunity to expand their services – is definitely here to stay.
온라인으로 또는 타블렛이나 휴대폰으로 비디오를 시청하는 것은 더 이상 참신한 일이 아니며 거실에서 TV로 인터넷으로 제공되는 콘텐츠를 스트리밍하는 것도 아닙니다. PC, 스마트 폰, 인터넷 가능 셋톱 박스 및 TV를 포함한 비디오 지원 장치의 호황에 힘 입어 소비자들은 TV Everywhere의 얼리 어답터 단계를 거치면서 급속도로 증가하는 소비자층이 모든 미디어는 모든 네트워크 연결을 통해 모든 장치에서 사용할 수 있어야하며 기존의 텔레비전 서비스에서 기대했던 것과 동일한 고품질로 제공되어야합니다. 이러한 다중 스크린 IP 비디오의 폭발은 전통적인 유료 TV 제공 업체의 혼란으로 간주 되든 서비스를 확장 할 수있는 기회로든 계속 머물러 있습니다.
While tremendous advancements in core and last mile bandwidth have been achieved in the last decade – primarily driven by Web-based data consumption – video traffic represents a leap in bandwidth requirements. This, coupled with the fact that the Internet at large is not a managed quality-of-service (QoS) environment, requires that new methods of video transport be considered to provide the same quality of video experience across any device and network that consumers have come to expect from managed TV delivery networks.
The evolution of video delivery transport has led to a new set of de facto standard adaptive delivery protocols from Apple, Microsoft and Adobe that are now positioned for broad adoption. Consequently, networks must now be equipped with servers that can take high-quality video content from live streams or file sources and ‘package’ it for transport to devices ready to accept these new delivery protocols.
This paper gives a technical comparison of the three main HTTP adaptive streaming technologies available today: Apple’s HTTP Live Streaming (HLS), Microsoft Silverlight Smooth Streaming (MSS) and Adobe’s HTTP Dynamic Streaming (HDS), also previously referred to as ‘Zeri.’ The paper is divided into three main parts: (1) it begins by giving an overview of adaptive HTTP streaming, discussing delivery architectures, highlighting its strengths and weaknesses, and discussing live and video-on-demand (VoD) delivery; (2) it then delves into each technology, explaining how they work and highlighting how each technology is different from the others; (3) finally, it looks at specific features and describes how they are implemented or deployed. This last section focuses on:
Delivery of multiple audio channels
Encryption and DRM
Closed captions / subtitling
Ability to insert ads
Custom VOD playlists
Trick modes (fast-forward/rewind, pause)
Fast channel change
Failover due to upstream issues
Stream latency
Ability to send other data to the client, including manifest compression
다중 오디오 채널 제공
암호화 및 DRM
자막 / 자막
광고 삽입 기능
사용자 지정 VOD 재생 목록
트릭 모드 (빨리 감기 / 되감기, 일시 중지)
빠른 채널 변경
업스트림 문제로 인한 장애 조치
스트림 대기 시간
매니페스트 압축을 포함하여 클라이언트에 다른 데이터를 보낼 수있는 기능
2. Adaptive HTTP Video Delivery
2. 적응 형 HTTP 비디오 전달
Background
In ‘traditional’ IP streaming, a video server sends video to a client at a fixed bandwidth. The client and server must have synchronized states: e.g. ‘video is stopped,’ ‘video is playing,’ ‘video is paused,’ etc. Traditional video streaming is typically delivered over UDP, a connectionless protocol in which packet losses result in poor quality-of-experience (QoE) and which has difficulty passing through firewalls in home routers. Traditional streaming can adapt to changes in network bandwidth, but only with complex synchronization between the server and client, and as a result, such adaptive protocols were never widely adopted.
In comparison, in adaptive HTTP streaming the source video, whether a file or a live stream, is encoded into file segments – sometimes referred to as ‘chunks’ or ‘segments’ – using a desired format, which includes a container, video codec, audio codec, encryption protocol, etc. Segments typically represent two to ten seconds of video. The stream is broken into segments at video Group of Pictures (GOP) boundaries that begin with an IDR frame (a frame that can be independently decoded without dependencies on other frames), giving each segment independence from previous and successive segments. The segments are subsequently hosted on a regular HTTP server. Each sequence of segments is called a profile. Profiles may differ in bitrate, resolution, codec or codec profile/level.
Clients play the stream by requesting segments in a profile from a Web server, downloading them via HTTP. As the segments are downloaded, the client plays back the segments in the order requested. Since the segments are sliced along GOP boundaries with no gaps between, video playback is seamless – even though it is actually just a collection of independent file downloads via a sequence of HTTP GET requests.
Adaptive delivery enables a client to ‘adapt’ to fluctuating network conditions by selecting video segments from different profiles. The client can easily compute the available network bandwidth by comparing the download time of a segment with its size. If the client has a list of available profile bitrates (or resolutions or codecs), it can determine if it must change to a lower bitrate/resolution profile or whether the available bandwidth allows it to download segments from a higher bitrate/resolution profile. This list of available profiles is called a manifest or playlist. The client’s bandwidth calculation is repeated at every chunk download, and so the client can adapt to changing network bandwidth or other conditions every few seconds. Aside from network bandwidth, conditions that may affect the client’s choice of profile may include local CPU load or the client’s ability to play back a specific codec or resolution.
This delivery model works for both live- and file-based sources. In either case, a manifest file is provided to the client. The manifest lists the bitrates (and potentially other data) associated with the available stream profiles and the client uses it to determine how to download the chunks; that is, what URL to use to fetch chunks from specific profiles. In the case of an on-demand file request, the manifest contains information on every chunk in the content. In the case of live streaming, this isn’t possible. HLS and HDS deliver a ‘rolling window’ manifest data that contains references to the last few available chunks, as shown in Figure 1. The client must update its manifest repeatedly in order to know about the most recently available chunks. MSS delivers information in each chunk that lets the client access subsequent chunks, so no rolling window-type manifest is needed.
Figure 1. The content delivery chain for a live adaptive HTTP stream. The client downloads the ‘rolling window’ manifest files that refer to the latest available chunks. The client then uses these references to download chunks and play them back sequentially. In the figure, the first manifest refers to chunks 3, 4, and 5, which are available in multiple bitrates. As new chunks become available the playlist is updated to reference the latest available chunks.
Adaptive HTTP streaming has a number of advantages over traditional streaming:
Lower infrastructure costs for content providers by eliminating specialty streaming servers in lieu of generic HTTP caches/proxies (that may already be in place for HTTP data delivery);
Content delivery is dynamically adapted to the weakest link in the end-to-end delivery chain, including highly varying last mile conditions;
Subscribers no longer need to statically select a bitrate on their own, as the client can now perform that function dynamically and automatically;
Subscribers enjoy fast start-up and seek times as playback control functions can be initiated via the lowest bitrate chunks and subsequently ratcheted up to a higher bitrates;
Annoying user experience shortcomings, including long initial buffer time, disconnects and playback start/stop are virtually eliminated;
The client can control bitrate switching – with no intelligence in the server – taking into account CPU load, available bandwidth, resolution, codec and other local conditions;
HTTP delivery works through firewalls and NAT.
일반 HTTP 캐시 / 프록시 (HTTP 데이터 전송을 위해 이미 설치되어있을 수 있음) 대신 전문 스트리밍 서버를 제거하여 컨텐츠 공급자의 인프라 비용을 낮 춥니 다.
콘텐츠 전달은 최종 마일 조건의 변화가 심한 경우를 포함하여 종단 간 전달 체인에서 가장 취약한 링크에 동적으로 적용됩니다.
가입자가 동적으로 자동으로 해당 기능을 수행 할 수 있기 때문에 구독자는 더 이상 정적으로 비트 전송률을 선택할 필요가 없습니다.
가입자는 재생 제어 기능을 최저 비트 전송률 청크를 통해 시작하고 이후 높은 비트 전송률까지 래치 업 할 수 있으므로 빠른 시동 및 탐색 시간을 누릴 수 있습니다.
긴 초기 버퍼 시간, 연결 해제 및 재생 시작 / 중지를 포함하여 성가신 사용자 경험상의 단점이 거의 제거됩니다.
클라이언트는 CPU로드, 사용 가능한 대역폭, 해상도, 코덱 및 기타 로컬 조건을 고려하여 서버의 인텔리전스없이 비트율 전환을 제어 할 수 있습니다.
HTTP 전달은 방화벽과 NAT를 통해 작동합니다.
Adaptive HTTP streaming also has some consequences:
The clients must buffer a few chunks to make sure they don’t starve their input buffers, which increases the end-to-end latency of live streams;
HTTP is based on TCP; when packet loss is low, TCP recovers well and this means that video playback has no artifacts caused by missing data. However, when packet loss rises, TCP can fail completely. Thus, overall, clients will typically have good quality playback or playback that stops completely, as opposed to quality that degrades proportionally to the amount of packet loss in the delivery network. The internet is generally reliable enough so that the benefit of completely clean video at low packet drop rates (with TCP) supersedes the value of some, but very poor quality, video at high packet drop rates (with UDP).
클라이언트는 입력 버퍼가 부족하여 라이브 스트림의 종단 간 대기 시간이 길어 지도록 몇 개의 청크를 버퍼링해야합니다.
HTTP는 TCP를 기반으로합니다. 패킷 손실이 적 으면 TCP가 잘 복구되므로 비디오 재생시 누락 된 데이터로 인한 아티팩트가 발생하지 않습니다. 그러나 패킷 손실이 발생하면 TCP가 완전히 실패 할 수 있습니다. 따라서 전반적으로 클라이언트는 일반적으로 전송 네트워크의 패킷 손실 양에 비례하여 품질이 저하되는 것과는 대조적으로 완전히 멈추는 양질의 재생 또는 재생을 갖습니다. 인터넷은 일반적으로 충분히 신뢰할 만하므로 낮은 패킷 드롭 율 (TCP 사용)에서 완전하게 깨끗한 비디오의 이점은 높은 패킷 드롭 율 (UDP 사용시)의 일부 품질 값보다 비디오 품질이 우선합니다.
2.1 Video Delivery Components
2.1 영상 전송 분대
At a high level, the components in an adaptive HTTP streaming data flow consist of an encoder or transcoder, a packager (also called a segmenter or a fragmenter), and a content delivery network (CDN). This section discusses the features of these components (see Figure 2) that are related to adaptive streaming.
Figure 2. The components of an HTTP streaming system, with some highlighted features.
그림 2. HTTP 스트리밍 시스템의 구성 요소 (일부 강조 표시된 기능 포함)
The Encoder/Transcoder
The transcoder (or encoder, if the input is not already encoded in another format) is responsible for ingesting the content and preparing it for segmentation. The transcoder must process the video in the following way:
The output video must be in progressive format. The transcoder must therefore de-interlace the input.
The output video must be scaled to resolutions suitable for the client device.
The different output profiles must be IDR-aligned so that the client playback of the chunks created from each profile is continuous and smooth.
Audio must be transcoded into AAC audio, the codec used by HLS, HDS, and MSS.
The same encoded audio stream needs to be streamed on all the output video profiles; this avoids clicking artifacts during client-side profile changes.
If SCTE 35 is used for ad insertion, it is desirable to have the transcoder add IDR frames at the ad insertion points so that the video is ready for ad insertion. It is then possible to align the chunk boundaries with the ad insertion points, so that ad insertion can be done more easily (via substitution of chunks) as compared to the traditional stream splicing.
A desirable fault tolerance mechanism allows two different transcoders that ingest the same input to create identically IDR-aligned output. This can be used to create a redundant backup of encoded content in such a way that any failure of the primary transcoder is seamlessly backed up by the secondary transcoder.
출력 비디오는 프로그레시브 형식이어야합니다. 따라서 트랜스 코더는 입력을 디인터레이스해야합니다.
출력 비디오는 클라이언트 장치에 적합한 해상도로 조정되어야합니다.
각 프로필에서 생성 된 청크의 클라이언트 재생이 연속적이고 원활하게 이루어 지도록 여러 출력 프로필을 IDR로 정렬해야합니다.
오디오는 HLS, HDS 및 MSS에서 사용하는 코덱 인 AAC 오디오로 코드 변환되어야합니다.
모든 출력 비디오 프로필에서 동일한 인코딩 된 오디오 스트림을 스트리밍해야합니다. 이렇게하면 클라이언트 측 프로파일 변경 중에 아티팩트를 클릭하지 않아도됩니다.
SCTE 35를 광고 삽입에 사용하는 경우 비디오가 광고 삽입 준비가 될 수 있도록 광고 삽입 지점에 IDR 프레임을 추가하는 것이 바람직합니다. 그런 다음 청크 경계를 광고 삽입 지점과 정렬하여 기존의 스트림 연결과 비교하여 (청크 대체를 통해) 광고 삽입을보다 쉽게 수행 할 수 있습니다.
바람직한 내결함성 메커니즘을 통해 동일한 입력을 처리하는 두 개의 서로 다른 트랜스 코더가 동일하게 IDR로 정렬 된 출력을 생성 할 수 있습니다. 이는 1 차 트랜스 코더의 모든 장애가 2 차 트랜스 코더에 의해 원활하게 백업되는 방식으로 인코딩 된 컨텐츠의 중복 백업을 생성하는 데 사용될 수 있습니다.
Because the quality of experience of the client depends on having a number of different profiles, it is desirable to have an encoder that can output a large number of different profiles for each input. Deployments may use anywhere from 4 to 16 different output profiles for each input, with more profiles resulting in more supported devices and a better user experience. The table below shows a typical use case for the different output profiles:
The Packager
The packager is the component that takes the output of the transcoder and packages the video for a specific delivery protocol. The packager should have the following features:
Encryption capability – the packager should be able to encrypt the outgoing chunks in a format compatible with the delivery protocol.
Integration with third party key management systems – the packager should be able to receive encryption keys from a third party key management server that is also used to manage and distribute the keys to the clients.
Packagers may ingest live streams or files, depending on whether the workflow is live or on-demand.
Packagers should support multiple ways to deliver the chunks, either via HTTP pull or by pushing the chunks via a network share or HTTP PUT / POST.
암호화 기능 - 패키저는 전송 프로토콜과 호환되는 형식으로 발신 청크를 암호화 할 수 있어야합니다.
타사 키 관리 시스템과의 통합 - 패키저는 타사 키 관리 서버에서 암호화 키를받을 수 있어야합니다. 타사 키 관리 서버는 키를 관리하고 클라이언트에 배포하는 데 사용됩니다.
워크 플로가 라이브인지 주문형인지에 따라 패키지 관리자가 라이브 스트림이나 파일을 수집 할 수 있습니다.
패키지는 HTTP 풀 또는 네트워크 공유 또는 HTTP PUT / POST를 통해 청크를 밀어 청크를 전달하는 여러 가지 방법을 지원해야합니다.
The CDN
The content delivery network (CDN) needed for HTTP streaming is not specialized – it is HTTP-based and doesn’t require any special streaming servers. For live delivery, it is beneficial to tune the CDN to age out older chunks rapidly, as there is no need to keep them around long. The actual duration depends on the duration of the chunks and latency in the client, but a minute is normally sufficient.
It is important to note that the number of chunks can be very large. For example, a day’s worth of 2-second chunks delivered in 10 different profiles for 100 different channels creates 43 million files! Thus the CDN must be able to cope with a large number of files as well.
The Client
HLS is available on iOS devices, but availability on Windows is via third party products that are not always complete or robust. Interestingly, however, Android 3.0 supports HLS natively, though some features aren’t well supported (e.g. stream discontinuity indications). MSS on a PC depends on a Silverlight runtime client which must be installed. But there are native Smooth Streaming clients developed for multiple devices, including iOS tablets, phones and iPods. HDS is native to flash 10.1 and later, and comes with the Flash plug-in on PCs, as well as on Android 2.3 and later devices.
Workflow Architecture
It is valuable to have an architecture that allows the transcoder and packager to be separate. This has the advantage that the input video can be transcoded just once at the core, delivered over a core network to the edge, and packaged into multiple formats at the edge of the network. Without this separation, all the final delivered formats must be delivered over the core network, unnecessarily increasing its bandwidth utilization. This is shown in Figure 3. Note, however, that packet loss on the core network would result in unrecoverable segment loses.
Figure 3. Integrated and remote segmentation of streams: When multiple formats are used, segmenting closer to the edge of the network (shown on the right) can save core bandwidth, as streams only need to be delivered once and can be packaged into multiple delivery formats at the edge. However, if the core network is susceptible to packet loss, segmenting at the core ensures that segments will always be delivered to the CDN (shown on the left).
그림 3. 스트림의 통합 및 원격 세분화 : 여러 가지 형식을 사용하는 경우 스트림의 한 번만 전달하면되고 여러 번 전달 될 수 있으므로 네트워크의 가장자리에 더 가깝게 분할하면 (오른쪽 그림 참조) 코어 대역폭을 절약 할 수 있습니다. 형식으로 표시됩니다. 그러나 코어 네트워크가 패킷 손실의 영향을 받기 쉬운 경우, 코어에서 세그먼트 화하면 세그먼트가 항상 CDN (왼쪽에 표시)에 전달됩니다.
Other Adaptive HTTP Streaming Technologies
There are a number of other adaptive streaming technologies available also, with varying market penetration:
Move Networks went out of business and was purchased by Echostar – their team is now integrating the technology into their home devices. Move was instrumental in popularizing adaptive HTTP streaming and has a number of patents on the technology (though chunked streaming was used before Move popularized it).
3GPP's Adaptation HTTP Streaming (AHS) is part of 3GPPs rel 9 specification (see [AHS]). And 3GPP rel 10 is working on a specification called DASH as well.
The Open TV Forum has an HTTP Adaptive Streaming (HAS) specification (see [HAS]).
MPEG Dynamic Adaptive Streaming over HTTP (DASH) is based on 3GPP’s AHS and the Open TV Forum’s HAS and is close to completion. It specifies use of either fMP4 or transport stream (TS) chunks and an XML manifest (called the media presentation description or MPD) that is repeatedly downloaded. DASH may well become the format of choice in the future, but currently, the lack of client support makes this specification interesting only in theory. DASH
does make allowances for multiple scenarios, including separate or joined streaming of audio, video and data, as well as encryption. However, its generality is as much a drawback as an advantage, since it makes clients complex to implement.
Many DRM vendors have their own variation of these schemes.
Move Networks는 업무를 중단하고 Echostar에서 구입했습니다. 현재이 기술 팀은이 기술을 가정용 장치에 통합하고 있습니다. Move는 적응 형 HTTP 스트리밍의 대중화에 중요한 역할을했으며이 기술에 대한 많은 특허를 보유하고 있습니다 (Move가 대중화되기 전에 청크 스트리밍이 사용되었지만).
3GPP의 AHS (Adaptation HTTP Streaming)는 3GPP rel 9 사양의 일부입니다 ([AHS] 참조). 그리고 3GPP rel 10은 DASH라는 명세에 대해서도 작업하고 있습니다.
공개 TV 포럼에는 HTTP 적응 형 스트리밍 (HAS) 사양이 있습니다 ([HAS] 참조).
HTTP (DASH)를 통한 MPEG 동적 적응 형 스트리밍은 3GPP의 AHS 및 공개 TV 포럼의 HAS를 기반으로하며 완료 단계에 근접합니다. 반복적으로 다운로드되는 fMP4 또는 전송 스트림 (TS) 청크와 XML 매니페스트 (미디어 프레젠테이션 설명 또는 MPD라고 함)를 사용하도록 지정합니다. DASH는 미래의 형식이 될 수 있지만 현재 클라이언트 지원이 부족하여이 사양이 이론적으로 만 흥미 롭습니다. 대시
오디오, 비디오 및 데이터의 별도 스트리밍 또는 합류 스트리밍 및 암호화를 포함한 여러 시나리오를 허용합니다. 그러나 클라이언트를 구현하기가 복잡해지기 때문에 일반성은 장점만큼 많은 단점이 있습니다.
많은 DRM 공급 업체가 이러한 구성표를 자체적으로 변형합니다.
반응형
'# 03 > 프로토콜' 카테고리의 다른 글
WebRTC topology (0) | 2019.02.05 |
---|---|
Comparing Adaptive HTTP Streaming Technologies-2 (0) | 2019.02.05 |
Automated Objective and Subjective Evaluation of HTTP Adaptive Streaming Systems (0) | 2019.02.05 |
MPEG DASH (0) | 2019.02.05 |
HLS (0) | 2019.02.05 |