媒体文件格式分析之FMP4 MP4 中最基本的单元就是Box,它内部是通过一个一个独立的Box拼接而成的。所以,这里,我们先从 Box 的讲解开始,每个 Box 是由 Header 和 Data 组成的,FullBox 是 Box 的扩展,Box 结构的基础上在 Header 中增加 8bits version 和 24bits flags
1. 名词解释
2. 最小单元Box
2.1 常见的mp4文件结构(简化版)

这里,我们按照 MP4 box 的划分来进行相关的阐述。先看一张 MP4 给出的结构图:

一般来说,解析媒体文件,最关心的部分是视频文件的宽高、时长、码率、编码格式、帧列表、关键帧列表,以及所对应的时戳和在文件中的位置,这些信息,在mp4中,是以特定的算法分开存放在stbl box下属的几个box中的,需要解析stbl下面所有的box,来还原媒体信息。下表是对于以上几个重要的box存放信息的说明

通常放在MP4文件的开头,告诉解码器基本的解码版本和兼容格式。
- aligned(8) class FileTypeBox
- extends Box(‘ftyp’) {
- unsigned int(32) major_brand;
- unsigned int(32) minor_version;
- unsigned int(32) compatible_brands[];
- }

- ngx_int_t
- ngx_rtmp_mp4_write_ftyp(ngx_buf_t *b)
- {
- u_char *pos;
-
- pos = ngx_rtmp_mp4_start_box(b, "ftyp");
-
- /* major brand */
- ngx_rtmp_mp4_box(b, "iso6");
-
- /* minor version */
- ngx_rtmp_mp4_field_32(b, 1);
-
- /* compatible brands */
- ngx_rtmp_mp4_box(b, "isom");
- ngx_rtmp_mp4_box(b, "iso6");
- ngx_rtmp_mp4_box(b, "dash");
-
- ngx_rtmp_mp4_update_box_size(b, pos);
-
- return NGX_OK;
- }
作为容器盒子,存放相关的trak及meta信息.
aligned(8) class MovieExtendsBox extends Box(‘mvex’){ }
文章最后扫码可领取音视频免费学习资料,资料包括(C/C++,Linux,FFmpeg webRTC rtmp hls rtsp ffplay srs 等等)
3.2.1 Movie Header Box (mvhd)
mvhd 是 moov 下的第一个 box,用来描述 media 的相关信息:
- aligned(8) class MovieHeaderBox extends FullBox(‘mvhd’, version, 0) {
- if (version==1) {
- unsigned int(64) creation_time;
- unsigned int(64) modification_time;
- unsigned int(32) timescale;
- unsigned int(64) duration;
- } else { // version==0
- unsigned int(32) creation_time;
- unsigned int(32) modification_time;
- unsigned int(32) timescale;
- unsigned int(32) duration;
- }
-
- template int(32) rate = 0x00010000; // typically 1.0
- template int(16) volume = 0x0100; // typically, full volume
- const bit(16) reserved = 0;
- const unsigned int(32)[2] reserved = 0;
- template int(32)[9] matrix =
- { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 };
- // Unity matrix
- bit(32)[6] pre_defined = 0;
- unsigned int(32) next_track_ID;
- }

- static ngx_int_t
- ngx_rtmp_mp4_write_mvhd(ngx_buf_t *b)
- {
- u_char *pos;
-
- pos = ngx_rtmp_mp4_start_box(b, "mvhd");
-
- /* version */
- ngx_rtmp_mp4_field_32(b, 0);
-
- /* creation time */
- ngx_rtmp_mp4_field_32(b, 0);
-
- /* modification time */
- ngx_rtmp_mp4_field_32(b, 0);
-
- /* timescale */
- ngx_rtmp_mp4_field_32(b, 1000);
-
- /* duration */
- ngx_rtmp_mp4_field_32(b, 0);
-
- /* reserved */
- ngx_rtmp_mp4_field_32(b, 0x00010000);
- ngx_rtmp_mp4_field_16(b, 0x0100);
- ngx_rtmp_mp4_field_16(b, 0);
- ngx_rtmp_mp4_field_32(b, 0);
- ngx_rtmp_mp4_field_32(b, 0);
-
- ngx_rtmp_mp4_write_matrix(b, 1, 0, 0, 1, 0, 0);
-
- /* reserved */
- ngx_rtmp_mp4_field_32(b, 0);
- ngx_rtmp_mp4_field_32(b, 0);
- ngx_rtmp_mp4_field_32(b, 0);
- ngx_rtmp_mp4_field_32(b, 0);
- ngx_rtmp_mp4_field_32(b, 0);
- ngx_rtmp_mp4_field_32(b, 0);
-
- /* next track id */
- ngx_rtmp_mp4_field_32(b, 1);
-
- ngx_rtmp_mp4_update_box_size(b, pos);
-
- return NGX_OK;
- }
3.2.2 Movie Extends Box (mvex)(fMP4专有)
mvex 是 fMP4 的标准盒子。它的作用是告诉解码器这是一个fMP4的文件,具体的 samples 信息内容不再放到 trak 里面,而是在每一个 moof 中。基本格式为:
-
- aligned(8) class MovieExtendsHeaderBox extends FullBox(‘mehd’, version, 0) { if (version==1) {
- unsigned int(64) fragment_duration;
- } else { // version==0
- unsigned int(32) fragment_duration;
- }
- }
3.2.2.1 Track Extends Box (trex)(fMP4专有)
trex 是 mvex 的子一级 box 用来给 fMP4 的 sample 设置默认值。基本内容为
- aligned(8) class TrackExtendsBox extends FullBox(‘trex’, 0, 0){
- unsigned int(32) track_ID;
- unsigned int(32) default_sample_description_index;
- unsigned int(32) default_sample_duration;
- unsigned int(32) default_sample_size;
- unsigned int(32) default_sample_flags
- }
3.2.3 Track Box (trak)
trak box 就是主要存放相关 media stream 的内容。
3.2.3.1 Track Header Box (tkhd)
tkhd 是 trak box 的子一级 box 的内容。主要是用来描述该特定 trak 的相关内容信息。其主要内容为:
-
- aligned(8) class TrackHeaderBox
- extends FullBox(‘tkhd’, version, flags){
- if (version==1) {
- unsigned int(64) creation_time;
- unsigned int(64) modification_time;
- unsigned int(32) track_ID;
- const unsigned int(32) reserved = 0;
- unsigned int(64) duration;
- } else { // version==0
- unsigned int(32) creation_time;
- unsigned int(32) modification_time;
- unsigned int(32) track_ID;
- const unsigned int(32) reserved = 0;
- unsigned int(32) duration;
- }
-
-
- const unsigned int(32)[2] reserved = 0;
- template int(16) layer = 0;
- template int(16) alternate_group = 0;
- template int(16) volume = {if track_is_audio 0x0100 else 0}; const unsigned int(16) reserved = 0;
- template int(32)[9] matrix=
- { 0x00010000,0,0,0,0x00010000,0,0,0,0x40000000 };
- // unity matrix
- unsigned int(32) width;
- unsigned int(32) height;
- }

3.2.3.2 Media Box (media)
mdia 主要用来包裹相关的 media 信息。
(1) Media Header Box (mdhd)
- aligned(8) class MediaHeaderBox extends FullBox(‘mdhd’, version, 0) { if (version==1) {
- unsigned int(64) creation_time;
- unsigned int(64) modification_time;
- unsigned int(32) timescale;
- unsigned int(64) duration;
- } else { // version==0
- unsigned int(32) creation_time;
- unsigned int(32) modification_time;
- unsigned int(32) timescale;
- unsigned int(32) duration;
- }
-
- bit(1) pad = 0;
- unsigned int(5)[3] language; // ISO-639-2/T language code unsigned int(16) pre_defined = 0;
- }

(2) Handler Reference Box(hdlr)
-
- aligned(8) class HandlerBox extends FullBox(‘hdlr’, version = 0, 0) {
- unsigned int(32) pre_defined = 0;
- unsigned int(32) handler_type;
- const unsigned int(32)[3] reserved = 0;
- string name;
- }

- vide : Video track
- soun : Audio track
- hint : Hint track
- meta : Timed Metadata track
- auxv : Auxiliary Video track
3.2.3.3 Media Information Box (minf)
minf 是子属内容中,重要的容器 box,用来存放当前 track 的基本描述信息。
(1) Video Media Header Box(vmhd)
- aligned(8) class VideoMediaHeaderBox
- extends FullBox(‘vmhd’, version = 0, 1) {
- template unsigned int(16) graphicsmode = 0; // copy, see below
- template unsigned int(16)[3] opcolor = {0, 0, 0};
- }
(2) Sound Media Header Box(smhd)
- aligned(8) class SoundMediaHeaderBox
- extends FullBox(‘smhd’, version = 0, 0) {
- template int(16) balance = 0;
- const unsigned int(16) reserved = 0;
- }
(3) Data Information Box(dinf)
dinf 是用来说明在 trak 中,media 描述信息的位置。其实本身就是一个容器,没啥内容:
- aligned(8) class SoundMediaHeaderBox
- extends FullBox(‘smhd’, version = 0, 0) {
- template int(16) balance = 0;
- const unsigned int(16) reserved = 0;
- }
(4) Data Reference Box(dref)
dref 是用来设置当前Box描述信息的 data_entry。
- aligned(8) class DataReferenceBox
- extends FullBox(‘dref’, version = 0, 0) {
- unsigned int(32) entry_count;
-
- for (i=1; i <= entry_count; i++) {
- DataEntryBox(entry_version, entry_flags) data_entry; }
- }
