Super fast HLS trimming with frame accuracy

Description

This article is an evolution of this code: Trim HLS stream with frame accuracy using ffmpeg and Bash script

  • The implementation gets a group if HLS chunks (.ts) and based on input and output points (timestamps) it creates an mp4 output file frame accuracy trimmed, and with AV perfectly aligned.
  • Thanks to the underlying algorithm it does the trimming in a very small amount of time although the input file can be huge.
  • This code is Javascript (JS). Is a more understandable / organized than the previous one in Bash

Drawing1

Possible usages:

  • On the fly live streaming trimming (Live hightlights)
  • VOD frame accuracy trimming
  • Increase dramatically the speed of video editors (mostly cloud video editors), because it only decodes / encodes the chunk to trim NOT the rest of the file / chunks

Source code:

A minimal h264 “encoder” (C++)

Introduction

I always thought that the best approach to complex things is to start from the basis, and once you control this part enter step by step towards more complex parts. With this idea in mind and the aim to improve my understanding of h264 bitstream and its techniques I have created the following code in C++ that generates a compliant h264 file from a YUV420p file.

Simple block diagram

Figure 1: Simple block diagram

First of all I have to say that the following source code is NOT a h264 encoder but it generates a compliant (playable) h264 stream, this means that the size of output file will be slightly bigger than the size of the input file. This it is because I have used only I_PCM macroblock codification (non compressed format) to describe the images in h264 stream.

I think that to read and understand this code could be a good starting point to start flirting with the h264 standard, better than dive into the standard (more than 700 pages of dense text).

References

Input files

  • Size:
    • Multiple of 16 in height and width
  • Pixel format:
    • yuv420p
  • Frame rate:
    • Any
  • Aspect ratio:
    • Any

To create a compliant input file you can use ffmpeg, here you have 2 examples. The fisrt example convert any video to a YUV420p 128×96 file.

ffmpeg.exe -i anyvideo.avi -s 128x96 -pix_fmt yuv420p out.yuv

The second example generates a yuv420p blank (green bck) video file of 10secs, 128×96 pixels ,and  25fps (10s x 25fps  = 250 progressive YUV frames):

ffmpeg.exe -t 10 -s 128x96 -f rawvideo -pix_fmt yuv420p -r 25 -i /dev/zero

Using the h264 simple coder

To use this h264 basic coder is very easy, just follow these steps:

  1. Open a YUV420p format file
  2. Open the destination file
  3. Create an instance of CJOCh264encoder (passing the parameter of destination file pointer)
  4. Call IniCoder with the following parameters:
    • nImW: Frame width in pixels
    • nImH: Frame height in pixels
    • nFps: Desired frames per second of the output file (typical values are: 25, 30, 50, etc)
    • SampleFormat: Sample format of the input file. In this implementation only SAMPLE_FORMAT_YUV420p is allowed
    • nSARw Indicates the horizontal size of the sample aspect ratio (typical values are:1, 4, 16, etc)
    • nSARh Indicates the vertical size of the sample aspect ratio (typical values are:1, 3, 9, etc)
  5. Over all frames in the input file:
    • Get the frame data pointer calling GetFramePtr
    • Load a new frame from source file over the pointer returned by GetFramePtr
    • Call CodeAndSaveFrame to code the frame and save it to destination file
  6. Finally, call CloseCoder and close the opened files

If you compile the h264simpleCoder.cpp, you could call the resulting application using this expression:

h264simpleCoder AVsyncTest.yuv OutTest.h264 128 96 25 16 9

The following source code shows how to use the CJOCh264encoder class including error handling.


//============================================================================
// Name        : h264simpleCoder.cpp
// Author      : Jordi Cenzano (www.jordicenzano.name)
// Version     : 1.0
// Copyright   : Copyright Jordi Cenzano 2014
// Description : Simple h264 encoder
//============================================================================

#include <iostream>
#include "CJOCh264encoder.h"

using namespace std;

int main(int argc, char **argv)
{
	int nRc = 1;

	puts("Simple h264 coder by Jordi Cenzano (www.jordicenzano.name)");
	puts("This is NOT a video compressor, only uses I_PCM macroblocks (intra without compression)");
	puts("It is made only for learning purposes");
	puts("**********************************************************");

	if (argc < 3) 	
	{
 		puts("------------------------------------------------------------------------"); 		
		puts("Usage: h264mincoder input.yuv output.h264 [image width] [image height] [fps] [AR SARw] [AR SARh]"); 		
		puts("Default parameters: Image width=128 Image height=96 Fps=25 SARw=1 SARh=1"); 		
		puts("Assumptions: Input file is yuv420p"); 		
		puts("------------------------------------------------------------------------"); 		
		return nRc; 	
	} 	

	char szInputFile[512]; 	
	char szOutputFile[512]; 	
	int nImWidth = 128; 	
	int nImHeight = 96; 	
	int nFps = 25; 	
	int nSARw = 1; 	
	int nSARh = 1; 	

	//Get input file 	
	strncpy (szInputFile,argv[1],512); 	

	//Get output file 	
	strncpy (szOutputFile,argv[2],512); 	

	//Get image width 	
	if (argc > 3)
	{
		nImWidth = (int) atol (argv[3]);
		if (nImWidth == 0)
			puts ("Error reading image width input parameter");
	}

	//Get image height
	if (argc > 4)
	{
		nImHeight = (int) atol (argv[4]);
		if (nImHeight == 0)
			puts ("Error reading image height input parameter");
	}

	//Get fps
	if (argc > 5)
	{
		nFps = (int) atol (argv[5]);
		if (nFps == 0)
			puts ("Error reading fps input parameter");
	}

	//Get SARw
	if (argc > 6)
	{
		nSARw = (int) atol (argv[6]);
		if (nSARw == 0)
			puts ("Error reading AR SARw input parameter");
	}

	//Get SARh
	if (argc > 7)
	{
		nSARh = (int) atol (argv[7]);
		if (nSARh == 0)
			puts ("Error reading AR SARh input parameter");
	}

	FILE *pfsrc = NULL;
	FILE *pfdst = NULL;

	pfsrc = fopen (szInputFile,"rb");
	if (pfsrc == NULL)
	{
		puts ("Error opening source file");
		return nRc;
	}

	pfdst = fopen (szOutputFile,"wb");
	if (pfdst == NULL)
	{
		puts ("Error opening destination file");
		return nRc;
	}

	try
	{
		//Instantiate the h264 coder
		CJOCh264encoder *ph264encoder = new CJOCh264encoder(pfdst);

		//Initialize the h264 coder with frame parameters
		ph264encoder->IniCoder(nImWidth,nImHeight,nFps,CJOCh264encoder::SAMPLE_FORMAT_YUV420p, nSARw, nSARh);

		int nSavedFrames = 0;
		char szLog[256];

		//Iterate trough all frames
		while (! feof(pfsrc))
		{
			//Get frame pointer to fill
			void *pFrame = ph264encoder->GetFramePtr ();

			//Get the size allocated in pFrame
			unsigned int nFrameSize = ph264encoder->GetFrameSize();

			//Load frame from disk and load it into pFrame
			size_t nreaded = fread (pFrame,1, nFrameSize, pfsrc);
			if (nreaded != nFrameSize)
			{
				if (! feof(pfsrc))
					throw "Error: Reading frame";
			}
			else
			{
				//Encode & save frame
				ph264encoder->CodeAndSaveFrame();

				//Get the number of saved frames
				nSavedFrames = ph264encoder->GetSavedFrames();

				//Show the number of saved / encoded frames iin console
				sprintf(szLog,"Saved frame num: %d", nSavedFrames - 1);
				puts (szLog);
			}
		}

		//Close encoder
		ph264encoder->CloseCoder();

		//Set return code to 0
		nRc = 0;
	}
	catch (const char *szErrorDesc)
	{
		//Show the error description on console
		puts (szErrorDesc);
	}

	//Close previously opened files
	if (pfsrc != NULL)
		fclose (pfsrc);

	if (pfdst != NULL)
		fclose (pfdst);

	return nRc;
}

Classes

The implementation of this h264 minimal encoder it is based in two classes:

  • CJOCh264bitstream
    • It contains useful functions to create the bit oriented stream, it has an exp Golomb coder as well.
  • CJOCh264encoder : CJOCh264bitstream
    • It derives from CJOCh264Bitstream and it contains the h264 oriented functions.
The CJOCh264bitstream class with its public functions

Figure 2: The CJOCh264bitstream class with its public functions

The CJOCh264encoder class with its public functions

Figure 3: The CJOCh264encoder class with its public functions

Source code

  • I have tried to make a readable code including comments and keeping a logical order of functions.
  • You can see and download the latest version of  the source code of this “experiment” from this github link: h264simpleCoder
    • You will find the following files: h264simpleCoder.cpp, CJOCh264bitstream.h, CJOCh264bitstream.cpp, CJOCh264encoder.h, CJOCh264encoder.cpp, makefile
  • In the following sections you can see the source code of CJOCh264bitstream and CJOCh264encoder classes

CJOCh264bitstream (.h and .cpp)

/*
 * CJOCh264bitstream.h
 *
 *  Created on: Aug 23, 2014
 *      Author: Jordi Cenzano (www.jordicenzano.name)
 */

#ifndef CJOCH264BITSTREAM_H_
#define CJOCH264BITSTREAM_H_

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>

//! h264 bitstream class
/*!
 It is used to create the h264 bit oriented stream, it contains different functions that helps you to create the h264 compliant stream (bit oriented, exp golomb coder)
 */
class CJOCh264bitstream
{
#define BUFFER_SIZE_BITS 24			/*! Buffer size in bits used for emulation prevention */
#define BUFFER_SIZE_BYTES (24/8)	/*! Buffer size in bytes used for emulation prevention */

#define H264_EMULATION_PREVENTION_BYTE 0x03		/*! Emulation prevention byte */

private:

	/*! Buffer  */
	unsigned char m_buffer[BUFFER_SIZE_BITS];

	/*! Bit buffer index  */
	unsigned int m_nLastbitinbuffer;

	/*! Starting byte indicator  */
	unsigned int m_nStartingbyte;

	/*! Pointer to output file */
	FILE *m_pOutFile;

	//! Clears the buffer
	void clearbuffer();

	//! Returns the nNumbit value (1 or 0) of lval
	/*!
	 	 \param lval number to extract the nNumbit value
	 	 \param nNumbit Bit position that we want to know if its 1 or 0 (from 0 to 63)
	 	 \return bit value (1 or 0)
	 */
	static int getbitnum (unsigned long lval, int nNumbit);

	//! Adds 1 bit to the end of h264 bitstream
	/*!
		 \param nVal bit to add at the end of h264 bitstream
	 */
	void addbittostream (int nVal);

	//! Adds 8 bit to the end of h264 bitstream (it is optimized for byte aligned situations)
	/*!
		 \param nVal byte to add at the end of h264 bitstream (from 0 to 255)
	 */
	void addbytetostream (int nVal);

	//! Save all buffer to file
	/*!
		 \param bemulationprevention Indicates if it will insert the emulation prevention byte or not (when it is needed)
	 */
	void savebufferbyte(bool bemulationprevention = true);

public:
	//! Constructor
	/*!
		 \param pOutBinaryFile The output file pointer
	 */
	CJOCh264bitstream(FILE *pOutBinaryFile);

	//! Destructor
	virtual ~CJOCh264bitstream();

	//! Add 4 bytes to h264 bistream without taking into acount the emulation prevention. Used to add the NAL header to the h264 bistream
	/*!
		 \param nVal The 32b value to add
		 \param bDoAlign Indicates if the function will insert 0 in order to create a byte aligned stream before adding nVal 4 bytes to stream. If you try to call this function and the stream is not byte aligned an exception will be thrown
	 */
	void add4bytesnoemulationprevention (unsigned int nVal, bool bDoAlign = false);

	//! Adds nNumbits of lval to the end of h264 bitstream
	/*!
		 \param nVal value to add at the end of the h264 stream (only the LAST nNumbits will be added)
		 \param nNumbits number of bits of lval that will be added to h264 stream (counting from left)
	 */
	void addbits (unsigned long lval, int nNumbits);

	//! Adds lval to the end of h264 bitstream using exp golomb coding for unsigned values
	/*!
		 \param nVal value to add at the end of the h264 stream
	 */
	void addexpgolombunsigned (unsigned long lval);

	//! Adds lval to the end of h264 bitstream using exp golomb coding for signed values
	/*!
		 \param nVal value to add at the end of the h264 stream
	 */
	void addexpgolombsigned (long lval);

	//! Adds 0 to the end of h264 bistream in order to leave a byte aligned stream (It will insert seven 0 maximum)
	void dobytealign();

	//! Adds cByte (8 bits) to the end of h264 bitstream. This function it is optimized in byte aligned streams.
	/*!
		 \param cByte value to add at the end of the h264 stream (from 0 to 255)
	 */
	void addbyte (unsigned char cByte);

	//! Close the h264 stream saving to disk the last remaing bits in buffer
	void close();
};

#endif /* CJOCH264BITSTREAM_H_ */

/*
 * CJOCh264bitstream.cpp
 *
 *  Created on: Aug 23, 2014
 *      Author: Jordi Cenzano (www.jordicenzano.name)
 */

#include "CJOCh264bitstream.h"

CJOCh264bitstream::CJOCh264bitstream(FILE *pOutBinaryFile)
{
	clearbuffer();

	m_pOutFile = pOutBinaryFile;
}

CJOCh264bitstream::~CJOCh264bitstream()
{
	close();
}

void CJOCh264bitstream::clearbuffer()
{
	memset (&m_buffer,0,sizeof (unsigned char)* BUFFER_SIZE_BITS);
	m_nLastbitinbuffer = 0;
	m_nStartingbyte = 0;
}

int CJOCh264bitstream::getbitnum (unsigned long lval, int nNumbit)
{
	int lrc = 0;

	unsigned long lmask = (unsigned long) pow((unsigned long)2,(unsigned long)nNumbit);
	if ((lval & lmask) > 0)
		lrc = 1;

	return lrc;
}

void CJOCh264bitstream::addbittostream (int nVal)
{
	if (m_nLastbitinbuffer >= BUFFER_SIZE_BITS)
	{
		//Must be aligned, no need to do dobytealign();
		savebufferbyte();
	}

	//Use circular buffer of BUFFER_SIZE_BYTES
	int nBytePos = (m_nStartingbyte + (m_nLastbitinbuffer / 8)) % BUFFER_SIZE_BYTES;
	//The first bit to add is on the left
	int nBitPosInByte = 7 - m_nLastbitinbuffer % 8;

	//Get the byte value from buffer
	int nValTmp = m_buffer[nBytePos];

	//Change the bit
	if (nVal > 0)
		nValTmp = (nValTmp | (int) pow(2,nBitPosInByte));
	else
		nValTmp = (nValTmp & ~((int) pow(2,nBitPosInByte)));

	//Save the new byte value to the buffer
	m_buffer[nBytePos] = (unsigned char) nValTmp;

	m_nLastbitinbuffer++;
}

void CJOCh264bitstream::addbytetostream (int nVal)
{
	if (m_nLastbitinbuffer >= BUFFER_SIZE_BITS)
	{
		//Must be aligned, no need to do dobytealign();
		savebufferbyte();
	}

	//Used circular buffer of BUFFER_SIZE_BYTES
	int nBytePos = (m_nStartingbyte + (m_nLastbitinbuffer / 8)) % BUFFER_SIZE_BYTES;
	//The first bit to add is on the left
	int nBitPosInByte = 7 - m_nLastbitinbuffer % 8;

	//Check if it is byte aligned
	if (nBitPosInByte != 7)
		throw "Error: inserting not aligment byte";

	//Add all byte to buffer
	m_buffer[nBytePos] = (unsigned char) nVal;

	m_nLastbitinbuffer = m_nLastbitinbuffer + 8;
}

void CJOCh264bitstream::dobytealign()
{
	//Check if the last bit in buffer is multiple of 8
	int nr = m_nLastbitinbuffer % 8;
	if ((nr % 8) != 0)
		m_nLastbitinbuffer = m_nLastbitinbuffer + (8 - nr);
}

void CJOCh264bitstream::savebufferbyte(bool bemulationprevention)
{
	bool bemulationpreventionexecuted = false;

	if (m_pOutFile == NULL)
		throw "Error: out file is NULL";

	//Check if the last bit in buffer is multiple of 8
	if ((m_nLastbitinbuffer % 8)  != 0)
		throw "Error: Save to file must be byte aligned";

	if ((m_nLastbitinbuffer / 8) <= 0)
		throw "Error: NO bytes to save";

	if (bemulationprevention == true)
	{
		//Emulation prevention will be used:
		/*As per h.264 spec,
		rbsp_data shouldn't contain
				- 0x 00 00 00
				- 0x 00 00 01
				- 0x 00 00 02
				- 0x 00 00 03

		rbsp_data shall be in the following way
				- 0x 00 00 03 00
				- 0x 00 00 03 01
				- 0x 00 00 03 02
				- 0x 00 00 03 03
		*/

		//Check if emulation prevention is needed (emulation prevention is byte align defined)
		if ((m_buffer[((m_nStartingbyte + 0) % BUFFER_SIZE_BYTES)] == 0x00)&&(m_buffer[((m_nStartingbyte + 1) % BUFFER_SIZE_BYTES)] == 0x00)&&((m_buffer[((m_nStartingbyte + 2) % BUFFER_SIZE_BYTES)] = 0x00)||(m_buffer[((m_nStartingbyte + 2) % BUFFER_SIZE_BYTES)] = 0x01)||(m_buffer[((m_nStartingbyte + 2) % BUFFER_SIZE_BYTES)] = 0x02)||(m_buffer[((m_nStartingbyte + 2) % BUFFER_SIZE_BYTES)] = 0x03)))
		{
			int nbuffersaved = 0;
			unsigned char cEmulationPreventionByte = H264_EMULATION_PREVENTION_BYTE;

			//Save 1st byte
			fwrite(&m_buffer[((m_nStartingbyte + nbuffersaved) % BUFFER_SIZE_BYTES)], 1, 1, m_pOutFile);
			nbuffersaved ++;

			//Save 2st byte
			fwrite(&m_buffer[((m_nStartingbyte + nbuffersaved) % BUFFER_SIZE_BYTES)], 1, 1, m_pOutFile);
			nbuffersaved ++;

			//Save emulation prevention byte
			fwrite(&cEmulationPreventionByte, 1, 1, m_pOutFile);

			//Save the rest of bytes (usually 1)
			while (nbuffersaved < BUFFER_SIZE_BYTES)
			{
				fwrite(&m_buffer[((m_nStartingbyte + nbuffersaved) % BUFFER_SIZE_BYTES)], 1, 1, m_pOutFile);
				nbuffersaved++;
			}

			//All bytes in buffer are saved, so clear the buffer
			clearbuffer();

			bemulationpreventionexecuted = true;
		}
	}

	if (bemulationpreventionexecuted == false)
	{
		//No emulation prevention was used

		//Save the oldest byte in buffer
		fwrite(&m_buffer[m_nStartingbyte], 1, 1, m_pOutFile);

		//Move the index
		m_buffer[m_nStartingbyte] = 0;
		m_nStartingbyte++;
		m_nStartingbyte = m_nStartingbyte % BUFFER_SIZE_BYTES;
		m_nLastbitinbuffer = m_nLastbitinbuffer - 8;
	}
}

//Public functions

void CJOCh264bitstream::addbits (unsigned long lval, int nNumbits)
{
	if ((nNumbits <= 0 )||(nNumbits > 64))
		throw "Error: numbits must be between 1 ... 64";

	int nBit = 0;
	int n = nNumbits-1;
	while (n >= 0)
	{
		nBit = getbitnum (lval,n);
		n--;

		addbittostream (nBit);
	}
}

void CJOCh264bitstream::addbyte (unsigned char cByte)
{
	//Byte alignment optimization
	if ((m_nLastbitinbuffer % 8)  == 0)
	{
		addbytetostream (cByte);
	}
	else
	{
		addbits (cByte, 8);
	}
}

void CJOCh264bitstream::addexpgolombunsigned (unsigned long lval)
{
	//it implements unsigned exp golomb coding

	unsigned long lvalint = lval + 1;
	int nnumbits = log2 (lvalint) + 1;

	for (int n = 0; n < (nnumbits-1); n++)
		addbits(0,1);

	addbits(lvalint,nnumbits);
}

void CJOCh264bitstream::addexpgolombsigned (long lval)
{
	//it implements a signed exp golomb coding

	unsigned long lvalint = abs(lval) * 2 - 1;
	if (lval <= 0) 		
		lvalint =  2 * abs(lval); 	

	addexpgolombunsigned(lvalint); 
} 

void CJOCh264bitstream::add4bytesnoemulationprevention (unsigned int nVal, bool bDoAlign) 
{ 	
	//Used to add NAL header stream 	
	//Remember: NAL header is byte oriented 	
	if (bDoAlign == true) 		
		dobytealign(); 	

	if ((m_nLastbitinbuffer % 8) != 0) 		
		throw "Error: Save to file must be byte aligned"; 	

	while (m_nLastbitinbuffer != 0) 		
		savebufferbyte(); 	

	unsigned char cbyte = (nVal & 0xFF000000)>>24;
	fwrite(&cbyte, 1, 1, m_pOutFile);

	cbyte = (nVal & 0x00FF0000)>>16;
	fwrite(&cbyte, 1, 1, m_pOutFile);

	cbyte = (nVal & 0x0000FF00)>>8;
	fwrite(&cbyte, 1, 1, m_pOutFile);

	cbyte = (nVal & 0x000000FF);
	fwrite(&cbyte, 1, 1, m_pOutFile);
}

void CJOCh264bitstream::close()
{
	//Flush the data in stream buffer

	dobytealign();

	while (m_nLastbitinbuffer != 0)
		savebufferbyte();
}

CJOCh264encoder (.h and .cpp)


/*
 * CJOCh264encoder.h
 *
 *  Created on: Aug 17, 2014
 *      Author: Jordi Cenzano (www.jordicenzano.name)
 */

#ifndef CJOCH264ENCODER_H_
#define CJOCH264ENCODER_H_

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "CJOCh264bitstream.h"

//! h264 encoder class
/*!
 It is used to create the h264 compliant stream
 */
class CJOCh264encoder : CJOCh264bitstream
{
public:

	/**
	 * Allowed sample formats
	 */
	enum enSampleFormat
	{
		SAMPLE_FORMAT_YUV420p//!< SAMPLE_FORMAT_YUV420p
	};

private:
	/*!Set the used Y macroblock size for I PCM in YUV420p */
	#define MACROBLOCK_Y_WIDTH	16
	#define MACROBLOCK_Y_HEIGHT	16

	/*!Set time base in Hz */
	#define TIME_SCALE_IN_HZ 	27000000

	/*!Pointer to pixels */
	typedef struct
	{
		unsigned char *pYCbCr;
	}YUV420p_frame_t;

	/*! Frame  */
	typedef struct
	{
		enSampleFormat sampleformat; 	/*!< Sample format */
		unsigned int nYwidth; 			/*!< Y (luminance) block width in pixels */
		unsigned int nYheight;			/*!< Y (luminance) block height in pixels */
		unsigned int nCwidth;			/*!< C (Crominance) block width in pixels */
		unsigned int nCheight;			/*!< C (Crominance) block height in pixels */

		unsigned int nYmbwidth;			/*!< Y (luminance) macroblock width in pixels */
		unsigned int nYmbheight;		/*!< Y (luminance) macroblock height in pixels */
		unsigned int nCmbwidth;			/*!< Y (Crominance) macroblock width in pixels */
		unsigned int nCmbheight;		/*!< Y (Crominance) macroblock height in pixels */

		YUV420p_frame_t yuv420pframe;	/*!< Pointer to current frame data */
		unsigned int nyuv420pframesize;	/*!< Size in bytes of yuv420pframe */
	}frame_t;

	/*! The frame var*/
	frame_t m_frame;

	/*! The frames per second var*/
	int m_nFps;

	/*! Number of frames sent to the output */
	unsigned long m_lNumFramesAdded;

	//! Frees the frame yuv420pframe allocated memory
	void free_video_src_frame ();

	//! Allocs the frame yuv420pframe memory according to the frame properties
	void alloc_video_src_frame ();

	//! Creates SPS NAL and add it to the output
	/*!
		\param nImW Frame width in pixels
		\param nImH Frame height in pixels
		\param nMbW macroblock width in pixels
		\param nMbH macroblock height in pixels
		\param nFps frames x second (tipical values are: 25, 30, 50, etc)
		\param nSARw Indicates the horizontal size of the sample aspect ratio (tipical values are:1, 4, 16, etc)
		\param nSARh Indicates the vertical size of the sample aspect ratio (tipical values are:1, 3, 9, etc)
	 */
	void create_sps (int nImW, int nImH, int nMbW, int nMbH, int nFps, int nSARw, int nSARh);

	//! Creates PPS NAL and add it to the output
	void create_pps ();

	//! Creates Slice NAL and add it to the output
	/*!
		\param lFrameNum number of frame
	 */
	void create_slice_header(unsigned long lFrameNum);

	//! Creates macroblock header and add it to the output
	void create_macroblock_header ();

	//! Creates the slice footer and add it to the output
	void create_slice_footer();

	//! Creates SPS NAL and add it to the output
	/*!
		\param nYpos First vertical macroblock pixel inside the frame
		\param nYpos nXpos horizontal macroblock pixel inside the frame
	 */
	void create_macroblock(unsigned int nYpos, unsigned int nXpos);

public:
	//! Constructor
	/*!
		 \param pOutFile The output file pointer
	 */
	CJOCh264encoder(FILE *pOutFile);

	//! Destructor
	virtual ~CJOCh264encoder();

	//! Initializes the coder
	/*!
		\param nImW Frame width in pixels
		\param nImH Frame height in pixels
		\param nFps Desired frames per second of the output file (typical values are: 25, 30, 50, etc)
		\param SampleFormat Sample format if the input file. In this implementation only SAMPLE_FORMAT_YUV420p is allowed
		\param nSARw Indicates the horizontal size of the sample aspect ratio (typical values are:1, 4, 16, etc)
		\param nSARh Indicates the vertical size of the sample aspect ratio (typical values are:1, 3, 9, etc)
	*/
	void IniCoder (int nImW, int nImH, int nImFps, CJOCh264encoder::enSampleFormat SampleFormat, int nSARw = 1, int nSARh = 1);

	//! Returns the frame pointer
	/*!
		\return Frame pointer ready to fill with frame pixels data (the format to fill the data is indicated by SampleFormat parameter when the coder is initialized
	*/
	void* GetFramePtr();

	//! Returns the allocated frame memory in bytes
	/*!
		\return The allocated memory to store the frame data
	*/
	unsigned int GetFrameSize();

	//! It codes the frame that is in frame memory a it saves the coded data to disc
	void CodeAndSaveFrame();

	//! Returns number of coded frames
	/*!
		\return The number of coded frames
	*/
	unsigned long GetSavedFrames();

	//! Flush all data and save the trailing bits
	void CloseCoder ();
};

#endif /* CJOCH264ENCODER_H_ */

/*
 * CJOCh264encoder.cpp
 *
 *  Created on: Aug 17, 2014
 *      Author: Jordi Cenzano (www.jordicenzano.name)
 */

#include "CJOCh264encoder.h"

//Private functions

//Contructor
CJOCh264encoder::CJOCh264encoder(FILE *pOutFile):CJOCh264bitstream(pOutFile)
{
	m_lNumFramesAdded = 0;

	memset (&m_frame, 0, sizeof (frame_t));
	m_nFps = 25;
}

//Destructor
CJOCh264encoder::~CJOCh264encoder()
{
	free_video_src_frame ();
}

//Free the allocated video frame mem
void CJOCh264encoder::free_video_src_frame ()
{
	if (m_frame.yuv420pframe.pYCbCr != NULL)
		free (m_frame.yuv420pframe.pYCbCr);

	memset (&m_frame, 0, sizeof (frame_t));
}

//Alloc mem to store a video frame
void CJOCh264encoder::alloc_video_src_frame ()
{
	if (m_frame.yuv420pframe.pYCbCr != NULL)
		throw "Error: null values in frame";

	int nYsize = m_frame.nYwidth * m_frame.nYheight;
	int nCsize = m_frame.nCwidth * m_frame.nCheight;
	m_frame.nyuv420pframesize = nYsize + nCsize + nCsize;

	m_frame.yuv420pframe.pYCbCr = (unsigned char*) malloc (sizeof (unsigned char) * m_frame.nyuv420pframesize);

	if (m_frame.yuv420pframe.pYCbCr == NULL)
		throw "Error: memory alloc";
}

//Creates and saves the NAL SPS (including VUI) (one per file)
void CJOCh264encoder::create_sps (int nImW, int nImH, int nMbW, int nMbH, int nFps, int nSARw, int nSARh)
{
	add4bytesnoemulationprevention (0x000001); // NAL header
	addbits (0x0,1); // forbidden_bit
	addbits (0x3,2); // nal_ref_idc
	addbits (0x7,5); // nal_unit_type : 7 ( SPS )
	addbits (0x42,8); // profile_idc = baseline ( 0x42 )
	addbits (0x0,1); // constraint_set0_flag
	addbits (0x0,1); // constraint_set1_flag
	addbits (0x0,1); // constraint_set2_flag
	addbits (0x0,1); // constraint_set3_flag
	addbits (0x0,1); // constraint_set4_flag
	addbits (0x0,1); // constraint_set5_flag
	addbits (0x0,2); // reserved_zero_2bits /* equal to 0 */
	addbits (0x0a,8); // level_idc: 3.1 (0x0a)
	addexpgolombunsigned(0); // seq_parameter_set_id
	addexpgolombunsigned(0); // log2_max_frame_num_minus4
	addexpgolombunsigned(0); // pic_order_cnt_type
	addexpgolombunsigned(0); // log2_max_pic_order_cnt_lsb_minus4
	addexpgolombunsigned(0); // max_num_refs_frames
	addbits(0x0,1); // gaps_in_frame_num_value_allowed_flag

	int nWinMbs = nImW / nMbW;
	addexpgolombunsigned(nWinMbs-1); // pic_width_in_mbs_minus_1
	int nHinMbs = nImH / nMbH;
	addexpgolombunsigned(nHinMbs-1); // pic_height_in_map_units_minus_1

	addbits(0x1,1); // frame_mbs_only_flag
	addbits(0x0,1); // direct_8x8_interfernce
	addbits(0x0,1); // frame_cropping_flag
	addbits(0x1,1); // vui_parameter_present

	//VUI parameters (AR, timming)
	addbits(0x1,1); //aspect_ratio_info_present_flag
	addbits(0xFF,8); //aspect_ratio_idc = Extended_SAR

	//AR
	addbits(nSARw, 16); //sar_width
	addbits(nSARh, 16); //sar_height

	addbits(0x0,1); //overscan_info_present_flag
	addbits(0x0,1); //video_signal_type_present_flag
	addbits(0x0,1); //chroma_loc_info_present_flag
	addbits(0x1,1); //timing_info_present_flag

	unsigned int nnum_units_in_tick = TIME_SCALE_IN_HZ / (2*nFps);
	addbits(nnum_units_in_tick,32); //num_units_in_tick
	addbits(TIME_SCALE_IN_HZ,32); //time_scale
	addbits(0x1,1);  //fixed_frame_rate_flag

	addbits(0x0,1);  //nal_hrd_parameters_present_flag
	addbits(0x0,1);  //vcl_hrd_parameters_present_flag
	addbits(0x0,1);  //pic_struct_present_flag
	addbits(0x0,1);  //bitstream_restriction_flag
	//END VUI

	addbits(0x0,1); // frame_mbs_only_flag
	addbits(0x1,1); // rbsp stop bit

	dobytealign();
}

//Creates and saves the NAL PPS (one per file)
void CJOCh264encoder::create_pps ()
{
	add4bytesnoemulationprevention (0x000001); // NAL header
	addbits (0x0,1); // forbidden_bit
	addbits (0x3,2); // nal_ref_idc
	addbits (0x8,5); // nal_unit_type : 8 ( PPS )
	addexpgolombunsigned(0); // pic_parameter_set_id
	addexpgolombunsigned(0); // seq_parameter_set_id
	addbits (0x0,1); // entropy_coding_mode_flag
	addbits (0x0,1); // bottom_field_pic_order_in frame_present_flag
	addexpgolombunsigned(0); // nun_slices_groups_minus1
	addexpgolombunsigned(0); // num_ref_idx10_default_active_minus
	addexpgolombunsigned(0); // num_ref_idx11_default_active_minus
	addbits (0x0,1); // weighted_pred_flag
	addbits (0x0,2); // weighted_bipred_idc
	addexpgolombsigned(0); // pic_init_qp_minus26
	addexpgolombsigned(0); // pic_init_qs_minus26
	addexpgolombsigned(0); // chroma_qp_index_offset
	addbits (0x0,1); //deblocking_filter_present_flag
	addbits (0x0,1); // constrained_intra_pred_flag
	addbits (0x0,1); //redundant_pic_ent_present_flag
	addbits(0x1,1); // rbsp stop bit

	dobytealign();
}

//Creates and saves the NAL SLICE (one per frame)
void CJOCh264encoder::create_slice_header(unsigned long lFrameNum)
{
	add4bytesnoemulationprevention (0x000001); // NAL header
	addbits (0x0,1); // forbidden_bit
	addbits (0x3,2); // nal_ref_idc
	addbits (0x5,5); // nal_unit_type : 5 ( Coded slice of an IDR picture  )
	addexpgolombunsigned(0); // first_mb_in_slice
	addexpgolombunsigned(7); // slice_type
	addexpgolombunsigned(0); // pic_param_set_id

	unsigned char cFrameNum = lFrameNum % 16; //(2⁴)
	addbits (cFrameNum,4); // frame_num ( numbits = v = log2_max_frame_num_minus4 + 4)

	unsigned long lidr_pic_id = lFrameNum % 512;
	//idr_pic_flag = 1
	addexpgolombunsigned(lidr_pic_id); // idr_pic_id
	addbits(0x0,4); // pic_order_cnt_lsb (numbits = v = log2_max_fpic_order_cnt_lsb_minus4 + 4)
	addbits(0x0,1); //no_output_of_prior_pics_flag
	addbits(0x0,1); //long_term_reference_flag
	addexpgolombsigned(0); //slice_qp_delta

	//Probably NOT byte aligned!!!
}

//Creates and saves the SLICE footer (one per SLICE)
void CJOCh264encoder::create_slice_footer()
{
	addbits(0x1,1); // rbsp stop bit
}

//Creates and saves the macroblock header(one per macroblock)
void CJOCh264encoder::create_macroblock_header ()
{
	addexpgolombunsigned(25); // mb_type (I_PCM)
}

//Creates & saves a macroblock (coded INTRA 16x16)
void CJOCh264encoder::create_macroblock(unsigned int nYpos, unsigned int nXpos)
{
	unsigned int x,y;

	create_macroblock_header();

	dobytealign();

	//Y
	unsigned int nYsize = m_frame.nYwidth * m_frame.nYheight;
	for(y = nYpos * m_frame.nYmbheight; y < (nYpos+1) * m_frame.nYmbheight; y++)
	{
		for (x = nXpos * m_frame.nYmbwidth; x < (nXpos+1) * m_frame.nYmbwidth; x++)
		{
			addbyte (m_frame.yuv420pframe.pYCbCr[(y * m_frame.nYwidth +  x)]);
		}
	}

	//Cb
	unsigned int nCsize = m_frame.nCwidth * m_frame.nCheight;
	for(y = nYpos * m_frame.nCmbheight; y < (nYpos+1) * m_frame.nCmbheight; y++)
	{
		for (x = nXpos * m_frame.nCmbwidth; x < (nXpos+1) * m_frame.nCmbwidth; x++)
		{
			addbyte(m_frame.yuv420pframe.pYCbCr[nYsize + (y * m_frame.nCwidth +  x)]);
		}
	}

	//Cr
	for(y = nYpos * m_frame.nCmbheight; y < (nYpos+1) * m_frame.nCmbheight; y++)
	{
		for (x = nXpos * m_frame.nCmbwidth; x < (nXpos+1) * m_frame.nCmbwidth; x++) 		
		{ 			
			addbyte(m_frame.yuv420pframe.pYCbCr[nYsize + nCsize + (y * m_frame.nCwidth +  x)]); 		
		} 	
	} 
} 

//public functions 

//Initilizes the h264 coder (mini-coder) 
void CJOCh264encoder::IniCoder (int nImW, int nImH, int nImFps, CJOCh264encoder::enSampleFormat SampleFormat, int nSARw, int nSARh) 
{ 	
	m_lNumFramesAdded = 0; 	
	if (SampleFormat != SAMPLE_FORMAT_YUV420p) 		
		throw "Error: SAMPLE FORMAT not allowed. Only yuv420p is allowed in this version"; 	

	free_video_src_frame (); 	

	//Ini vars 	
	m_frame.sampleformat = SampleFormat; 	
	m_frame.nYwidth = nImW; 	
	m_frame.nYheight = nImH; 	
	if (SampleFormat == SAMPLE_FORMAT_YUV420p) 	
	{ 		
		//Set macroblock Y size 		
		m_frame.nYmbwidth = MACROBLOCK_Y_WIDTH; 		
		m_frame.nYmbheight = MACROBLOCK_Y_HEIGHT; 		
		//Set macroblock C size (in YUV420 is 1/2 of Y) 		
		m_frame.nCmbwidth = MACROBLOCK_Y_WIDTH/2; 		
		m_frame.nCmbheight = MACROBLOCK_Y_HEIGHT/2; 		
		//Set C size 		
		m_frame.nCwidth = m_frame.nYwidth / 2; 		
		m_frame.nCheight = m_frame.nYheight / 2; 		
		//In this implementation only picture sizes multiples of macroblock size (16x16) are allowed 		
		if (((nImW % MACROBLOCK_Y_WIDTH) != 0)||((nImH % MACROBLOCK_Y_HEIGHT) != 0)) 			
			throw "Error: size not allowed. Only multiples of macroblock are allowed (macroblock size is: 16x16)"; 	
	} 	

	m_nFps = nImFps; 	

	//Alloc mem for 1 frame 	
	alloc_video_src_frame (); 	

	//Create h264 SPS & PPS 	
	create_sps (m_frame.nYwidth , m_frame.nYheight, m_frame.nYmbwidth, m_frame.nYmbheight, nImFps , nSARw, nSARh); 	
	create_pps (); 
} 

//Returns the frame pointer to load the video frame 
void* CJOCh264encoder::GetFramePtr() 
{ 	
	if (m_frame.yuv420pframe.pYCbCr == NULL) 		
		throw "Error: video frame is null (not initialized)"; 	

	return (void*) m_frame.yuv420pframe.pYCbCr; 
} 

//Returns the the allocated size for video frame 
unsigned int CJOCh264encoder::GetFrameSize() 
{ 	
	return m_frame.nyuv420pframesize; 
} 

//Codifies & save the video frame (it only uses 16x16 intra PCM -> NO COMPRESSION!)
void CJOCh264encoder::CodeAndSaveFrame()
{
	//The slice header is not byte aligned, so the first macroblock header is not byte aligned
	create_slice_header (m_lNumFramesAdded);

	//Loop over macroblock size
	unsigned int y,x;
	for (y = 0; y < m_frame.nYheight / m_frame.nYmbheight; y++)
	{
		for (x = 0; x < m_frame.nYwidth / m_frame.nYmbwidth; x++)
		{
			create_macroblock(y, x);
		}
	}

	create_slice_footer();
	dobytealign();

	m_lNumFramesAdded++;
}

//Returns the number of codified frames
unsigned long CJOCh264encoder::GetSavedFrames()
{
	return m_lNumFramesAdded;
}

//Closes the h264 coder saving the last bits in the buffer
void CJOCh264encoder::CloseCoder ()
{
	close();
}

 

Future work

  • Evolve the code implementing h264 intra frame compression techniques, such as intra prediction, CAVLC, etc.
  • Implement h264 inter frame compression techniques, such block matching

The audio video alignment nightmare in audio compressed formats

Introduction

Doing some research about automatic editing systems I have found strange timing behaviors in audio compressed streams and I wanted to take some measures, and here they are:

Audio tests – Audacity

  • Using Audacity  2.0.5 (free audio editor) I generated a test time line, see figure 1
Audacity audio test time line

Figure 1: Audacity audio test time line

  • After that I have exported the test timeline to 3 different formats:
    • PCM
    • MP3 (using integrated coder)
    • AAC (using integrated ffmpeg)
  • I have imported the previously exported files into Audacity (see figure 2 and 3) and you can see the results in the following table:
    Audacity – Test nun Export coder Delay respect original [ms] Obs
    1 PCM 0
    2 MP3 +51
    3 AAC +24

 

Original and exported->imported tracks in Audition

Figure 3:  Original and exported->imported tracks in Audacity

Zoom of the original and exported->imported tracks in Audition

Figure 2: Zoom of the original and exported->imported tracks in Audition

Audio tests – Adobe audition CC

  • I have repeated the same test that I have done with Audacity with Adobe Audition CC version: July2014
  • The result is the following (see Figure 3 and figure 4):
    Audition – Test nun Export coder Delay respect original [ms] Obs
    1 PCM 0
    2 MP3 0
    3 AAC -21  It has lost the first 21ms

     

Original and exported->imported tracks in Audition

Figure 3: Original and exported->imported tracks in Audition

Zoom of original and exported->imported tracks in Audition

Figure 4: Zoom of original and exported->imported tracks in Audition

 

Audio-video test – Adobe Premiere

Audio-Video alignment test timeline

Figure 5: Audio-Video alignment test timeline

  • After that I have exported the test timeline to 3 different formats:
    • MXF OP1a (Video: AVCI50 720p50, Audio: PCM 48KHz 16b)
    • MP4-AAC (Video: H264 main 4.1 VBR 10Mbps 720p50 ,Audio: AAC 320Kbps 48KHz)
    • MP4-MPEG (Video: H264 main 4.1 VBR 10Mbps 720p50 ,Audio: MPEG1 L2 320Kbps 48KHz
  • Finally I have imported the previously exported media files into premiere, and you can see the results in the following table:
Premiere- Test nun Export coder Audio delay respect video (or original) [ms] Obs
1 MXF OP1a 0  Non-compressed audio format
2 MP4-AAC +10
3 MP4-MPEG +10

 

Adobe Premiere CC timelines comparison (original and exported -> Imported media: MP4-AAC and AAC-MPEG)

Figure 6: Adobe Premiere CC timelines comparison (original and exported -> Imported media: MP4-AAC and AAC-MPEG)

 

Audio-video test – Final cut 7

  • I have exported the test timeline to:
    • MOV (Video: XDCAM EX 720p50 35Mbps VBR, Audio: PCM 48KHz 16b)
    • MOV (Video: MPEG-4 6.4Mbps 720p50 ,Audio: AAC-LC 320Kbps 48KHz)
  • The results are:
FinalCut7- Test nun Export coder Audio delay respect video (or original) [ms] Obs
1 MOV-XDCAM, PCM 0 Non-compressed audio format
2 MOV-MPEG4, AAC-LC 0

 

Final Cut 7 timelines comparison (original and exported -> Imported media: MOV XDCAM-PCM, and MOV MPEG4-AAC-LC)

Figure 7: Final Cut 7 timelines comparison (original and exported -> Imported media: MOV XDCAM-PCM, and MOV MPEG4-AAC-LC)

Future work

  • Extend the experiment to other audio / video editors
  • In order to simplify the experiment I have measured only into the same software (ie: Export premiere -> import premiere). A good idea could be create an import export matrix between applications.
  • Play the exported files with a broadcast player and measure the audio video delay in SDI
  • Establish a reliable software method to measure the audio video delay in media files

Pleliminary conclusions

  • When we use NON compressed formats all works as expected, but when we use an audio compressed format (AAC, MP3) it seems that in some cases the audio and video editors do NOT compute properly the audio compressing time.
  • If you only work with audio perhaps you can handle a small audio delay, but if you work with audio and video tens of ms of audio advance or delay will be noticeable for the audience according to EBU R27
  • Recommendation: In the video processing chain if you can work with audio UNcompressed formats do it, you will increase the quality and you will avoid a lot of headaches. And use audio compressed formats at last step, just for distribution

Trim HLS stream with frame accuracy using ffmpeg and Bash script

Introduction

The main idea of this post is to practise with ffmpeg and write down my experiences, issues that I have found, possible solutions, etc. To achieve that I developed a group of bash scripts that are able to trim a HLS stream using ffmpeg .

The purpose of these scripts is to perform a VERY FAST frame accuracy trimming to HLS streaming files (.TS: h264, aac), it only re-encodes the first and the last segment of whole stream. To be able to do this the TS files shall be h264 baseline and they must start with an I frame (see the section: Known problems and future work).

To use these scripts you will need to install ffmpeg (v2.2+) and libxml2 (v20901+) (also known xmllint). I have developed and tested them in a virtual Ubuntu (14.04 TLS).

You can download the scripts from this link: 20140807_HLStrim_JCF_v1

Procedure

  •  Download media files from HLS URL
    • You can use this ruby gem or you can do it manually (downloading the manifest and media chunks)
  • At the end of downloading process you will have all media files (.ts) in your local disc, see figure 1
HLS Downloaded files

Figure 1: Downloaded files

  • Use the bashscript of the hlsframeaccuratetrim dest_file Tin Tout src_file1, src_file2, …. ,src_fileN
    • dest_file: Destination file (.mp4 format)
    • Tin: In trim point in seconds. Must be inside the first segment. If the first segment duration is 10s, the Tin must be <10s (see figure 2)
    • Tout: Out trim point in seconds. It is related to the in point of last segment and must be less that the last segment duration. If the last segment duration is 10s the Tout must be < 10s (see figure 2).
Tin & Tout trim point references

Figure 2: Tin & Tout trim point references

  • Example:

hlsframeaccuratetrim media_join_3_3.mp4 3.0 3.0.media_w1514283453_173.ts media_w1514283453_174.ts media_w1514283453_175.ts

Source TS and MP4 destination with their durations

Figure3: Source TS and MP4 destination with their durations

As we can see in the figure 3 the final clip duration is 21.66s = (13.37-3 + 8.34 + 3)

Bash script – hlsframeaccuratetrim

It implements all trimming & joining procedure, doing the following actions:

  1. Corrects the trim in point, it subtracts 5ms because later it will use this time to seek the next available video packet in the stream. Doing this we can have frame accuracy, if we dont’ subtract this 5ms will always cut 1 frame after.
  2. Call to tsaccuratetrim_rv_pa for the first and the last segments (see source code 2)
    1. This script trims & re-encodes the video and it parses the audio generating 2 separated TS (one for video and the other for audio)
    2. It returns the audio delay, that is the time difference between first video frame and first audio frame, see AVDelay in figure 4
  3. Call to rewrapts script for all other segments
    1. This script creates two a new transport streams (TS) from the original TS, one for audio and the other one for video. And it sets the start PTS of both streams to 0
  4. Call catts script for video
    1. It joins all video segments in one TS
  5. Call catts script for audio
    1. It joins all audio segments in one TS
  6. Finally it calls rewraptomp4 script
    1. It re-wraps the video and audio TSs into a MP4 file keeping the audio delay

Here you can find the commented source code of hlsframeaccuratetrim:


#!/bin/bash
#Use this script to trim with frame accuracy a TS (h264-AAC)
#It re-encodes the video of first and last segment, and it parses audio (keeping the AV delay)
#
#Version: 1.0
#
#Usage:
# hlsframeaccuratetrim dest_file Tin Tout src_file1, src_file2, .... ,src_fileN
#
#Example:
# hlsframeaccuratetrim dest.mp4 3.1 4.2 src1.ts, src2.ts, src3.ts,src4.ts

#Get input params
#----------------------------
outMP4=$1
inpoint=$2
outdur=$3
for ((i=4; i<=$#; i++))
do
	echo "${!i}"
	tssegments[i-4]=${!i}
done

#Apply a correction of -5ms in order to adjust the input trim point (later will find the next PTS)
#----------------------------
inpoint=$(printf "%f\n" $(bc -q <<< scale=0\;${inpoint}-0.005))

#Show input params
#----------------------------
echo "out file: ${outMP4}"
echo "Trim points: in=${inpoint}, out=${outdur}"
echo "Input files: "
for var in ${tssegments[@]}; do echo $var; done
 
#Set intermediate files 
#----------------------------
outTSvideo=${outMP4}_video.ts
outTSaudio=${outMP4}_audio.ts
outTS=${outMP4}.ts
let i=0;
for var in ${tssegments[@]}; do tssegmentsoutvideo[i]=${var}_out_video.ts; let i++; done
let i=0;
for var in ${tssegments[@]}; do tssegmentsoutaudio[i]=${var}_out_audio.ts; let i++; done

#Clean before
#----------------------------
rm -f ${tssegmentsoutvideo[@]}
rm -f ${tssegmentsoutaudio[@]}
rm -f $outTS
rm -f $outMP4

#Trim FIRST segment (re-encode video & parse audio)
#----------------------------
#The tsaccuratetrim_rv_pa does some tasks in order to keep the AV aligment (at PTS level)
audiodelay=$(./tsaccuratetrim_rv_pa ${tssegments[0]} ${tssegmentsoutvideo[0]} ${tssegmentsoutaudio[0]} IN $inpoint)

#Rewrap middle TSs (in order to clean all streams except the main video & audio)
#----------------------------
#The rewrapping set the AV PTS to 0
for ((i=1; i<=${#tssegments[*]}-2; i++))
do
	./rewrapts ${tssegments[i]} ${tssegmentsoutvideo[i]} ${tssegmentsoutaudio[i]}
done

#Trim LAST segment (re-encode video & parse audio)
#----------------------------
#The acctrimts_rv_pa does some tasks in order to keep the AV aligment (at PTS level)
./tsaccuratetrim_rv_pa ${tssegments[${#tssegments[*]}-1]} ${tssegmentsoutvideo[${#tssegmentsout[*]}-1]} ${tssegmentsoutaudio[${#tssegmentsmp4[*]}-1]} DUR $outdur

#Join all segments
#----------------------------
./catts $outTSvideo ${tssegmentsoutvideo[@]}
./catts $outTSaudio ${tssegmentsoutaudio[@]}

#Create final MP4 (wrap)
#----------------------------
./rewraptomp4 $outTSvideo $outTSaudio $audiodelay $outMP4

Bash script – tsaccuratetrim_rv_pa

It trims a TS file with frame accuracy and it generate one TS for video and another TS for audio.  The strategy that tsaccuratetrim_rv_pa implements is:

  1. Get the information from TS file using ffprobe (dumps all file info to xml)
  2. Use xmllint to parse information from xml file
  3. Use getnextpts script to get the PTS of the first video packet after trim point (see figure 4 for clarification)
  4. Use getnextpts script to get the PTS of the first audio packet after the PTS returned in step 4 (see figure 4 for clarification)
    1. This is done in order to start the file with video
    2. In this step it is computed the difference between first video PTS and first audio PTS after trim point (avdelay), this parameter will be used when we create the final MP4 file to keep AV delay
  5. Extract the video and audio tracks (h264 and AAC)
  6. Trim the video track re-encoding it with frame accuracy (using the encoding settings extracted from original TS, and the PTS computed in step 3)
  7. Trim audio track, by parsing the audio packets using using the PTS computed in step 4
  8. Create two TSs, one for the trimmed video and the other one for the parsed audio
  9. Return the avdelay value. We will use it later when we create the final MP4 file
Audio and video cut points selection

Figure 4: Audio and video cut points selection

Here you can find the commented source code of tsaccuratetrim_rv_pa:

#!/bin/bash
#Accurate trimmig of .ts (H264-AAC) based in ffmpeg 
#Strategy: re-encode video & parse audio keeping the AV delay
#
#This script requires ffmpeg (ffmpeg, ffprobe, libxml2-utils (xmllint)) 
#
#Version: 1.0
#
#Usage:
# tsaccuratetrim_rv_pa source destTSvideo destTSaudio IN/DUR timeIN
#
#Examples:
# tsaccuratetrim_rv_pa source.ts destvideo.ts destaudio.ts IN 1.32
#
# tsaccuratetrim_rv_pa source.ts destvideo.ts destaudio.ts DUR 2.33

#Set global vars 
#----------------------------
ffmpeg="ffmpeg"
ffprobe="ffprobe"
xmllint="xmllint"
getinfoparams="-print_format xml -show_streams"

#Set in vars
#----------------------------
sourceTS=$1
destTSvideo=$2
destTSaudio=$3
trimtype=$4
timetrim=$5

#Set intermediate files 
#----------------------------
sourceinfoxml=${sourceTS}.xml
rawvideo=${sourceTS}.h264
rawtrimmedaudio=${sourceTS}.aac
rawtrimedvideo=${sourceTS}_trimmed.h264

#Clean
#----------------------------
rm -f $sourceinfoxml  
rm -f $rawvideo
rm -f $rawtrimmedaudio
rm -f $rawtrimedvideo
rm -f $destTSvideo
rm -f $destTSaudio
rm -f $destMP4

#Create video information file
#----------------------------
${ffprobe} $sourceTS ${getinfoparams} > ${sourceinfoxml}
if [ "$?" != "0" ]; then
	echo "Error getting TS information!" 1>&2
	exit 1
fi

#Load information data
#----------------------------
numvideotracks=$(${xmllint} --xpath "count(/ffprobe/streams/stream[@codec_type='video'])" ${sourceinfoxml})
videocoder=$(${xmllint} --xpath "string(//ffprobe/streams/stream[@codec_type='video']/@codec_name)" ${sourceinfoxml})
videocodingprofilesrc=$(${xmllint} --xpath "string(//ffprobe/streams/stream[@codec_type='video']/@profile)" ${sourceinfoxml})
videocodingleveloriginal=$(${xmllint} --xpath "string(//ffprobe/streams/stream[@codec_type='video']/@level)" ${sourceinfoxml})
videobitrate=$(${xmllint} --xpath "string(//ffprobe/streams/stream[@codec_type='video']/@bit_rate)" ${sourceinfoxml})
videoduration=$(${xmllint} --xpath "string(//ffprobe/streams/stream[@codec_type='video']/@duration)" ${sourceinfoxml})
videoindex=$(${xmllint} --xpath "string(//ffprobe/streams/stream[@codec_type='video']/@index)" ${sourceinfoxml})

#Correct the level string from XX to X.X (Ex: 31 to 3.1)
#----------------------------
if [ ${#videocodingleveloriginal} == "2" ]; then
	videocodinglevel=${videocodingleveloriginal:0:1}.${videocodingleveloriginal:1:1}
else
	videocodinglevel=$videocodingleveloriginal
fi

#Adapt the profile name
videocodingprofile=${videocodingprofilesrc,,}
if [[ $videocodingprofile == *"baseline"* ]]; then
	videocodingprofile="baseline"
fi

#Check allowed profiles
if [ ${videocodingprofile} != "baseline" ] && [ ${videocodingprofile} != "main" ] && [ ${videocodingprofile} != "high" ]; then
	echo "Error the profile is ${videocodingprofile} is not allowed!" 1>&2
	exit 1
fi

#Approximate the video bitrate using the file length
#----------------------------
if [ ${#videobitrate} == "0" ]; then
	sourcefilesize=$(stat -c %s ${sourceTS})
	videobitrate=$(bc <<< "scale=0;(${sourcefilesize}*8)/${videoduration}") fi #To check #---------------------------- echo "Num video tracks: $numvideotracks" 1>&2
echo "Video codec is: $videocoder" 1>&2
echo "Video bitrate is: $videobitrate" 1>&2
echo "Video duration is: $videoduration" 1>&2
echo "Profile is: $videocodingprofile" 1>&2
echo "Level is: $videocodingleveloriginal" 1>&2
echo "Level length: ${#videocodingleveloriginal}" 1>&2
echo "Level modified is: $videocodinglevel" 1>&2
echo "Video index is: $videoindex" 1>&2

#Validate input TS
#----------------------------
if [ $numvideotracks != "1" ]; then
	echo "Error number of video tracks. Must be 1 video track in input TS!" 1>&2
	exit 1
fi
if [ $videocoder != "h264" ]; then
	echo "Error video codec. Must be h264" 1>&2
	exit 1
fi

#Set video encoding modifiers
#----------------------------
#This must be copied from input file video stream (OK from xml, bitrate approx from filesize / duration)
videoencodingparams="-vcodec libx264 -profile:v ${videocodingprofile} -level ${videocodinglevel} -b:v ${videobitrate}" 

#To check
#echo "videoencodingparams is: $videoencodingparams"

#Get precise trim points (PTS based)
#----------------------------
#Get video trim PTS
timetrimvideo=$(./getnextpts $sourceTS $timetrim video)
#Get audio trim PTS (it will be after the video trim point)
timetrimaudio=$(./getnextpts $sourceTS $timetrimvideo audio)

#To check
echo "Trim point video: $timetrimvideo" 1>&2
echo "Trim point audio: $timetrimaudio" 1>&2

#Use FFMPEG to extract video track in raw format (h264)
#----------------------------
${ffmpeg} -i $sourceTS -vcodec copy -f mpeg2video $rawvideo

#Use FFMPEG to extract audio track (and parse it)
#----------------------------
if [ $trimtype == "IN" ]; then
	${ffmpeg} -i $sourceTS -ss ${timetrimaudio} -acodec copy -f mp2 ${rawtrimmedaudio}
else
	${ffmpeg} -i $sourceTS -t ${timetrimaudio} -acodec copy -f mp2 ${rawtrimmedaudio}
fi

#Use FFMPEG to trim and re-encode the video track (re-encoding uses similar parameters as original file)
#----------------------------
if [ $trimtype == "IN" ]; then
	${ffmpeg} -i $rawvideo -ss ${timetrimvideo} ${videoencodingparams} ${rawtrimedvideo}
else
	${ffmpeg} -i $rawvideo -t ${timetrimvideo} ${videoencodingparams} ${rawtrimedvideo}
fi

#Create TSs for the video & audio streams
#----------------------------
#Comment: perhaps the PIDs changes!!
#Comment: Compute avdelay parameterin order to preserve the AV delay
avdelay=$(printf "%f\n" $(bc -q <<< scale=0\;${timetrimaudio}-${timetrimvideo})) #To check #echo "Audio delay appliied: $avdelay" 1>&2

#Comment: It does not work with h264 main profile. See https://trac.ffmpeg.org/ticket/1598
${ffmpeg} -i ${rawtrimedvideo} -vcodec copy -mpegts_copyts 1 -f mpegts -copyts ${destTSvideo}
${ffmpeg} -i ${rawtrimmedaudio} -acodec copy -mpegts_copyts 1 -f mpegts -copyts ${destTSaudio}

#Create final MP4 for test purposes (remux)
#----------------------------
#if [ $trimtype == "IN" ]; then
#	./rewraptomp4 ${destTSvideo} ${destTSaudio} ${avdelay} ${destMP4}
#else
#	./rewraptomp4 ${destTSvideo} ${destTSaudio} 0 ${destMP4}
#fi

#Return audio delay
#----------------------------
echo "$avdelay"	

Bash script – getnextpts

It finds PTS of a specific type of stream (video or audio)  after a determined stream time. It makes the following actions:

  1. Using ffprobe dumps all packets information into a xml file
  2. It uses xmllint and xpath to process that file and get the desired information

Here is the source code:

#!/bin/bash
#Get the next PTS to timeIN in specified type stream
#
#This script requires ffmpeg (ffmpeg, ffprobe) and libxml2-utils (xmllint)
#
#Version: 1.0
#
#Usage:
# getnextpts source timeIN video/audio
#
#Examples:
# getnextpts in.ts 1.2 video
#
# getnextpts in.ts 3.0 audio

#Set global vars 
#----------------------------
ffprobe="ffprobe"
xmllint="xmllint"
getinfoparams="-print_format xml -show_packets"

#Set in vars
#----------------------------
source=$1
timeIN=$2
streamtype=$3

#Set intermediate files 
#----------------------------
sourceinfoxml=${source}.packets.xml

#Clean
#----------------------------
rm -f $sourceinfoxml
.
#Create video information file
#----------------------------
${ffprobe} $source ${getinfoparams} > ${sourceinfoxml}
if [ "$?" != "0" ]; then
	echo "Error getting TS information!" 1>&2
	exit 1
fi

#Get fisrt pts in TS file (any stream type!)
#----------------------------
#TODO: Use function min to ensure that the returned PTS is the minimum
startpts=$(${xmllint} --xpath "string(//ffprobe/packets/packet[@pts_time>=0]/@pts_time)" ${sourceinfoxml})

timeINnorm=$(bc <<${timeINnorm}]/@pts_time)" ${sourceinfoxml})

#Normalize to 0
#----------------------------
nextptsn=$(bc <<&2
#echo "Timein in file: $timeINnorm" 1>&2
#echo "Next PTS in file: $nextpts" 1>&2	

#Return param
#----------------------------
echo "$nextptsn"	

Bash script – rewraptomp4

This is a simple script that has the mission to create a MP4 file based on an input video and audio TSs. It also delays the audio stream (respect to video) according to the audiodelay parameter.

  1. It uses ffmpeg to mux the input audio and video TS into a MP4 file without reencoding, it is a simple re-wrapping.

Here is the source code:


#!/bin/bash
#Re-wrap the source stream to mp4 
#
#This script requires ffmpeg
#
#Version: 1.0
#
#Usage:
# rewraptomp4 sourcevideo sourceaudio audiodelay dest
#
#Example:
# rewraptomp4 invideo.ts inaudio.ts 0.03 out.mp4

#Set global vars 
#----------------------------
ffmpeg="ffmpeg"

#Set in vars
#----------------------------
sourceTSvideo=$1
sourceTSaudio=$2
avdelay=$3
destMP4=$4

#Clean
#----------------------------
rm -f $destMP4  

#Create final MP4
#----------------------------
#This param is used to correct AAC coding problems
muxmp4encodingparams="-absf aac_adtstoasc" 

#echo "Delay: ${avdelay}"

${ffmpeg} -i ${sourceTSvideo} -itsoffset ${avdelay} -i ${sourceTSaudio} -vcodec copy -acodec copy ${muxmp4encodingparams} ${destMP4}

Bash script – rewrapts

This is a simple script that extract the video and audio streams from source and wrap them into separate TSs.

  1. It uses ffmpeg to extract and wrap the video and audio streams from input file to a separate TSs
    1. It is useful to clean the input TS from other type of streams different of video and audio
    2. It is important to point out that the start PTS are set to 0 in every destination TS

Here is the source code:

#!/bin/bash
#Extract video and audio streams and wrap them into separate TSs
#
#This script requires ffmpeg
#
#Version: 1.0
#
#Usage:
# rewrapts source destvideo destaudio
#
#Example:
# rewrapts in.ts outvideo.ts outaudio.ts


#Set global vars 
#----------------------------
ffmpeg="ffmpeg"

#Set in vars
#----------------------------
sourceTS=$1
destTSvideo=$2
destTSaudio=$3

#Clean
#----------------------------
rm -f $destTSvideo
rm -f $destTSaudio

#Create final TSs
#----------------------------
${ffmpeg} -i ${sourceTS} -map 0:v -vcodec copy -mpegts_copyts 1 ${destTSvideo}
${ffmpeg} -i ${sourceTS} -map 0:a -acodec copy -mpegts_copyts 1 ${destTSaudio}

Bash script – catts

This script is used to concatenate different TS

  1. It creates a txt file with the list of TS to concatenate
  2. It uses ffmpeg to concatenate the input TSs and it generates the output TS that is the result of concatenate all input TSs
    1. It’s very important to know that ffmpeg concat uses the last PTS of input file number N to set the first PTS of the file N+1. This means that if you concatenate audio and video streams and the stream N finishes with an audio packet this will cause that the first frame PTS of N+1 file will be incorrect, because you will have one video frame that lasts more than the others. And at the end you will see an audio video misalignment because the players will read an incorrect frame rate: fps_avg = number of frames / total duration[s] (and the total duration will be incorrect)

Here is the source code:

#!/bin/bash
#Concatenate TS streams
#
#This script requires ffmpeg
#
#Version: 1.0
#
#Usage:
# catts streamout streamIn1 streamIn2 ...streamInN
#
#Example:
# catts out.ts in1.ts in2.ts in3.ts in4.ts

#Set global vars 
#----------------------------
ffmpeg="ffmpeg"

#Set in vars
#----------------------------
destTS=$1

#Set intermediate files 
#----------------------------
catlist=${destTS}_list.txt

#Clean
#----------------------------
rm -f $catlist
rm -f $destTS

#Create list of files to merge
#----------------------------
noout="0"

echo "#List of files to merge" > $catlist
for var in "$@"
do
	if [ $noout == "0" ]; then
		noout="1"
	else	
		echo "file '${var}'" >> $catlist
	fi
done

#Cat mpeg TS streams
#----------------------------
${ffmpeg} -f concat -i $catlist -codec copy -mpegts_copyts 1 -f mpegts ${destTS}

Accuracy test

  • Create AV sync file (MP4 – h264 baseline 3.1, AAC 720×576@25p), I have used Adobe Premiere CC 2014), see figure 5
  • Use Apple mediafilesegmenter tool to generate the .ts files. Use the option -start-segments-with-iframe
  • Use the script hlsframeaccuratetrim to trim & join the .ts files using several input times
  • For every resulting file check the cut point precision by viewing the first frame (frame numbers are burn in video), you can see the results in table 1.
  • Check, for every file in the Adobe Premiere’s timeline if the first and last audio click in the resulting files have the same AV alignment (not the properly AV alignment) as original MP4 file (figure 6), you can see the results in table 2.
    • Definitions:
      • AV total delay = max (first click delay, last click delay)
      • AV Drift = max (first click delay, last click delay)-min (first click delay, last click delay)
Test nun Tin 1st Frame expected 1st Frame real Difference
1 3.0 75 75 0
2 2.0 50 50 0
3 2.12 53 53 0
4 3.2 80 80 0
5 4.0 100 100 0

Table 1: Trimming accuracy table

Accuracy test results are perfect!!!!

AV sync test timeline

Figure 5: AV sync test timeline

Checking AV aligment

Figure 6: Checking AV alignment

Test nun Tin 1st click delay[ms] last click delay[ms] AV total delay[ms] AV Drift [ms]
1 3.0 0 (*) 0 (*) 0 0
2 2.0 0 (*) 0 (*) 0 0
3 2.12 -10 -10 -10 0
4 3.2 -15 15 -15 0
5 4.0 0 (*) 0 (*) 0 0

Table 2: Audio video delay test

(*) Imposible to measure with this method (less than 5ms)

The results of audio – video delay test are pretty good, all under 15ms without drifting

Test in wowza environment

  • I also tested these scripts in a more realistic environment, a wowza server (3.6.3 build8031) installed in a EC2 instance (see figure 7):
Test environment

Figure 7: Test environment

In this test environment the scrpits have continued working as expected.

Known problems & future work

  • The script tsaccuratetrim_rv_pa fails if the source media is encoded in h264 MAIN profile (it woks if it is h264 baseline) [see:  ffmpeg posted bug]
    • Implement a workaround to avoid this bug (for instance we can use MP4 wrapper before join phase??)
  • If you use a source file of 50fps (20ms/frame) the first packet of re-codified files (for first and last segment) indicates that the frame duration = 40ms, and 20ms for all other frames. At the end this problem provokes an audio video misalignment because the players will read an incorrect frame rate: fps_avg = number of frames / total duration[s] (and the total duration will be incorrect because 2 frames in whole file will lasts the double what it should be)
    • Notify this issue to ffmpeg.org & try to workaround the problem
  • Avoid using xmlint translating the scripts into ruby
  • Implement download HLS media inside the script
    • Implement HLS live stream downloading
  • Use absolute trim points related to whole HLS stream (not chunk related trim points)
  • Work to reduce the small audio-video delay that appear in some cases
  • Use longer files to increase the precision in audio video alignment tests

Other methods tested

Use ffmpeg to convert the video track into pngs:

ffmpeg -i c:\tmpDownload\TS\Wowza\media_14.ts -f image2 c:\tmpDownload\TS\fileSequence0\im%05d.png

Result: OK

Use FFmpeg to trim TS (copying audio stream):

ffmpeg -i input.ts -ss 1 -acodec copy -vcodec libx264 -profile:v main -level 3.1 out.ts

Result: ERRORS (repeated 1st frame)

JOCHLSDownloader (Ruby gem)

The JOCHLSDownloader ruby gem is a very simple native ruby code that downloads all files linked by a m3u8 manifest (used in HLS), I have used several times for test purposes.

  You can download the JOCHLSDownloader (Ruby version) from: Gem Version

Note:

  • For LIVE and EVENT playlist types it downloads only the firsts media segments.
  • For VOD playlist type it will download all media content.

 

Usage examples

require 'JOCHLSDownloader.rb'

#URL from HLS apple example
url = "https://devimages.apple.com.edgekey.net/streaming/examples/bipbop_4x3/bipbop_4x3_variant.m3u8"
downloadpath = "./donloadtestfiles"
logfilename = "jochlsdownloader.log"

begin

hlsdownloader = CJOCHLSDownloader.new(url, downloadpath, logfilename, Logger::DEBUG)
hlsdownloader.startdownload
puts "End!"

rescue Exception => e

puts "Error: #{e.message}, Trace: #{e.backtrace.inspect}"

end

 

Multiview depth video coding using 3D Wavelets

This work was done under 3D video coding subject of Merit Master (UPC). It evaluates the use of 3D wavelet  scheme to codify depth video images.

Download / view PDF

Paper PDF

RTP Protocol

It describes the RTP protocol, and it proposes a C code to implement it and test it.

Download / View paper

Paper PDF