Trim HLS stream with frame accuracy using ffmpeg and Bash script

Introduction

The main idea of this post is to practise with ffmpeg and write down my experiences, issues that I have found, possible solutions, etc. To achieve that I developed a group of bash scripts that are able to trim a HLS stream using ffmpeg .

The purpose of these scripts is to perform a VERY FAST frame accuracy trimming to HLS streaming files (.TS: h264, aac), it only re-encodes the first and the last segment of whole stream. To be able to do this the TS files shall be h264 baseline and they must start with an I frame (see the section: Known problems and future work).

To use these scripts you will need to install ffmpeg (v2.2+) and libxml2 (v20901+) (also known xmllint). I have developed and tested them in a virtual Ubuntu (14.04 TLS).

You can download the scripts from this link: 20140807_HLStrim_JCF_v1

Procedure

  •  Download media files from HLS URL
    • You can use this ruby gem or you can do it manually (downloading the manifest and media chunks)
  • At the end of downloading process you will have all media files (.ts) in your local disc, see figure 1
HLS Downloaded files
Figure 1: Downloaded files
  • Use the bashscript of the hlsframeaccuratetrim dest_file Tin Tout src_file1, src_file2, …. ,src_fileN
    • dest_file: Destination file (.mp4 format)
    • Tin: In trim point in seconds. Must be inside the first segment. If the first segment duration is 10s, the Tin must be <10s (see figure 2)
    • Tout: Out trim point in seconds. It is related to the in point of last segment and must be less that the last segment duration. If the last segment duration is 10s the Tout must be < 10s (see figure 2).
Tin & Tout trim point references
Figure 2: Tin & Tout trim point references
  • Example:

hlsframeaccuratetrim media_join_3_3.mp4 3.0 3.0.media_w1514283453_173.ts media_w1514283453_174.ts media_w1514283453_175.ts

Source TS and MP4 destination with their durations
Figure3: Source TS and MP4 destination with their durations

As we can see in the figure 3 the final clip duration is 21.66s = (13.37-3 + 8.34 + 3)

Bash script – hlsframeaccuratetrim

It implements all trimming & joining procedure, doing the following actions:

  1. Corrects the trim in point, it subtracts 5ms because later it will use this time to seek the next available video packet in the stream. Doing this we can have frame accuracy, if we dont’ subtract this 5ms will always cut 1 frame after.
  2. Call to tsaccuratetrim_rv_pa for the first and the last segments (see source code 2)
    1. This script trims & re-encodes the video and it parses the audio generating 2 separated TS (one for video and the other for audio)
    2. It returns the audio delay, that is the time difference between first video frame and first audio frame, see AVDelay in figure 4
  3. Call to rewrapts script for all other segments
    1. This script creates two a new transport streams (TS) from the original TS, one for audio and the other one for video. And it sets the start PTS of both streams to 0
  4. Call catts script for video
    1. It joins all video segments in one TS
  5. Call catts script for audio
    1. It joins all audio segments in one TS
  6. Finally it calls rewraptomp4 script
    1. It re-wraps the video and audio TSs into a MP4 file keeping the audio delay

Here you can find the commented source code of hlsframeaccuratetrim:


#!/bin/bash
#Use this script to trim with frame accuracy a TS (h264-AAC)
#It re-encodes the video of first and last segment, and it parses audio (keeping the AV delay)
#
#Version: 1.0
#
#Usage:
# hlsframeaccuratetrim dest_file Tin Tout src_file1, src_file2, .... ,src_fileN
#
#Example:
# hlsframeaccuratetrim dest.mp4 3.1 4.2 src1.ts, src2.ts, src3.ts,src4.ts

#Get input params
#----------------------------
outMP4=$1
inpoint=$2
outdur=$3
for ((i=4; i<=$#; i++))
do
	echo "${!i}"
	tssegments[i-4]=${!i}
done

#Apply a correction of -5ms in order to adjust the input trim point (later will find the next PTS)
#----------------------------
inpoint=$(printf "%f\n" $(bc -q <<< scale=0\;${inpoint}-0.005))

#Show input params
#----------------------------
echo "out file: ${outMP4}"
echo "Trim points: in=${inpoint}, out=${outdur}"
echo "Input files: "
for var in ${tssegments[@]}; do echo $var; done
 
#Set intermediate files 
#----------------------------
outTSvideo=${outMP4}_video.ts
outTSaudio=${outMP4}_audio.ts
outTS=${outMP4}.ts
let i=0;
for var in ${tssegments[@]}; do tssegmentsoutvideo[i]=${var}_out_video.ts; let i++; done
let i=0;
for var in ${tssegments[@]}; do tssegmentsoutaudio[i]=${var}_out_audio.ts; let i++; done

#Clean before
#----------------------------
rm -f ${tssegmentsoutvideo[@]}
rm -f ${tssegmentsoutaudio[@]}
rm -f $outTS
rm -f $outMP4

#Trim FIRST segment (re-encode video & parse audio)
#----------------------------
#The tsaccuratetrim_rv_pa does some tasks in order to keep the AV aligment (at PTS level)
audiodelay=$(./tsaccuratetrim_rv_pa ${tssegments[0]} ${tssegmentsoutvideo[0]} ${tssegmentsoutaudio[0]} IN $inpoint)

#Rewrap middle TSs (in order to clean all streams except the main video & audio)
#----------------------------
#The rewrapping set the AV PTS to 0
for ((i=1; i<=${#tssegments[*]}-2; i++))
do
	./rewrapts ${tssegments[i]} ${tssegmentsoutvideo[i]} ${tssegmentsoutaudio[i]}
done

#Trim LAST segment (re-encode video & parse audio)
#----------------------------
#The acctrimts_rv_pa does some tasks in order to keep the AV aligment (at PTS level)
./tsaccuratetrim_rv_pa ${tssegments[${#tssegments[*]}-1]} ${tssegmentsoutvideo[${#tssegmentsout[*]}-1]} ${tssegmentsoutaudio[${#tssegmentsmp4[*]}-1]} DUR $outdur

#Join all segments
#----------------------------
./catts $outTSvideo ${tssegmentsoutvideo[@]}
./catts $outTSaudio ${tssegmentsoutaudio[@]}

#Create final MP4 (wrap)
#----------------------------
./rewraptomp4 $outTSvideo $outTSaudio $audiodelay $outMP4

Bash script – tsaccuratetrim_rv_pa

It trims a TS file with frame accuracy and it generate one TS for video and another TS for audio.  The strategy that tsaccuratetrim_rv_pa implements is:

  1. Get the information from TS file using ffprobe (dumps all file info to xml)
  2. Use xmllint to parse information from xml file
  3. Use getnextpts script to get the PTS of the first video packet after trim point (see figure 4 for clarification)
  4. Use getnextpts script to get the PTS of the first audio packet after the PTS returned in step 4 (see figure 4 for clarification)
    1. This is done in order to start the file with video
    2. In this step it is computed the difference between first video PTS and first audio PTS after trim point (avdelay), this parameter will be used when we create the final MP4 file to keep AV delay
  5. Extract the video and audio tracks (h264 and AAC)
  6. Trim the video track re-encoding it with frame accuracy (using the encoding settings extracted from original TS, and the PTS computed in step 3)
  7. Trim audio track, by parsing the audio packets using using the PTS computed in step 4
  8. Create two TSs, one for the trimmed video and the other one for the parsed audio
  9. Return the avdelay value. We will use it later when we create the final MP4 file
Audio and video cut points selection
Figure 4: Audio and video cut points selection

Here you can find the commented source code of tsaccuratetrim_rv_pa:

#!/bin/bash
#Accurate trimmig of .ts (H264-AAC) based in ffmpeg 
#Strategy: re-encode video & parse audio keeping the AV delay
#
#This script requires ffmpeg (ffmpeg, ffprobe, libxml2-utils (xmllint)) 
#
#Version: 1.0
#
#Usage:
# tsaccuratetrim_rv_pa source destTSvideo destTSaudio IN/DUR timeIN
#
#Examples:
# tsaccuratetrim_rv_pa source.ts destvideo.ts destaudio.ts IN 1.32
#
# tsaccuratetrim_rv_pa source.ts destvideo.ts destaudio.ts DUR 2.33

#Set global vars 
#----------------------------
ffmpeg="ffmpeg"
ffprobe="ffprobe"
xmllint="xmllint"
getinfoparams="-print_format xml -show_streams"

#Set in vars
#----------------------------
sourceTS=$1
destTSvideo=$2
destTSaudio=$3
trimtype=$4
timetrim=$5

#Set intermediate files 
#----------------------------
sourceinfoxml=${sourceTS}.xml
rawvideo=${sourceTS}.h264
rawtrimmedaudio=${sourceTS}.aac
rawtrimedvideo=${sourceTS}_trimmed.h264

#Clean
#----------------------------
rm -f $sourceinfoxml  
rm -f $rawvideo
rm -f $rawtrimmedaudio
rm -f $rawtrimedvideo
rm -f $destTSvideo
rm -f $destTSaudio
rm -f $destMP4

#Create video information file
#----------------------------
${ffprobe} $sourceTS ${getinfoparams} > ${sourceinfoxml}
if [ "$?" != "0" ]; then
	echo "Error getting TS information!" 1>&2
	exit 1
fi

#Load information data
#----------------------------
numvideotracks=$(${xmllint} --xpath "count(/ffprobe/streams/stream[@codec_type='video'])" ${sourceinfoxml})
videocoder=$(${xmllint} --xpath "string(//ffprobe/streams/stream[@codec_type='video']/@codec_name)" ${sourceinfoxml})
videocodingprofilesrc=$(${xmllint} --xpath "string(//ffprobe/streams/stream[@codec_type='video']/@profile)" ${sourceinfoxml})
videocodingleveloriginal=$(${xmllint} --xpath "string(//ffprobe/streams/stream[@codec_type='video']/@level)" ${sourceinfoxml})
videobitrate=$(${xmllint} --xpath "string(//ffprobe/streams/stream[@codec_type='video']/@bit_rate)" ${sourceinfoxml})
videoduration=$(${xmllint} --xpath "string(//ffprobe/streams/stream[@codec_type='video']/@duration)" ${sourceinfoxml})
videoindex=$(${xmllint} --xpath "string(//ffprobe/streams/stream[@codec_type='video']/@index)" ${sourceinfoxml})

#Correct the level string from XX to X.X (Ex: 31 to 3.1)
#----------------------------
if [ ${#videocodingleveloriginal} == "2" ]; then
	videocodinglevel=${videocodingleveloriginal:0:1}.${videocodingleveloriginal:1:1}
else
	videocodinglevel=$videocodingleveloriginal
fi

#Adapt the profile name
videocodingprofile=${videocodingprofilesrc,,}
if [[ $videocodingprofile == *"baseline"* ]]; then
	videocodingprofile="baseline"
fi

#Check allowed profiles
if [ ${videocodingprofile} != "baseline" ] && [ ${videocodingprofile} != "main" ] && [ ${videocodingprofile} != "high" ]; then
	echo "Error the profile is ${videocodingprofile} is not allowed!" 1>&2
	exit 1
fi

#Approximate the video bitrate using the file length
#----------------------------
if [ ${#videobitrate} == "0" ]; then
	sourcefilesize=$(stat -c %s ${sourceTS})
	videobitrate=$(bc <<< "scale=0;(${sourcefilesize}*8)/${videoduration}") fi #To check #---------------------------- echo "Num video tracks: $numvideotracks" 1>&2
echo "Video codec is: $videocoder" 1>&2
echo "Video bitrate is: $videobitrate" 1>&2
echo "Video duration is: $videoduration" 1>&2
echo "Profile is: $videocodingprofile" 1>&2
echo "Level is: $videocodingleveloriginal" 1>&2
echo "Level length: ${#videocodingleveloriginal}" 1>&2
echo "Level modified is: $videocodinglevel" 1>&2
echo "Video index is: $videoindex" 1>&2

#Validate input TS
#----------------------------
if [ $numvideotracks != "1" ]; then
	echo "Error number of video tracks. Must be 1 video track in input TS!" 1>&2
	exit 1
fi
if [ $videocoder != "h264" ]; then
	echo "Error video codec. Must be h264" 1>&2
	exit 1
fi

#Set video encoding modifiers
#----------------------------
#This must be copied from input file video stream (OK from xml, bitrate approx from filesize / duration)
videoencodingparams="-vcodec libx264 -profile:v ${videocodingprofile} -level ${videocodinglevel} -b:v ${videobitrate}" 

#To check
#echo "videoencodingparams is: $videoencodingparams"

#Get precise trim points (PTS based)
#----------------------------
#Get video trim PTS
timetrimvideo=$(./getnextpts $sourceTS $timetrim video)
#Get audio trim PTS (it will be after the video trim point)
timetrimaudio=$(./getnextpts $sourceTS $timetrimvideo audio)

#To check
echo "Trim point video: $timetrimvideo" 1>&2
echo "Trim point audio: $timetrimaudio" 1>&2

#Use FFMPEG to extract video track in raw format (h264)
#----------------------------
${ffmpeg} -i $sourceTS -vcodec copy -f mpeg2video $rawvideo

#Use FFMPEG to extract audio track (and parse it)
#----------------------------
if [ $trimtype == "IN" ]; then
	${ffmpeg} -i $sourceTS -ss ${timetrimaudio} -acodec copy -f mp2 ${rawtrimmedaudio}
else
	${ffmpeg} -i $sourceTS -t ${timetrimaudio} -acodec copy -f mp2 ${rawtrimmedaudio}
fi

#Use FFMPEG to trim and re-encode the video track (re-encoding uses similar parameters as original file)
#----------------------------
if [ $trimtype == "IN" ]; then
	${ffmpeg} -i $rawvideo -ss ${timetrimvideo} ${videoencodingparams} ${rawtrimedvideo}
else
	${ffmpeg} -i $rawvideo -t ${timetrimvideo} ${videoencodingparams} ${rawtrimedvideo}
fi

#Create TSs for the video & audio streams
#----------------------------
#Comment: perhaps the PIDs changes!!
#Comment: Compute avdelay parameterin order to preserve the AV delay
avdelay=$(printf "%f\n" $(bc -q <<< scale=0\;${timetrimaudio}-${timetrimvideo})) #To check #echo "Audio delay appliied: $avdelay" 1>&2

#Comment: It does not work with h264 main profile. See https://trac.ffmpeg.org/ticket/1598
${ffmpeg} -i ${rawtrimedvideo} -vcodec copy -mpegts_copyts 1 -f mpegts -copyts ${destTSvideo}
${ffmpeg} -i ${rawtrimmedaudio} -acodec copy -mpegts_copyts 1 -f mpegts -copyts ${destTSaudio}

#Create final MP4 for test purposes (remux)
#----------------------------
#if [ $trimtype == "IN" ]; then
#	./rewraptomp4 ${destTSvideo} ${destTSaudio} ${avdelay} ${destMP4}
#else
#	./rewraptomp4 ${destTSvideo} ${destTSaudio} 0 ${destMP4}
#fi

#Return audio delay
#----------------------------
echo "$avdelay"	

Bash script – getnextpts

It finds PTS of a specific type of stream (video or audio)  after a determined stream time. It makes the following actions:

  1. Using ffprobe dumps all packets information into a xml file
  2. It uses xmllint and xpath to process that file and get the desired information

Here is the source code:

#!/bin/bash
#Get the next PTS to timeIN in specified type stream
#
#This script requires ffmpeg (ffmpeg, ffprobe) and libxml2-utils (xmllint)
#
#Version: 1.0
#
#Usage:
# getnextpts source timeIN video/audio
#
#Examples:
# getnextpts in.ts 1.2 video
#
# getnextpts in.ts 3.0 audio

#Set global vars 
#----------------------------
ffprobe="ffprobe"
xmllint="xmllint"
getinfoparams="-print_format xml -show_packets"

#Set in vars
#----------------------------
source=$1
timeIN=$2
streamtype=$3

#Set intermediate files 
#----------------------------
sourceinfoxml=${source}.packets.xml

#Clean
#----------------------------
rm -f $sourceinfoxml
.
#Create video information file
#----------------------------
${ffprobe} $source ${getinfoparams} > ${sourceinfoxml}
if [ "$?" != "0" ]; then
	echo "Error getting TS information!" 1>&2
	exit 1
fi

#Get fisrt pts in TS file (any stream type!)
#----------------------------
#TODO: Use function min to ensure that the returned PTS is the minimum
startpts=$(${xmllint} --xpath "string(//ffprobe/packets/packet[@pts_time>=0]/@pts_time)" ${sourceinfoxml})

timeINnorm=$(bc <<${timeINnorm}]/@pts_time)" ${sourceinfoxml})

#Normalize to 0
#----------------------------
nextptsn=$(bc <<&2
#echo "Timein in file: $timeINnorm" 1>&2
#echo "Next PTS in file: $nextpts" 1>&2	

#Return param
#----------------------------
echo "$nextptsn"	

Bash script – rewraptomp4

This is a simple script that has the mission to create a MP4 file based on an input video and audio TSs. It also delays the audio stream (respect to video) according to the audiodelay parameter.

  1. It uses ffmpeg to mux the input audio and video TS into a MP4 file without reencoding, it is a simple re-wrapping.

Here is the source code:


#!/bin/bash
#Re-wrap the source stream to mp4 
#
#This script requires ffmpeg
#
#Version: 1.0
#
#Usage:
# rewraptomp4 sourcevideo sourceaudio audiodelay dest
#
#Example:
# rewraptomp4 invideo.ts inaudio.ts 0.03 out.mp4

#Set global vars 
#----------------------------
ffmpeg="ffmpeg"

#Set in vars
#----------------------------
sourceTSvideo=$1
sourceTSaudio=$2
avdelay=$3
destMP4=$4

#Clean
#----------------------------
rm -f $destMP4  

#Create final MP4
#----------------------------
#This param is used to correct AAC coding problems
muxmp4encodingparams="-absf aac_adtstoasc" 

#echo "Delay: ${avdelay}"

${ffmpeg} -i ${sourceTSvideo} -itsoffset ${avdelay} -i ${sourceTSaudio} -vcodec copy -acodec copy ${muxmp4encodingparams} ${destMP4}

Bash script – rewrapts

This is a simple script that extract the video and audio streams from source and wrap them into separate TSs.

  1. It uses ffmpeg to extract and wrap the video and audio streams from input file to a separate TSs
    1. It is useful to clean the input TS from other type of streams different of video and audio
    2. It is important to point out that the start PTS are set to 0 in every destination TS

Here is the source code:

#!/bin/bash
#Extract video and audio streams and wrap them into separate TSs
#
#This script requires ffmpeg
#
#Version: 1.0
#
#Usage:
# rewrapts source destvideo destaudio
#
#Example:
# rewrapts in.ts outvideo.ts outaudio.ts


#Set global vars 
#----------------------------
ffmpeg="ffmpeg"

#Set in vars
#----------------------------
sourceTS=$1
destTSvideo=$2
destTSaudio=$3

#Clean
#----------------------------
rm -f $destTSvideo
rm -f $destTSaudio

#Create final TSs
#----------------------------
${ffmpeg} -i ${sourceTS} -map 0:v -vcodec copy -mpegts_copyts 1 ${destTSvideo}
${ffmpeg} -i ${sourceTS} -map 0:a -acodec copy -mpegts_copyts 1 ${destTSaudio}

Bash script – catts

This script is used to concatenate different TS

  1. It creates a txt file with the list of TS to concatenate
  2. It uses ffmpeg to concatenate the input TSs and it generates the output TS that is the result of concatenate all input TSs
    1. It’s very important to know that ffmpeg concat uses the last PTS of input file number N to set the first PTS of the file N+1. This means that if you concatenate audio and video streams and the stream N finishes with an audio packet this will cause that the first frame PTS of N+1 file will be incorrect, because you will have one video frame that lasts more than the others. And at the end you will see an audio video misalignment because the players will read an incorrect frame rate: fps_avg = number of frames / total duration[s] (and the total duration will be incorrect)

Here is the source code:

#!/bin/bash
#Concatenate TS streams
#
#This script requires ffmpeg
#
#Version: 1.0
#
#Usage:
# catts streamout streamIn1 streamIn2 ...streamInN
#
#Example:
# catts out.ts in1.ts in2.ts in3.ts in4.ts

#Set global vars 
#----------------------------
ffmpeg="ffmpeg"

#Set in vars
#----------------------------
destTS=$1

#Set intermediate files 
#----------------------------
catlist=${destTS}_list.txt

#Clean
#----------------------------
rm -f $catlist
rm -f $destTS

#Create list of files to merge
#----------------------------
noout="0"

echo "#List of files to merge" > $catlist
for var in "$@"
do
	if [ $noout == "0" ]; then
		noout="1"
	else	
		echo "file '${var}'" >> $catlist
	fi
done

#Cat mpeg TS streams
#----------------------------
${ffmpeg} -f concat -i $catlist -codec copy -mpegts_copyts 1 -f mpegts ${destTS}

Accuracy test

  • Create AV sync file (MP4 – h264 baseline 3.1, AAC 720×576@25p), I have used Adobe Premiere CC 2014), see figure 5
  • Use Apple mediafilesegmenter tool to generate the .ts files. Use the option -start-segments-with-iframe
  • Use the script hlsframeaccuratetrim to trim & join the .ts files using several input times
  • For every resulting file check the cut point precision by viewing the first frame (frame numbers are burn in video), you can see the results in table 1.
  • Check, for every file in the Adobe Premiere’s timeline if the first and last audio click in the resulting files have the same AV alignment (not the properly AV alignment) as original MP4 file (figure 6), you can see the results in table 2.
    • Definitions:
      • AV total delay = max (first click delay, last click delay)
      • AV Drift = max (first click delay, last click delay)-min (first click delay, last click delay)
Test nun Tin 1st Frame expected 1st Frame real Difference
1 3.0 75 75 0
2 2.0 50 50 0
3 2.12 53 53 0
4 3.2 80 80 0
5 4.0 100 100 0

Table 1: Trimming accuracy table

Accuracy test results are perfect!!!!

AV sync test timeline
Figure 5: AV sync test timeline
Checking AV aligment
Figure 6: Checking AV alignment
Test nun Tin 1st click delay[ms] last click delay[ms] AV total delay[ms] AV Drift [ms]
1 3.0 0 (*) 0 (*) 0 0
2 2.0 0 (*) 0 (*) 0 0
3 2.12 -10 -10 -10 0
4 3.2 -15 15 -15 0
5 4.0 0 (*) 0 (*) 0 0

Table 2: Audio video delay test

(*) Imposible to measure with this method (less than 5ms)

The results of audio – video delay test are pretty good, all under 15ms without drifting

Test in wowza environment

  • I also tested these scripts in a more realistic environment, a wowza server (3.6.3 build8031) installed in a EC2 instance (see figure 7):
Test environment
Figure 7: Test environment

In this test environment the scrpits have continued working as expected.

Known problems & future work

  • The script tsaccuratetrim_rv_pa fails if the source media is encoded in h264 MAIN profile (it woks if it is h264 baseline) [see:  ffmpeg posted bug]
    • Implement a workaround to avoid this bug (for instance we can use MP4 wrapper before join phase??)
  • If you use a source file of 50fps (20ms/frame) the first packet of re-codified files (for first and last segment) indicates that the frame duration = 40ms, and 20ms for all other frames. At the end this problem provokes an audio video misalignment because the players will read an incorrect frame rate: fps_avg = number of frames / total duration[s] (and the total duration will be incorrect because 2 frames in whole file will lasts the double what it should be)
    • Notify this issue to ffmpeg.org & try to workaround the problem
  • Avoid using xmlint translating the scripts into ruby
  • Implement download HLS media inside the script
    • Implement HLS live stream downloading
  • Use absolute trim points related to whole HLS stream (not chunk related trim points)
  • Work to reduce the small audio-video delay that appear in some cases
  • Use longer files to increase the precision in audio video alignment tests

Other methods tested

Use ffmpeg to convert the video track into pngs:

ffmpeg -i c:\tmpDownload\TS\Wowza\media_14.ts -f image2 c:\tmpDownload\TS\fileSequence0\im%05d.png

Result: OK

Use FFmpeg to trim TS (copying audio stream):

ffmpeg -i input.ts -ss 1 -acodec copy -vcodec libx264 -profile:v main -level 3.1 out.ts

Result: ERRORS (repeated 1st frame)

4 thoughts on “Trim HLS stream with frame accuracy using ffmpeg and Bash script

  1. Hi

    Very interesting. Do you ygink it’s possible to make the same but on the main manifest and have the script make the same cut on every sub quality TS file ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: