Sunday, April 10, 2016

AAC encoding using AudioConverter and writing to AVAssetWriter

Leave a Comment

I'm struggling to encode audio buffers received from AVCaptureSession using AudioConverter and then appending them to an AVAssetWriter.

I'm not getting any errors (including OSStatus responses), and the CMSampleBuffers generated seem to have valid data, however the resulting file simply does not have any playable audio. When writing together with video, the video frames stop getting appended a couple of frames in (appendSampleBuffer() returns false, but with no AVAssetWriter.error), probably because the asset writer is waiting for the audio to catch up. I suspect it's related to the way I'm setting up the priming for AAC.

The app uses RxSwift, but I've removed the RxSwift parts so that it's easier to understand for a wider audience.

Please check out comments in the code below for more... comments

Given a settings struct:

import Foundation import AVFoundation import CleanroomLogger  public struct AVSettings {  let orientation: AVCaptureVideoOrientation = .Portrait let sessionPreset                          = AVCaptureSessionPreset1280x720 let videoBitrate: Int                      = 2_000_000 let videoExpectedFrameRate: Int            = 30 let videoMaxKeyFrameInterval: Int          = 60  let audioBitrate: Int                      = 32 * 1024  /// Settings that are `0` means variable rate. /// The `mSampleRate` and `mChennelsPerFrame` is overwritten at run-time /// to values based on the input stream. let audioOutputABSD = AudioStreamBasicDescription(                             mSampleRate: AVAudioSession.sharedInstance().sampleRate,                             mFormatID: kAudioFormatMPEG4AAC,                             mFormatFlags: UInt32(MPEG4ObjectID.AAC_Main.rawValue),                             mBytesPerPacket: 0,                             mFramesPerPacket: 1024,                             mBytesPerFrame: 0,                             mChannelsPerFrame: 1,                             mBitsPerChannel: 0,                             mReserved: 0)  let audioEncoderClassDescriptions = [     AudioClassDescription(         mType: kAudioEncoderComponentType,         mSubType: kAudioFormatMPEG4AAC,         mManufacturer: kAppleSoftwareAudioCodecManufacturer) ]  } 

Some helper functions:

public func getVideoDimensions(fromSettings settings: AVSettings) -> (Int, Int) {   switch (settings.sessionPreset, settings.orientation)  {   case (AVCaptureSessionPreset1920x1080, .Portrait): return (1080, 1920)   case (AVCaptureSessionPreset1280x720, .Portrait): return (720, 1280)   default: fatalError("Unsupported session preset and orientation")   } }  public func createAudioFormatDescription(fromSettings settings: AVSettings) -> CMAudioFormatDescription {   var result = noErr   var absd = settings.audioOutputABSD   var description: CMAudioFormatDescription?   withUnsafePointer(&absd) { absdPtr in       result = CMAudioFormatDescriptionCreate(nil,                                               absdPtr,                                               0, nil,                                               0, nil,                                               nil,                                               &description)   }    if result != noErr {       Log.error?.message("Could not create audio format description")   }    return description! }  public func createVideoFormatDescription(fromSettings settings: AVSettings) -> CMVideoFormatDescription {   var result = noErr   var description: CMVideoFormatDescription?   let (width, height) = getVideoDimensions(fromSettings: settings)   result = CMVideoFormatDescriptionCreate(nil,                                           kCMVideoCodecType_H264,                                           Int32(width),                                           Int32(height),                                           [:],                                           &description)    if result != noErr {       Log.error?.message("Could not create video format description")   }    return description! } 

This is how the asset writer is initialized:

guard let audioDevice = defaultAudioDevice() else { throw RecordError.MissingDeviceFeature("Microphone") }  guard let videoDevice = defaultVideoDevice(.Back) else { throw RecordError.MissingDeviceFeature("Camera") }  let videoInput      = try AVCaptureDeviceInput(device: videoDevice) let audioInput      = try AVCaptureDeviceInput(device: audioDevice) let videoFormatHint = createVideoFormatDescription(fromSettings: settings) let audioFormatHint = createAudioFormatDescription(fromSettings: settings)  let writerVideoInput = AVAssetWriterInput(mediaType: AVMediaTypeVideo,                                         outputSettings: nil,                                         sourceFormatHint: videoFormatHint)  let writerAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio,                                         outputSettings: nil,                                         sourceFormatHint: audioFormatHint)  writerVideoInput.expectsMediaDataInRealTime = true writerAudioInput.expectsMediaDataInRealTime = true  let url = NSURL(fileURLWithPath: NSTemporaryDirectory(), isDirectory: true)         .URLByAppendingPathComponent(NSProcessInfo.processInfo().globallyUniqueString)         .URLByAppendingPathExtension("mp4")  let assetWriter =  try AVAssetWriter(URL: url, fileType: AVFileTypeMPEG4)  if !assetWriter.canAddInput(writerVideoInput) { throw RecordError.Unknown("Could not add video input") }  if !assetWriter.canAddInput(writerAudioInput) { throw RecordError.Unknown("Could not add audio input") }  assetWriter.addInput(writerVideoInput) assetWriter.addInput(writerAudioInput) 

And this is how audio samples are being encoded, problem area is most likely to be around here. I've re-written this so that it doesn't use any Rx-isms.

var outputABSD = settings.audioOutputABSD var outputFormatDescription: CMAudioFormatDescription! = nil CMAudioFormatDescriptionCreate(nil, &outputABSD, 0, nil, 0, nil, nil, &formatDescription)  var converter: AudioConverter?  // Indicates whether priming information has been attached to the first buffer var primed = false  func encodeAudioBuffer(settings: AVSettings, buffer: CMSampleBuffer) throws -> CMSampleBuffer? {    // Create the audio converter if it's not available   if converter == nil {       var classDescriptions = settings.audioEncoderClassDescriptions       var inputABSD = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(buffer)!).memory       var outputABSD = settings.audioOutputABSD       outputABSD.mSampleRate = inputABSD.mSampleRate       outputABSD.mChannelsPerFrame = inputABSD.mChannelsPerFrame        var converter: AudioConverterRef = nil       var result = noErr       result = withUnsafePointer(&outputABSD) { outputABSDPtr in           return withUnsafePointer(&inputABSD) { inputABSDPtr in           return AudioConverterNewSpecific(inputABSDPtr,                                           outputABSDPtr,                                           UInt32(classDescriptions.count),                                           &classDescriptions,                                           &converter)           }       }        if result != noErr { throw RecordError.Unknown }        // At this point I made an attempt to retrieve priming info from       // the audio converter assuming that it will give me back default values       // I can use, but ended up with `nil`       var primeInfo: AudioConverterPrimeInfo? = nil       var primeInfoSize = UInt32(sizeof(AudioConverterPrimeInfo))        // The following returns a `noErr` but `primeInfo` is still `nil``       AudioConverterGetProperty(converter,                                kAudioConverterPrimeInfo,                               &primeInfoSize,                                &primeInfo)        // I've also tried to set `kAudioConverterPrimeInfo` so that it knows       // the leading frames that are being primed, but the set didn't seem to work       // (`noErr` but getting the property afterwards still returned `nil`)   }    let converter = converter!    // Need to give a big enough output buffer.   // The assumption is that it will always be <= to the input size   let numSamples = CMSampleBufferGetNumSamples(buffer)   // This becomes 1024 * 2 = 2048   let outputBufferSize = numSamples * Int(inputABSD.mBytesPerPacket)   let outputBufferPtr = UnsafeMutablePointer<Void>.alloc(outputBufferSize)    defer {       outputBufferPtr.destroy()       outputBufferPtr.dealloc(1)   }    var result = noErr    var outputPacketCount = UInt32(1)   var outputData = AudioBufferList(   mNumberBuffers: 1,   mBuffers: AudioBuffer(                   mNumberChannels: outputABSD.mChannelsPerFrame,                   mDataByteSize: UInt32(outputBufferSize),                   mData: outputBufferPtr))    // See below for `EncodeAudioUserData`   var userData = EncodeAudioUserData(inputSampleBuffer: buffer,                                       inputBytesPerPacket: inputABSD.mBytesPerPacket)    withUnsafeMutablePointer(&userData) { userDataPtr in       // See below for `fetchAudioProc`       result = AudioConverterFillComplexBuffer(                       converter,                       fetchAudioProc,                       userDataPtr,                       &outputPacketCount,                       &outputData,                       nil)   }    if result != noErr {       Log.error?.message("Error while trying to encode audio buffer, code: \(result)")       return nil   }    // See below for `CMSampleBufferCreateCopy`   guard let newBuffer = CMSampleBufferCreateCopy(buffer,                                                   fromAudioBufferList: &outputData,                                                   newFromatDescription: outputFormatDescription) else {       Log.error?.message("Could not create sample buffer from audio buffer list")       return nil   }    if !primed {       primed = true       // Simply picked 2112 samples based on convention, is there a better way to determine this?       let samplesToPrime: Int64 = 2112       let samplesPerSecond = Int32(settings.audioOutputABSD.mSampleRate)       let primingDuration = CMTimeMake(samplesToPrime, samplesPerSecond)        // Without setting the attachment the asset writer will complain about the       // first buffer missing the `TrimDurationAtStart` attachment, is there are way       // to infer the value from the given `AudioBufferList`?       CMSetAttachment(newBuffer,                       kCMSampleBufferAttachmentKey_TrimDurationAtStart,                       CMTimeCopyAsDictionary(primingDuration, nil),                       kCMAttachmentMode_ShouldNotPropagate)   }    return newBuffer  } 

Below is the proc that fetches samples for the audio converter, and the data structure that gets passed to it:

private class EncodeAudioUserData {   var inputSampleBuffer: CMSampleBuffer?   var inputBytesPerPacket: UInt32    init(inputSampleBuffer: CMSampleBuffer,       inputBytesPerPacket: UInt32) {       self.inputSampleBuffer   = inputSampleBuffer       self.inputBytesPerPacket = inputBytesPerPacket   } }  private let fetchAudioProc: AudioConverterComplexInputDataProc = {   (inAudioConverter,   ioDataPacketCount,   ioData,   outDataPacketDescriptionPtrPtr,   inUserData) in    var result = noErr    if ioDataPacketCount.memory == 0 { return noErr }    let userData = UnsafeMutablePointer<EncodeAudioUserData>(inUserData).memory    // If its already been processed   guard let buffer = userData.inputSampleBuffer else {       ioDataPacketCount.memory = 0       return -1   }    var inputBlockBuffer: CMBlockBuffer?   var inputBufferList = AudioBufferList()   result = CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(               buffer,               nil,               &inputBufferList,               sizeof(AudioBufferList),               nil,               nil,               0,               &inputBlockBuffer)    if result != noErr {       Log.error?.message("Error while trying to retrieve buffer list, code: \(result)")       ioDataPacketCount.memory = 0       return result   }    let packetsCount = inputBufferList.mBuffers.mDataByteSize / userData.inputBytesPerPacket   ioDataPacketCount.memory = packetsCount    ioData.memory.mBuffers.mNumberChannels = inputBufferList.mBuffers.mNumberChannels   ioData.memory.mBuffers.mDataByteSize = inputBufferList.mBuffers.mDataByteSize   ioData.memory.mBuffers.mData = inputBufferList.mBuffers.mData    if outDataPacketDescriptionPtrPtr != nil {       outDataPacketDescriptionPtrPtr.memory = nil   }    return noErr } 

This is how I am converting AudioBufferLists to CMSampleBuffers:

public func CMSampleBufferCreateCopy(     buffer: CMSampleBuffer,     inout fromAudioBufferList bufferList: AudioBufferList,     newFromatDescription formatDescription: CMFormatDescription? = nil)     -> CMSampleBuffer? {    var result = noErr    var sizeArray: [Int] = [Int(bufferList.mBuffers.mDataByteSize)]   // Copy timing info from the previous buffer   var timingInfo = CMSampleTimingInfo()   result = CMSampleBufferGetSampleTimingInfo(buffer, 0, &timingInfo)    if result != noErr { return nil }    var newBuffer: CMSampleBuffer?   result = CMSampleBufferCreateReady(       kCFAllocatorDefault,       nil,       formatDescription ?? CMSampleBufferGetFormatDescription(buffer),       Int(bufferList.mNumberBuffers),       1, &timingInfo,       1, &sizeArray,       &newBuffer)    if result != noErr { return nil }   guard let b = newBuffer else { return nil }    CMSampleBufferSetDataBufferFromAudioBufferList(b, nil, nil, 0, &bufferList)   return newBuffer  } 

Is there anything that I am obviously doing wrong? Is there a proper way to construct CMSampleBuffers from AudioBufferList? How do you transfer priming information from the converter to CMSampleBuffers that you create?

For my use case I need to do the encoding manually as the buffers will be manipulated further down the pipeline (although I've disabled all transformations after the encode in order to make sure that it works.)

Any help would be much appreciated. Sorry that there's so much code to digest, but I wanted to provide as much context as possible.

Thanks in advance :)


Some related questions:

Some references I've used:

0 Answers

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment