I'm struggling to encode audio buffers received from AVCaptureSession
using AudioConverter
and then appending them to an AVAssetWriter
.
I'm not getting any errors (including OSStatus responses), and the CMSampleBuffer
s generated seem to have valid data, however the resulting file simply does not have any playable audio. When writing together with video, the video frames stop getting appended a couple of frames in (appendSampleBuffer()
returns false, but with no AVAssetWriter.error
), probably because the asset writer is waiting for the audio to catch up. I suspect it's related to the way I'm setting up the priming for AAC.
The app uses RxSwift, but I've removed the RxSwift parts so that it's easier to understand for a wider audience.
Please check out comments in the code below for more... comments
Given a settings struct:
import Foundation import AVFoundation import CleanroomLogger public struct AVSettings { let orientation: AVCaptureVideoOrientation = .Portrait let sessionPreset = AVCaptureSessionPreset1280x720 let videoBitrate: Int = 2_000_000 let videoExpectedFrameRate: Int = 30 let videoMaxKeyFrameInterval: Int = 60 let audioBitrate: Int = 32 * 1024 /// Settings that are `0` means variable rate. /// The `mSampleRate` and `mChennelsPerFrame` is overwritten at run-time /// to values based on the input stream. let audioOutputABSD = AudioStreamBasicDescription( mSampleRate: AVAudioSession.sharedInstance().sampleRate, mFormatID: kAudioFormatMPEG4AAC, mFormatFlags: UInt32(MPEG4ObjectID.AAC_Main.rawValue), mBytesPerPacket: 0, mFramesPerPacket: 1024, mBytesPerFrame: 0, mChannelsPerFrame: 1, mBitsPerChannel: 0, mReserved: 0) let audioEncoderClassDescriptions = [ AudioClassDescription( mType: kAudioEncoderComponentType, mSubType: kAudioFormatMPEG4AAC, mManufacturer: kAppleSoftwareAudioCodecManufacturer) ] }
Some helper functions:
public func getVideoDimensions(fromSettings settings: AVSettings) -> (Int, Int) { switch (settings.sessionPreset, settings.orientation) { case (AVCaptureSessionPreset1920x1080, .Portrait): return (1080, 1920) case (AVCaptureSessionPreset1280x720, .Portrait): return (720, 1280) default: fatalError("Unsupported session preset and orientation") } } public func createAudioFormatDescription(fromSettings settings: AVSettings) -> CMAudioFormatDescription { var result = noErr var absd = settings.audioOutputABSD var description: CMAudioFormatDescription? withUnsafePointer(&absd) { absdPtr in result = CMAudioFormatDescriptionCreate(nil, absdPtr, 0, nil, 0, nil, nil, &description) } if result != noErr { Log.error?.message("Could not create audio format description") } return description! } public func createVideoFormatDescription(fromSettings settings: AVSettings) -> CMVideoFormatDescription { var result = noErr var description: CMVideoFormatDescription? let (width, height) = getVideoDimensions(fromSettings: settings) result = CMVideoFormatDescriptionCreate(nil, kCMVideoCodecType_H264, Int32(width), Int32(height), [:], &description) if result != noErr { Log.error?.message("Could not create video format description") } return description! }
This is how the asset writer is initialized:
guard let audioDevice = defaultAudioDevice() else { throw RecordError.MissingDeviceFeature("Microphone") } guard let videoDevice = defaultVideoDevice(.Back) else { throw RecordError.MissingDeviceFeature("Camera") } let videoInput = try AVCaptureDeviceInput(device: videoDevice) let audioInput = try AVCaptureDeviceInput(device: audioDevice) let videoFormatHint = createVideoFormatDescription(fromSettings: settings) let audioFormatHint = createAudioFormatDescription(fromSettings: settings) let writerVideoInput = AVAssetWriterInput(mediaType: AVMediaTypeVideo, outputSettings: nil, sourceFormatHint: videoFormatHint) let writerAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio, outputSettings: nil, sourceFormatHint: audioFormatHint) writerVideoInput.expectsMediaDataInRealTime = true writerAudioInput.expectsMediaDataInRealTime = true let url = NSURL(fileURLWithPath: NSTemporaryDirectory(), isDirectory: true) .URLByAppendingPathComponent(NSProcessInfo.processInfo().globallyUniqueString) .URLByAppendingPathExtension("mp4") let assetWriter = try AVAssetWriter(URL: url, fileType: AVFileTypeMPEG4) if !assetWriter.canAddInput(writerVideoInput) { throw RecordError.Unknown("Could not add video input") } if !assetWriter.canAddInput(writerAudioInput) { throw RecordError.Unknown("Could not add audio input") } assetWriter.addInput(writerVideoInput) assetWriter.addInput(writerAudioInput)
And this is how audio samples are being encoded, problem area is most likely to be around here. I've re-written this so that it doesn't use any Rx-isms.
var outputABSD = settings.audioOutputABSD var outputFormatDescription: CMAudioFormatDescription! = nil CMAudioFormatDescriptionCreate(nil, &outputABSD, 0, nil, 0, nil, nil, &formatDescription) var converter: AudioConverter? // Indicates whether priming information has been attached to the first buffer var primed = false func encodeAudioBuffer(settings: AVSettings, buffer: CMSampleBuffer) throws -> CMSampleBuffer? { // Create the audio converter if it's not available if converter == nil { var classDescriptions = settings.audioEncoderClassDescriptions var inputABSD = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(buffer)!).memory var outputABSD = settings.audioOutputABSD outputABSD.mSampleRate = inputABSD.mSampleRate outputABSD.mChannelsPerFrame = inputABSD.mChannelsPerFrame var converter: AudioConverterRef = nil var result = noErr result = withUnsafePointer(&outputABSD) { outputABSDPtr in return withUnsafePointer(&inputABSD) { inputABSDPtr in return AudioConverterNewSpecific(inputABSDPtr, outputABSDPtr, UInt32(classDescriptions.count), &classDescriptions, &converter) } } if result != noErr { throw RecordError.Unknown } // At this point I made an attempt to retrieve priming info from // the audio converter assuming that it will give me back default values // I can use, but ended up with `nil` var primeInfo: AudioConverterPrimeInfo? = nil var primeInfoSize = UInt32(sizeof(AudioConverterPrimeInfo)) // The following returns a `noErr` but `primeInfo` is still `nil`` AudioConverterGetProperty(converter, kAudioConverterPrimeInfo, &primeInfoSize, &primeInfo) // I've also tried to set `kAudioConverterPrimeInfo` so that it knows // the leading frames that are being primed, but the set didn't seem to work // (`noErr` but getting the property afterwards still returned `nil`) } let converter = converter! // Need to give a big enough output buffer. // The assumption is that it will always be <= to the input size let numSamples = CMSampleBufferGetNumSamples(buffer) // This becomes 1024 * 2 = 2048 let outputBufferSize = numSamples * Int(inputABSD.mBytesPerPacket) let outputBufferPtr = UnsafeMutablePointer<Void>.alloc(outputBufferSize) defer { outputBufferPtr.destroy() outputBufferPtr.dealloc(1) } var result = noErr var outputPacketCount = UInt32(1) var outputData = AudioBufferList( mNumberBuffers: 1, mBuffers: AudioBuffer( mNumberChannels: outputABSD.mChannelsPerFrame, mDataByteSize: UInt32(outputBufferSize), mData: outputBufferPtr)) // See below for `EncodeAudioUserData` var userData = EncodeAudioUserData(inputSampleBuffer: buffer, inputBytesPerPacket: inputABSD.mBytesPerPacket) withUnsafeMutablePointer(&userData) { userDataPtr in // See below for `fetchAudioProc` result = AudioConverterFillComplexBuffer( converter, fetchAudioProc, userDataPtr, &outputPacketCount, &outputData, nil) } if result != noErr { Log.error?.message("Error while trying to encode audio buffer, code: \(result)") return nil } // See below for `CMSampleBufferCreateCopy` guard let newBuffer = CMSampleBufferCreateCopy(buffer, fromAudioBufferList: &outputData, newFromatDescription: outputFormatDescription) else { Log.error?.message("Could not create sample buffer from audio buffer list") return nil } if !primed { primed = true // Simply picked 2112 samples based on convention, is there a better way to determine this? let samplesToPrime: Int64 = 2112 let samplesPerSecond = Int32(settings.audioOutputABSD.mSampleRate) let primingDuration = CMTimeMake(samplesToPrime, samplesPerSecond) // Without setting the attachment the asset writer will complain about the // first buffer missing the `TrimDurationAtStart` attachment, is there are way // to infer the value from the given `AudioBufferList`? CMSetAttachment(newBuffer, kCMSampleBufferAttachmentKey_TrimDurationAtStart, CMTimeCopyAsDictionary(primingDuration, nil), kCMAttachmentMode_ShouldNotPropagate) } return newBuffer }
Below is the proc that fetches samples for the audio converter, and the data structure that gets passed to it:
private class EncodeAudioUserData { var inputSampleBuffer: CMSampleBuffer? var inputBytesPerPacket: UInt32 init(inputSampleBuffer: CMSampleBuffer, inputBytesPerPacket: UInt32) { self.inputSampleBuffer = inputSampleBuffer self.inputBytesPerPacket = inputBytesPerPacket } } private let fetchAudioProc: AudioConverterComplexInputDataProc = { (inAudioConverter, ioDataPacketCount, ioData, outDataPacketDescriptionPtrPtr, inUserData) in var result = noErr if ioDataPacketCount.memory == 0 { return noErr } let userData = UnsafeMutablePointer<EncodeAudioUserData>(inUserData).memory // If its already been processed guard let buffer = userData.inputSampleBuffer else { ioDataPacketCount.memory = 0 return -1 } var inputBlockBuffer: CMBlockBuffer? var inputBufferList = AudioBufferList() result = CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer( buffer, nil, &inputBufferList, sizeof(AudioBufferList), nil, nil, 0, &inputBlockBuffer) if result != noErr { Log.error?.message("Error while trying to retrieve buffer list, code: \(result)") ioDataPacketCount.memory = 0 return result } let packetsCount = inputBufferList.mBuffers.mDataByteSize / userData.inputBytesPerPacket ioDataPacketCount.memory = packetsCount ioData.memory.mBuffers.mNumberChannels = inputBufferList.mBuffers.mNumberChannels ioData.memory.mBuffers.mDataByteSize = inputBufferList.mBuffers.mDataByteSize ioData.memory.mBuffers.mData = inputBufferList.mBuffers.mData if outDataPacketDescriptionPtrPtr != nil { outDataPacketDescriptionPtrPtr.memory = nil } return noErr }
This is how I am converting AudioBufferList
s to CMSampleBuffer
s:
public func CMSampleBufferCreateCopy( buffer: CMSampleBuffer, inout fromAudioBufferList bufferList: AudioBufferList, newFromatDescription formatDescription: CMFormatDescription? = nil) -> CMSampleBuffer? { var result = noErr var sizeArray: [Int] = [Int(bufferList.mBuffers.mDataByteSize)] // Copy timing info from the previous buffer var timingInfo = CMSampleTimingInfo() result = CMSampleBufferGetSampleTimingInfo(buffer, 0, &timingInfo) if result != noErr { return nil } var newBuffer: CMSampleBuffer? result = CMSampleBufferCreateReady( kCFAllocatorDefault, nil, formatDescription ?? CMSampleBufferGetFormatDescription(buffer), Int(bufferList.mNumberBuffers), 1, &timingInfo, 1, &sizeArray, &newBuffer) if result != noErr { return nil } guard let b = newBuffer else { return nil } CMSampleBufferSetDataBufferFromAudioBufferList(b, nil, nil, 0, &bufferList) return newBuffer }
Is there anything that I am obviously doing wrong? Is there a proper way to construct CMSampleBuffer
s from AudioBufferList
? How do you transfer priming information from the converter to CMSampleBuffer
s that you create?
For my use case I need to do the encoding manually as the buffers will be manipulated further down the pipeline (although I've disabled all transformations after the encode in order to make sure that it works.)
Any help would be much appreciated. Sorry that there's so much code to digest, but I wanted to provide as much context as possible.
Thanks in advance :)
Some related questions:
- CMSampleBufferRef kCMSampleBufferAttachmentKey_TrimDurationAtStart crash
- Can I use AVCaptureSession to encode an AAC stream to memory?
- Writing video + generated audio to AVAssetWriterInput, audio stuttering
- How do I use CoreAudio's AudioConverter to encode AAC in real-time?
Some references I've used:
0 comments:
Post a Comment