古詩詞大全網 - 成語用法 - AAC音頻編碼 相關的原理和設置

AAC音頻編碼 相關的原理和設置

AAC(Advanced Audio Coding),中文名:高級 音頻 編碼 ,出現於1997年,基於 MPEG-2 的音頻編碼技術。由Fraunhofer IIS、 杜比實驗室 、 AT&T 、 Sony 等公司***同開發,目的是取代 MP3 格式。2000年, MPEG-4 標準出現後,AAC重新集成了其特性,加入了SBR技術和PS技術,為了區別於傳統的MPEG-2 AAC又稱為MPEG-4 AAC。

iOS平臺支持AAC編碼器,主要使用AudioToolbox中的AudioConverter API。之所以做AAC編碼器是因為在做壹個HLS的功能,HLS要求的TS文件,需要視頻采用H264編碼,音頻采用AAC編碼。H264可以使用硬件或軟件編碼器,前面已經介紹。AAC也可以使用硬件或者軟件編碼,iOS全都支持。

首先需要創建壹個Converter,也就是壹個AAC Encoder,使用如下接口:

extern OSStatus

AudioConverterNew( ? const AudioStreamBasicDescription*? inSourceFormat,

const AudioStreamBasicDescription*? inDestinationFormat,

AudioConverterRef* ? outAudioConverter) ? __OSX_AVAILABLE_STARTING(__MAC_10_1,__IPHONE_2_0);

輸入參數分別是源和目的的數據格式。

在AAC編碼的場景下,源格式就是采集到的PCM數據,目的格式就是AAC。

AudioStreamBasicDescription inAudioStreamBasicDescription;

// FillOutASBDForLPCM()

inAudioStreamBasicDescription.mFormatID = kAudioFormatLinearPCM;

inAudioStreamBasicDescription.mSampleRate = 44100;

inAudioStreamBasicDescription.mBitsPerChannel = 16;

inAudioStreamBasicDescription.mFramesPerPacket = 1;

inAudioStreamBasicDescription.mBytesPerFrame = 2;

inAudioStreamBasicDescription.mBytesPerPacket = inAudioStreamBasicDescription.mBytesPerFrame * inAudioStreamBasicDescription.mFramesPerPacket;

inAudioStreamBasicDescription.mChannelsPerFrame = 1;

inAudioStreamBasicDescription.mFormatFlags = kLinearPCMFormatFlagIsPacked | kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsNonInterleaved;

inAudioStreamBasicDescription.mReserved = 0;

AudioStreamBasicDescription outAudioStreamBasicDescription = {0}; // Always initialize the fields of a new audio stream basic description structure to zero, as shown here: ...

outAudioStreamBasicDescription.mChannelsPerFrame = 1;

outAudioStreamBasicDescription.mFormatID = kAudioFormatMPEG4AAC;

UInt32 size = sizeof(outAudioStreamBasicDescription);

AudioFormatGetProperty(kAudioFormatProperty_FormatInfo, 0, NULL, &size, &outAudioStreamBasicDescription);

OSStatus status = AudioConverterNew(&inAudioStreamBasicDescription, &outAudioStreamBasicDescription, &_audioConverter);

if(status != 0) {NSLog(@"setup converter failed: %d", (int)status);}

這樣就創建了AAC編碼器,默認情況下,Apple會創建壹個硬件編碼器,如果硬件不可用,會創建軟件編碼器。

經過我的測試,硬件AAC編碼器的編碼時延很高,需要buffer大約2秒的數據才會開始編碼。而軟件編碼器的編碼時延就是正常的,只要餵給1024個樣點,就會開始編碼。

那麽如何在創建的時候指定使用軟件編碼器呢?需要用到下面的接口:

- (AudioClassDescription *)getAudioClassDescriptionWithType:(UInt32)type

fromManufacturer:(UInt32)manufacturer

{

static AudioClassDescription desc;

UInt32 encoderSpecifier = type;

OSStatus st;

UInt32 size;

st = AudioFormatGetPropertyInfo(kAudioFormatProperty_Encoders,

sizeof(encoderSpecifier),

&encoderSpecifier,

&size);

if (st) {

NSLog(@"error getting audio format propery info: %d", (int)(st));

return nil;

}

unsigned int count = size / sizeof(AudioClassDescription);

AudioClassDescription descriptions[count];

st = AudioFormatGetProperty(kAudioFormatProperty_Encoders,

sizeof(encoderSpecifier),

&encoderSpecifier,

&size,

descriptions);

if (st) {

NSLog(@"error getting audio format propery: %d", (int)(st));

return nil;

}

for (unsigned int i = 0; i < count; i++) {

if ((type == descriptions[i].mSubType) &&

(manufacturer == descriptions[i].mManufacturer)) {

memcpy(&desc, &(descriptions[i]), sizeof(desc));

return &desc;

}

}

return nil;

}

AudioClassDescription *desc = [self getAudioClassDescriptionWithType:kAudioFormatMPEG4AAC

fromManufacturer:kAppleSoftwareAudioCodecManufacturer];

OSStatus status = AudioConverterNewSpecific(&inAudioStreamBasicDescription, &outAudioStreamBasicDescription, 1, desc, &_audioConverter);

如果要正確的編碼,編碼碼率參數是必須設置的。否則編碼時會返回560226676錯誤碼(!dat)。

UInt32 ulBitRate = 64000;

UInt32 ulSize = sizeof(ulBitRate);

status = AudioConverterSetProperty(_audioConverter, kAudioConverterEncodeBitRate, ulSize, &ulBitRate);

需要註意,AAC並不是隨便的碼率都可以支持。比如如果PCM采樣率是44100KHz,那麽碼率可以設置64000bps,如果是16K,可以設置為32000bps。

創建完成Converter和設置完Bitrate之後,可以查詢壹下最大編碼輸出的大小,後續會用到。

UInt32 value = 0;

size = sizeof(value);

AudioConverterGetProperty(_audioConverter, kAudioConverterPropertyMaximumOutputPacketSize, &size, &value);

獲取出來的Value表示編碼器最大輸出的包大小。

然後調用AudioConverterFillCOmplexBuffer進行編碼:

AudioBufferList outAudioBufferList = {0};

outAudioBufferList.mNumberBuffers = 1;

outAudioBufferList.mBuffers[0].mNumberChannels = 1;

outAudioBufferList.mBuffers[0].mDataByteSize = value;//value是上面查詢到的值

outAudioBufferList.mBuffers[0].mData = new int8[value];

UInt32 ioOutputDataPacketSize = 1;

status = AudioConverterFillComplexBuffer(_audioConverter, inInputDataProc, (__bridge void *)(self), &ioOutputDataPacketSize, &outAudioBufferList, NULL);

編碼接口中,inInputDataProc是壹個輸入數據的回調函數。用來餵PCM數據給Converter,ioOutputDataPacketSize為1表示編碼產生1幀數據即返回。outAudioBufferList用來存放編碼後的數據。

inInputDataProc中的處理如下:

static OSStatus inInputDataProc(AudioConverterRef inAudioConverter, UInt32 *ioNumberDataPackets, AudioBufferList *ioData, AudioStreamPacketDescription **outDataPacketDescription, void *inUserData)

{

AACEncoder *encoder = (__bridge AACEncoder *)(inUserData);

UInt32 requestedPackets = *ioNumberDataPackets;

uint8_t *buffer;

uint32_t bufferLength = requestedPackets * 2;

uint32_t bufferRead;

bufferRead = [encoder.pcmPool readBuffer:&buffer withLength:bufferLength];

if (bufferRead == 0) {

*ioNumberDataPackets = 0;

return -1;

}

ioData->mBuffers[0].mData = buffer;

ioData->mBuffers[0].mDataByteSize = bufferRead;

ioData->mNumberBuffers = 1;

ioData->mBuffers[0].mNumberChannels = 1;

*ioNumberDataPackets = bufferRead >> 1;

return noErr;

}

pcmPool是壹個用於存放PCM數據的環形緩沖區。

因為采集輸入每次不壹定有1024樣點,所以可以將數據緩存起來,再滿足1024樣點時再調用編碼。

另外,對於TS文件來說,每個AAC數據需要增加壹個adts頭,adts頭是壹個7bit的數據,通過adts可以得知AAC數據的編碼參數,方便解碼器進行解碼。

adts頭的計算方法如下:

- (NSData*) adtsDataForPacketLength:(NSUInteger)packetLength {

int adtsLength = 7;

char *packet = (char *)malloc(sizeof(char) * adtsLength);

// Variables Recycled by addADTStoPacket

int profile = 2;? //AAC LC

//39=MediaCodecInfo.CodecProfileLevel.AACObjectELD;

int freqIdx = 8;? //16KHz

int chanCfg = 1;? //MPEG-4 Audio Channel Configuration. 1 Channel front-center

NSUInteger fullLength = adtsLength + packetLength;

// fill in ADTS data

packet[0] = (char)0xFF; // 11111111? = syncword

packet[1] = (char)0xF9; // 1111 1 00 1? = syncword MPEG-2 Layer CRC

packet[2] = (char)(((profile-1)<<6) + (freqIdx<<2) +(chanCfg>>2));

packet[3] = (char)(((chanCfg&3)<<6) + (fullLength>>11));

packet[4] = (char)((fullLength&0x7FF) >> 3);

packet[5] = (char)(((fullLength&7)<<5) + 0x1F);

packet[6] = (char)0xFC;

NSData *data = [NSData dataWithBytesNoCopy:packet length:adtsLength freeWhenDone:YES];

return data;

}