2

我正在使用 Python 将元素与音乐隔离开来。训练一个模型,我将我的音频分成几帧,每帧都有一个标签 - 1 或 0。不幸的是,由于舍入错误,我的标签总是短 1 或 2 帧。

将我的音频转换为帧,我得到一个值 (13, 3709)

    s = [] 
    for y in audio:
        mfcc = librosa.feature.mfcc(y= y, sr = 16000, n_mfcc=13, n_fft=2048, hop_length = 1024)
        s.append(mfcc)

将我的文本文件(对于我正在使用的 mp3)从毫秒转换为帧数,我得到一个向量值 3708。

    output = []              
    for block in textCorpus:
        block_start = int(float(block[0]) * 16000 / 1024)   # Converted to frame number
        block_end = int(float(block[1]) * 16000 / 1024)     # Converted to frame number
        singing = block[2]
        block_range = np.arange(block_start, block_end, 1)  # Step size is 1 (per frame number)
# extraneous code 

我曾尝试使用Decimal,math.floor以及math.ceil在我的block_startblock_stop变量中,但我似乎无法匹配我的音频帧长度。

4

2 回答 2

1

Use the Fraction package in the standard library: https://docs.python.org/2/library/fractions.html

It is useful for exact rational number arithmetic.

于 2018-04-05T18:43:25.750 回答
0

如果您按顺序排列块,也许您可​​以放弃乘法和除法,而只需通过简单的加法来解决它们:

def labelToFrames(textCorpus):
    output    = []  
    offset    = 0
    increment = 0.064           # or 1024/16000      
    for block in textCorpus:
        block_start = block[0]   
        block_end   = block[1]    
        singing     = block[2]
        while offset < block_end:
            ms_start = '{0:.3f}'.format(offset) 
            offset   = min(block_end,offset + increment)          
            ms_end   = '{0:.3f}'.format(offset)   
            add_to_output = [ms_start, ms_end, singing]
            output.append(add_to_output)
    return output
于 2018-04-06T01:31:48.210 回答