这可能不是一个答案,这只是我在开始研究@jonnor 和@paul-john-leonard 的答案之前到达的地方。
我正在查看您可以通过使用 librosastft
和获得的频谱图amplitude_to_db
,并认为如果我将进入图表的数据进行一些四舍五入,我可能会发现正在播放的 1 声音效果:
https://librosa.github.io/librosa/generated/librosa.display.specshow.html
我在下面编写的代码很有效;虽然它:
确实会返回很多误报,这可以通过调整被认为匹配的参数来解决。
我需要用可以一次性解析、舍入和进行匹配检查的东西替换 librosa 函数;作为一个 3 小时的音频文件,导致 python 在具有 16GB RAM 的计算机上用完大约 30 分钟后内存不足,甚至还没有达到舍入位。
import sys
import numpy
import librosa
#--------------------------------------------------
if len(sys.argv) == 3:
source_path = sys.argv[1]
sample_path = sys.argv[2]
else:
print('Missing source and sample files as arguments');
sys.exit()
#--------------------------------------------------
print('Load files')
source_series, source_rate = librosa.load(source_path) # The 3 hour file
sample_series, sample_rate = librosa.load(sample_path) # The 1 second file
source_time_total = float(len(source_series) / source_rate);
#--------------------------------------------------
print('Parse Data')
source_data_raw = librosa.amplitude_to_db(abs(librosa.stft(source_series, hop_length=64)))
sample_data_raw = librosa.amplitude_to_db(abs(librosa.stft(sample_series, hop_length=64)))
sample_height = sample_data_raw.shape[0]
#--------------------------------------------------
print('Round Data') # Also switches X and Y indexes, so X becomes time.
def round_data(raw, height):
length = raw.shape[1]
data = [];
range_length = range(1, (length - 1))
range_height = range(1, (height - 1))
for x in range_length:
x_data = []
for y in range_height:
# neighbours = []
# for a in [(x - 1), x, (x + 1)]:
# for b in [(y - 1), y, (y + 1)]:
# neighbours.append(raw[b][a])
#
# neighbours = (sum(neighbours) / len(neighbours));
#
# x_data.append(round(((raw[y][x] + raw[y][x] + neighbours) / 3), 2))
x_data.append(round(raw[y][x], 2))
data.append(x_data)
return data
source_data = round_data(source_data_raw, sample_height)
sample_data = round_data(sample_data_raw, sample_height)
#--------------------------------------------------
sample_data = sample_data[50:268] # Temp: Crop the sample_data (318 to 218)
#--------------------------------------------------
source_length = len(source_data)
sample_length = len(sample_data)
sample_height -= 2;
source_timing = float(source_time_total / source_length);
#--------------------------------------------------
print('Process series')
hz_diff_match = 18 # For every comparison, how much of a difference is still considered a match - With the Source, using Sample 2, the maximum diff was 66.06, with an average of ~9.9
hz_match_required_switch = 30 # After matching "start" for X, drop to the lower "end" requirement
hz_match_required_start = 850 # Out of a maximum match value of 1023
hz_match_required_end = 650
hz_match_required = hz_match_required_start
source_start = 0
sample_matched = 0
x = 0;
while x < source_length:
hz_matched = 0
for y in range(0, sample_height):
diff = source_data[x][y] - sample_data[sample_matched][y];
if diff < 0:
diff = 0 - diff
if diff < hz_diff_match:
hz_matched += 1
# print(' {} Matches - {} @ {}'.format(sample_matched, hz_matched, (x * source_timing)))
if hz_matched >= hz_match_required:
sample_matched += 1
if sample_matched >= sample_length:
print(' Found @ {}'.format(source_start * source_timing))
sample_matched = 0 # Prep for next match
hz_match_required = hz_match_required_start
elif sample_matched == 1: # First match, record where we started
source_start = x;
if sample_matched > hz_match_required_switch:
hz_match_required = hz_match_required_end # Go to a weaker match requirement
elif sample_matched > 0:
# print(' Reset {} / {} @ {}'.format(sample_matched, hz_matched, (source_start * source_timing)))
x = source_start # Matched something, so try again with x+1
sample_matched = 0 # Prep for next match
hz_match_required = hz_match_required_start
x += 1
#--------------------------------------------------