我一直在尝试训练音频分类模型。当我使用 learning_rate=0.01、momentum=0.0 和 nesterov=False 的 SGD 时,我得到以下损失和准确度图:
我无法弄清楚是什么原因导致损失在 750 轮左右立即减少。我尝试了不同的学习率、动量值及其组合、不同的批量大小、初始层权重等以获得更合适的图表,但完全没有运气. 因此,如果您对导致此问题的原因有任何了解,请告诉我。
我用于此培训的代码如下:
# MFCCs Model
x = tf.keras.layers.Dense(units=512, activation="sigmoid")(mfcc_inputs)
x = tf.keras.layers.Dropout(0.5)(x)
x = tf.keras.layers.Dense(units=256, activation="sigmoid")(x)
x = tf.keras.layers.Dropout(0.5)(x)
# Spectrograms Model
y = tf.keras.layers.Conv2D(32, kernel_size=(3,3), strides=(2,2))(spec_inputs)
y = tf.keras.layers.AveragePooling2D(pool_size=(2,2), strides=(2,2))(y)
y = tf.keras.layers.BatchNormalization()(y)
y = tf.keras.layers.Activation("sigmoid")(y)
y = tf.keras.layers.Conv2D(64, kernel_size=(3,3), strides=(1,1), padding="same")(y)
y = tf.keras.layers.AveragePooling2D(pool_size=(2,2), strides=(2,2))(y)
y = tf.keras.layers.BatchNormalization()(y)
y = tf.keras.layers.Activation("sigmoid")(y)
y = tf.keras.layers.Conv2D(64, kernel_size=(3,3), strides=(1,1), padding="same")(y)
y = tf.keras.layers.AveragePooling2D(pool_size=(2,2), strides=(2,2))(y)
y = tf.keras.layers.BatchNormalization()(y)
y = tf.keras.layers.Activation("sigmoid")(y)
y = tf.keras.layers.Flatten()(y)
y = tf.keras.layers.Dense(units=256, activation="sigmoid")(y)
y = tf.keras.layers.Dropout(0.5)(y)
# Chroma Model
t = tf.keras.layers.Dense(units=512, activation="sigmoid")(chroma_inputs)
t = tf.keras.layers.Dropout(0.5)(t)
t = tf.keras.layers.Dense(units=256, activation="sigmoid")(t)
t = tf.keras.layers.Dropout(0.5)(t)
# Merge Models
concated = tf.keras.layers.concatenate([x, y, t])
# Dense and Output Layers
z = tf.keras.layers.Dense(64, activation="sigmoid")(concated)
z = tf.keras.layers.Dropout(0.5)(z)
z = tf.keras.layers.Dense(64, activation="sigmoid")(z)
z = tf.keras.layers.Dropout(0.5)(z)
z = tf.keras.layers.Dense(1, activation="sigmoid")(z)
mdl = tf.keras.Model(inputs=[mfcc_inputs, spec_inputs, chroma_inputs], outputs=z)
mdl.compile(optimizer=SGD(), loss="binary_crossentropy", metrics=["accuracy"])
mdl.fit([M_train, X_train, C_train], y_train, batch_size=8, epochs=1000, validation_data=([M_val, X_val, C_val], y_val), callbacks=[tensorboard_cb])