问题
我一直在尝试用 tensorflow 训练 TFBertForMaskedLM 模型。但是当我使用 model.fit() 时总是会遇到一些问题。希望有人可以提供帮助并提出一些解决方案。
参考论文和样本输出
论文题目是“Conditional Bert for Contextual Augmentation”。简而言之,只需将 type_token_ids 更改为 label_ids 即可。如果句子的标签为5,长度为10,max_sequence_length = 16。它将处理输出如下:
input_ids = [101, 523, 791, 3189, 677, 5221, 524, 1920, 686, 102, 0, 0, 0, 0, 0, 0]
attention_mask = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
token_type_ids = [5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 0, 0, 0, 0, 0, 0]
labels = [-100, -100, 791, -100, 677, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]
环境
- 张量流 == 2.2.0
- 拥抱脸 == 3.5.0
- 数据集 == 1.1.2
- 数据集总标签为 5。(1~5)
- 显卡:GCP P100 * 1
数据集输出(max_sequence_length=128,batch_size=1)
{'attention_mask': <tf.Tensor: shape=(128,), dtype=int32, numpy=
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)>,
'input_ids': <tf.Tensor: shape=(128,), dtype=int32, numpy=
array([ 101, 523, 791, 3189, 677, 5221, 524, 1920, 686,
4518, 6240, 103, 2466, 2204, 2695, 100, 519, 5064,
1918, 736, 2336, 520, 103, 2695, 1564, 4923, 8013,
678, 6734, 8038, 8532, 131, 120, 120, 8373, 119,
103, 9989, 103, 8450, 120, 103, 120, 12990, 8921,
8165, 102, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0], dtype=int32)>,
'labels': <tf.Tensor: shape=(128,), dtype=int32, numpy=
array([-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
4634, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
4158, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, 8429, -100, 119, -100, -100, 100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100, -100, -100, -100], dtype=int32)>,
'token_type_ids': <tf.Tensor: shape=(128,), dtype=int32, numpy=
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)>}
型号代码
from transformers import AdamWeightDecay, TFBertForMaskedLM, BertConfig
def create_model():
configuration = BertConfig.from_pretrained('bert-base-chinese')
model = TFBertForMaskedLM.from_pretrained('bert-base-chinese',
config=configuration)
model.bert.embeddings.token_type_embeddings = tf.keras.layers.Embedding(5, 768,
embeddings_initializer=tf.keras.initializers.TruncatedNormal(stddev=0.02))
return model
model = create_model()
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metrics = [tf.keras.metrics.Mean(), tf.keras.metrics.SparseCategoricalAccuracy('accuracy')]
model.compile(optimizer = optimizer,
loss = loss,
metrics = metrics)
model.fit(tf_sms_dataset,
epochs=1,
verbose=1)
使用 TFBertForMaskedLM 时的警告消息
Some layers from the model checkpoint at bert-base-chinese were not used when initializing TFBertForMaskedLM: ['nsp___cls']
- This IS expected if you are initializing TFBertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertForMaskedLM were initialized from the model checkpoint at bert-base-chinese.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertForMaskedLM for predictions without further training.
错误信息
ValueError Traceback (most recent call last)
<ipython-input-42-99b78906fef7> in <module>()
5 model.fit(tf_sms_dataset,
6 epochs=1,
----> 7 verbose=1)
10 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self, *args, **kwargs)
64 def _method_wrapper(self, *args, **kwargs):
65 if not self._in_multi_worker_mode(): # pylint: disable=protected-access
---> 66 return method(self, *args, **kwargs)
67
68 # Running inside `run_distribute_coordinator` already.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
846 batch_size=batch_size):
847 callbacks.on_train_batch_begin(step)
--> 848 tmp_logs = train_function(iterator)
849 # Catch OutOfRangeError for Datasets of unknown size.
850 # This blocks until the batch has finished executing.
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
578 xla_context.Exit()
579 else:
--> 580 result = self._call(*args, **kwds)
581
582 if tracing_count == self._get_tracing_count():
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
625 # This is the first call of __call__, so we have to initialize.
626 initializers = []
--> 627 self._initialize(args, kwds, add_initializers_to=initializers)
628 finally:
629 # At this point we know that the initialization is complete (or less
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in _initialize(self, args, kwds, add_initializers_to)
504 self._concrete_stateful_fn = (
505 self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access
--> 506 *args, **kwds))
507
508 def invalid_creator_scope(*unused_args, **unused_kwds):
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
2444 args, kwargs = None, None
2445 with self._lock:
-> 2446 graph_function, _, _ = self._maybe_define_function(args, kwargs)
2447 return graph_function
2448
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
2775
2776 self._function_cache.missed.add(call_context_key)
-> 2777 graph_function = self._create_graph_function(args, kwargs)
2778 self._function_cache.primary[cache_key] = graph_function
2779 return graph_function, args, kwargs
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
2665 arg_names=arg_names,
2666 override_flat_arg_shapes=override_flat_arg_shapes,
-> 2667 capture_by_value=self._capture_by_value),
2668 self._function_attributes,
2669 # Tell the ConcreteFunction to clean up its graph once it goes out of
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
979 _, original_func = tf_decorator.unwrap(python_func)
980
--> 981 func_outputs = python_func(*func_args, **func_kwargs)
982
983 # invariant: `func_outputs` contains only Tensors, CompositeTensors,
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds)
439 # __wrapped__ allows AutoGraph to swap in a converted function. We give
440 # the function a weak reference to itself to avoid a reference cycle.
--> 441 return weak_wrapped_fn().__wrapped__(*args, **kwds)
442 weak_wrapped_fn = weakref.ref(wrapped_fn)
443
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
966 except Exception as e: # pylint:disable=broad-except
967 if hasattr(e, "ag_error_metadata"):
--> 968 raise e.ag_error_metadata.to_exception(e)
969 else:
970 raise
ValueError: in user code:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:571 train_function *
outputs = self.distribute_strategy.run(
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:951 run **
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2290 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2649 _call_for_each_replica
return fn(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:541 train_step **
self.trainable_variables)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1804 _minimize
trainable_variables))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:521 _aggregate_gradients
filtered_grads_and_vars = _filter_grads(grads_and_vars)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:1219 _filter_grads
([v.name for _, v in grads_and_vars],))
ValueError: No gradients provided for any variable: ['tf_bert_for_masked_lm_2/bert/embeddings/word_embeddings/weight:0', 'tf_bert_for_masked_lm_2/bert/embeddings/position_embeddings/embeddings:0', 'tf_bert_for_masked_lm_2/bert/embeddings/LayerNorm/gamma:0', 'tf_bert_for_masked_lm_2/bert/embeddings/LayerNorm/beta:0', 'tf_bert_for_masked_lm_2/bert/embeddings/embedding_1/embeddings:0', 'tf_bert_for_masked_lm_2/bert/encoder/layer_._0/attention/self/query/kernel:0', 'tf_bert_for_masked_lm_2/bert/encoder/layer_._0/attention/self/query/bias:0', 'tf_bert_for_masked_lm_2/bert/encoder/layer_._0/attention/self/key/kernel:0', 'tf_bert_for_masked_lm_2/bert/encoder/layer_._0/attention/self/key/bias:0', 'tf_bert_for_masked_lm_2/bert/encoder/layer_._0/attention/self/value/kernel:0', 'tf_bert_for_masked_lm_2/bert/encoder/layer_._0/attention/self/value/bias:0', 'tf_bert_for_masked_lm_2/bert/encoder/layer_._0/attention/output/dense/kernel:0', 'tf_bert_for_masked_lm_2/bert/encoder/layer_._0/attention/output/dense/bias:0', 'tf_bert_for_masked_lm_2/bert/encoder/layer_._0/attention/output/LayerNorm/gamma:0', 'tf_bert_for_masked_lm_2/bert/encoder/layer_._0/attention/output/LayerNorm/beta:0', 'tf_bert_for_masked_lm_2/bert/encoder/layer_._0/intermediate/dense/kernel:0', 'tf_bert_for_masked_lm_2/bert/encoder/layer_._0/intermediate/dense/bias:0', 'tf_bert_for_masked_lm_2/bert/encoder/layer_._0/output/dense/kernel:0', 'tf_bert_for_masked_lm_2/bert/encoder/layer_._0/output/dense/bias:0', 'tf_bert_f...
有谁能帮忙。我会非常感谢。
其他测试
我用英语句子来测试。示例如下:
from transformers import TFBertForMaskedLM, BertConfig
def create_model():
configuration = BertConfig.from_pretrained('bert-base-uncased')
model = TFBertForMaskedLM.from_pretrained('bert-base-uncased',
config=configuration)
return model
model = create_model()
eng_tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
token_info = eng_tokenizer(text="We are very happy to show you the Transformers library.", padding='max_length', max_length=20)
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metrics = [tf.keras.metrics.Mean(), tf.keras.metrics.SparseCategoricalAccuracy("acc")]
dataset = tf.data.Dataset.from_tensor_slices(dict(token_info))
dataset = dataset.batch(1).prefetch(tf.data.experimental.AUTOTUNE)
model.compile(optimizer = optimizer,
loss = model.compute_loss,
metrics = metrics)
model.fit(dataset)
token_info 输出数据集
{
'input_ids': [101, 2057, 2024, 2200, 103, 2000, 2265, 2017, 103, 100, 19081, 3075, 1012, 102, 0, 0, 0, 0, 0, 0]
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
'labels': [-100, -100, -100, -100, 3407, -100, -100, -100, 1996, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100]
}
得到同样的错误......
ValueError: in user code:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:571 train_function *
outputs = self.distribute_strategy.run(
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:951 run **
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2290 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2649 _call_for_each_replica
return fn(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:541 train_step **
self.trainable_variables)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:1804 _minimize
trainable_variables))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:521 _aggregate_gradients
filtered_grads_and_vars = _filter_grads(grads_and_vars)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:1219 _filter_grads
([v.name for _, v in grads_and_vars],))
ValueError: No gradients provided for any variable: ['tf_bert_for_masked_lm_2/bert/embeddings/word_embeddings/weight:0', 'tf_bert_for_masked_lm_2/bert/embeddings/position_embeddings/embeddings:0', 'tf_bert_for_masked_lm_2/bert/embeddings/token_type_embeddings/embeddings:0', 'tf_bert_for_masked_lm_2/bert/embeddings/LayerNorm/gamma:0', 'tf_bert_for_masked_lm_2/bert/embeddings/LayerNorm/beta:0',
我不确定将 fit() 集成到模型中是否存在问题?