0

我知道 GPT 的注意力是 Masked Multihead Attention。我有一个预训练的 GPT 模型,我想为 EncoderDecoderModel 的编码器分配它的权重。喜欢:

model = EncoderDecoderModel.from_encoder_decoder_pretrained("pretrained_gpt_name","pretrained_for_decoder" , tie_encoder_decoder=True)
# Change masked attention to self attention here to make encoder bidirectional and copy weights.

我对变形金刚和 pytorch 很陌生。

4

0 回答 0