我已经使用了BERT
withHuggingFace
和,PyTorch
用于培训和评估。下面是代码:DataLoader
Serializer
! pip install transformers==3.5.1
from transformers import AutoModel, BertTokenizerFast
bert = AutoModel.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')
def textToTensor(text,labels=None,paddingLength=30):
tokens = tokenizer.batch_encode_plus(text.tolist(), max_length=paddingLength, padding='max_length', truncation=True)
text_seq = torch.tensor(tokens['input_ids'])
text_mask = torch.tensor(tokens['attention_mask'])
text_y = None
if isinstance(labels,np.ndarray): # if we do not have y values
text_y = torch.tensor(labels.tolist())
return text_seq, text_mask, text_y
text = test_df['text'].values
seq,mask,_ = textToTensor(text,paddingLength=35)
data = TensorDataset(seq,mask)
dataloader = DataLoader(data,batch_size=1)
for step,batch in enumerate(dataloader):
batch = [t.to(device) for t in batch]
sent_id, mask = batch
with torch.no_grad():
print(np.argmax(model(sent_id, mask).detach().cpu().numpy(),1))
结果它给了我一个numpy array
结果,并且由于在这个中使用了batch_size=1
和否Serializer
,我得到的结果是作为类预测的单个数组编号。
我有两个问题:
结果是否严格按照 的指标df['text']
?
**我怎样才能得到一个句子的预测,比如你好做我的预测。我是单句?
有人可以帮我做一个预测吗?