我使用 sklearn 创建了一个使用 Tf-Idf 的文本分类器,并且我想使用 BERT 和 Elmo 嵌入而不是 Tf-Idf。
如何做到这一点?
我正在使用以下代码嵌入 Bert:
from flair.data import Sentence
from flair.embeddings import TransformerWordEmbeddings
# init embedding
embedding = TransformerWordEmbeddings('bert-base-uncased')
# create a sentence
sentence = Sentence('The grass is green .')
# embed words in sentence
embedding.embed(sentence)
import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
column_trans = ColumnTransformer([
('tfidf', TfidfVectorizer(), 'text'),
('number_scaler', MinMaxScaler(), ['number'])
])
# Initialize data
data = [
['This process, however, afforded me no means of.', 20, 1],
['another long description', 21, 1],
['It never once occurred to me that the fumbling', 19, 0],
['How lovely is spring As we looked from Windsor', 18, 0]
]
# Create DataFrame
df = pd.DataFrame(data, columns=['text', 'number', 'target'])
X = column_trans.fit_transform(df)
X = X.toarray()
y = df.loc[:, "target"].values
# Perform classification
classifier = LogisticRegression(random_state=0)
classifier.fit(X, y)