我试图在 KTRAIN 模型上运行 AnchorText 解释器以进行文本的二进制分类。在尝试预测结果时,我可以看到模型正在给出分类。但是,当我在 explain_instance 中包含变量时,它会抛出一个错误。我已将所有结果包含在打印语句中。
from anchor import anchor_text
import spacy
spacy_nlp = spacy.load('en_core_web_sm')
sample_ids = [75]
for idx in sample_ids:
print('Index: %d, Feature: %s' % (idx, test_data.description.values[idx]))
print('CLASSA Label: %s' % (test_data.CLASSA.values[idx]))
print('CLASSB Label: %s' % (test_data.CLASSB.values[idx]))
print(explainer)
print(predictor.predict_proba(test_data.description.values[idx]))
explainer = anchor_text.AnchorText(spacy_nlp,class_names=labels, use_unk_distribution=True)
exp = explainer.explain_instance(test_data.description.values[idx], predictor.predict_proba, threshold=0.8, use_proba=True, batch_size=30)
#max_pred = 2
print('Key Singal from Anchors: %s' % (' AND '.join(exp.names())))
print('Precision: %.2f' % exp.precision())
print()
exp.show_in_notebook()
我收到以下输出和错误:
Index: 75, Feature: While he didnt do it himself those who collected and compiled the information within this text classification dataset repository did so on the merit of his research. The information contained within features social networking data product review data social circles data, and of course question/answer data
CLASSA Label: 1
CLASSB Label: 0
<anchor.anchor_text.AnchorText object at 0x000001675D870D08>
[0.08205518 0.91794485]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-141-4f25bcbdc43f> in <module>
11
12 explainer = anchor_text.AnchorText(spacy_nlp,class_names=labels, use_unk_distribution=True)
---> 13 exp = explainer.explain_instance(test_data.description.values[idx], predictor.predict_proba, threshold=0.8, use_proba=True, batch_size=30)
14
15 #max_pred = 2
C:\conda\lib\site-packages\anchor\anchor_text.py in explain_instance(self, text, classifier_fn, threshold, delta, tau, batch_size, onepass, use_proba, beam_size, **kwargs)
178 sample_fn, delta=delta, epsilon=tau, batch_size=batch_size,
179 desired_confidence=threshold, stop_on_first=True,
--> 180 coverage_samples=1, **kwargs)
181 exp['names'] = [words[x] for x in exp['feature']]
182 exp['positions'] = [positions[x] for x in exp['feature']]
C:\conda\lib\site-packages\anchor\anchor_base.py in anchor_beam(sample_fn, delta, epsilon, batch_size, min_shared_samples, desired_confidence, beam_size, verbose, epsilon_stop, min_samples_start, max_anchor_size, verbose_every, stop_on_first, coverage_samples)
277 (raw_data, np.zeros((prealloc_size, raw_data.shape[1]),
278 raw_data.dtype)))
--> 279 labels = np.hstack((labels, np.zeros(prealloc_size, labels.dtype)))
280 n_features = data.shape[1]
281 state = {'t_idx': collections.defaultdict(lambda: set()),
<__array_function__ internals> in hstack(*args, **kwargs)
C:\conda\lib\site-packages\numpy\core\shape_base.py in hstack(tup)
343 return _nx.concatenate(arrs, 0)
344 else:
--> 345 return _nx.concatenate(arrs, 1)
346
347
<__array_function__ internals> in concatenate(*args, **kwargs)
ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)
我不确定,“explain_instance”到底期望什么维度?