0

我试图在 KTRAIN 模型上运行 AnchorText 解释器以进行文本的二进制分类。在尝试预测结果时,我可以看到模型正在给出分类。但是,当我在 explain_instance 中包含变量时,它会抛出一个错误。我已将所有结果包含在打印语句中。

from anchor import anchor_text
import spacy
spacy_nlp = spacy.load('en_core_web_sm')
sample_ids = [75]

for idx in sample_ids:
    
    print('Index: %d, Feature: %s' % (idx, test_data.description.values[idx]))
    print('CLASSA Label: %s' % (test_data.CLASSA.values[idx]))
    print('CLASSB Label: %s' % (test_data.CLASSB.values[idx]))
    print(explainer)
    print(predictor.predict_proba(test_data.description.values[idx]))
    
    explainer = anchor_text.AnchorText(spacy_nlp,class_names=labels, use_unk_distribution=True)
    exp = explainer.explain_instance(test_data.description.values[idx], predictor.predict_proba, threshold=0.8, use_proba=True, batch_size=30)

    #max_pred = 2
    print('Key Singal from Anchors: %s' % (' AND '.join(exp.names())))
    print('Precision: %.2f' % exp.precision())
    print()

    exp.show_in_notebook()

我收到以下输出和错误:

Index: 75, Feature: While he didnt do it himself those who collected and compiled the information within this text classification dataset repository did so on the merit of his research. The information contained within features social networking data product review data social circles data, and of course question/answer data
CLASSA Label: 1
CLASSB Label: 0
<anchor.anchor_text.AnchorText object at 0x000001675D870D08>
[0.08205518 0.91794485]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-141-4f25bcbdc43f> in <module>
     11 
     12     explainer = anchor_text.AnchorText(spacy_nlp,class_names=labels, use_unk_distribution=True)
---> 13     exp = explainer.explain_instance(test_data.description.values[idx], predictor.predict_proba, threshold=0.8, use_proba=True, batch_size=30)
     14 
     15     #max_pred = 2

C:\conda\lib\site-packages\anchor\anchor_text.py in explain_instance(self, text, classifier_fn, threshold, delta, tau, batch_size, onepass, use_proba, beam_size, **kwargs)
    178             sample_fn, delta=delta, epsilon=tau, batch_size=batch_size,
    179             desired_confidence=threshold, stop_on_first=True,
--> 180             coverage_samples=1, **kwargs)
    181         exp['names'] = [words[x] for x in exp['feature']]
    182         exp['positions'] = [positions[x] for x in exp['feature']]

C:\conda\lib\site-packages\anchor\anchor_base.py in anchor_beam(sample_fn, delta, epsilon, batch_size, min_shared_samples, desired_confidence, beam_size, verbose, epsilon_stop, min_samples_start, max_anchor_size, verbose_every, stop_on_first, coverage_samples)
    277             (raw_data, np.zeros((prealloc_size, raw_data.shape[1]),
    278                                 raw_data.dtype)))
--> 279         labels = np.hstack((labels, np.zeros(prealloc_size, labels.dtype)))
    280         n_features = data.shape[1]
    281         state = {'t_idx': collections.defaultdict(lambda: set()),

<__array_function__ internals> in hstack(*args, **kwargs)

C:\conda\lib\site-packages\numpy\core\shape_base.py in hstack(tup)
    343         return _nx.concatenate(arrs, 0)
    344     else:
--> 345         return _nx.concatenate(arrs, 1)
    346 
    347 

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

我不确定,“explain_instance”到底期望什么维度?

4

0 回答 0