我写了一个注释器,可以提取所有带有 CD 标记的令牌,代码如下所示
public class WeightAnnotator extends JCasAnnotator_ImplBase {
private Logger logger = Logger.getLogger(getClass().getName());
public static List<Token> weightTokens = new ArrayList<Token>();
public static final String PARAM_STRING = "stringParam";
@ConfigurationParameter(name = PARAM_STRING)
private String stringParam;
@Override
public void initialize(UimaContext context) throws ResourceInitializationException {
super.initialize(context);
}
@Override
public void process(JCas jCas) throws AnalysisEngineProcessException {
logger.info("Starting processing.");
for (Sentence sentence : JCasUtil.select(jCas, Sentence.class)) {
List<Token> tokens = JCasUtil.selectCovered(jCas, Token.class, sentence);
for (Token token : tokens) {
int begin = token.getBegin();
int end = token.getEnd();
if (token.getPos().equals( PARAM_STRING)) {
WeightAnnotation ann = new WeightAnnotation(jCas, begin, end);
ann.addToIndexes();
System.out.println("Token: " + token.getCoveredText());
}
}
}
}
}
但是当我试图在管道中创建一个迭代器时,迭代器返回 null。这是我的管道的外观。
AnalysisEngineDescription weightDescriptor = AnalysisEngineFactory.createEngineDescription(WeightAnnotator.class,
WeightAnnotator.PARAM_STRING, "CD"
);
AggregateBuilder builder = new AggregateBuilder();
builder.add(sentenceDetectorDescription);
builder.add(tokenDescriptor);
builder.add(posDescriptor);
builder.add(weightDescriptor);
builder.add(writer);
for (JCas jcas : SimplePipeline.iteratePipeline(reader, builder.createAggregateDescription())) {
Iterator iterator1 = JCasUtil.iterator(jcas, WeightAnnotation.class);
while (iterator1.hasNext()) {
WeightAnnotation weights = (WeightAnnotation) iterator1.next();
logger.info("Token: " + weights.getCoveredText());
}
}
我使用 JCasGen 生成了 WeightAnnotator 和 WeightAnnotator_Type。我调试了整个代码,但我不明白我在哪里弄错了。任何关于如何改进这一点的想法都值得赞赏。