1

I can able to annotate the web pages using Portia web crawler, my question is how can use the Regex while extracting the data.

For Example,

I have extracted Location filed from a page

Output looks like,

Location : Location xyz,abc

enter image description here

But I need only the xyz,abc values.

I have googled for solutions, but not getting more information.

Could you explain about regex in Portia scrapy?

4

1 回答 1

4

您需要使用捕获组来提取数据,因此在这种情况下:

Location: (.*)

这告诉 portia 提取Location:字符串后面的所有数据。

例如,如果您只想提取 和 之间Location:的所有数据, ,您可以使用以下内容:

Location: (.*),

您还可以将信息放在捕获组中,以便提取直到并包括您的模式的所有数据。

于 2015-01-22T16:05:08.570 回答