0

我目前有这个字符串,我想提取名称,例如,Garet Hayes、Ronald Allen 等。

Executives
Garet Hayes - Director, Public Relations
Ronald Allen - Chief Executive Officer
Gilbert Danielson - Executive Vice President and Chief Financial Officer
Steven Michaels - President
John Robinson - Executive Vice President and President and Chief Executive Officer, Progressive Finance Holdings LLC

我可以通过以下代码提取名字 Garet Hayes:

def partiesExtractor(doc):
    executives = []
    executives.append(doc[doc.lower().index('executives') + len('executives') + 1 : doc.index(' -')])
    return executives

但我觉得应该有一种更有效的方法,即使只是得到第一个名字,更不用说第二个或列表的其余部分了。我该怎么做?

4

3 回答 3

2

您需要将内容分成几行,然后在破折号上拆分每个内容,并保留第一部分

def partiesExtractor(doc):
    executives = []
    for line in doc.splitlines()[1:]:
        executives.append(line.split("-")[0].strip())
    return executives
    # return [line.split("-")[0].strip() for line in doc.splitlines()[1:]] # list-comprenhension


text = """Executives
Garet Hayes - Director, Public Relations
Ronald Allen - Chief Executive Officer
Gilbert Danielson - Executive Vice President and Chief Financial Officer
Steven Michaels - President
John Robinson - Executive Vice President and President and Chief Executive 
Officer, Progressive Finance Holdings LLC"""

print(partiesExtractor(text))  # ['Garet Hayes', 'Ronald Allen', 'Gilbert Danielson', 'Steven Michaels', 'John Robinson']

regex你也可以使用

def partiesExtractor(doc):
    return re.findall("^[A-Z][a-z]+ [A-Z][a-z]+", doc, flags=re.MULTILINE)
于 2020-12-28T18:10:43.397 回答
1

与@azro 使用列表理解类似的解决方案:

def partiesExtractor(doc):
  return [line.split(" - ")[0] for line in doc.split("\n")[1:]]
于 2020-12-28T18:12:13.833 回答
0

使用正则表达式:


import re

s = '''Executives
Garet Hayes - Director, Public Relations
Ronald Allen - Chief Executive Officer
Gilbert Danielson - Executive Vice President and Chief Financial Officer
Steven Michaels - President
John Robinson - Executive Vice President and President and Chief Executive Officer, Progressive Finance Holdings LLC'''

reg = '(\w+\s\w+)\s-\s'

names = re.findall(reg,s)
print(names)

['Garet Hayes', 'Ronald Allen', 'Gilbert Danielson', 'Steven Michaels', 'John Robinson']
于 2020-12-28T18:20:21.043 回答