标准模块re可以使用'\d+'
re.findall('\d+', "ID is 123 or ID is 234 or ID is 345")
获取列表[123,234,345]。
为了确保您也可以使用'ID is (\d+)'
re.findall('ID is (\d+)', "ID is 123 or ID is 234 or ID is 345")
在DataFrame您可以使用.str.findall()对所有行执行相同的操作。
import pandas as pd
df = pd.DataFrame({
'ID': [
"ID is 123 or ID is 234 or ID is 345",
"ID is 123 or ID is 567 or ID is 876",
"ID is 567 or ID is 567 or ID is 298",
]
})
print('\n--- before ---\n')
print(df)
df['result'] = df['ID'].str.findall('ID is (\d+)')
print('\n--- after ---\n')
print(df)
结果:
--- before ---
ID
0 ID is 123 or ID is 234 or ID is 345
1 ID is 123 or ID is 567 or ID is 876
2 ID is 567 or ID is 567 or ID is 298
--- after ---
ID result
0 ID is 123 or ID is 234 or ID is 345 [123, 234, 345]
1 ID is 123 or ID is 567 or ID is 876 [123, 567, 876]
2 ID is 567 or ID is 567 or ID is 298 [567, 567, 298]
如果您只需要列result,numpy array那么您可以获得df['result'].values.
如果您需要嵌套列表:df['result'].values.tolist().