标准模块re
可以使用'\d+'
re.findall('\d+', "ID is 123 or ID is 234 or ID is 345")
获取列表[123,234,345]
。
为了确保您也可以使用'ID is (\d+)'
re.findall('ID is (\d+)', "ID is 123 or ID is 234 or ID is 345")
在DataFrame
您可以使用.str.findall()
对所有行执行相同的操作。
import pandas as pd
df = pd.DataFrame({
'ID': [
"ID is 123 or ID is 234 or ID is 345",
"ID is 123 or ID is 567 or ID is 876",
"ID is 567 or ID is 567 or ID is 298",
]
})
print('\n--- before ---\n')
print(df)
df['result'] = df['ID'].str.findall('ID is (\d+)')
print('\n--- after ---\n')
print(df)
结果:
--- before ---
ID
0 ID is 123 or ID is 234 or ID is 345
1 ID is 123 or ID is 567 or ID is 876
2 ID is 567 or ID is 567 or ID is 298
--- after ---
ID result
0 ID is 123 or ID is 234 or ID is 345 [123, 234, 345]
1 ID is 123 or ID is 567 or ID is 876 [123, 567, 876]
2 ID is 567 or ID is 567 or ID is 298 [567, 567, 298]
如果您只需要列result
,numpy array
那么您可以获得df['result'].values
.
如果您需要嵌套列表:df['result'].values.tolist()
.