2

熊猫系列中我的数据:

data = ["1. stock1 (1991)",  
"3. stock13 (1993)",  
"5. stock19 (1999)",  
"89. stock105 (2001)"] # pandas Series

我需要过滤每个字符串并保存为

s.no    sdata       year  
1       stock1      1991  
3       stock13     1993  
5       stock19     1999  
89      stock105    2001 

我试过使用

data = stock["Rank & Title"].str.split(".")
4

1 回答 1

1

您可以尝试str.extract使用正则表达式的方法:

data = ["1. stock1 (1991)",  
"3. stock13 (1993)",  
"5. stock19 (1999)",  
"89. stock105 (2001)"]

s = pd.Series(data)

s.str.extract("(?P<sno>\d+)\.\s(?P<sdata>\w+)\s\((?P<year>\d+)\)", expand=True)

# sno      sdata    year
#0  1     stock1    1991
#1  3    stock13    1993
#2  5    stock19    1999
#3  89  stock105    2001

分解正则表达式(?P<sno>\d+)\.\s(?P<sdata>\w+)\s\((?P<year>\d+)\)可以简化为(\d+)\.\s(\w+)\s\((\d+)\)不命名捕获的组(使用 完成?P<name>);(\d+),(\w+)并分别(\d+)捕获s.nostocknameyear


或者您可能只想在空白处拆分,然后根据您的真实数据的外观清理列:

(s.str.split(" ", expand=True)
  # strip period and parenthesis
 .apply(lambda col: col.str.strip(".()"))
  # rename columns
 .rename(columns={0: "s.no", 1: "sdata", 2: "year"}))

# s.no     sdata    year
#0   1    stock1    1991
#1   3   stock13    1993
#2   5   stock19    1999
#3  89  stock105    2001
于 2017-04-24T15:39:02.613 回答