0

我有一些选项链数据:

Contract Name,Last Trade Date,Strike,Last Price,Bid,Ask,Change

AMZN200605P03320000,2020-05-28 3:24PM EDT,3320.0,900.65,876.0,893.5,+900.65

AMZN200605P03500000,2020-05-28 3:51PM EDT,3500.0,1099.55,1055.5,1073.5,"+1,099.55"

条目 - “+1,099.55”似乎是数据中的错误条目,因为没有任何其他类似的记录,我需要在插入 SQLDB 之前对其进行清理。我尝试了几种不同的方法,但都没有奏效。任何见解将不胜感激:

optionsChainPuts['Change'] = optionsChainPuts['Change'].map(lambda x: x.lstrip('\"+').rstrip('\"'))
optionsChainPuts['Change'] = optionsChainPuts['Change'].astype(str).str.replace('\D', '')
optionsChainPuts['Change'] = optionsChainPuts['Change'].astype(str).map(lambda x: x.replace('"', ''))

谢谢

4

2 回答 2

0

问题是带有逗号和引号的数字。

使用语言环境将欧洲语言转换为英语

代码

from io import StringIO
import pandas as pd
import locale

s = '''Contract Name,Last Trade Date,Strike,Last Price,Bid,Ask,Change
AMZN200605P03320000,2020-05-28 3:24PM EDT,3320.0,900.65,876.0,893.5,+900.65
AMZN200605P03500000,2020-05-28 3:51PM EDT,3500.0,1099.55,1055.5,1073.5,"+1,099.55"'''

df = pd.read_csv(StringIO(s))

# set local to English
locale.setlocale( locale.LC_ALL, 'en_US.UTF-8' ) 

# Convert column to float
df['Change'] = df['Change'].apply(lambda x: locale.atof(x))

print(df['Change'])

输出

Name: Change, dtype: object
0     900.65
1    1099.55
Name: Change, dtype: float64
于 2020-06-01T11:43:44.067 回答
0

这是引起问题的逗号。一种选择是以逗号分隔并加入值

>>> val = "+1,099.55"
>>> val = val.split(",")
>>> num = float(val[0] + val[1])
>>> num
1099.55

希望有帮助!

于 2020-05-31T19:32:06.497 回答