我正在尝试使用 pandas 数据框中的 2 个变量进行韦尔奇两次样本 t 检验。两个变量都是字符串。我正在使用 Jupyter Notebook——我尝试了很多不同的场景,但都没有运气
import urllib
import json
import pandas as pd
import bumpy as np
from collections import Counter
import scipy.stats
from scipy import stats
url = "https://data.cityofnewyork.us/resource/9w7m-hzhe.json"
response = urllib.urlopen(url)
data = json.loads(response.read())
pdData = pd.DataFrame(data)
#remove na
dataB = pdData.dropna()
#remove unnecessary values
gradeYes = ['A', 'B', 'C']
gradeRm = dataB.query('grade==@gradeYes')
print(scipy.stats.ttest_ind(gradeRm['inspection_type'], gradeRm['grade']))
数据框的片段
camis cuisine_description dba boro zipcode record_date inspection_date score grade critical_flag action violation_code violation_description inspection_type
40372466 American MURALS ON 54/RANDOLPHS'S MANHATTAN 10019 2017-04-26T06:00:59.000 2016-03-10T00:00:00.000 10 A Critical Violations were cited in the following area(s). 02H Food not cooled by an approved method whereby ... Cycle Inspection / Re-inspection
50012352 Jewish/Kosher SUSHI FUSSION QUEENS 11375 2017-04-26T06:00:59.000 2015-12-08T00:00:00.000 20 B Not Critical Violations were cited in the following area(s). 10I Single service item reused, improperly stored,... Cycle Inspection / Re-inspection
41028194 Chinese SAI'S CAFE BROOKLYN 11219 2017-04-26T06:00:59.000 2015-01-02T00:00:00.000 13 A Not Critical Violations were cited in the following area(s). 10I Single service item reused, improperly stored,... Cycle Inspection / Re-inspection
TypeError Traceback (most recent call last)
<ipython-input-228-5ba9bcaf819c> in <module>()
1 from scipy import stats
----> 2 print(scipy.stats.ttest_ind(gradeRm['inspection_type'], gradeRm['grade']))
/Users/sharonmorris/anaconda/lib/python2.7/site- packages/scipy/stats/stats.pyc in ttest_ind(a, b, axis, equal_var, nan_policy)
4058 return Ttest_indResult(np.nan, np.nan)
4059
-> 4060 v1 = np.var(a, axis, ddof=1)
4061 v2 = np.var(b, axis, ddof=1)
4062 n1 = a.shape[axis]
/Users/sharonmorris/anaconda/lib/python2.7/site- packages/numpy/core/fromnumeric.pyc in var(a, axis, dtype, out, ddof, keepdims)
3124
3125 return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
-> 3126 **kwargs)
/Users/sharonmorris/anaconda/lib/python2.7/site-packages/numpy/core/_methods.pyc in _var(a, axis, dtype, out, ddof, keepdims)
103 if isinstance(arrmean, mu.ndarray):
104 arrmean = um.true_divide(
--> 105 arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
106 else:
107 arrmean = arrmean.dtype.type(arrmean / rcount)
TypeError: unsupported operand type(s) for /: 'unicode' and 'int'