1

我正在尝试使用 pandas 数据框中的 2 个变量进行韦尔奇两次样本 t 检验。两个变量都是字符串。我正在使用 Jupyter Notebook——我尝试了很多不同的场景,但都没有运气

import urllib
import json
import pandas as pd
import bumpy as np
from collections import Counter
import scipy.stats
from scipy import stats

url = "https://data.cityofnewyork.us/resource/9w7m-hzhe.json"
response = urllib.urlopen(url)
data = json.loads(response.read())
pdData = pd.DataFrame(data)

#remove na    
dataB = pdData.dropna()

#remove unnecessary values
gradeYes = ['A', 'B', 'C'] 
gradeRm = dataB.query('grade==@gradeYes')

print(scipy.stats.ttest_ind(gradeRm['inspection_type'], gradeRm['grade']))

数据框的片段

camis        cuisine_description    dba     boro    zipcode record_date inspection_date score   grade   critical_flag   action  violation_code  violation_description   inspection_type
40372466    American    MURALS ON 54/RANDOLPHS'S    MANHATTAN   10019   2017-04-26T06:00:59.000 2016-03-10T00:00:00.000 10  A   Critical    Violations were cited in the following area(s). 02H Food not cooled by an approved method whereby ...   Cycle Inspection / Re-inspection
50012352    Jewish/Kosher   SUSHI FUSSION   QUEENS  11375   2017-04-26T06:00:59.000 2015-12-08T00:00:00.000 20  B   Not Critical    Violations were cited in the following area(s). 10I Single service item reused, improperly stored,...   Cycle Inspection / Re-inspection
41028194    Chinese SAI'S CAFE  BROOKLYN    11219   2017-04-26T06:00:59.000 2015-01-02T00:00:00.000 13  A   Not Critical    Violations were cited in the following area(s). 10I Single service item reused, improperly stored,...   Cycle Inspection / Re-inspection    

TypeError                                 Traceback (most recent call   last)
<ipython-input-228-5ba9bcaf819c> in <module>()
   1 from scipy import stats
----> 2 print(scipy.stats.ttest_ind(gradeRm['inspection_type'],  gradeRm['grade']))

/Users/sharonmorris/anaconda/lib/python2.7/site- packages/scipy/stats/stats.pyc in ttest_ind(a, b, axis, equal_var,  nan_policy)
   4058         return Ttest_indResult(np.nan, np.nan)
   4059 
-> 4060     v1 = np.var(a, axis, ddof=1)
  4061     v2 = np.var(b, axis, ddof=1)
  4062     n1 = a.shape[axis]

  /Users/sharonmorris/anaconda/lib/python2.7/site- packages/numpy/core/fromnumeric.pyc in var(a, axis, dtype, out, ddof, keepdims)
  3124 
  3125     return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
 -> 3126                          **kwargs)

/Users/sharonmorris/anaconda/lib/python2.7/site-packages/numpy/core/_methods.pyc in _var(a, axis, dtype, out, ddof, keepdims)
103     if isinstance(arrmean, mu.ndarray):
104         arrmean = um.true_divide(
--> 105                 arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
106     else:
107         arrmean = arrmean.dtype.type(arrmean / rcount)

TypeError: unsupported operand type(s) for /: 'unicode' and 'int'
4

1 回答 1

0

参数scipy.stats.ttest_ind()必须是数字数据类型,因为该函数会比较它们的平均值。您不能将其与字符串一起使用。

于 2017-05-03T03:06:21.907 回答