2

我必须收集数据来证明我的假设,即用惯用手打字比用非惯用手打字更快。我写了下面的代码,给参与者一个随机词,然后他们必须复制它。该代码将计算键入每个单词所需的时间,然后将该数据保存到新文件中。将为每个接受测试的参与者创建一个新的 CSV 文件。

现在我需要编写另一个脚本来找到每个参与者每手牌的平均值,然后创建一个包含平均值的数组,这样我就可以创建一个图表来证明我的假设是否正确。我将如何从不同的文件中获取数据并将其组合到一个数组中?

我的脚本:

import random
import time

name = raw_input('Enter name: ')  # get some name for the file
outfile = file(name + '.csv', 'w')  # create a file for this user's data

# load up a list of 1000 common words
words = file('1-1000.txt').read().split()

ntrials = 50

answers = []
print """Type With Dominant Hand"""
for i in range(ntrials):
    word = random.choice(words)
    tstart = time.time()
    ans = raw_input('Please type ' + word + ': ')
    tstop = time.time()
    answers.append((word, ans, tstop - tstart))
    print >>outfile, 'Dominant', word, ans, tstop - tstart  # write the data to the file
    if (i % 5 == 3):
        go = raw_input('take a break, type y to continue: ')

print """Type With Nondominant Hand"""      
for i in range(ntrials):
    word = random.choice(words)
    tstart = time.time()
    ans = raw_input('Please type ' + word + ': ')
    tstop = time.time()
    answers.append((word, ans, tstop - tstart))
    print >>outfile, 'Nondominant', word, ans, tstop - tstart  # write the data to the file
    if (i % 5 == 3):
        go = raw_input('take a break, type y to continue: ')

 outfile.close()  # close the file

上述脚本的示例结果:

Dominant sit sit 1.81511306763
Dominant again again 2.54711103439
Dominant from from 1.53057098389
Dominant general general 1.98939108849
Dominant horse horse 1.93938016891
Dominant of of 1.07597017288
Dominant clock clock 1.6587600708
Dominant save save 1.42030906677
Nondominant story story 3.92807888985
Nondominant of of 0.93910908699
Nondominant test test 1.69210004807
Nondominant low low 1.13296699524
Nondominant hit hit 1.15252614021
Nondominant you you 1.22019600868
Nondominant river river 1.42011594772
Nondominant middle middle 1.61595511436
4

3 回答 3

1

如果您不熟悉 numpy,这可能看起来像是另一种语言,但这里有一个利用它的解决方案(注意缺少循环!)

为了测试,我创建了第二个用户数据文件,每个条目增加 1 秒。

import glob
import numpy as np

usecols = [0, 3] # Columns to extract from data file
str2num = {'Dominant': 0, 'Nondominant': 1} # Conversion dictionary
converters = {0: (lambda s: str2num[s])} # Strings -> numbers

userfiles = glob.glob('*.csv')
userdat = np.array([np.loadtxt(f, usecols=usecols, converters=converters)
                    for f in userfiles])

# Create boolean arrays to filter desired results
dom    = userdat[..., 0] == 0 
nondom = userdat[..., 0] == 1 

# Filter and reshape to keep 'per-user' layout
usercnt, _, colcnt = userdat.shape
domdat    = userdat[dom   ].reshape(usercnt, -1, colcnt)
nondomdat = userdat[nondom].reshape(usercnt, -1, colcnt)

domavgs    = np.average(domdat,    axis=1)[:, 1]
nondomavgs = np.average(nondomdat, axis=1)[:, 1]

print 'Dominant averages by user:    ', domavgs
print 'Non-dominant averages by user:', nondomavgs

输出:

Dominant averages by user:     [ 1.74707571  2.74707571]
Non-dominant averages by user: [ 1.63763103  2.63763103]

如果您要进行大量分析,我强烈建议您了解 numpy。

于 2012-11-30T00:43:31.277 回答
0
persons = ["billy","bob","joe","kim"]
num_dom,total_dom,num_nondom,total_nondom=0,0,0,0
for person in persons:
   data = file('%s.csv' %person, 'r').readlines()
   for line in data:
      if "Nondominant" in line:
         num_nondom+=1
         total_nondom+=int(line.split(' ')[-1].strip())
      elif "Dominant" in line:
         num_dom+=1
         total_nondom+=int(line.split(' ')[-1].strip())
      else: continue
dom_avg = total_dom/num_dom
nondom_avg = total_nondom/num_nondom
print "Average speed with Dominant hand: %s" %dom_avg
print "Average speed with Non-Dominant hand: %s" %nondom_avg

用您的主题名称填充“人员”数组,然后对数据做您喜欢的事情。

PS。Heltonbiker 注意到您的想法并添加了它。还通过添加条来修复换行错误。

于 2012-11-29T22:50:27.777 回答
0
def avg_one(filename):
    vals = { 'Dominant': [], 'Nondominant': [] }
    for line in input:
        hand, _, _, t = split(line.strip())
        vals[hand].append(float(t))
    d = vals['Dominant']
    nd = vals['Nondominant']
    return (sum(d)/len(d), sum(nd)/len(nd))

data = []
for f in os.listdir():
    if f.endswith('.csv'):
        data.append(avg_one(f))

doms, nondoms = zip(data)

print "Dominant: " + repr(doms)
print "Nondominant: " + repr(nondoms)

这假定同一目录中没有其他具有不同格式的 .csv 文件(并且解析失败)。一般来说,这需要更多的错误检查,但它可以理解这个想法。

于 2012-11-29T22:53:02.980 回答