python - Python - CSV 阅读器 - For 循环读取行，但未到达文件末尾

Question

我开发了一个函数来聚合每个给定队列的总人口，并在文件中给出。此功能目前被使用两次。一次获得总[实际]人口，一次获得“病例”总数。我遇到了一个问题，该函数没有读取到“案例”文件的末尾。我实现了一个行计数器，它打印迭代的行数。人口档案计数器输出为 933，案例档案计数器输出为 911，这意味着它没有读取底部的 22 个案例。有人知道为什么会这样吗？

这是我定义的函数：

def newPopCount(filename, fileheader):
    rowCount = 0  # Row counter
    import csv
    popholder = []
    cohorts = []
    print (len(fileheader))
    for i in range(3, len(fileheader)):
        cohorts.append(fileheader[i])
    for i in range(len(cohorts)):
        popholder.append(0)

    popcsv = open(filename, 'r', newline = '')
    popreader = csv.reader(popcsv, delimiter = ',')

    for row in popreader:
        rowCount += 1
        counter = 0
        if row[0] == fileheader[0]:
            continue
        else:
            for i in range(3, len(fileheader)):
                popholder[counter] += int(row[i])
                counter += 1

    popcsv.close()  

    print (rowCount)  # Print row counter
    return popholder

顺便说一句：fileheader是从另一个函数中获得的，就像它听起来的那样——文件的标题。此外，索引开始于3因为文件中的第一个条目是邮政编码、x 坐标和 y 坐标。

如果有人有任何想法，请分享！

这是新的案例文件，其中的数据是用逗号正确分隔的。还有第二个文件，其中包含数据原始状态的示例。这些数据在主函数调用中聚合，生成我们实际讨论的文件：案例

我还决定包含用于获取标题的代码。我通常通过设置一个等于它的变量来调用它：thisHeader = getHeader('Cases.csv')然后调用另一个函数caseRecord = newPopCount('Cases.csv', thisHeader)

这是getHeader功能：

`def getHeader(file):
    import csv
    headername = None
    charList = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '+', '+', "'", '"', '{', '}', '[', ']', '?', '.', ',', '<', '>', '/', '~', '`', '-', '_']
    headercsv = open(file, 'r', newline = '')
    headerreader = csv.reader(headercsv, delimiter = ',')
    for row in headerreader:
        if row[0][0] in charList and row[1][0] in charList:
            headername = row
    headercsv.close()
    return headername`

再次感谢您的观看！

score 1 · Accepted Answer

我下载了你的要点并将其保存为cases.tsv.

然后我修改了你在ing 文件后newPopCount立即执行的操作，并将下一行更改为使用而不是.popcsv.readline()opendelimiter='\t'delimiter=','

然后我用这条线运行它：

h = newPopCount('cases.tsv', ['zcta', 'xcoord', 'ycoord', 'm5064', 'm6574', 'm75plus', 'f5064', 'f6574', 'f75plus'])

它打印出 932。

由于有 933 行，其中之一是标题（不计算在内），这是正确的答案。

所以，我最好的猜测是你只是在错误的文件上运行它，这就是你得到错误答案的原因。

您的代码中存在错误并非不可能，并且您上传的不正确示例数据恰好正好抵消了该错误……但这似乎不太可能。如果您可以向我们提供实际文件、实际在该文件上运行的代码以及调用该newPopCount函数的代码，那么排除这种可能性应该是微不足道的。

score 1 · Accepted Answer

这不是您问题的答案——所以我将它设为 CW——但您可能有兴趣查看pandas库。它使处理表格数据比其他方式更有趣。

首先读入数据（我在NewCaseFile这里使用你的，这似乎是逗号分隔的，所以我称之为ncf.csv）：

>>> import pandas as pd
>>> df = pd.read_csv("ncf.csv")
>>> df
<class 'pandas.core.frame.DataFrame'>
Int64Index: 932 entries, 0 to 931
Data columns (total 9 columns):
zcta       932  non-null values
xcoord     932  non-null values
ycoord     932  non-null values
m5064      932  non-null values
m6574      932  non-null values
m75plus    932  non-null values
f5064      932  non-null values
f6574      932  non-null values
f75plus    932  non-null values
dtypes: float64(1), int64(8)
>>> df.head() # look at the start of the frame
    zcta    xcoord   ycoord  m5064  m6574  m75plus  f5064  f6574  f75plus
0  51062  211253.4  4733175      0      0        1      0      0        0
1  51011  212255.6  4757939      0      0        1      0      0        0
2  51109  215303.5  4721048      0      1        7      0      1        2
3  51001  215651.1  4746655      1      0        4      0      1        0
4  51103  216887.7  4713568      4      9       28      1      1        8

使用 x,y,zip 列作为索引，并在总体列中求和：

>>> df = df.set_index(["zcta", "xcoord", "ycoord"])
>>> df["total"] = df.sum(axis=1)
>>> df.head()
                        m5064  m6574  m75plus  f5064  f6574  f75plus  total
zcta  xcoord   ycoord                                                      
51062 211253.4 4733175      0      0        1      0      0        0      1
51011 212255.6 4757939      0      0        1      0      0        0      1
51109 215303.5 4721048      0      1        7      0      1        2     11
51001 215651.1 4746655      1      0        4      0      1        0      6
51103 216887.7 4713568      4      9       28      1      1        8     51

按列求和：

>>> df.sum()
m5064       981
m6574      1243
m75plus    2845
f5064      1355
f6574      1390
f75plus    1938
total      9752
dtype: int64

等等。特别是，它可以更容易地进行许多其他直接解释但在实践中令人讨厌的转换。例如：

>>> df = pd.read_csv("ncf.csv")
>>> d2 = pd.melt(df, id_vars=list(df.columns[:3]))
>>> d2["sex"] = d2["variable"].str[:1]
>>> d2["age_lower"] = d2["variable"].str[1:3].astype(float)
>>> d2["age_upper"] = d2["variable"].str[3:].replace("plus", 100).astype(float)
>>> del d2["variable"]
>>> d2.rename(columns={"value": "count"}, inplace=True)

给出：

>>> d2.head()
    zcta    xcoord   ycoord  count sex  age_lower  age_upper
0  51062  211253.4  4733175      0   m         50         64
1  51011  212255.6  4757939      0   m         50         64
2  51109  215303.5  4721048      0   m         50         64
3  51001  215651.1  4746655      1   m         50         64
4  51103  216887.7  4713568      4   m         50         64
>>> d2.groupby("sex")["count"].sum()
sex
f      4683
m      5069
Name: count, dtype: int64

等等。

score 0 · Accepted Answer

首先感谢大家考虑他的问题并试图帮助我。在回答 @abarnert 的问题时发现，我忘记NewCaseFile.csv在创建此聚合文件 ( ) 后关闭它。因此，添加.close()语句后，一切都开始正常工作。谢谢大家花时间看我的问题。

python - Python - CSV 阅读器 - For 循环读取行，但未到达文件末尾

3 回答 3

Related

Reference