python - 检查文件是否具有相同名称并存储具有相同名称的文件的行数

Question

我对 Python 比较陌生，我真的可以使用你们中的一些人的意见。

我有一个脚本正在运行，它以以下格式存储文件：

201309030700__81.28.236.2.txt
201308240115__80.247.17.26.txt
201308102356__84.246.88.20.txt
201309030700__92.243.23.21.txt
201308030150__203.143.64.11.txt

每个文件都有一些代码行，我想计算它们的总数，然后我想存储它。例如，我想浏览这些文件，如果一个文件具有相同的日期（文件名的第一部分），那么我想以以下格式将其存储在同一个文件中。

201309030700__81.28.236.2.txt has 10 lines
201309030700__92.243.23.21.txt has 8 lines

创建一个日期为 20130903 的文件（最后 4 位数字是我不想要的时间）。创建文件：20130903.txt 其中有两行代码 10 8

我有以下代码，但我没有得到任何地方，请帮忙。

import os, os.path
asline = []
ipasline = []

def main():
    p = './results_1/'
    np = './new/'
    fd = os.listdir(p)
    run(fd)

def writeFile(fd, flines):
    fo = np+fd+'.txt'
    with open(fo, 'a') as f:    
        r = '%s\t %s\n' % (fd, flines)
        f.write(r)

def run(path):
    for root, dirs, files in os.walk(path):
       for cfile in files:
            stripFN = os.path.splitext(cfile)[0]
            fileDate = stripFN.split('_')[0]
            fileIP = stripFN.split('_')[-1]     
        if cfile.startswith(fileDate):
                hp = 0
                for currentFile in files.readlines()[1:]:
                    hp += 1
                    writeFile(fdate, hp)

我试图玩弄这个脚本：

if not os.path.exists(os.path.join(p, y)):  
    os.mkdir(os.path.join(p, y))
    np = '%s%s' % (datetime.now().strftime(FORMAT), path)
if os.path.exists(os.path.join(p, m)):
    os.chdir(os.path.join(p, month, d))
    np = '%s%s' % (datetime.now().strftime(FORMAT), path)

其中 FORMAT 具有以下值

20130903

但我似乎无法让它发挥作用。

编辑：我已经修改了代码如下，它有点像我想做的，但可能我做的事情是多余的，我仍然没有考虑到我正在处理大量文件，所以也许这不是最有效的方式。请看一看。

import re, os, os.path


p = './results_1/'
np = './new/'
fd = os.listdir(p)
star = "*"


def writeFile(fd, flines):
    fo = './new/'+fd+'_v4.txt'
    with open(fo, 'a') as f:    
    r = '%s\n' % (flines)
    f.write(r)

for f in fd:
    pathN = os.path.join(p, f)
    files = open(pathN, 'r')
    fileN = os.path.basename(pathN)
    stripFN = os.path.splitext(fileN)[0]
    fileDate = stripFN.split('_')[0]
    fdate = fileDate[0:8]
    lnum = len(files.readlines())
    writeFile(fdate, lnum)
    files.close()

目前，它正在为文件中计数的每个行数写入一个带有新行的文件。但是我已经对此进行了排序。我将不胜感激，非常感谢您。

编辑2：现在我得到每个文件的输出，日期为文件名。这些文件现在显示为：

20130813.txt
20130819.txt
20130825.txt

每个文件现在看起来像：

每个文件继续进行 200 多行。理想情况下，到现在为止，每次发生都会发生很多次，并且首先以最小的数字排序将是最理想的结果。

我试过类似的东西：

import sys
from collections import Counter

p = '.txt'
d = []
with open(p, 'r') as f:
    for x in f:
        x = int(x)
        d.append(x)
    d.sort()
    o = Counter(d)
    print o

这有意义吗？

编辑 3：

我有以下脚本对我来说是唯一的，但我仍然无法按唯一计数排序。

import os
from collections import Counter

p = './newR'
fd = os.listdir(p)

for f in fd:
    pathN = os.path.join(p, f)
    with open(pathN, 'r') as infile:
        fileN = os.path.basename(pathN)
        stripFN = os.path.splitext(fileN)[0]
        fileDate = stripFN.split('_')[0]
        counts = Counter(l.strip() for l in infile)
        for line, count in counts.most_common():
            print line, count

有以下结果：

输出应如下所示：

这样做最有效的方法是什么？

score 0 · Accepted Answer

The following code has achieved my initial question.

import os, os.path, subprocess
from sys import stdout

p = './new/results/v4/TRACE_v4_results_ASN_mh60'
fd = os.listdir(p)

def writeFile(fd, flines):
    fo = './new/newR/'+fd+'_v4.txt'
    with open(fo, 'a') as f:    
        r = '%s\n' % (flines)
        f.write(r)

for pfiles in dirs:
pathN = os.path.join(path, pfiles)
files = open(pathN, 'r')
fileN = os.path.basename(pathN)
stripFN = os.path.splitext(fileN)[0]
fileDate = stripFN.split('_')[0]
fdate = fileDate[0:8]
numlines = len(files.readlines()[1:])
writeFile(fdate, numlines)
files.close()

It produced the following results:

20130813.txt
20130819.txt
20130825.txt

My sincerely apology if I have not followed the rules.

score 0 · Accepted Answer

字典非常适合这样的任务。如果您打算递归处理不同目录深度的输入文件，则必须修改下面的示例。还要记住，您可以将 Python 字符串视为列表，这允许您拼接它们（这可以减少混乱的正则表达式）。

D = {}
fnames = os.listdir("txt/")
for fname in fnames:
    print(fname)
    date = fname[0:8] # this extracts the first 8 characters, aka: date
    if date not in D:
        D[date] = []
    file = open("txt/" + fname, 'r')
    numlines = len(file.readlines())
    file.close()
    D[date].append(fname + " has " + str(numlines) + " lines")

for k in D:
    datelist = D[k]
    f = open('output/' + k + '.txt', 'w')
    for m in datelist:
        f.write(m + '\n')
    f.close()

python - 检查文件是否具有相同名称并存储具有相同名称的文件的行数

2 回答 2

Related

Reference