python - 解析目录中存在的文件中的用户字符串数据

Question

我正在尝试执行以下操作，即使是一个很好的案例，示例输入文件和完整代码都无法匹配，下面给出了完整的代码？为什么代码与下面的示例输入文件不匹配？如何克服它？

1.根据参数打开目录和子目录中的每个文件(which

2.检查每个文件的版权信息是否正好是3行，这3行不必是3行开头

 Copyright (c) 2012 Company, Inc. 
 All Rights Reserved.
 Company Confidential and Proprietary.

示例输入文件：-

File1.txt

/*==========================================================================
 *
 *  @file:     Compiler.h
 *
 *  @brief:    This file 
 *
 *
 *  @author:   david
 *
 *  Copyright (c) 2012 Company, Inc. 
 *  All Rights Reserved.
 *  Company Confidential and Proprietary
 *
 *=========================================================================*/
#ifndef __COMPILER_ABSTRACT_H
#define __COMPILER_ABSTRACT_H

编码：

import os
import sys
userstring="Copyright (c) 2012 Company, Inc.\nAll Rights Reserved.\nCompany Confidential and Proprietary."
print len(sys.argv)
print sys.argv[1]
if len(sys.argv) < 2:
    sys.exit('Usage: python.py <build directory>')
for r,d,f in os.walk(sys.argv[1]):
    for files in f:
        with open(os.path.join(r, files), "r") as file:
            if ''.join(file.readlines()[:3]).strip() != userstring:
                print files

score 1 · Accepted Answer

检查什么''.join(file.readlines()[:3]).strip()给了你。您会注意到*行与行之间仍然存在，并且您将获得前 3 行（[:3]这样做），这肯定不是您在示例文件中想要的。虽然他们不在userstring.

一种可能的解决方案是单独检查每一行。像这样的东西：

userlines = userstring.split('\n') # Separate the string into lines
with open(os.path.join(r, files), "r") as file:
    match = 0
    for line in file:
        if userlines[match] in line: # Check if the line at index `m` is in the user lines
            match += 1 # Next time check the following line
        elif match > 0: # If there was no match, reset the counter
            match = 0
        if match >= len(userlines): # If 3 consecutive lines match, then you found a match
            break
    if match == len(userlines): # You found a match
        print files

这背后的想法是，您正在寻找的不是完全匹配，因为有空行、*点、空格等。我使用in运算符或多或少地说明了这一点，但您可以想出更多当您在每行基础上工作时非常灵活。处理文件时更是如此……

更新：

对于每行更高级的解析，您可以使用re包中的正则表达式，但这在您的用例中可能不实用，因为您最想匹配字符串而不是模式。因此，要忽略最后一个字符，您可以尝试删除/忽略开头或结尾的任何字符（空格或点或星号）。

例如：

>>> a = '   This is a string.   '
>>> a.strip()
'This is a string.' # removes the whitespace by default
>>> a.strip('.')
'   This is a string.   ' # removes only dots
>>> a.strip('. ')
'This is a string' # removes dots and spaces

为了使其与您的输入相匹配，userstring我建议您以相同的方式处理两个字符串（即从两者中删除空格/点），除非您确定自己在userstring. 通过修改，你应该有类似的东西：

userlines = [s.strip('\n\r .') for s in userstring.split('\n')]
# ...
        if userlines[match] == line.strip('\n\r .'):
# ...

逐行处理文件后，您可以使用许多有用的功能，如startswith, endswith, strip, count, find, ... 只需输入help(str)解释器即可获得完整列表。

python - 解析目录中存在的文件中的用户字符串数据

1 回答 1

Related

Reference