python - 获取除数字以外的任何字符

Question

我正在尝试搜索具有 6 位数字的字符串，但仅此而已，其他字符可能会跟随。这是我使用的正则表达式\d{6}[^\d]由于某种原因它没有捕捉到确实捕捉到的数字\d{6}。

更新

现在我正在使用确实有意义的正则表达式 (\d{6}\D*)$。但无论如何我都无法让它工作。

更新 2 - 解决方案

我当然应该用括号将 \d{6} 分组。嗬！否则，它会包含非数字并尝试与之约会。

更新结束

我想要达到的目标（作为一个相当肮脏的黑客）是在以下格式之一的 openoffice 文档的标题中找到一个日期字符串：YYMMDD,YYYY-MM-DD或YYYYMMDD. 如果它找到其中之一（并且只有一个），它将将该文件的 mtime 和 atime 设置为该日期。尝试在 /tmp100101的标头中创建一个 odt 文件并运行此脚本（要下载的示例文件：http: //db.tt/9aBaIqqa）。根据我的测试，它不应该改变 mtime/atime。但是，如果您在下面的脚本中删除 \D，它将改变它们。

这是我的全部来源：

import zipfile
import re
import glob
import time
import os

class OdfExtractor:
    def __init__(self,filename):
        """
        Open an ODF file.
        """
        self._odf = zipfile.ZipFile(filename)

    def getcontent(self): 
        # Read file with header
        return self._odf.read('styles.xml')

if __name__ == '__main__':
    filepattern = '/tmp/*.odt'

    # Possible date formats I've used
    patterns = [('\d{6}\D', '%y%m%d'), ('\d{4}-\d\d-\d\d', '%Y-%m-%d'), ('\d{8}', '%Y%m%d')]

    # go thru all those files
    for f in glob.glob(filepattern):
        # Extract data
        odf = OdfExtractor(f)

        # Create a list for all dates that will be found
        findings = []

        # Try finding date matches
        contents = odf.getcontent()
        for p in patterns:
            matches = re.findall(p[0], contents)
            for m in matches:
                try:
                    # Collect regexp matches that really are dates
                    findings.append(time.strptime(m, p[1]))
                except ValueError:
                    pass

        print f
        if len(findings) == 1: # Don't change if multiple dates was found in file
            print 'ändrar till:', findings[0]
            newtime = time.mktime(findings[0])
            os.utime(f, (newtime, newtime))
        print '-' * 8

score 1 · Accepted Answer

试试这个：

r'(\d{6}\D*)$'

（六位数字后跟 0 个或更多非数字）。

编辑：添加了“必须匹配到字符串结尾”限定符。

Edit2：哦，看在皮特的份上：

import re

test_strings = [
    ("12345", False),
    ("123456", True),
    ("1234567", False),
    ("123456abc", True),
    ("123456ab9", False)
]

outp = [
    "  good, matched",
    "FALSE POSITIVE",
    "FALSE NEGATIVE",
    "  good, no match"
]

pattern = re.compile(r'(\d{6}\D*)$')
for s,expected in test_strings:
    res = pattern.match(s)
    print outp[2*(res is None) + (expected is False)]

返回

  good, no match
  good, matched
  good, no match
  good, matched
  good, no match

score 1 · Accepted Answer

您可以使用\D（大写 D）匹配任何非数字字符。

正则表达式：

\d{6}\D

原始字符串：（你确定你正确地转义了字符串吗？）

ex = r"\d{6}\D"

细绳：

ex = '\\d{6}\\D'

score 0 · Accepted Answer

我很愚蠢。如果我\D在搜索的末尾添加一个，搜索当然会返回我不想要的 none 数字。我必须在我真正想要的部分添加括号。我觉得很愚蠢，因为没有在循环后用一个简单的打印语句来捕捉这一点。我真的需要更频繁地编码。

python - 获取除数字以外的任何字符

更新

更新 2 - 解决方案

更新结束

3 回答 3

Related

Reference