-3

我是 python 新手,需要帮助。我有一个文件,想将文本提取到另一个文件。

输入文件如下所示:

<Datei Kennung="4bc78" Titel="Morgen 1" Bereich="I847YP"> Morgen 1

Here is text, contains numbers and text.
Here is text, contains numbers and text.
Here is text, contains numbers and text.
Here is text, contains numbers and text.
Here is text, contains numbers and text.

</Datei>
<Datei Kennung="469" Titel="Trop Hall W " Bereich="izr"> Trop Hall W

Here is text, contains numbers and text.
Here is text, contains numbers and text.    


</Datei>

对于我文件中的第一个区域,我需要输出包含以下内容的文件 Morgen 1.txt:

Morgen 1

Here is text, contains numbers and text.
Here is text, contains numbers and text.
Here is text, contains numbers and text.
Here is text, contains numbers and text.
Here is text, contains numbers and text.

我从其他用户那里得到了这个代码:

import re
REG_PARSE=re.compile(r'<Datei[^>]*Titel="\s*([^"]*?)\s*"[^>]*>\s*\1\s*(.*?</Datei>',re.dotall)
with open(filename) as infile:
for outfilename, text = REG_PARSE.finditer(infile.read()):
    with open('%s.txt'%outfilename,'w') as outf:
        outf.write(text)

但它不起作用

4

3 回答 3

0

看看这是否适合你:

#!/usr/bin/env python
#-*- coding:utf-8 -*-
from xml.dom import minidom
xmldoc  = minidom.parse('/path/to/file')
items   = xmldoc.getElementsByTagName('Datei') 

for s in items:
    if s.attributes['Titel'].value == "Morgen 1":
        with open("Morgen 1.txt", "w") as fileOutput:
            listLines = [   line.strip()
                            for line in s.firstChild.nodeValue.strip().split("\n")
                            if line.strip()
                            ]

            fileOutput.write("\n".join(listLines))
            break
于 2012-12-22T09:55:48.510 回答
-1

如果您想要一种快速而肮脏的方式来做到这一点,而不使用 xml(推荐),这将完成这项工作:

with open('path/to/input') as infile:
    found = False
    outfile = open("Morgen 1.txt", 'w')
    for line in infile:
        if line.startswith("<Datei") and 'Titel="Morgen 1"' in line:
            found = True
        elif line.startswith("</Datei"):
            found = False
        if found:
            if not line.startswith("<Datei"):
                outfile.write(line)
于 2012-12-22T09:23:30.847 回答
-2

试试这个……它有效……

fp = open("data.txt", "r")
data = fp.read();

data = data.split(">");

i = 0;

while True:
    filename = data[i].split('" ')[1].split('"')[1]
    text = data[i+1].split('<')[0].strip()

    fp1 = open(filename + ".txt", "w")
    fp1.write(text)
    fp1.close()

    i += 2
    if i >= (len(data) - 1):
        break;
于 2012-12-22T09:16:40.153 回答