0

“test.txt”中有两个sentnec

sentence1 = 句子是由一个或多个单词组成的语法单元。

sentence2 = 一个句子也可以单独用正字法定义。

count_line = 0
for line in open('C:/Users/Desktop/test.txt'):
    count_line = count_line +1
    fields = line.rstrip('\n').split('\t')
    ##print count_line, fields
    file = open('C:/Users/Desktop/test_words.txt', 'w+')
    count_word = 0
    for words in fields:
        wordsplit = words.split()
        for word in wordsplit:
             count_word = count_word + 1
             print count_word, word
             file.write(str(count_word) + " " + word + '\n')
        file.close()

我在“test_words.txt”中的结果只显示了第二句中的单词:

1 A 
2 sentence
3 can
4 also
5 be
6 defined
7 in
8 orthographic
9 terms
10 alone.

如何也将第一句中的单词写在第二句“test_words.txt”中的单词之后?

有什么建议吗?

4

4 回答 4

3

在您的代码中,您多次打开和关闭输出文件,导致您的代码覆盖您从第一句话中编写的内容。简单的解决方案是只打开一次,只关闭一次。

count_line = 0
# Open outside the loop
file = open('C:/Users/Desktop/test_words.txt', 'w+')
for line in open('C:/Users/Desktop/test.txt'):
    count_line = count_line +1
    fields = line.rstrip('\n').split('\t')
    ##print count_line, fields
    count_word = 0
    for words in fields:
        wordsplit = words.split()
        for word in wordsplit:
            count_word = count_word + 1
            print count_word, word
            file.write(str(count_word) + " " + word + '\n')
# Also close outside the loop
file.close()
于 2012-12-17T18:42:31.920 回答
0

发生这种情况的原因是,当您第二次打开文件时,您不会保留其中的原始文本。当你在 Python 中打开一个文件并写入它时,你基本上会覆盖它的内容,除非你将它们存储在一个变量中并重新写入它们。

试试这个代码:

count_line = 0
for n, line in enumerate(open('test.txt')):
    count_line = count_line +1
    fields = line.rstrip('\n').split('\t')
    ##print count_line, fields
    already_text = open('test_words.txt').read() if n > 0 else ''
    file = open('test_words.txt', 'w+')
    count_word = 0
    file.write(already_text)
    for words in fields:
        wordsplit = words.split()
        for word in wordsplit:
             count_word = count_word + 1
             print count_word, word
             file.write(str(count_word) + " " + word + '\n')
        file.close()

这是我运行它时的输出:

1个
2 句
3是
4个
5 语法
6个单位
7 组成
8 个
9个
10 或
还有 11 个
12 个字。
1个
2 句
3罐
4还有
5 是
6 定义
7 英寸
8 正字法
9 学期
10个人。

这是没有的代码enumerate()

count_line = 0
n = 0
for line in open('test.txt'):
    count_line = count_line +1
    fields = line.rstrip('\n').split('\t')
    ##print count_line, fields
    already_text = open('test_words.txt').read() if n > 0 else ''
    file = open('test_words.txt', 'w+')
    count_word = 0
    file.write(already_text)
    for words in fields:
        wordsplit = words.split()
        for word in wordsplit:
             count_word = count_word + 1
             print count_word, word
             file.write(str(count_word) + " " + word + '\n')
        file.close()
    n += 1
于 2012-12-17T18:27:15.000 回答
0

如果可能,您应该with在处理文件时使用它 - 它是一个上下文管理器,并确保在您完成它们后正确关闭它们(通过离开缩进块来表示)。在这里,我们使用enumerate提供的可选start参数 - 这是一种(几种)在计数器移动到下一行时保持计数器运行的方法:

# Open the file
with open('test.txt', 'rb') as f:
  # Open the output (in Python 2.7+, this can be done on the same line)
  with open('text_words.txt', 'wb') as o:
    # Set our counter
    counter = 1
    # Iterate through the file
    for line in f:
      # Strip out newlines and split on whitespace
      words = line.strip().split()
      # Start our enumeration, which will return the index (starting at 1) and
      # the word itself
      for index, word in enumerate(words, counter):
        # Write the word to the file
        o.write('{0} {1}\n'.format(index, word))
      # Increment the counter
      counter += len(words)

或者,如果您想要更少的行 - 这用于readlines()将文件读入列表,其中项目由换行符分隔。然后,这些行本身在空白处被分割,每个单词都被拉出。这意味着您基本上遍历文件中所有单词的列表,并且结合enumerate您不需要增加计数器,因为它为您完成:

# Open the file
with open('test.txt', 'rb') as f:
  # Open the output (in Python 2.7+, this can be done on the same line)
  with open('text_words.txt', 'wb') as o:
    # Iterate through the file
    for i, w in enumerate((x for l in f.readlines() for x in l.strip().split()), 1):
      o.write('{0} {1}\n'.format(i, w))

使用 Python 2.7:

# Open the file
with open('test.txt', 'rb') as f, open('text_words.txt', 'wb') as o:
  # Iterate through the file
  for i, w in enumerate((x for l in f.readlines() for x in l.strip().split()), 1):
    o.write('{0} {1}\n'.format(i, w))
于 2012-12-17T18:47:02.407 回答
0

这可能无关紧要,但我建议您使用更简洁的方法编写它。你不需要有3个循环:

lines = open('test.txt').readlines()
file = open('test_words.txt', 'w+')
for line in lines:
  words = line.rstrip('\n').split()

  for i, word in enumerate(words):
    print i, word
    file.write('%d %s\n' % (i+1, word))
file.close()
于 2012-12-17T18:49:07.563 回答