我有一个包含正常句子的文本文件。实际上,我在输入该文件时很着急,所以我只是将句子第一个单词的第一个字母大写(根据英语语法)。
但是现在我希望每个单词的第一个字母都大写会更好。就像是:
这句话的每个单词都大写
上面句子中要注意的一点是of和is不是大写的,实际上我想转义等于或小于3个字母的单词。
我该怎么办?
我有一个包含正常句子的文本文件。实际上,我在输入该文件时很着急,所以我只是将句子第一个单词的第一个字母大写(根据英语语法)。
但是现在我希望每个单词的第一个字母都大写会更好。就像是:
这句话的每个单词都大写
上面句子中要注意的一点是of和is不是大写的,实际上我想转义等于或小于3个字母的单词。
我该怎么办?
for line in text_file:
print ' '.join(word.title() if len(word) > 3 else word for word in line.split())
编辑:省略计数标点符号替换len
为以下功能:
def letterlen(s):
return sum(c.isalpha() for c in s)
看看NLTK。
标记每个单词,并大写。诸如“if”、“of”之类的词称为“停用词”。如果您的标准仅仅是长度,史蒂文的回答是这样做的好方法。如果您想查找停用词,SO 中有一个类似的问题:How to remove stop words using nltk or python。
您应该拆分单词,并且只将那些超过三个字母的单词大写。
words.txt
:
each word of this sentence is capitalized
some more words
an other line
-
import string
with open('words.txt') as file:
# List to store the capitalised lines.
lines = []
for line in file:
# Split words by spaces.
words = line.split(' ')
for i, word in enumerate(words):
if len(word.strip(string.punctuation + string.whitespace)) > 3:
# Capitalise and replace words longer than 3 (without punctuation).
words[i] = word.capitalize()
# Join the capitalised words with spaces.
lines.append(' '.join(words))
# Join the capitalised lines.
capitalised = ''.join(lines)
# Optionally, write the capitalised words back to the file.
with open('words.txt', 'w') as file:
file.write(capitalised)
你真正想要的是一个叫做停用词列表的东西。如果没有此列表,您可以自己构建一个并执行以下操作:
skipWords = set("of is".split())
punctuation = '.,<>{}][()\'"/\\?!@#$%^&*' # and any other punctuation that you want to strip out
answer = ""
with open('filepath') as f:
for line in f:
for word in line.split():
for p in punctuation:
# you end up losing the punctuation in the outpt. But this is easy to fix if you really care about it
word = word.replace(p, '')
if word not in skipwords:
answer += word.title() + " "
else:
answer += word + " "
return answer # or you can write it to file continuously
您可以将文本文件中的所有元素添加到列表中:
list = []
f.open('textdocument'.txt)
for elm in f (or text document, I\'m too tired):
list.append(elm)
一旦你有一个列表中的所有元素,运行一个 for 循环来检查每个元素的长度,如果它大于三个,则返回第一个大写的元素
new_list = []
for items in list:
if len(item) > 3:
item.title() (might wanna check if this works in this case)
new_list.append(item)
else:
new_list.append(item) #doesn't change words smaller than three words, just adds them to the new list
看看这是否有效?