0

There are 5404 in "noslang.txt". Example

...
2mz   tomorrow
2night   tonight
2nite   tonight
soml   story of my life
ssry   so sorry
...

In "test.txt"

ya right
i'll attend the class
2morow will b great 

My codes:

 NoSlang = open("noslang.txt")
 for line in NoSlang:
      slang,fulltext = map(str, line.split('\t'))
      dic[slang] = fulltext.strip('\n')


 file = open('test.txt').read().split("\n")
 for line in file:
     sline = line.split(" ")
     for n,i in enumerate(sline):
         if i in dic:
             sline[n] = dic[i]
     print ' '.join(sline)

I tried to create dictionary and replace them in sentence from "test.txt". The results showed the same, nothing change.

Any suggestion?

Expected results:

 yeah  right
 i'll attend the class
 tomorrow will be great
4

2 回答 2

1

您可以使用正则表达式替换文件中的单词:

#!/usr/bin/env python
import re
from functools import partial

with open('noslang.txt') as file:
    # slang word -> translation
    slang_map = dict(map(str.strip, line.partition('\t')[::2])
                     for line in file if line.strip())

slang_words = sorted(slang_map, key=len, reverse=True) # longest first for regex
regex = re.compile(r"\b({})\b".format("|".join(map(re.escape, slang_words))))
substitute_slang = partial(regex.sub, lambda m: slang_map[m.group(1)])

with open('input.txt') as file:
    for line in file:
        print substitute_slang(line),

如果input.txt不是很大,您可以一次替换所有俚语:

with open('input.txt') as file:
    print substitute_slang(file.read()),
于 2013-06-24T17:22:12.793 回答
0

像这样的东西:

with open('noslang.txt') as f:
    dic = dict(line.strip().split(None,1) for line in f)
...     
with open('test.txt') as f:
    for line in f:                                             
        spl = line.split()
        new_lis =[dic.get(word,word) for word in spl]
        print " ".join(new_lis)
...         
yeah right
i'll attend the class
tomorrow will b great

其中noslang.txt包含:

ya   yeah
2morow   tomorrow 
2mz   tomorrow
2night   tonight
2nite   tonight
2nyt   tonight
于 2013-06-24T14:59:13.670 回答