我必须在地址字段中用 NS 替换北、南等。
如果我有
list = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}
address = "123 north anywhere street"
我可以遍历我的字典值来替换我的地址字段吗?
for dir in list[]:
address.upper().replace(key,value)
我知道我什至没有接近!但是,如果您可以使用这样的字典值,任何输入都将不胜感激。
我必须在地址字段中用 NS 替换北、南等。
如果我有
list = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}
address = "123 north anywhere street"
我可以遍历我的字典值来替换我的地址字段吗?
for dir in list[]:
address.upper().replace(key,value)
我知道我什至没有接近!但是,如果您可以使用这样的字典值,任何输入都将不胜感激。
address = "123 north anywhere street"
for word, initial in {"NORTH":"N", "SOUTH":"S" }.items():
address = address.replace(word.lower(), initial)
print address
也很好,简洁易读。
你很接近,实际上:
dictionary = {"NORTH":"N", "SOUTH":"S" }
for key in dictionary.iterkeys():
address = address.upper().replace(key, dictionary[key])
注意:对于 Python 3 用户,您应该使用.keys()
而不是.iterkeys()
:
dictionary = {"NORTH":"N", "SOUTH":"S" }
for key in dictionary.keys():
address = address.upper().replace(key, dictionary[key])
我认为尚未有人建议的一种选择是构建一个包含所有键的正则表达式,然后简单地对字符串进行一次替换:
>>> import re
>>> l = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}
>>> pattern = '|'.join(sorted(re.escape(k) for k in l))
>>> address = "123 north anywhere street"
>>> re.sub(pattern, lambda m: l.get(m.group(0).upper()), address, flags=re.IGNORECASE)
'123 N anywhere street'
>>>
这样做的好处是正则表达式可以忽略输入字符串的大小写而不修改它。
如果您只想对完整的单词进行操作,那么您也可以通过简单修改模式来做到这一点:
>>> pattern = r'\b({})\b'.format('|'.join(sorted(re.escape(k) for k in l)))
>>> address2 = "123 north anywhere southstreet"
>>> re.sub(pattern, lambda m: l.get(m.group(0).upper()), address2, flags=re.IGNORECASE)
'123 N anywhere southstreet'
您可能正在寻找iteritems()
:
d = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}
address = "123 north anywhere street"
for k,v in d.iteritems():
address = address.upper().replace(k, v)
地址是现在'123 N ANYWHERE STREET'
好吧,如果你想保留大小写、空格和嵌套单词(例如Southstreet
,不应该转换为Sstreet
),请考虑使用这个简单的列表推导:
import re
l = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}
address = "North 123 East Anywhere Southstreet West"
new_address = ''.join(l[p.upper()] if p.upper() in l else p for p in re.split(r'(\W+)', address))
new_address 现在是
N 123 E Anywhere Southstreet W
用字典“翻译”字符串是非常常见的要求。我提出了一个您可能希望保留在工具包中的功能:
def translate(text, conversion_dict, before=None):
"""
Translate words from a text using a conversion dictionary
Arguments:
text: the text to be translated
conversion_dict: the conversion dictionary
before: a function to transform the input
(by default it will to a lowercase)
"""
# if empty:
if not text: return text
# preliminary transformation:
before = before or str.lower
t = before(text)
for key, value in conversion_dict.items():
t = t.replace(key, value)
return t
然后你可以写:
>>> a = {'hello':'bonjour', 'world':'tout-le-monde'}
>>> translate('hello world', a)
'bonjour tout-le-monde'
我建议使用正则表达式而不是简单的替换。使用替换,您有可能替换单词的子部分,这可能不是您想要的。
import json
import re
with open('filePath.txt') as f:
data = f.read()
with open('filePath.json') as f:
glossar = json.load(f)
for word, initial in glossar.items():
data = re.sub(r'\b' + word + r'\b', initial, data)
print(data)
def replace_values_in_string(text, args_dict):
for key in args_dict.keys():
text = text.replace(key, str(args_dict[key]))
return text
尝试,
import re
l = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}
address = "123 north anywhere street"
for k, v in l.iteritems():
t = re.compile(re.escape(k), re.IGNORECASE)
address = t.sub(v, address)
print(address)
使用replace()
和format()
不那么精确:
data = '{content} {address}'
for k,v in {"{content}":"some {address}", "{address}":"New York" }.items():
data = data.replace(k,v)
# results: some New York New York
'{ {content} {address}'.format(**{'content':'str1', 'address':'str2'})
# results: ValueError: unexpected '{' in field name
re.sub()
如果您需要精确的位置,最好翻译:
import re
def translate(text, kw, ignore_case=False):
search_keys = map(lambda x:re.escape(x), kw.keys())
if ignore_case:
kw = {k.lower():kw[k] for k in kw}
regex = re.compile('|'.join(search_keys), re.IGNORECASE)
res = regex.sub( lambda m:kw[m.group().lower()], text)
else:
regex = re.compile('|'.join(search_keys))
res = regex.sub( lambda m:kw[m.group()], text)
return res
#'score: 99.5% name:%(name)s' %{'name':'foo'}
res = translate( 'score: 99.5% name:{name}', {'{name}':'foo'})
print(res)
res = translate( 'score: 99.5% name:{NAME}', {'{name}':'foo'}, ignore_case=True)
print(res)
所有这些答案都很好,但是您缺少 python 字符串替换 - 它简单快捷,但需要正确格式化您的字符串。
address = "123 %(direction)s anywhere street"
print(address % {"direction": "N"})
如果您正在寻找一种简洁的方法,您可以从 functools 中选择 reduce:
from functools import reduce
str_to_replace = "The string for replacement."
replacement_dict = {"The ": "A new ", "for ": "after "}
str_replaced = reduce(lambda x, y: x.replace(*y), [str_to_replace, *list(replacement_dict.items())])
print(str_replaced)
邓肯方法的优点是注意不要覆盖以前的答案。例如,如果您有 {"Shirt": "Tank Top", "Top": "Sweater"},则其他方法将 "Shirt" 替换为 "Tank Sweater"。
以下代码扩展了该方法,但对键进行排序,以便始终首先找到最长的键,并且它使用命名组不区分大小写搜索。
import re
root_synonyms = {'NORTH':'N','SOUTH':'S','EAST':'E','WEST':'W'}
# put the longest search term first. This menas the system does not replace "top" before "tank top"
synonym_keys = sorted(root_synonyms.keys(),key=len,reverse=True)
# the groups will be named w1, w2, ... . Determine what each of them should become
number_mapping = {f'w{i}':root_synonyms[key] for i,key in enumerate(synonym_keys) }
# make a regex for each word where "tank top" or "tank top" are the same
search_terms = [re.sub(r'\s+',r'\s+',re.escape(k)) for k in synonym_keys]
# give each search term a name w1 etc where
search_terms = [f'(?P<w{i}>\\b{key}\\b)' for i,key in enumerate(search_terms)]
# make one huge regex
search_terms = '|'.join(search_terms)
# compile it for speed
search_re = re.compile(search_terms,re.IGNORECASE)
query = "123 north anywhere street"
result = re.sub(search_re,lambda x: number_mapping[x.lastgroup],query)
print(result)