假设我有 10 个不同的标记,“(TOKEN)”在一个字符串中。如何用其他字符串替换随机选择的 2 个标记,而其他标记保持不变?
问问题
760 次
7 回答
2
>>> import random
>>> text = '(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)'
>>> token = '(TOKEN)'
>>> replace = 'foo'
>>> num_replacements = 2
>>> num_tokens = text.count(token) #10 in this case
>>> points = [0] + sorted(random.sample(range(1,num_tokens+1),num_replacements)) + [num_tokens+1]
>>> replace.join(token.join(text.split(token)[i:j]) for i,j in zip(points,points[1:]))
'(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__foo__(TOKEN)__foo__(TOKEN)__(TOKEN)__(TOKEN)'
函数形式:
>>> def random_replace(text, token, replace, num_replacements):
num_tokens = text.count(token)
points = [0] + sorted(random.sample(range(1,num_tokens+1),num_replacements)) + [num_tokens+1]
return replace.join(token.join(text.split(token)[i:j]) for i,j in zip(points,points[1:]))
>>> random_replace('....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....','(TOKEN)','FOO',2)
'....FOO....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....FOO....'
测试:
>>> for i in range(0,9):
print random_replace('....(0)....(0)....(0)....(0)....(0)....(0)....(0)....(0)....','(0)','(%d)'%i,i)
....(0)....(0)....(0)....(0)....(0)....(0)....(0)....(0)....
....(0)....(0)....(0)....(0)....(1)....(0)....(0)....(0)....
....(0)....(0)....(0)....(0)....(0)....(2)....(2)....(0)....
....(3)....(0)....(0)....(3)....(0)....(3)....(0)....(0)....
....(4)....(4)....(0)....(0)....(4)....(4)....(0)....(0)....
....(0)....(5)....(5)....(5)....(5)....(0)....(0)....(5)....
....(6)....(6)....(6)....(0)....(6)....(0)....(6)....(6)....
....(7)....(7)....(7)....(7)....(7)....(7)....(0)....(7)....
....(8)....(8)....(8)....(8)....(8)....(8)....(8)....(8)....
于 2012-04-08T05:52:01.150 回答
1
有很多方法可以做到这一点。我的方法是编写一个函数,该函数采用原始字符串、令牌字符串和一个函数,该函数返回原始令牌中出现的令牌的替换文本:
def strByReplacingTokensUsingFunction(original, token, function):
outputComponents = []
matchNumber = 0
unexaminedOffset = 0
while True:
matchOffset = original.find(token, unexaminedOffset)
if matchOffset < 0:
matchOffset = len(original)
outputComponents.append(original[unexaminedOffset:matchOffset])
if matchOffset == len(original):
break
unexaminedOffset = matchOffset + len(token)
replacement = function(original=original, offset=matchOffset, matchNumber=matchNumber, token=token)
outputComponents.append(replacement)
matchNumber += 1
return ''.join(outputComponents)
(你当然可以改变它以使用更短的标识符。我的风格比典型的 Python 风格更冗长。)
鉴于该功能,很容易替换十分之二的随机事件。这是一些示例输入:
sampleInput = 'a(TOKEN)b(TOKEN)c(TOKEN)d(TOKEN)e(TOKEN)f(TOKEN)g(TOKEN)h(TOKEN)i(TOKEN)j(TOKEN)k'
random 模块有一种方便的方法可以从总体中选择随机项目(不是两次选择相同的项目):
import random
replacementIndexes = random.sample(range(10), 2)
然后我们可以使用上面的函数来替换随机选择的事件:
sampleOutput = strByReplacingTokensUsingFunction(sampleInput, '(TOKEN)',
(lambda matchNumber, token, **keywords:
'REPLACEMENT' if (matchNumber in replacementIndexes) else token))
print sampleOutput
这是一些测试输出:
a(TOKEN)b(TOKEN)cREPLACEMENTd(TOKEN)e(TOKEN)fREPLACEMENTg(TOKEN)h(TOKEN)i(TOKEN)j(TOKEN)k
这是另一个运行:
a(TOKEN)bREPLACEMENTc(TOKEN)d(TOKEN)e(TOKEN)f(TOKEN)gREPLACEMENTh(TOKEN)i(TOKEN)j(TOKEN)k
于 2012-04-08T04:20:04.520 回答
1
如果您恰好需要两个,那么:
- 检测令牌(保留一些指向它们的链接,例如字符串的索引)
- 随机选择两个(
random.choice
) - 更换它们
于 2012-04-08T03:31:45.053 回答
1
你到底想做什么?一个好的答案将取决于...
也就是说,想到的蛮力解决方案是:
- 将这 10 个标记存储在一个数组中,例如 tokens[0] 是第一个标记,tokens[1] 是第二个标记,……以此类推
- 创建一个字典,将每个唯一的“(TOKEN)”与两个数字相关联:start_idx、end_idx
- 编写一个小解析器,遍历您的字符串并查找 10 个标记中的每一个。每当找到一个时,在该标记出现的字符串中记录开始/结束索引(如 start_idx、end_idx)。
- 完成解析后,生成范围 [0,9] 内的随机数。让我们称之为R
- 现在,你的随机 "(TOKEN)" 是 tokens[ R ];
- 使用步骤(3)中的字典查找字符串中的start_idx、end_idx值;用“其他字符串”替换那里的文本
于 2012-04-08T03:35:44.777 回答
1
我的代码解决方案:
import random
s = "(TOKEN)test(TOKEN)fgsfds(TOKEN)qwerty(TOKEN)42(TOKEN)(TOKEN)ttt"
replace_from = "(TOKEN)"
replace_to = "[REPLACED]"
amount_to_replace = 2
def random_replace(s, replace_from, replace_to, amount_to_replace):
parts = s.split(replace_from)
indices = random.sample(xrange(len(parts) - 1), amount_to_replace)
replaced_s_parts = list()
for i in xrange(len(parts)):
replaced_s_parts.append(parts[i])
if i < len(parts) - 1:
if i in indices:
replaced_s_parts.append(replace_to)
else:
replaced_s_parts.append(replace_from)
return "".join(replaced_s_parts)
#TEST
for i in xrange(5):
print random_replace(s, replace_from, replace_to, 2)
解释:
- 使用将字符串拆分为几个部分
replace_from
- 使用 选择要替换的标记索引
random.sample
。此返回的列表包含唯一编号 - 为字符串重建构建一个列表,用生成的索引替换标记
replace_to
。 - 将所有列表元素连接成单个字符串
于 2012-04-08T03:47:59.370 回答
1
试试这个解决方案:
import random
def replace_random(tokens, eqv, n):
random_tokens = eqv.keys()
random.shuffle(random_tokens)
for i in xrange(n):
t = random_tokens[i]
tokens = tokens.replace(t, eqv[t])
return tokens
假设存在带有标记的字符串,并且可以通过替换每个标记来构造合适的等价表:
tokens = '(TOKEN1) (TOKEN2) (TOKEN3) (TOKEN4) (TOKEN5) (TOKEN6) (TOKEN7) (TOKEN8) (TOKEN9) (TOKEN10)'
equivalences = {
'(TOKEN1)' : 'REPLACEMENT1',
'(TOKEN2)' : 'REPLACEMENT2',
'(TOKEN3)' : 'REPLACEMENT3',
'(TOKEN4)' : 'REPLACEMENT4',
'(TOKEN5)' : 'REPLACEMENT5',
'(TOKEN6)' : 'REPLACEMENT6',
'(TOKEN7)' : 'REPLACEMENT7',
'(TOKEN8)' : 'REPLACEMENT8',
'(TOKEN9)' : 'REPLACEMENT9',
'(TOKEN10)' : 'REPLACEMENT10'
}
你可以这样称呼它:
replace_random(tokens, equivalences, 2)
> '(TOKEN1) REPLACEMENT2 (TOKEN3) (TOKEN4) (TOKEN5) (TOKEN6) (TOKEN7) (TOKEN8) REPLACEMENT9 (TOKEN10)'
于 2012-04-08T04:04:11.093 回答
0
from random import sample
mystr = 'adad(TOKEN)hgfh(TOKEN)hjgjh(TOKEN)kjhk(TOKEN)jkhjk(TOKEN)utuy(TOKEN)tyuu(TOKEN)tyuy(TOKEN)tyuy(TOKEN)tyuy(TOKEN)'
def replace(mystr, substr, n_repl, replacement='XXXXXXX', tokens=10, index=0):
choices = sorted(sample(xrange(tokens),n_repl))
for i in xrange(choices[-1]+1):
index = mystr.index(substr, index) + 1
if i in choices:
mystr = mystr[:index-1] + mystr[index-1:].replace(substr,replacement,1)
return mystr
print replace(mystr,'(TOKEN)',2)
于 2012-04-08T05:57:50.753 回答