0

如果我有字典:

mydict = {"g18_84pp_2A_MVP1_GoodiesT0-HKJ-DFG_MIX-CMVP1_Y1000-MIX.txt" : 0,
          "g18_84pp_2A_MVP2_GoodiesT0-HKJ-DFG_MIX-CMVP2_Y1000-MIX.txt" : 1,
          "g18_84pp_2A_MVP3_GoodiesT0-HKJ-DFG_MIX-CMVP3_Y1000-MIX.txt" : 2,
          "g18_84pp_2A_MVP4_GoodiesT0-HKJ-DFG_MIX-CMVP4_Y1000-MIX.txt" : 3,
          "g18_84pp_2A_MVP5_GoodiesT0-HKJ-DFG_MIX-CMVP5_Y1000-MIX.txt" : 4,
          "g18_84pp_2A_MVP6_GoodiesT0-HKJ-DFG_MIX-CMVP6_Y1000-MIX.txt" : 5,
          "h18_84pp_3A_MVP1_GoodiesT1-HKJ-DFG-CMVP1_Y1000-FIX.txt" : 6,
          "g18_84pp_2A_MVP7_GoodiesT0-HKJ-DFG_MIX-CMVP7_Y1000-MIX.txt" : 7,
          "h18_84pp_3A_MVP2_GoodiesT1-HKJ-DFG-CMVP2_Y1000-FIX.txt" : 8,
          "h18_84pp_3A_MVP3_GoodiesT1-HKJ-DFG-CMVP3_Y1000-FIX.txt" : 9,
          "p18_84pp_2B_MVP1_GoodiesT2-HKJ-DFG-CMVP3_Y1000-FIX.txt" : 10}
  1. 我想g18_84pp_2A_MVP_GoodiesT0在第一个-.

  2. 我还想在第一组中找到特定单词时添加一个_MIX跟随。假设我能够根据是或在 myDict 中对两组进行分类,那么最终的输出字典:g18_84pp_2A_MVP_GoodiesT0MIXMIXFIX

OutputNameDict= {"g18_84pp_2A_MVP_GoodiesT0_MIX" : 0,
                  "h18_84pp_3A_MVP_GoodiesT1_FIX" : 1,
                  "p18_84pp_2B_MVP_FIX": 2}

有什么功能可以用来查找公共部分吗?如何在特定符号 like 之前或之后提取单词-并找到特定单词 like MIXor FIX

4

3 回答 3

1

您可以使用split来获取公共部分:

s = "g18_84pp_2A_MVP1_GoodiesT0-HKJ-DFG_MIX-CMVP1_Y1000-MIX.txt"
n = s.split('-')[0]

事实上,split会给你一个由 分隔的每个标记的列表'-',因此s.split('-')产生:

['g18_84pp_2A_MVP1_GoodiesT0', 'HKJ', 'DFG_MIX', 'CMVP1_Y1000', 'MIX.txt']

要查看MIXorFIX是否在字符串中,您可以使用in

if 'MIX' in s:
    print "then MIX is in the string s"

如果你想摆脱后面的数字'MVP',你可以使用re模块:

import re
s = 'g18_84pp_2A_MVP1_GoodiesT0'
s = re.sub('MVP[0-9]*','MVP',s)

这是一个获取常用部件列表的示例函数:

def foo(mydict):
    return [re.sub('MVP[0-9]*', 'MVP', k.split('-')[0]) for k in mydict]
于 2013-07-17T14:10:49.487 回答
1

您可以使用该index()函数来查找破折号,然后根据这些知识,您可以将字符串的其余部分移过该点。例如,

mydict = {"g18_84pp_2A_MVP1_GoodiesT0-HKJ-DFG_MIX-CMVP1_Y1000-MIX.txt" : 0,
          "g18_84pp_2A_MVP2_GoodiesT0-HKJ-DFG_MIX-CMVP2_Y1000-MIX.txt" : 1,
          "g18_84pp_2A_MVP3_GoodiesT0-HKJ-DFG_MIX-CMVP3_Y1000-MIX.txt" : 2,
          "g18_84pp_2A_MVP4_GoodiesT0-HKJ-DFG_MIX-CMVP4_Y1000-MIX.txt" : 3,
          "g18_84pp_2A_MVP5_GoodiesT0-HKJ-DFG_MIX-CMVP5_Y1000-MIX.txt" : 4,
          "g18_84pp_2A_MVP6_GoodiesT0-HKJ-DFG_MIX-CMVP6_Y1000-MIX.txt" : 5,
          "g18_84pp_2A_MVP7_GoodiesT0-HKJ-DFG_MIX-CMVP7_Y1000-MIX.txt" : 6,
          "h18_84pp_3A_MVP1_GoodiesT1-HKJ-DFG_MIX-CMVP1_Y1000-FIX.txt" : 7,
          "h18_84pp_3A_MVP2_GoodiesT1-HKJ-DFG_MIX-CMVP2_Y1000-FIX.txt" : 8,
          "h18_84pp_3A_MVP2_GoodiesT1-HKJ-DFG_MIX-CMVP3_Y1000-FIX.txt" : 9}

for value in sorted(mydict.iterkeys()):
        index = value.index('-')
        extracted = value[index+1:-4] # Goes past the first occurrence of - and removes .txt from the end
        print extracted[-3:] # Find the last 3 letters in the string

将打印以下内容:

MIX
MIX
MIX
MIX
MIX
MIX
MIX
FIX
FIX
FIX

然后 if 语句可以用来做你想做的事。

如果您只想提取公共部分。

index = value.index('-')
extracted = value[:index] # Will get g18_84pp_2A_MVP1_GoodiesT0

然后要弄清楚要使用的结局。如果你知道 mydict 值的结尾总是 MIX.txt 或 FIX.txt 那么你可以这样做。

for value in sorted(mydict.iterkeys()):
    ending = value[-7:-4]
    index = value.index('-')
    extracted = value[:index]
    print "%s_%s" % (extracted, ending)

哪个打印

g18_84pp_2A_MVP1_GoodiesT0_MIX
g18_84pp_2A_MVP2_GoodiesT0_MIX
g18_84pp_2A_MVP3_GoodiesT0_MIX
g18_84pp_2A_MVP4_GoodiesT0_MIX
g18_84pp_2A_MVP5_GoodiesT0_MIX
g18_84pp_2A_MVP6_GoodiesT0_MIX
g18_84pp_2A_MVP7_GoodiesT0_MIX
h18_84pp_3A_MVP1_GoodiesT1_FIX
h18_84pp_3A_MVP2_GoodiesT1_FIX
h18_84pp_3A_MVP2_GoodiesT1_FIX

然后将其添加到提取的字典中。

于 2013-07-17T14:20:10.583 回答
0

感谢您的回答。我的完整代码如下。有什么优化它的建议吗?

import re

mydict = {"g18_84pp_2A_MVP1_GoodiesT0-HKJ-DFG_MIX-CMVP1_Y1000-MIX.txt" : 0,
          "g18_84pp_2A_MVP2_GoodiesT0-HKJ-DFG_MIX-CMVP2_Y1000-MIX.txt" : 1,
          "g18_84pp_2A_MVP3_GoodiesT0-HKJ-DFG_MIX-CMVP3_Y1000-MIX.txt" : 2,
          "g18_84pp_2A_MVP4_GoodiesT0-HKJ-DFG_MIX-CMVP4_Y1000-MIX.txt" : 3,
          "g18_84pp_2A_MVP5_GoodiesT0-HKJ-DFG_MIX-CMVP5_Y1000-MIX.txt" : 4,
          "g18_84pp_2A_MVP6_GoodiesT0-HKJ-DFG_MIX-CMVP6_Y1000-MIX.txt" : 5,    
          "h18_84pp_3A_MVP1_GoodiesT1-HKJ-DFG-CMVP1_Y1000-FIX.txt" : 6,    
          "g18_84pp_2A_MVP7_GoodiesT0-HKJ-DFG_MIX-CMVP7_Y1000-MIX.txt" : 7,
          "h18_84pp_3A_MVP2_GoodiesT1-HKJ-DFG-CMVP2_Y1000-FIX.txt" : 8,
          "h18_84pp_3A_MVP3_GoodiesT1-HKJ-DFG-CMVP3_Y1000-FIX.txt" : 9,
          "p18_84pp_2B_MVP1_GoodiesT2-HKJ-DFG-CMVP3_Y1000-FIX.txt" : 10}

ExtractDict = {}
start = 0
for stringList in sorted(mydict.iterkeys()):
    stringList = stringList.split('.')[0]  
    underscore = stringList.split('_')   
    Area= re.split('[0-9]+',stringList.split('_')[3])[0] # MVP and etc.       
    CaseNameString=underscore[0]+"_"+underscore[1]+"_"+underscore[2]+"_"+Area #g18_84pp_2A_MVP_GoodiesT0 and etc.
    postfix= stringList.split('-')[4]
    Newstring= CaseNameString + "_" + postfix   
    ExtractDict[Newstring]= start
    start += 1
startagain =0
OutputNameDict = {}
for OutputNameList in sorted(ExtractDict.iterkeys()):
    OutputNameDict[OutputNameList] = startagain
    startagain +=1

#OutputNameDict = {'h18_84pp_3A_MVP_FIX': 1, 'p18_84pp_2B_MVP_FIX': 2, 'g18_84pp_2A_MVP_MIX': 0}
于 2013-07-18T15:44:29.343 回答