python - Python: From 2 Strings Delete Dissimilar Lines

Question

I'm using Python to look at String A and String B.

String A only contains words (with \n newline character as each word is on its own line).

Next, I have String B, which contains lots of words, some which are found in String A and others that are not. I would like to only retain words in String B that are also in String A. The only problem here is that there are other characters after the words in String B that I would also like to retain.

Example:

String_A='apple/nbanana/nkiwi/npear'
String_B='cow|0.0|0.25|apple|0.0|0.99|pig|0.0|horse|0.2|banana|0.0|dog|0.2|kiwi|0.25|'

I would like String_C to have an end format of:

String_C='apple|0.0|0.99|banana|0.0|kiwi|0.25|'

Please see if you can assist! Thanks.

score 0 · Accepted Answer

If there's always exactly two groups after each word in StringB you can do the following

def foo(stringA, stringB):
    sawords = frozenset(stringA.split('\n'))
    sbparts = stringB.split('|')
    sbgroups = [sbparts[i:i+3] for i in range(len(sbparts))[::3]]
    filtered = [group for group in sbgroups if group[0] in sawords]
    return '|'.join(itertools.chain(*filtered))

score 0 · Accepted Answer

0

'|'.join([elem for elem in String_A.split('/') if elem in String_B.split('|')])

于 2012-08-13T08:14:38.783 回答

score 0 · Accepted Answer

这不是更好的实现，但它有效

a = String_A.split('\n')
b = String_B.split('|')
c = []
for i in a:
    try:
        found = b.index(i)
        c.append(b[found])
        found += 1
        while found < len(b) and all(map(str.isdigit, (i for i in b[found] if i != '.-'))):
            c.append(b[found])
            found += 1
    except ValueError:
        pass
c = '|'.join(c)

score 0 · Accepted Answer

这种方法会忽略名称字段，因为它们可以包含小数、“-”和“.”，只要它还包含其他内容。相反，此函数使用 re 模块来测试非名称字段。如果您想在非名称字段中允许其他字符，您可以修改正则表达式。我对 String_B 进行了一些更改以检查其他非十进制字符类型。

import re
import itertools

def filter_strings(stra, strb):
    splita = stra.split("\n")
    splitb = strb.split("|")
    bnestlist = []
    sublist = []

    for segment in splitb:
        if re.match("[\d\.-]+", segment):
            sublist.append(segment)
        else:
            if sublist: bnestlist.append(sublist)
            sublist = []
            sublist.append(segment)

    filtered = [group for group in bnestlist if group[0] in splita]
    return "|".join(itertools.chain.from_iterable(filtered))

例子：

>>> String_A='apple\nbanana\nkiwi\npear'
>>> String_B='cow|0.0|0.25|apple|0.0|-0.99|pig|0.0|horse|0.2|banana|0.0|dog|0.2|kiwi|0.25|'
>>> result = filter_strings(String_A, String_B)
>>> print(result)
apple|0.0|-0.99|banana|0.0|kiwi|0.25

python - Python: From 2 Strings Delete Dissimilar Lines

4 回答 4

Related

Reference