python - 在特殊出现的字符处拆分字符串

Question

我有一个充满这样的字符串的数据文件：

1682|Scream of Stone (Schrei aus Stein) (1991)|08-Mar-1996

我已经解析了字符串并在处拆分"|"并将其转储到列表中，所以我有：

['1682', 'Scream of Stone (Schrei aus Stein) (1991)', '08-Mar-1996']

我需要做的是在围绕年份的括号中的位置 1 处进一步拆分列表。如果电影的标题没有括号，我可以轻松做到，但这里不是这样。

如果下一个字符不是数字，我该如何写一些跳过括号拆分的内容？我想结束：

['1682', 'Scream of Stone (Schrei aus Stein)', '1991', '08-Mar-1996']

一些帮助会很棒！谢谢

score 2 · Accepted Answer

This looks like a job for regular expressions!

import re

data = ['1682', 'Scream of Stone (Schrei aus Stein) (1991)', '08-Mar-1996']

def handleYear(matchobj):
    data.insert(2, matchobj.group(1))
    return ''

data[1] = re.sub(r'\s*\((\d+)\)$', handleYear, data[1])

This removes any string of the form (dddd) from the end of data[1] and inserts it into the next position in data.

score 1 · Accepted Answer

You can use regex split:

import re
title="1682|Scream of Stone (Schrei aus Stein) (1991)|08-Mar-1996"
print re.split('\((\d+)\)', title.split("|")[1])

The re.split splits on regular expressions, i.e., uses regexes as delimiters. If there is a capture in the split expression, the delimiter is also kept in the split result rather than discarded.

The split expression \((\d+)\) first matches literal parentheses \( ... \). and within them matches only digits \d+. But we also capture the digits to keep them, hence \((\d+)\).

score 0 · Accepted Answer

You can use python re module.

>>> import re
>>> s = 'Scream of Stone (Schrei aus Stein) (1991)'
>>> re.findall('\([0-9]+\)', s)
['(1991)']
>>> re.findall('\((\d+)\)', s)
['1991']
>>>

Once you have the year parsed out. you can insert it in whichever index you want to in the list.

python - 在特殊出现的字符处拆分字符串

3 回答 3

Related

Reference