python - 使用 Python 将字符串拆分为所需的形式

Question

我有以下形式的数据：

<a> <b> _:h1 <c>.
_:h1 <e> "200"^^<http://www.w3.org/2001/XMLSchema#integer> <f> .
_:h1 <date> "Mon, 30 Apr 2012 07:01:51 GMT" <p> .
_:h1 <server> "Apache/2" <df> .
_:h1 <last-modified> "Sun, 25 Mar 2012 14:15:37 GMT" <hf> .

我需要使用 Python 将其转换为以下形式：

<a> <b> _:h1.
<1> <c>.
_:h1 <e> "200"^^<http://www.w3.org/2001/XMLSchema#integer> .
<1> <f>.
_:h1 <date> "Mon, 30 Apr 2012 07:01:51 GMT".
<1> <p>.
_:h1 <server> "Apache/2" .
<1> <df>.
_:h1 <last-modified> "Sun, 25 Mar 2012 14:15:37 GMT" .
<1> <hf>.

str.split()我在 Python 中使用该方法编写了代码。它根据空间拆分字符串。但是，它并不能解决我的目的，因为使用它“Sun，2012 年 3 月 25 日 14:15:37 GMT”也会被拆分。有没有其他方法可以使用 Python 实现这一目标？

score 2 · Accepted Answer

您可以使用rfindorrindex方法查找行中最后一次出现的<。

data = """[your data]"""
data_new = ""
for line in data.splitlines():
    i = line.rfind("<")
    data_new += line if i == -1 else line[:i] + ". \n<1> " + line[i:] + "\n"
data_new = data_new.strip()

score 0 · Accepted Answer

那是N3/乌龟吗？如果是这样，我认为您想要RDFlib。

另请参阅：使用 Python 读取 Turtle/N3 RDF 文件

score 0 · Accepted Answer

字符串中的空格有什么问题？您似乎只对最后两个字段感兴趣，无论您的行被分成多少块，这两个字段都将存在。

fields = line.split()
count = len(fields)
tag = fields[count - 2]
dot = fields[count - 1]
# Now print your line without last two fields
l1 = " ".join(fields[0:count - 2])
l2 = '<1> ' + tag + dot

好吧，我不知道到底应该用结束点做什么，但除非你必须让你的字符串保持完全相同的空间，否则应该没问题。

python - 使用 Python 将字符串拆分为所需的形式

3 回答 3

Related

Reference