1

我正在尝试以下列方式拆分字符串。这是一个示例字符串:

"Hello this is a string.-2.34 This is an example1 string."

请注意,“”是一个 U+F8FF unicode 字符,字符串的类型是 Unicode。

我想将字符串分解为:

"Hello this is a string.","-2.34"," This is an example1 string."

我已经编写了一个正则表达式来拆分字符串,但是使用它我无法获得我想要的数字部分。(第一个字符串中的-2.34)

我的代码:

import re
import os
from django.utils.encoding import smart_str, smart_unicode

text = open(r"C:\data.txt").read()
text = text.decode('utf-8')
print(smart_str(text))

pat = re.compile(u"\uf8ff-*\d+\.*\d+")
newpart = pat.split(text)
firstpart = newpart[::1]

print ("first part of the string ----")
for f in firstpart:
f = smart_str(f)
print ("-----")
print f

4

1 回答 1

5

-*\d+\.*\d+如果要将其保留在以下结果中,则需要将括号括起来re.split

import re
text = u"Hello this is a string.\uf8ff-2.34 This is an example1 string."
print(re.split(u'\uf8ff(-*\d+\.*\d+)', text))

产量

[u'Hello this is a string.', u'-2.34', u' This is an example1 string.']
于 2012-11-18T02:49:43.087 回答