1

假设我有以下文字:

实现这一目标的步骤包括: 提高移动网络、数据中心、数据传输和频谱分配的效率 减少应用程序必须通过缓存、压缩和未来技术(如点对点数据传输)从网络中提取的数据量 进行投资通过教育人们了解数据的使用,创建在最初提供免费数据访问时蓬勃发展的商业模式,以及建设信用卡基础设施,以便运营商可以从预付费模式转变为促进投资的后付费模式,从而在可访问性方面获利行之有效,移动运营商将获得更多客户并在可访问性方面投入更多;手机制造商将看到人们想要更好的设备;互联网提供商将连接更多人;

正如您通过阅读文本可以看出的,这些是多个句子(点列表)。我怎样才能把这段文字分成句子?我试过使用 python NLTK 但没有运气。检查大写字母也不起作用,因为它不是很可靠。

关于如何解决这个问题的任何想法?

谢谢。

4

1 回答 1

1

如果我理解正确,这个小代码可能会有所帮助:(注意在 python 2.7.5 上测试)

paragraph = 'Steps toward this goal include: Increasing efficiency of mobile networks, data centers, data transmission, and spectrum allocation Reducing the amount of data apps have to pull from networks through caching, compression, and futuristic technologies like peer-to-peer data transfer Making investments in accessibility profitable by educating people about the uses of data, creating business models that thrive when free data access is offered initially, and building out credit card infrastructure so carriers can move from pre-paid to post-paid models that facilitate investment If the plan works, mobile operators will gain more customers and invest more in accessibility; phone makers will see people wanting better devices; Internet providers will get to connect more people; and people will receive affordable Internet so they can join the knowledge economy and connect with the people they care about.'
words = []
separators = ['.',',',':',';']
oldValue = 0
for value in range(len(paragraph)):
    if paragraph[value] in separators:
        words.append(paragraph[oldValue:value+1])
        oldValue = value+2
for word in words:
    print word

[编辑] 你也可以很容易地添加大写字母检查

if paragraph[value] == paragraph[value].upper():
    words.append(paragraph[oldValue:value+1])
    ...
于 2013-09-14T12:41:49.500 回答