问题标签 [text-segmentation]

问问题

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

189 问题

0 投票

6 回答

28318 浏览

algorithm - Split a string to a string of valid words using Dynamic Programming

I need to find a dynamic programming algorithm to solve this problem. I tried but couldn't figure it out. Here is the problem:

You are given a string of n characters s[1...n], which you believe to be a corrupted text document in which all punctuation has vanished (so that it looks something like "itwasthebestoftimes..."). You wish to reconstruct the document using a dictionary, which is available in the form of a Boolean function dict(*) such that, for any string w, dict(w) has value 1 if w is a valid word, and has value 0 otherwise.

Give a dynamic programming algorithm that determines whether the string s[*] can be reconstituted as a sequence of valid words. The running time should be at most O(n^2), assuming that each call to dict takes unit time.
In the event that the string is valid, make your algorithm output the corresponding sequence of words.

2011-03-15T11:02:54.387

0 投票

7 回答

16204 浏览

php - 如何将句子中第一个单词的第一个字母大写？

我正在尝试编写一个函数来清理用户输入。

我不想让它完美。我宁愿有一些小写的名字和首字母缩写词，也不愿用大写的完整段落。

我认为该函数应该使用正则表达式，但我对这些很不好，我需要一些帮助。

如果下面的表达式后面跟着一个字母，我想把那个字母变成大写。

更好的是，该函数可以在“.”、“！”之后添加一个空格。和 ”？” 如果后面跟着一个字母。

如何做到这一点？

php regex user-input text-segmentation

2011-03-21T20:46:32.350

0 投票

2 回答

504 浏览

algorithm - DP的复发关系？

假设您有一本包含有效单词的字典。

给定一个删除所有空格的输入字符串，确定该字符串是否由有效单词组成。

您可以假设字典是一个提供 O(1) 查找的哈希表。

请为此给出一个递归关系。我在一本书中找到了这个问题，但是这本书没有给出答案？

algorithm dynamic-programming text-segmentation

2011-04-06T18:13:16.127

0 投票

5 回答

2629 浏览

c# - 在句子边界拆分文本文件

我必须处理一个文本文件（电子书）。我想处理它，以便每行有一个句子（“换行符分隔的文件”，是吗？）。我将如何使用 sed UNIX 实用程序来完成这项任务？它是否具有“句子边界”的符号，就像“单词边界”的符号一样（我认为 GNU 版本有）。请注意，句子可以以句号、省略号、问号或感叹号结尾，最后两个组合在一起（例如，?、!、!?、!!!!! 都是有效的“句子终止符”）。输入文件的格式设置为某些句子包含必须删除的换行符。

我想到了一个脚本s/...|. |[!?]+ |/\n/g（为了更好地阅读而未转义）。但它不会从句子中删除换行符。

在 C# 中怎么样？如果我使用像 sed 这样的正则表达式会不会快得多？（我想不是）。还有其他更快的方法吗？

无论哪种方式（sed 或 C#）都可以。谢谢你。

c#sed nlp text-segmentation

2011-04-11T11:19:03.307

0 投票

3 回答

280 浏览

regex - 如何从输入中获取句子编号？

似乎很难检测文本中的句子边界。引号之类的 .!? 可用于分隔句子，但不太准确，因为可能存在模棱两可的单词和引文，例如 USA 或 Prof. 或 Dr。我正在研究Jan Goyvaerts的 Tperlregex 库和正则表达式食谱，但我不知道如何编写表达式检测句子？

在 delphi 中使用 Tperlregex 可能比较准确的表达是什么？

谢谢

regex delphi nlp text-segmentation

2011-04-20T15:59:43.767

0 投票

4 回答

9917 浏览

algorithm - 有没有好的开源或免费提供的中文分词算法可用？

正如问题中所说，我正在寻找一种免费和/或开源的中文文本分割算法，我确实理解这是一项非常难以解决的任务，因为其中涉及很多歧义。我知道有 google 的 API，但它是一个黑匣子，即没有多少关于它正在做什么的信息通过。

algorithm open-source cjk text-segmentation

2011-04-29T15:59:09.850

0 投票

15 回答

236058 浏览

python - 将字符串转换为单词列表？

我正在尝试使用 python 将字符串转换为单词列表。我想采取以下措施：

然后转换成这样的东西：

注意省略标点符号和空格。最快的方法是什么？

python string list words text-segmentation

2011-05-31T00:09:24.393

0 投票

10 回答

110731 浏览

python - Python：截断句子的最后一个单词？

从文本块中分割最后一个单词的最佳方法是什么？

我能想到

将其拆分为列表（按空格）并删除最后一项，然后重新连接列表。
使用正则表达式替换最后一个单词。

我目前正在采用方法＃1，但我不知道如何连接列表......

非常感谢任何代码示例。

python split concatenation word text-segmentation

2011-06-07T14:26:36.403

0 投票

7 回答

2055 浏览

c# - 解析连续字符串中的单词

如果 a 有一个包含单词且没有空格的字符串，鉴于我有一个包含这些单词的字典/列表，我应该如何解析这些单词？

例如，如果我的字符串是“thisisastringwithwords”，我如何使用字典来创建输出“这是一个带单词的字符串”？

我听说使用数据结构Tries可能会有所帮助，但也许有人可以帮助处理伪代码？例如，我在想也许你可以将字典索引到一个 trie 结构中，然后沿着 trie 中的每个字符；问题是，我不熟悉如何在（伪）代码中执行此操作。

c#algorithm data-structures text-segmentation

2011-06-20T07:54:11.400

0 投票

5 回答

1706 浏览

python - 使用正则表达式进行句子分割

我的文本（SMS）消息很少，我想使用句点（'.'）作为分隔符对它们进行分段。我无法处理以下类型的消息。如何在 Python 中使用 Regex 对这些消息进行分段。

分割前：

分割后：

每一行都是一个单独的消息

更新：

我正在做自然语言处理，我觉得可以'16.8mmmol/l'同等对待'no of beds 8.2 cups of tea.'。80% 的准确率对我来说已经足够了，但我想尽可能地降低False Positive。

python regex text-segmentation

2011-07-19T10:17:35.907

1 2 3 4 5 6 7 8 9 10

问题标签 [text-segmentation]

Reference