定义你的规则:
// 1. 句子以大写字母开头 // 2. 句子前面没有任何内容或 [.!?],但不是 [,:;] // 3. 句子前面可以有引号如果格式不正确,例如 ["'] // 4. 如果引号后面的单词是名称,则在这种情况下,句子可能不正确
有什么额外的规则吗?
定义你的目的:
// 1. 删除最后一句话
假设:
如果您从文本字符串中的最后一个字符开始并向后工作,那么您会将句子的开头标识为: 1. 字符之前的文本字符串是 [.?!] 或 2. 字符串字符之前的文本是 ["'] 并且前面是大写字母 3. 每个 [.] 前面都有一个空格 4. 我们没有更正 html 标签 5. 这些假设并不可靠,需要进行调整经常
可能的解决方案:
读入您的字符串并将其拆分为空格字符,以便我们反向查看字符串块。
var characterGroups = $('#this-paragraph').html().split(' ').reverse();
如果你的字符串是:
布拉布拉,这里还有更多文字。有时会使用基本的 html 代码,但这不应该使句子的“选择”变得更加困难!我抬头看了看窗户,看到一架飞机飞了过来。我首先想到的是:“它在上面做什么?” 她不知道,“我想我们应该越过栅栏!”,她连忙说道。他后来将其描述为:“有些疯狂。”
var originalString = 'Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said. He later described it as: "Something insane."';
那么你的数组characterGroups
将是:
["insane."", ""Something", "as:", "it", "described", "later", "He",
"said.", "quickly", "she", "fence!",", "the", "past", "move", "should", "we",
"think", ""I", "know,", "not", "did", "She", "there?"", "up", "doing", "it",
"is", ""What", "mind:", "to", "came", "that", "thing", "first", "the", "asked",
"I", "over.", "flying", "plane", "a", "saw", "I", "and", "window", "the", "up",
"looked", "I", "harder!", "any", "sentence", "the", "of", ""selection"", "the",
"make", "not", "should", "that", "but", "used", "is", "code", "html", "basic",
"Sometimes", "here.", "text", "more", "some", "Blabla,"]
注意: '' 标记和其他标记将使用 jQuery 中的 .text() 方法删除
每个块后面都有一个空格,所以当我们确定了句子的开始位置(通过数组索引)时,我们将知道空格有什么索引,我们可以将原始字符串拆分到空格从末尾开始占据该索引的位置的句子。
给自己一个变量来标记我们是否找到它,并给我们一个变量来保存我们识别为保存最后一句开头的数组元素的索引位置:
var found = false;
var index = null;
遍历数组并查找以 [.!?] 或以 " 结尾的任何元素,其中前一个元素以大写字母开头。
var position = 1,//skip the first one since we know that's the end anyway
elements = characterGroups.length,
element = null,
prevHadUpper = false,
last = null;
while(!found && position < elements) {
element = characterGroups[position].split('');
if(element.length > 0) {
last = element[element.length-1];
// test last character rule
if(
last=='.' // ends in '.'
|| last=='!' // ends in '!'
|| last=='?' // ends in '?'
|| (last=='"' && prevHadUpper) // ends in '"' and previous started [A-Z]
) {
found = true;
index = position-1;
lookFor = last+' '+characterGroups[position-1];
} else {
if(element[0] == element[0].toUpperCase()) {
prevHadUpper = true;
} else {
prevHadUpper = false;
}
}
} else {
prevHadUpper = false;
}
position++;
}
如果您运行上面的脚本,它将正确地将“He”识别为最后一句的开头。
console.log(characterGroups[index]); // He at index=6
现在您可以遍历之前的字符串:
var trimPosition = originalString.lastIndexOf(lookFor)+1;
var updatedString = originalString.substr(0,trimPosition);
console.log(updatedString);
// Blabla, some more text here. Sometimes <span>basic</span> html code is used but that should not make the "selection" of the sentence any harder! I looked up the window and I saw a plane flying over. I asked the first thing that came to mind: "What is it doing up there?" She did not know, "I think we should move past the fence!", she quickly said.
再次运行它并得到:Blabla,这里还有一些文本。有时会使用基本的 html 代码,但这不应该使句子的“选择”变得更加困难!我抬头看了看窗户,看到一架飞机飞了过来。我首先想到的是:“它在上面做什么?”
再次运行它并得到:Blabla,这里还有一些文本。有时会使用基本的 html 代码,但这不应该使句子的“选择”变得更加困难!我抬头看了看窗户,看到一架飞机飞了过来。
再次运行它并得到:Blabla,这里还有一些文本。有时会使用基本的 html 代码,但这不应该使句子的“选择”变得更加困难!
再次运行它并得到:Blabla,这里还有一些文本。
再次运行它并得到:Blabla,这里还有一些文本。
所以,我认为这符合您的要求?
作为一个函数:
function trimSentence(string){
var found = false;
var index = null;
var characterGroups = string.split(' ').reverse();
var position = 1,//skip the first one since we know that's the end anyway
elements = characterGroups.length,
element = null,
prevHadUpper = false,
last = null,
lookFor = '';
while(!found && position < elements) {
element = characterGroups[position].split('');
if(element.length > 0) {
last = element[element.length-1];
// test last character rule
if(
last=='.' || // ends in '.'
last=='!' || // ends in '!'
last=='?' || // ends in '?'
(last=='"' && prevHadUpper) // ends in '"' and previous started [A-Z]
) {
found = true;
index = position-1;
lookFor = last+' '+characterGroups[position-1];
} else {
if(element[0] == element[0].toUpperCase()) {
prevHadUpper = true;
} else {
prevHadUpper = false;
}
}
} else {
prevHadUpper = false;
}
position++;
}
var trimPosition = string.lastIndexOf(lookFor)+1;
return string.substr(0,trimPosition);
}
如果为它制作一个插件是微不足道的,但要注意假设!:)
这有帮助吗?
谢谢,AE