javascript - javascript中截断单词的功能（学习dojo的代码）

Question

'truncate words' 会接受一串单词并只返回第一个单词，比如说 10 个单词。

在dojo（javascript库）中，他们有这样一个函数，其代码是这样的：

truncatewords: function(value, arg){
    // summary: Truncates a string after a certain number of words
   // arg: Integer
   //              Number of words to truncate after
   arg = parseInt(arg);
   if(!arg){
           return value;
   }

   for(var i = 0, j = value.length, count = 0, current, last; i < value.length; i++){
           current = value.charAt(i);
           if(dojox.dtl.filter.strings._truncatewords.test(last)){
                   if(!dojox.dtl.filter.strings._truncatewords.test(current)){
                           ++count;
                           if(count == arg){
                                   return value.substring(0, j + 1);
                           }
                   }
           }else if(!dojox.dtl.filter.strings._truncatewords.test(current)){
                   j = i;
           }
           last = current;
   }
   return value;
}

dojox.dtl.filter.strings._truncatewords.在哪里/(&.*?;|<.*?>|(\w[\w-]*))/g

为什么不是这样写：

function truncate(value,arg) {
    var value_arr = value.split(' ');
    if(arg < value_arr.length) {
        value = value_arr.slice(0,arg).join(' '); }
    return value;
}

有什么区别？

score 3 · Accepted Answer

您的拆分应考虑到任何空白字符序列都是单词分隔符。您应该拆分一个正则表达式，例如\s+.

但除此之外，dojo 的代码似乎也将实体和 xml 标记作为单词。如果你知道你的字符串中没有这样的东西，你的实现可能会成功。请注意，尽管您的切片不会超出找到的单词数，但这可能需要进行一些检查。

score 0 · Accepted Answer

您正在查看的代码来自 dtl 库，该库用于支持 django 模板语言。（http://www.dojotoolkit.org/book/dojo-book-0-9/part-5-dojox/dojox-dtl）。我确信其中的代码不仅仅是为了进行直接的字符串拆分，而是解析他们正在使用的模板。

此外，看看那个正则表达式，他们处理的场景不仅仅是空格......例如， <.*?> 将导致包含在开始和结束标签中的任何一组单词被视为“单词”。

score 0 · Accepted Answer

function declaration: this is probably a javascript object, and using function_name: function(params) {... helps keep javascript out of the global scope.
By checking the arg variable, they're ensuring that an integer was passed. Using parseInt() will allow both 10 and "10" to be accepted.
This method can handle more delimiters than spaces by the regex being used.
This code is safe for array overflow. You can't count to 10 if there are only 8 words in value. Otherwise, you'd get an array out of bounds or object does not exist error.

score 0 · Accepted Answer

正则表达式是 3 个部分

&.*?; 将匹配字符实体（如 &）
<.*?> 将匹配尖括号中的内容
(\w[\w-]*) 将匹配以 [a-zA-Z0-9_] 开头的字符串，后跟一个破折号

这不仅仅是空间上的分裂。它正在寻找它认为可能是一个词的一部分的东西，一旦它找到不是的东西，它就会增加字数。

它应该采用逗号或管道分隔的列表，并且可以与空格分隔的列表一样工作。

javascript - javascript中截断单词的功能（学习dojo的代码）

4 回答 4

Related

Reference