javascript - 如何获取文本某些部分的数组

Question

我有一些文字，包括文章的数量；我需要得到这些数字的数组（包括文章），然后是“标记词”。fe 在文本中：

《123456/9902/001 一二三手 123456/9902/002 胖子抽签 123456/9902/003 五六 123456/9902/004 七十黄油》

我为“标记词”= [hand,ten] 生成的数组将是：

【“123456/9902/001一二三手”、“123456/9902/004七十黄油”】

我的代码找到了一些东西，但它工作错了，正确的正则表达式是什么？

let markers = ["hand", "ten"],
  fin = [];
let num = "(\\d{6}\/\\d{4}\/\\d{3}).*?";
markers.forEach(item => {
  let reg = new RegExp(num + item, 'gmi');
  found = text.match(reg);
  found.forEach(item => fin.push(item));
  if (result) {
    console.log(`for ${item} : ${found.length}`);
    console.log(found);
  } else {
    (console.log('Nothing'))
  }
})
console.log(fin)

score 0 · Accepted Answer

您可以使用前瞻正则表达式将文章拆分为一个数组，然后通过基于标记的正则表达式过滤该数组：

let text = "123456/9902/001 one two three hand 123456/9902/002 fat got lot 123456/9902/003 five six 123456/9902/004 seven ten butter";

let markers = ["hand","ten"];
let regex = new RegExp("\\b("+markers.join("|")+")\\b", "");
let result = text.split(/(?=\s*\d{6}\/\d{4}\/\d{3})/).filter(art => regex.test(art));

console.log(result);

如果您的标记将包含在正则表达式中具有特殊含义的字符，则需要对它们进行转义。

score 0 · Accepted Answer

您可以首先使用以下代码分析文本：

function findArticles(text) {
  return text.match(/(?:\d{6}\/\d{4}\/\d{3})(?: [a-zA-Z]+)+/g).map(item => item[0])
}

然后按标记获取文章：

function getArticleByMarker(articles, marker) {
    let result = null
    articles.forEach(article => article.indexOf(marker) > 0 ? result = article : undefined)
    return result
}

score 0 · Accepted Answer

您可以使用它来将字符串拆分为不同的文章名称，然后过滤掉那些不包含标记 words的文章，而不是使用正则表达式来提取所需的文章。这是一个例子：

const markers = ['hand', 'ten']
const str = `123456/9902/001 one two three hand 123456/9902/002 fat got lot 123456/9902/003 five six 123456/9902/004 seven ten butter`;

const articleNames = str.split(/(?=\d{6}\/\d{4}\/\d{3})/);

const articleNamesWithMarkers = articleNames.filter(articleName => markers.some(marker => articleName.includes(marker)));

console.log(articleNamesWithMarkers);

javascript - 如何获取文本某些部分的数组

3 回答 3

Related

Reference