0

I've been using this regex

/(?:[^ .,;:]+[ .,;:]+){3}(?:term1|term2)(?:[ .,;:]+[^ .,;:]+){3}/gi

to extract selected terms and the preceding and succeeding 3 words. I'd like to change the regex so that I extract the line containing the selected terms. The line will be bounded by \n but I'd also like to trim leading and trailing spaces.
How do I alter the regex to do that?

example input:

   This line, containing  term2, I'd like to extract.  
        This line contains term13 and I'd like to ignore it  
  This line, on the other hand, contains term1, so let's keep it.

ouput would be

This line, containing  term2, I'd like to extract.
This line, on the other hand, contains term1, so let's keep it.

See code to be altered below.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Untitled Document</title>
</head>

<body>
<script>
var Input = "   This line, containing  term2, I'd like to extract."
Input += "        This line contains term13 and I'd like to ignore it."
Input += "  This line, on the other hand, contains term1, so let's keep it."

 var matches = Input.match(/(?:[^ .,;:]+[ .,;:]+){3}(?:term1|term2)(?:[ .,;:]+[^ .,;:]+){3}/gi);
 var myMatches = ""
  for (i=0;i<matches.length;i++)
  {
  myMatches += ("..." + matches[i] + "...\n"); //assign to variable
  }
  alert(myMatches)
</script>


</body>
</html>
4

1 回答 1

2

就像 Asad 指出的那样,您可以将 \b 用于单词边界,例如 term1 不会匹配 term13 。

正则表达式:

^ *(.*\b(?:term1|term2)\b.*) *$

应该做你所追求的。您的比赛将在第一个(也是唯一一个)捕获组中。只需遍历它们,您就完成了。

在rubular上看到它。

于 2012-10-17T08:38:20.487 回答