4

所以,我是编程新手,但我正在尝试学习 JavaScript。目前我正在做一个项目,我试图将一个大型文本文件(莎士比亚的 154 首十四行诗在这里找到)解析为一个对象数组,数据结构如下:

var obj = {
property 1: [ 'value 1',
     'value 2',
    ], 
property 2: [ 'value 1',
     'value 2',
    ], 

等等,其中罗马数字代表对象属性,十四行诗的每一行代表每个属性数组中的一个值。

我必须使用正则表达式来解析文本文件。到目前为止,我一直在寻找正确的正则表达式来划分文本,但我不知道我是否以正确的方式进行。最终我想创建一个下拉菜单,其中列表中的每个值都是十四行诗。

编辑:我现在实际上是从这个网址获取源文本:http: //pizzaboys.biz/xxx/sonnets.php

并做与上面相同的事情,但我没有做 $get 而是将文本放入变量中......

我试过这个:

$(document).ready(function(){
    var data = new SonnetizerArray();
});

function SonnetizerArray(){
    this.data = [];
    var rawText = "text from above link"
    var rx = /^\\n[CDILVX]/$\\n/g;

    var array_of_sonnets = rawText.exec(rx);
    for (var i = 0; i < array_of_sonnets.length; i ++){
        var s = $.split(array_of_sonnets[i]);
        if (s.length > 0) this.data.push(s);
    }
}
4

1 回答 1

5

描述

此正则表达式会将文本解析为罗马数字和正文。然后可以在新行上拆分主体\n

^\s+\b([CDMLXVI]{1,12})\b(?:\r|\n|$).*?(?:^.*?)(^.*?)(?=^\s+\b([MLXVI]{1,12})\b(?:\r|\n|$)|\Z)

在此处输入图像描述

捕获组

第 0 组获取整个匹配部分

  1. 获取罗马数字
  2. 获取部分的正文,不包括罗马数字

Javascript代码示例:

从您的链接中提取的示例文本

  VII

  Lo! in the orient when the gracious light
  Lifts up his burning head, each under eye
  Doth homage to his new-appearing sight,


  VIII

  Music to hear, why hear'st thou music sadly?
  Sweets with sweets war not, joy delights in joy:
  Why lov'st thou that which thou receiv'st not gladly,
  Or else receiv'st with pleasure thine annoy?


  IX

  Is it for fear to wet a widow's eye,
  That thou consum'st thy self in single life?
  Ah! if thou issueless shalt hap to die,
  The world will wail thee like a makeless wife;

示例代码

<script type="text/javascript">
  var re = /^\s+\b([MLXVI]{1,12})\b(?:\r|\n|$).*?(?:^.*?)(^.*?)(?=^\s+\b([MLXVI]{1,12})\b(?:\r|\n|$)|\Z)/;
  var sourcestring = "source string to match with pattern";
  var results = [];
  var i = 0;
  for (var matches = re.exec(sourcestring); matches != null; matches = re.exec(sourcestring)) {
    results[i] = matches;
    for (var j=0; j<matches.length; j++) {
      alert("results["+i+"]["+j+"] = " + results[i][j]);
    }
    i++;
  }
</script>

样本输出

$matches Array:
(
    [0] => Array
        (
            [0] =>   VII

  Lo! in the orient when the gracious light
  Lifts up his burning head, each under eye
  Doth homage to his new-appearing sight,

            [1] => 

  VIII

  Music to hear, why hear'st thou music sadly?
  Sweets with sweets war not, joy delights in joy:
  Why lov'st thou that which thou receiv'st not gladly,
  Or else receiv'st with pleasure thine annoy?

            [2] => 

  IX

  Is it for fear to wet a widow's eye,
  That thou consum'st thy self in single life?
  Ah! if thou issueless shalt hap to die,
  The world will wail thee like a makeless wife;
        )

    [1] => Array
        (
            [0] => VII
            [1] => VIII
            [2] => IX
        )

    [2] => Array
        (
            [0] => 
  Lo! in the orient when the gracious light
  Lifts up his burning head, each under eye
  Doth homage to his new-appearing sight,

            [1] => 
  Music to hear, why hear'st thou music sadly?
  Sweets with sweets war not, joy delights in joy:
  Why lov'st thou that which thou receiv'st not gladly,
  Or else receiv'st with pleasure thine annoy?

            [2] => 
  Is it for fear to wet a widow's eye,
  That thou consum'st thy self in single life?
  Ah! if thou issueless shalt hap to die,
  The world will wail thee like a makeless wife;
        )

    [3] => Array
        (
            [0] => VIII
            [1] => IX
            [2] => 
        )

)

罗马数字验证

上面的表达式只测试罗马数字字符串是由罗马数字字符组成的,它实际上并不验证数字是否有效。如果您还需要验证罗马数字的格式是否正确,则可以使用此表达式。

^\s+\b(M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3}))\b(?:\r|\n|$).*?(?:^.*?)(^.*?)(?=^\s+\b([MLXVI]{1,12})\b(?:\r|\n|$)|\Z)

在此处输入图像描述

于 2013-06-13T04:02:25.757 回答