php - 创建准确的圣经搜索

Question

我正在创建一个圣经搜索。圣经搜索的问题在于人们经常输入不同类型的搜索，我需要将它们相应地分开。所以我认为最好的开始方法是删除所有空格，并在那里处理字符串。不同类型的搜索可能是：

Genesis 1:1- 创世记第 1 章第 1 节

1 Kings 2:5- 1 国王第 2 章第 5 节

Job 3- 工作第 3 章

Romans 8:1-7- 罗马书第 8 章第 1 至 7 节

1 John 5:6-11- 约翰一书第 5 章第 6 - 11 节。

我对不同类型的搜索并不太分阶段，但是如果有人能找到更简单的方法来做到这一点，或者知道这样做的好方法，那么请告诉我如何！

谢谢

score 1 · Accepted Answer

这里最简单的做法是编写一个正则表达式来捕获文本，然后解析捕获的内容以查看您得到的内容。首先，假设您有测试平台：

$tests = array( 
    'Genesis 1:1' => 'Genesis Chapter 1, Verse 1',
    '1 Kings 2:5' => '1 Kings Chapter 2, Verse 5',
    'Job 3' => 'Job Chapter 3',
    'Romans 8:1-7' => 'Romans Chapter 8, Verses 1 to 7',
    '1 John 5:6-11' => '1 John Chapter 5, Verses 6 to 11'
);

所以，你有，从左到右：

书名，可选前缀为数字
章号
节数，可选，可选后跟一个范围。

因此，我们可以编写一个正则表达式来匹配所有这些情况：

((?:\d+\s)?\w+)\s+(\d+)(?::(\d+(?:-\d+)?))?

现在看看我们从正则表达式中得到了什么：

foreach( $tests as $test => $answer) {
    // Match the regex against the test case
    preg_match( $regex, $test, $match);

    // Ignore the first entry, the 2nd and 3rd entries hold the book and chapter
    list( , $book, $chapter) = array_map( 'trim', $match);

    $output = "$book Chapter $chapter";

    // If the fourth match exists, we have a verse entry
    if( isset( $match[3])) {
        // If there is no dash, it's a single verse
        if( strpos( $match[3], '-') === false) {
            $output .= ", Verse " . $match[3];
        } else {
            // Otherwise it's a range of verses
            list( $start, $end) = explode( '-', $match[3]);
            $output .= ", Verses $start to $end";
        }
    }
    // Here $output matches the value in $answer from our test cases
    echo $answer . "\n" . $output . "\n\n";
}

您可以在此演示中看到它的工作原理。

score 0 · Accepted Answer

我知道这是疯话，但为什么不只是有一个包含 4 个字段的表单，以便他们可以指定：

书
章节
起始诗句
结尾诗[可选]

score 0 · Accepted Answer

我想我明白你在这里问什么。您想设计一种提取信息的算法（例如书名、章节、诗句）。

在我看来，这就像模式匹配（例如正则表达式）的工作，因为您可以定义模式，为所有有意义的场景提取数据并从那里开始工作。

实际上可能存在相当多的变体 - 也许您还应该看看自然语言处理。名称上的模糊字符串匹配可以提供更好的结果（例如，人们拼写错误的书名）。

祝你好运

score 0 · Accepted Answer

尝试基于 preg_match_all 的东西，例如：

$ php -a
Interactive shell

php > $s = '1 kings 2:4 and 1 sam 4-5';
php > preg_match_all("/(\\d*|[^\\d ]*| *)/", $s, $parts);
php > print serialize($s);

score 0 · Accepted Answer

好吧好吧，我不太确定正则表达式，我还没有研究出来，所以我坚持使用更程序化的方法。我做了以下事情（这仍然是对我 5 年前编写的代码的巨大改进，这是我的目标）这似乎完美无缺：

你首先需要这个功能：

    function varType($str) {
        if(is_numeric($str)) {return false;}    
        if(is_string($str)) {return true;}  
    }


    $bible = array("BookNumber" => "", "Book" => "", "Chapter" => "", "StartVerse" => "", "EndVerse" => "");    
  $pos = 1; // 1 - Book Number
        // 2 - Book 
        // 3 - Chapter 
        // 4 - ':' or 'v'
        // 5 - StartVerse
        // 6 - is a dash for spanning verses '-'
        // 7 - EndVerse
    $scan = ""; $compile = array();
    //Divide into character type groups.    
    for($x=0;$x<=(strlen($collapse)-1);$x++)
    {   if($x>=1) {if(varType($collapse[$x]) != varType($collapse[$x-1])) {array_push($compile,$scan);$scan = "";}}
        $scan .= $collapse[$x];
        if($x==strlen($collapse)-1) {array_push($compile,$scan);}
    }
    //If the first element is not a number, then it is not a numbered book (AKA 1 John, 2 Kings), So move the position forward.
    if(varType($compile[0])) {$pos=2;}
    foreach($compile as $val)
    {   if(!varType($val)) 
        {   switch($pos) 
            {   case 1: $bible['BookNumber'] = $val;    break;      
                case 3: $bible['Chapter'] = $val;   break; 
                case 5: $bible['StartVerse'] = $val;    break; 
                case 7: $bible['EndVerse'] = $val;  break; 
            }
        } else {switch($pos) 
            {   case 2: $bible['Book'] = $val;      break;      
                case 4:     //Colon or 'v'
                case 6: break;  //Dash for verse spanning. 
            }}
        $pos++;
    }

这将在最后为您提供一个名为“圣经”的数组，其中包含在 SQL 数据库或您可能想要的任何其他数据库上运行的所有必要数据。希望这对其他人有帮助。

php - 创建准确的圣经搜索

5 回答 5

Related

Reference