0

为简洁起见...
我想从字符串中取出项目,将它们放入单独的数组中,用 ID 标记替换从字符串中提取的值,解析字符串,然后将提取的项目放回它们的原始位置(按正确顺序)。(如果这是有道理的,那么跳过其余的:D)

我有以下字符串;
“我的句子包含指向 [url] 和 [url] 的 URL,这让我的生活变得困难。”

由于各种原因,我想删除这些 URL。但我需要保留它们的位置,稍后再重新插入它们(在处理完字符串的其余部分之后)。

因此我愿意;
“我的句子包含指向 [url] 和 [url] 的 URL,这让我的生活变得困难。”
成为;
“我的句子包含指向 [token1fortheURL] 和 [token2fortheURL] 的 URL,这让我的生活变得困难。”

我已经尝试过几次,各种方式。我所做的就是撞砖墙并发明新的脏话!

我使用以下代码进行设置;

$mystring = 'my sentence contains URLs to http://www.google.com/this.html and http://www.yahoo.com which makes my life difficult.';
$myregex = '/(((?:https?|ftps?)\:\/\/)?([a-zA-Z0-9:]*[@])?([a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}|([0-9]+))([a-zA-Z0-9-._?,\'\/\+&%\$#\=~:]+)?)/';
$myextractions = array();

然后我做一个 preg_replace_callback;

$matches = preg_replace_callback($myregex,'myfunction',$mystring);

我的功能如下;

function myfunction ($matches) {}

正是在这里,砖墙开始发生。我可以将东西放入空白提取数组中 - 但它们在函数之外不可用。我可以使用令牌更新字符串,但我无法访问被替换的 URL。我似乎无法向 preg_replace_callback 中的函数调用添加其他值。

我希望有人可以提供帮助,因为这让我发疯。


更新:

根据@Lepidosteus 建议的解决方案,我认为我有以下工作?

$mystring = 'my sentence contains URLs to http://www.google.com/this.html and http://www.yahoo.com which makes my life difficult.';
$myregex = '/(((?:https?|ftps?)\:\/\/)?([a-zA-Z0-9:]*[@])?([a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}|([0-9]+))([a-zA-Z0-9-._?,\'\/\+&%\$#\=~:]+)?)/';
$tokenstart = ":URL:";
$tokenend = ":";


function extraction ($myregex, $mystring, $mymatches, $tokenstart, $tokenend) {
$test1 = preg_match_all($myregex,$mystring,$mymatches);
$mymatches = array_slice($mymatches, 0, 1);
$thematches = array();

foreach ($mymatches as $match) {
    foreach ($match as $key=>$match2) {
        $thematches[] = array($match2, $tokenstart.$key.$tokenend);
    }
}


return $thematches;
}
$matches = extraction ($myregex, $mystring, $mymatches, $tokenstart, $tokenend);
echo "1) ".$mystring."<br/>";
// 1) my sentence contains URLs to http://www.google.com/this.html and http://www.yahoo.com which makes my life difficult.



function substitute($matches,$mystring) {
foreach ($matches as $match) {
    $mystring = str_replace($match[0], $match[1], $mystring);
}
return $mystring;
}
$mystring = substitute($matches,$mystring);
echo "2) ".$mystring."<br/>";
// 2) my sentence contains URLs to :URL:0: and :URL:1: which makes my life difficult.


function reinsert($matches,$mystring) {
foreach ($matches as $match) {
    $mystring = str_replace($match[1], $match[0], $mystring);
}
return $mystring;
}
$mystring = reinsert($matches,$mystring);
echo "3) ".$mystring."<br/>";
// 3) my sentence contains URLs to http://www.google.com/this.html and http://www.yahoo.com which makes my life difficult.

这似乎有效?

4

1 回答 1

1

在这里解决您的问题的关键是将 urls 列表存储在外部容器中,以便您的回调和主代码可以访问以对它们进行所需的更改。为了记住您的网址位置,我们将在字符串中使用自定义标记。

请注意,要访问容器,我使用closures,如果由于某种原因您不能使用php 5.3,则需要用另一种方式替换它们以从回调中访问$url_tokens 容器,这应该不是问题。

<?php
// the string you start with

$string = "my sentence contains URLs to http://stackoverflow.com/questions/7619843/php-preg-replace-call-extract-specific-values-for-later-reinsertion and http://www.google.com/ which makes my life difficult.";

// the url container, you will store the urls found here

$url_tokens = array();

// the callback for the first replace, will take all urls, store them in $url_tokens, then replace them with [[URL::X]] with X being an unique number for each url
//
// note that the closure use $url_token by reference, so that we can add entries to it from inside the function

$callback = function ($matches) use (&$url_tokens) {
  static $token_iteration = 0;

  $token = '[[URL::'.$token_iteration.']]';

  $url_tokens[$token_iteration] = $matches;

  $token_iteration++;

  return $token;
};

// replace our urls with our callback

$pattern = '/(((?:https?|ftps?)\:\/\/)?([a-zA-Z0-9:]*[@])?([a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}|([0-9]+))([a-zA-Z0-9-._?,\'\/\+&amp;%\$#\=~:]+)?)/';

$string = preg_replace_callback($pattern, $callback, $string);

// some debug code to check what we have at this point

var_dump($url_tokens);
var_dump($string);

// you can do changes to the url you found in $url_tokens here

// now we will replace our previous tokens with a specific string, just as an exemple of how to re-replace them when you're done

$callback_2 = function ($matches) use ($url_tokens) {
  $token = $matches[0];
  $token_iteration = $matches[1];

  if (!isset($url_tokens[$token_iteration])) {
    // if we don't know what this token is, leave it untouched
    return $token;
  }

  return '- there was an url to '.$url_tokens[$token_iteration][4].' here -';
};

$string = preg_replace_callback('/\[\[URL::([0-9]+)\]\]/', $callback_2, $string);

var_dump($string);

执行时会给出以下结果:

// the $url_tokens array after the first preg_replace_callback
array(2) {
  [0]=>
  array(7) {
    [0]=>
    string(110) "http://stackoverflow.com/questions/7619843/php-preg-replace-call-extract-specific-values-for-later-reinsertion"
    [1]=>
    string(110) "http://stackoverflow.com/questions/7619843/php-preg-replace-call-extract-specific-values-for-later-reinsertion"
    [2]=>
    string(7) "http://"
    [3]=>
    string(0) ""
    [4]=>
    string(17) "stackoverflow.com"
    [5]=>
    string(0) ""
    [6]=>
    string(86) "/questions/7619843/php-preg-replace-call-extract-specific-values-for-later-reinsertion"
  }
  [1]=>
  array(7) {
    [0]=>
    string(22) "http://www.google.com/"
    [1]=>
    string(22) "http://www.google.com/"
    [2]=>
    string(7) "http://"
    [3]=>
    string(0) ""
    [4]=>
    string(14) "www.google.com"
    [5]=>
    string(0) ""
    [6]=>
    string(1) "/"
  }
}
// the $string after the first preg_replace_callback
string(85) "my sentence contains URLs to [[URL::0]] and [[URL::1]] which makes my life difficult."

// the $string after the second replace
string(154) "my sentence contains URLs to - there was an url to stackoverflow.com here - and - there was an url to www.google.com here - which makes my life difficult."
于 2011-10-01T11:37:21.063 回答