5

我有一个 wordpress 数据库,其中包含一些来自声音云的嵌入式 iframe。我希望将 iframe 替换为某种短代码。我什至创建了一个简码,它工作得很好。

问题是我有一个旧数据库,其中包含大约 2000 个已经嵌入代码的帖子。我想要做的是编写一个代码,以便它将 iframe 替换为简码。

这是我用来从内容中查找 url 的代码,但它总是返回空白。

$string = 'Think Kavinsky meets Futurecop! meets your favorite 80s TV show theme song and you might be pretty close to Swedish producer Johan Bengtsson\'s retro project, <a href="https://soundcloud.com/daataa"><strong>Mitch Murder</strong></a>. Title track, "The Touch," is genuinely lighthearted and fun, crossing over from 80s synth work into a bit of French Touch influence; also including a big time guitar solo straight out of your dad\'s record collection. B-side "Race Day" could very easily be the soundtrack to a video montage of all of your favorite beach scenes from every 80s movie you\'ve ever watched, or as the PR put it, "quite possibly a contender to be the title screen music to a Wave Race 64 sequel." Sounds awesome to me. Also included in this package out today on <a href="https://soundcloud.com/maddecent/">Mad Decent</a>\'s Jeffree\'s sub-label are two remixes of the A-side from Lifelike and Nite Sprite. Download below.
<iframe src="https://w.soundcloud.com/player/?url=http%3A%2F%2Fapi.soundcloud.com%2Fplaylists%2F8087281&amp;color=000000&amp;auto_play=false&amp;show_artwork=true" frameborder="no" scrolling="no" width="100%" height="350"></iframe>';

preg_match("/url=(.*?)/", $string, $matches);

print_r($matches);

上面的代码不起作用,我对正则表达式不太熟悉,所以如果有人能找出这里出了什么问题,那就太好了。而且,如果有人可以指导我正确的流程来做到这一点,那就太好了。

4

5 回答 5

4

由于您在这里使用 HTML,我建议使用 DOM 函数:

$doc = new DOMDocument;
$doc->loadHTML($string);

foreach ($doc->getElementsByTagName('iframe') as $iframe) {
    $url = $iframe->getAttribute('src');
    // parse the query string
    parse_str(parse_url($url, PHP_URL_QUERY), $args);
    // save the modified attribute
    $iframe->setAttribute('src', $args['url']);
}

echo $doc->saveHTML();

这会输出完整的文档,因此您需要对其进行修剪:

$body = $doc->getElementsByTagName('body')->item(0);
foreach ($body->childNodes as $node) {
    echo $doc->saveHTML($node);
}

输出:

<p>Think Kavinsky meets Futurecop! meets your favorite 80s TV show theme song and you might be pretty close to Swedish producer Johan Bengtsson's retro project, <a href="https://soundcloud.com/daataa"><strong>Mitch Murder</strong></a>. Title track, "The Touch," is genuinely lighthearted and fun, crossing over from 80s synth work into a bit of French Touch influence; also including a big time guitar solo straight out of your dad's record collection. B-side "Race Day" could very easily be the soundtrack to a video montage of all of your favorite beach scenes from every 80s movie you've ever watched, or as the PR put it, "quite possibly a contender to be the title screen music to a Wave Race 64 sequel." Sounds awesome to me. Also included in this package out today on <a href="https://soundcloud.com/maddecent/">Mad Decent</a>'s Jeffree's sub-label are two remixes of the A-side from Lifelike and Nite Sprite. Download below.
<iframe src="http://api.soundcloud.com/playlists/8087281" frameborder="no" scrolling="no" width="100%" height="350"></iframe></p>
于 2013-08-21T06:33:01.393 回答
2

这应该适用于您指定的内容

$new_string = preg_replace('/(?:<iframe[^\>]+src="[^\"]*url=([^\"]*soundcloud\.com[^\"]*))"[^\/]*\/[^\>]*>/i', '[soundcloud url="$1"]', $string);

它仅限于 src 属性中带有 url=...soundcloud... 部分的 iframe,并将整个 iframe 代码替换为 [soundcloud url="{part after url=}"]

于 2013-08-23T10:25:45.373 回答
2

对于一次性修复,您可以考虑使用 SQL 解决方案。以下 SQL 的一些假设:

  • 每个帖子只有一个 iframe 被替换(如果有多个 iframe 的帖子,SQL 可以运行多次)。
  • 要替换 ALL 的 iframe 采用以下形式:

<iframe src="https://w.soundcloud.com/player/?url="..." other-stuff</iframe>

  • 您只关心 url 参数的引号之间的内容
  • 最终结果是 [soundcloud url="..."]

如果所有这些都是真的,那么下面的 SQL 应该可以解决问题。如果您想要不同的简码等,可以对其进行调整。

在执行任何大规模更新之前,请务必备份您的 wp_posts 表。

CREATE TABLE wp_posts_backup SELECT * FROM wp_posts
;

备份完成后,以下 SQL 应一次性修复所有帖子:

UPDATE wp_posts p

   SET p.post_content = CONCAT( SUBSTRING_INDEX( p.post_content, '<iframe src="https://w.soundcloud.com/player/?url=', 1 )
                               ,'[soundcloud url="'
                               , REPLACE( REPLACE(
                                 SUBSTRING_INDEX( SUBSTR( p.post_content
                                                        , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                        )
                                                , '&amp;', 1
                                                )
                               , '%3A', ':' ), '%2F', '/' )
                               ,'?'
                               ,SUBSTRING_INDEX( SUBSTR( p.post_content
                                                       , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                       + LOCATE( '&amp;', SUBSTR( p.post_content
                                                                                , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                                                )
                                                               ) + 4
                                                       )
                                               , ' ', 1
                                               )
                               ,']'
                               ,SUBSTR( p.post_content, LOCATE( '</iframe>', p.post_content ) + 9 )
                              )

 WHERE p.post_content LIKE '%<iframe src="https://w.soundcloud.com/player/?url=%</iframe>%'
;

我建议您在针对所有帖子运行之前测试一些帖子。一种简单的测试方法是将以下内容添加到上面的 WHERE 子句中(紧接在 ';' 之前)更改 '?' 到要测试的帖子 ID。

AND p.ID IN (?,?,?)

如果出于任何原因您需要恢复您的帖子,您可以执行以下操作:

UPDATE wp_posts p
  JOIN wp_posts_backup b
    ON b.ID = p.ID
   SET p.post_content = b.post_content
;

要考虑的另一件事。我不确定您是否要传递当前属于 url 的参数,所以我将它们包括在内。您可以通过更改轻松删除它们:

                               ,'?'
                               ,SUBSTRING_INDEX( SUBSTR( p.post_content
                                                       , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                       + LOCATE( '&amp;', SUBSTR( p.post_content
                                                                                , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                                                )
                                                               ) + 4
                                                       )
                                               , ' ', 1
                                               )
                               ,']'

至:

                           ,'"]'

导致:

UPDATE wp_posts p

   SET p.post_content = CONCAT( SUBSTRING_INDEX( p.post_content, '<iframe src="https://w.soundcloud.com/player/?url=', 1 )
                               ,'[soundcloud url="'
                               , REPLACE( REPLACE(
                                 SUBSTRING_INDEX( SUBSTR( p.post_content
                                                        , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                        )
                                                , '&amp;', 1
                                                )
                               , '%3A', ':' ), '%2F', '/' )
                               ,'"]'
                               ,SUBSTR( p.post_content, LOCATE( '</iframe>', p.post_content ) + 9 )
                              )

 WHERE p.post_content LIKE '%<iframe src="https://w.soundcloud.com/player/?url=%</iframe>%'
;

已更新以允许 url 中没有参数

UPDATE wp_posts p

   SET p.post_content = CONCAT( SUBSTRING_INDEX( p.post_content, '<iframe src="https://w.soundcloud.com/player/?url=', 1 )
                               ,'[soundcloud url="'
                               , REPLACE( REPLACE(
                                 SUBSTRING_INDEX(
                                     SUBSTRING_INDEX( SUBSTR( p.post_content
                                                            , LOCATE( '<iframe src="https://w.soundcloud.com/player/?url=', p.post_content ) + 50
                                                            )
                                                    , '&amp;', 1
                                                    )
                                                , '"', 1
                                                )
                               , '%3A', ':' ), '%2F', '/' )
                               ,'"]'
                               ,SUBSTR( p.post_content, LOCATE( '</iframe>', p.post_content ) + 9 )
                              )

 WHERE p.post_content LIKE '%<iframe src="https://w.soundcloud.com/player/?url=%</iframe>%'
;

祝你好运。

于 2013-08-24T22:20:37.277 回答
1

我建议调查 simplehtmldom。它是一个 DOM 解析器,使用类似于 jQuery 和 CSS 的选择器。

http://simplehtmldom.sourceforge.net/

$html = load($html_from_database);
// Find all frames
foreach($html->find('frame') as $element){
   $source = $element->src; // extract the source from the frame.
   // This is where you do your magic like changing links. 
   $element->href = $source ; // This is where you replace the old source
}


// UPDATE $html back into the table.

确保在解析后更新任何表之前对所有表进行完整备份:)

http://simplehtmldom.sourceforge.net/manual.htm

于 2013-08-26T13:44:43.617 回答
1
<?php
    preg_match("/url\=([^\"]+)/i", $string, $matches);

所以基本上你想在 url= 之后匹配任何字符(1+),但不是在 "

于 2013-08-13T12:40:20.430 回答