0

如何获取嵌入代码的域 url,我有 400k 视频,我从许多网站获取视频,其中一些使用 iframe 或对象,获取嵌入代码域的简单方法和最佳方法是什么?

iframe 代码示例:

<iframe src="http://www.websites-test.com/video231/" frameborder=0 width=510 height=400 scrolling=no></iframe>

嵌入代码示例:

<object width="990" height="750"> <param name="movie" value="http://www.websites-test.com/video231/"></param><param name="AllowScriptAccess" value="always"></param><param name="wmode" value="transparent"></param><embed src="http://www.websites-test.com/video231/" type="application/x-shockwave-flash" wmode="transparent"` AllowScriptAccess="always" width="990" height="750"></embed></object>

所以让我们说 $Domain_Embed = website-test.com

4

2 回答 2

2

我建议您解析 HTML 代码(How do you parse and process HTML/XML in PHP?),然后从适当的属性中提取域。例如:

<?php

function getDomainFromEmbed($html, $all = false)
{
    $result = array();
    $doc = new DOMDocument;
    @$doc->loadHTML($html);

    $iframes = $doc->getElementsByTagName('iframe');
    if (!empty($iframes)) {
        foreach ($iframes as $iframe) {
            if ($iframe->hasAttribute('src')) {
                $url = parse_url($iframe->getAttribute('src'), PHP_URL_HOST);
                if ($all) {
                    $result[] = $url;
                } else {
                    return $url;
                }
            }
        }
    }

    $objects = $doc->getElementsByTagName('object');
    if (!empty($objects)) {
        foreach ($objects as $object) {
            if ($object->hasAttribute('data')) {
                $url = parse_url($object->getAttribute('data'), PHP_URL_HOST);
                if ($all) {
                    $result[] = $url;
                } else {
                    return $url;
                }
            }

            $params = $object->getElementsByTagName('param');
            if (!empty($params)) {
                foreach ($params as $param) {
                    if ($param->hasAttribute('name') && $param->hasAttribute('value') && 'movie' === $param->getAttribute('name')) {
                        $url = parse_url($param->getAttribute('value'), PHP_URL_HOST);
                        if ($all) {
                            $result[] = $url;
                        } else {
                            return $url;
                        }
                    }
                }
            }
        }
    }

    $embeds = $doc->getElementsByTagName('embed');
    if (!empty($embeds)) {
        foreach ($embeds as $embed) {
            if ($embed->hasAttribute('src')) {
                $url = parse_url($embed->getAttribute('src'), PHP_URL_HOST);
                if ($all) {
                    $result[] = $url;
                } else {
                    return $url;
                }
            }
        }
    }

    return $all ? $result : null;
}

echo '<pre>';
var_dump(getDomainFromEmbed('<iframe src="http://www.websites-test.com/video231/" frameborder=0 width=510 height=400 scrolling=no></iframe>'));
var_dump(getDomainFromEmbed('<object width="990" height="750"> <param name="movie" value="http://www.websites-test.com/video231/"></param><param name="AllowScriptAccess" value="always"></param><param name="wmode" value="transparent"></param><embed src="http://www.websites-test.com/video231/" type="application/x-shockwave-flash" wmode="transparent"` AllowScriptAccess="always" width="990" height="750"></embed></object>'));
echo '</pre>';
于 2013-09-16T08:39:43.137 回答
1

试试这个代码:

function getDomain($html) {
    preg_match('`<[^>]*src=["\'\s]?([^"^\'^\s]+)["\'\s][^>]*>`i', $html, $matches);
    if(isset($matches[1]))
        return parse_url($matches[1], PHP_URL_HOST);  
    return false;
}

$html = '<iframe src="http://www.websites-test.com/video231/" frameborder=0 width=510 height=400 scrolling=no></iframe>';
echo getDomain($html);

echo '<br />';

$html = '<object width="990" height="750"> <param name="movie" value="http://www.websites-test.com/video231/"></param><param name="AllowScriptAccess" value="always"></param><param name="wmode" value="transparent"></param><embed src="http://www.websites-test.com/video231/" type="application/x-shockwave-flash" wmode="transparent"` AllowScriptAccess="always" width="990" height="750"></embed></object>';
echo getDomain($html);

当然,echo getDomain($html)您可以$Domain_Embed = getDomain($html)根据需要将其分配给您的变量,而不是您的变量。$html是包含src您提到的这些标签的 HTML 代码。

对于同一个对象中的多个对象,$html您可以更改函数以获取结果数组:

function getDomains($html) {
    $results = array();

    preg_match_all('`<[^>]*src=["\'\s]?([^"^\'^\s]+)["\'\s][^>]*>`i', $html, $matches);
    if(isset($matches[1]) && is_array($matches[1]))
        foreach($matches[1] as $match)
            $results[] = parse_url($match, PHP_URL_HOST);

    return empty($results) ? false : $results;
}

echo '<pre>' . print_r(getDomains($html), true) . '</pre>';
于 2013-09-16T07:58:36.057 回答