0
<?php
include('../simple_html_dom.php');

$fname = "http://www.myurl.com";

$html = file_get_html($fname);

$divs = $html->find('h6');
foreach($divs as $element)
{
 $title = $element->find('a', 0)->plaintext;
 echo $title.'<br>';
}
echo '<br>';
?>

我收到了这个错误:

“打开流失败:HTTP 请求失败!HTTP/1.1 500 内部服务器错误......”

我的网址很长,它的实际长度是 750 个字符。如果我使用 wget 它显示“文件名太长”

我该如何解决?我需要它来处理简单的 dom

4

3 回答 3

2

URL 长度可以包含 750 个字符。最常用的实际限制是 2000 个字符,这是旧 IE 的限制。

您应该尝试模拟发出请求的 Web 浏览器。请参阅另一个问题

编辑:在您的代码中使用 CURL

<?php

// include is not a function, don't use parens (also use require instead)
require '../simple_html_dom.php';

$fname = "http://www.myurl.com";

$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';

$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
// don't want to polute your output
//curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL, $fname);
$result=curl_exec($ch);

$html = new simple_html_dom();
$html->load($result);

$divs = $html->find('h6');
foreach($divs as $element)
{
 $title = $element->find('a', 0)->plaintext;
 echo $title.'<br>';
}
echo '<br>';
于 2013-06-01T13:31:53.213 回答
0

URL 长度很好。链接可能已损坏或已过期。我尝试使用下面显示的链接,结果似乎很好:

<?php
include("simple_html_dom.php");

$fname = "http://www.youtubeonfire.com/?genre=0&language=0&next_token=rO0ABXNyACdjb20uYW1hem9uLnNkcy5RdWVyeVByb2Nlc3Nvci5Nb3JlVG9rZW7racXLnINNqwMA%0AC0kAFGluaXRpYWxDb25qdW5jdEluZGV4WgAOaXNQYWdlQm91bmRhcnlKAAxsYXN0RW50aXR5SURa%0AAApscnFFbmFibGVkSQAPcXVlcnlDb21wbGV4aXR5SgATcXVlcnlTdHJpbmdDaGVja3N1bUkACnVu%0AaW9uSW5kZXhaAA11c2VRdWVyeUluZGV4TAANY29uc2lzdGVudExTTnQAEkxqYXZhL2xhbmcvU3Ry%0AaW5nO0wAEmxhc3RBdHRyaWJ1dGVWYWx1ZXEAfgABTAAJc29ydE9yZGVydAAvTGNvbS9hbWF6b24v%0Ac2RzL1F1ZXJ5UHJvY2Vzc29yL1F1ZXJ5JFNvcnRPcmRlcjt4cAAAAAEAAAAAAAABds0AAAAAAQAA%0AAAC71ED7AAAAAAFwdAAQMDAwMDAwMDAwMDAwMjAxM35yAC1jb20uYW1hem9uLnNkcy5RdWVyeVBy%0Ab2Nlc3Nvci5RdWVyeSRTb3J0T3JkZXIAAAAAAAAAABIAAHhyAA5qYXZhLmxhbmcuRW51bQAAAAAA%0AAAAAEgAAeHB0AApERVNDRU5ESU5HeA%3D%3D&sort=2";

$html = file_get_html($fname);

$divs = $html->find("h6");
foreach($divs as $element) {
    $title = $element->find("a", 0)->plaintext;
    echo($title . "<br />");
}
echo("<br />");

输出:

Spider (2013)
500 MPH STORM 2013 HD
Van Diemans Land (Action,Adventure,20...
Good Agent is A Bad Agent (Full HQ En...
Employee of the Month (Full HQ Englis...
The Croods (2013)
GIRLFRIENDS - 2013
Boys Are Pigs-2013
The Patriot -2013
My Daughter&#x27;s Secret -2013
Dead on Arrival [2013]
Flght 2013XViD1
Samsung Galaxy S4 Presentation UNPACK...
Affinity 2013
Golden Globe Awards 2013: Full Show
Parker-2013
Hells&#x27; Kitchen-  New Action Movie 2013
ALIENS [2013]
7 Nights Of Darkness -2013
Hansel And Gretel 2013
The Collection (2012)
Mac And Devin Go To High School 2012
Red Dawn (2012)
Hijacked -2012
Bending The Rules -2012
Inside -2012
VAMPIRELAND-2012
Dead Mine -2012
Devil Seed-2012
Kill Em All -2012
One In The Chamber -2012
The Forger - 2012
Dark Desire -2012
A Common Man -2012 .
The Helpers -2012
Red Dawn- 2012 720p

因此,解决 URL 的问题,一切都会正常工作!

于 2013-06-01T13:50:01.763 回答
0

您说您的 URL 在您的浏览器中工作,而我们这里的所有人都收到 500 错误,就像您的脚本一样。

该站点可能会根据 IP 和可能的其他请求标头检查 URL 中的令牌。因此,您需要找到一种从 PHP 脚本中获取标记化 URL 的方法。

为此,您需要首先从您的 PHP 脚本下载主页,然后找到下一个链接的 URL 并在您的脚本中使用这个。

于 2013-06-01T14:03:09.403 回答