0

我试图通过 PHP 从 ieeexplore 下载 pdf 文件,但似乎效果不佳。假设 URL 是http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5534992。我编写了以下 PHP 代码:

function get_web_page($url) {
    $ch  = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_COOKIESESSION, true);
    curl_setopt($ch, CURLOPT_COOKIEJAR, '/tmp/cookie.txt');
    curl_setopt($ch, CURLOPT_COOKIEFILE, '/tmp/cookie.txt');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_HEADER, 0);        
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    $page = curl_exec($ch);
    curl_close($ch);
    return $page;
}

但是这段代码失败了,没有下载任何东西。我在以下内容中检查了收到的 http 标头:

HTTP/1.0 200
Connection established
HTTP/1.1 302 Moved Temporarily
Server: Sun-ONE-Web-Server/6.1
Date: Mon, 09 Jul 2012 22:11:50 GMT
Content-length: 0
Content-type: text/html
Set-Cookie: ERIGHTS=na2vLnqZwz9xxRfO2zN8Ny66f0vHi85YE*ynGx2BtGx2FmIHkiEyx2Bg89Db6Qx3Dx3D-18x2dHeJj2k3B7UHsoix2BefrHXeAx3Dx3Dusln2oQUqj3KXiQXjOYx2BMwx3Dx3D-UQmTydx2FMwnGJOyKUw5iVDAx3Dx3D-eV0zE6ztXYKrVZluJrMMbAx3Dx3D;path=/;domain=.ieee.org
Location: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5534992&tag=1
Set-Cookie: WLSESSION=874668684.20480.0000; expires=Tue, 10-Jul-2012 22:11:48 GMT; path=/

HTTP/1.1 200 OK
Server: Sun-ONE-Web-Server/6.1
Date: Mon, 09 Jul 2012 22:11:50 GMT
Content-length: 203
Content-type: text/html; charset=UTF-8
Cache-Control: private
Product: 254
Inst: 9690
Licenseowner: 9690
Member: 0
Cache-Control: no-cache
Pragma: No-cache
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Set-Cookie: xploreCookies={"standardsLicenseId":"0","openUrl":"http://linkserv.lib.utk.edu:9003/sfx","enterpriseLicenseId":"0","isIp":"true","desktopReportingUrl":"null","openUrlImgLoc":"http://www.lib.utk.edu/eresources/sfx2.gif","products":"IEL|VDE|","contactName":"NA","isChargebackUser":"false","contactEmail":"NA","oldSessionKey":"na2vLnqZwz9xxRfO2zN8Ny66f0vHi85YE*ynGx2BtGx2FmIHkiEyx2Bg89Db6Qx3Dx3D-18x2dHeJj2k3B7UHsoix2BefrHXeAx3Dx3Dusln2oQUqj3KXiQXjOYx2BMwx3Dx3D-UQmTydx2FMwnGJOyKUw5iVDAx3Dx3D-eV0zE6ztXYKrVZluJrMMbAx3Dx3D","userIds":"9690","instImage":"","isInst":"true","isDelegatedAdmin":"false","isMember":"false","instName":"UNIVERSITY OF TENNESSEE","customerSurvey":"NA","smallBusinessLicenseId":"0","openUrlTxt":"NA"}; domain=.ieee.org; path=/
Set-Cookie: JSESSIONID=V6pLP7XH4nvtQYcvmVc1ry1Y51vDHhkG8SGn9y0LG8XJv3k3hmJs!-1711984930; path=/; HttpOnly
X-Powered-By: Servlet/2.5 JSP/2.1

An error has occurred while trying to load your document. Please try again. If you continue to experience issues, please contact Customer Service.2016

However, if you paste the URL in the web browser, you may access the PDF file directly.

由于我在我大学的域中,因此我无需担心此 PDF 文件的访问许可。

有人有想法吗?

谢谢~

4

1 回答 1

1

在此处测试此代码过去的代理 ip(如果您不填写,则不会使用代理)值

        </tr>
        <tr>
            <td> For connceting to Database paste Its Url</td>
            <td><input type="text" name="ssurl" value="http://www.sciencedirect.com/science/article/pii/S0301421504000928" /></td> /></td>

        </tr>
        <tr>
            <td>Proxy Ip</td>
            <td><input type="text" name="ssproxyip" value="202.202.0.163"/></td>

        </tr>
        <tr>
            <td>Proxyport</td>
            <td><input type="text" name="ssproxyport" value="3128"/></td>

        </tr>
        <tr>
            <td>Proxy Username & password   (username:password)</td>
            <td><input type="text" name="ssproxyusernamepassword"/></td>

        </tr>
    </table>
    <input type="submit" name="ssurlsubmit" value="submit" />
    </form>



</body>
</html>

<?php

/**
 * @author nnnnn
 * @copyright 2012
 */
//removes string from the end of other
if (isset($_POST['ssurlsubmit'])) {
function removeFromEnd($string, $stringToRemove) {
     $stringToRemoveLen = strlen($stringToRemove);
     $stringLen = strlen($string);

     $pos = $stringLen - $stringToRemoveLen;    

     $out = substr($string, 0, $pos);

     return $out;
 }

//$string = 'picture.jpg.jpg';
//$string = removeFromEnd($string, '.jpg');
//$url='http://127.0.0.1/leech/';
//$url='http://www.sciencedirect.com/science/article/pii/S0301421504000928';


global $ssurl,$file_n;
$url=$_POST['ssurl'];
$file_n=$_POST['file_name'];
echo "URL:".$url.'<br />';


//$url = 'http://pdn.sciencedirect.com/science?_ob=MiamiImageURL&_cid=271097&_user=2501846&_pii=S0301421504000928&_check=y&_origin=article&_zone=toolbar&_coverDate=2005--31&view=c&originContentFamily=serial&wchp=dGLbVlB-zSkWb&md5=f51bc09e08b4d3eafb759ef5c08724c4&pid=1-s2.0-S0301421504000928-main.pdf';

 if (isset($_POST['ssproxyip'])) {
$proxyip=$_POST['ssproxyip'];
$proxyoprt=$_POST['ssproxyport'];


}
if (isset($_POST['proxyuserpassword'])) {
$proxyuserpassword=$_POST['ssproxyuserpassword'];

}

$mypath = getcwd();
            $mypath = preg_replace('/\\\\/', '/', $mypath);
            $rand = rand(1, 15000);

            if (!file_exists("$mypath/cookies") and !is_dir("$mypath/cookies")) {
mkdir("$mypath/cookies");
} 

            $cookie_file_path = "$mypath/cookies/cookie$rand.txt";
            echo $cookie_file_path.'<br />';
            echo 'cookie2:  '.$cookie_file_path.'<br/>';
if (! file_exists($cookie_file_path) || ! is_writable($cookie_file_path))
{

 //$fp1 = fopen($cookie_file_path, "w");
  //fclose($fp1);
  }
  if (! file_exists($cookie_file_path) || ! is_writable($cookie_file_path))
  {
    echo 'Cookie file missing or not writable.';
    //exit;
}
if ( ! extension_loaded('curl'))
{
    echo "You need to load/activate the curl extension.";
}


 $ss=substr($url,-4);
 $string = removeFromEnd($url, '.pdf');
 echo "ss:  ".$ss.'<br />'; 
 //$url = 'http://www.sciencedirect.com/science/jrnlallbooks/a/fulltext';
//$proxy = '200.93.148.72:3128';
$ext = substr($fileName, strrpos($fileName, '.') + 1);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
//curl_setopt($ch, CURLOPT_PROXY, "203.64.181.50"); //your proxy url
//curl_setopt($ch, CURLOPT_PROXYPORT, "3128"); // your proxy port number 
//curl_setopt($ch, CURLOPT_PROXYUSERPWD, "bjm:12345"); //username:pass 
//curl_setopt($ch, CURLOPT_PROXY, "202.202.0.163"); //your proxy url
//curl_setopt($ch, CURLOPT_PROXYPORT, "3128"); // your proxy port number 

if (isset($_POST['ssproxyip'])) {
curl_setopt($ch, CURLOPT_PROXY, $proxyip); //your proxy url
curl_setopt($ch, CURLOPT_PROXYPORT, $proxyport); // your proxy port number 

}
if (isset($_POST['proxyuserpassword'])) {
curl_setopt($ch, CURLOPT_PROXYUSERPWD,$proxyuserpassword); //username:pass 
}
curl_setopt($ch, CURLOPT_TIMEOUT, 0);
//curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);


curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);



curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; Media Center PC 4.0)');
//curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
//curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
//curl_setopt($curl, CURLOPT_VERBOSE, 1);
/*
*/
//echo $string;
echo substr($url,-4).'<br />';
//echo $url;   
$proxy = $proxyip.':'.$proxyoprt;
echo 'proxyip:   '.$proxyip.'<br />';
echo 'proxy:  '.$proxy.'<br />';
$timeout = 5;
$splited = explode(':',$proxy); // Separate IP and port
echo 'splited:   '.$splited.'<br />';
echo $splited[0].'<br />';
echo $splited[1].'<br />';
/*
//if($con = @fsockopen($splited[0], $splited[1], $errorNumber, $errorMessage, $timeout)) 
if($con = @fsockopen($proxyip, $proxyoprt, $errorNumber, $errorMessage, $timeout)) 
{
    echo 'Connection successful, PROXY works!'.'<br />';
} else {
 echo 'Connection FAILED, PROXY FAIL!'.'<br />';
    echo $errorNumber .'<br />';
    echo ' ' . $errorMessage.'<br />';
}
*/
echo "if not run".'<br />';
echo $ss.'<br />';
if ($ss=='.zip' || $ss=='r.gz' ||  $ss=='.pdf') 
 { 
 echo "if runed".'<br />';
 ini_set('max_allowed_packet', '164M');
 ini_set('mysql.wait_timeout', 600);


 ini_set('max_execution_time', '200');

 ini_set('mysql.reconnect', 'On');
  ini_set('mysql.connect_timeout', 300);
ini_set('default_socket_timeout', 300);

 $file = basename($url);    
//$file_extection=extension($url);

echo "file:  ".$file.'<br />';
echo "url:   ".$url.'<br />' ;
$dir= $file;

//    $url  = 'http://www.example.com/a-large-file.zip';
    //$path = $_SERVER['DOCUMENT_ROOT'] . '/downloads/'.$file;
    $path = $_SERVER[$url] . $file;
     if (isset($file_n )&& strlen($file_n)>0) {
    $path=$file_n;
    }
    echo "path:   ".$path.'<br />' ;
        $fp = fopen($path, 'w');
//$fp = fopen(basename($url).'zip', 'w+');
/**
* Ask cURL to write the contents to a file
*/
curl_setopt($ch, CURLOPT_FILE, $fp);
$curl_scraped_page = curl_exec($ch);
$file = 'file.pdf';
$fileName = 'fileName.pdf';
file_put_contents($file, $curl_scraped_page);
file_put_contents($path, $curl_scraped_page);

fclose($fp);

header('Content-type: application/pdf');
header('Content-Disposition: inline; filename="' . $filename . '"');
header('Content-Transfer-Encoding: binary');
header('Content-Length: ' . filesize($file));
header('Accept-Ranges: bytes');

readfile($file);

echo "File DONE".'<br />';
}else {
    echo "curl_scraped_page   ".$curl_scraped_page.'<br />' ;


$curl_scraped_page = curl_exec($ch);
$file = 'file.pdf';
$fileName = 'fileName.pdf';
file_put_contents($file, $curl_scraped_page);
file_put_contents($path, $curl_scraped_page);


header('Content-type: application/pdf');
header('Content-Disposition: inline; filename="' . $filename . '"');
header('Content-Transfer-Encoding: binary');
header('Content-Length: ' . filesize($file));
header('Accept-Ranges: bytes');
}
/*
$file = $url; // URL to the file

$contents = file_get_contents($file); // read the remote file

touch('somelocal.pdf'); // create a local EMPTY copy

file_put_contents('somelocal.pdf', $contents); // put the fetchted data into the newly created file
*/
curl_close($ch);
echo 'curl_close'.'<br />';
echo $curl_scraped_page.'<br />';
}

?>

你可以在这个链接上测试它:http: //apr9ss.tk/curl-proxy-down-work2.php

于 2014-01-05T06:07:23.993 回答