2

我尝试使用我在 stackoverflow 上找到的不同方法保存 .pdf 文件,包括FileUtils IO,但是,我总是会损坏它。当我使用记事本打开损坏的文件时,我得到了以下内容:

<HEAD>

    <TITLE>
        09010b129fasdf558a-
    </TITLE>

</HEAD>


<HTML>

<SCRIPT language="javascript" src="./js/windowClose.js"></SCRIPT>

<LINK href="./theme/default.css" rel="stylesheet" type="text/css">
<LINK href="./theme/additions.css" rel="stylesheet" type="text/css">

<BODY leftmargin="0" topmargin="0">

<TABLE cellpadding="0" cellspacing="0" width="100%">
    <TR>
        <TD class="mainSectionHeader">
            <A href="javascript:windowClose()" class="allLinks">
                CLOSE
            </A>
        </TD>

    </TR>

</TABLE>

                <script language='javaScript'>
                    alert('Session timed out. Please login again.\n');
                    window.close();
                </script>



</BODY>

</HTML>

后来,我尝试.pdf使用@BalusC 提供的答案从浏览器中保存文件。这个解决方案非常有帮助:我能够摆脱这些session问题。但是,它也会生成损坏的 .pdf。但是当我用记事本打开它时,它就完全不同了。但是,不再存在登录问题:

<HTML>

    <HEAD>

        <TITLE>
            Evidence System
        </TITLE>

    </HEAD>

<LINK href="./theme/default.css" rel="stylesheet" type="text/css">

<TABLE cellpadding="0" cellspacing="0" class="tableWidth760" align="center">
    <TR>
        <TD class="headerTextCtr">
            Evidence System
        </TD>
    </TR>
    <TR>
        <TD colspan="2">
            <HR size="1" noshade>
        </TD>
    </TR>
    <TR>
        <TD colspan="2">



<HTML>
<HEAD>
<link href="./theme/default.css" rel="stylesheet" type="text/css">
<script language="JavaScript">

function trim(str)
{
    var trmd_str

    if(str != "")
    {
        trmd_str = str.replace(/\s*/, "")
        if (trmd_str != ""){

            trmd_str = trmd_str.replace(/\s*$/, "")
        }

    }else{
        trmd_str = str
    }
    return trmd_str
}  

function validate(frm){
    //check for User name 
    var msg="";
    if(trim(frm.userName.value)==""){
        msg += "Please enter your user id.\n";
        frm.userName.focus();
    }

    if(trim(frm.password.value)==""){
        msg += "Please enter your password.\n";
        frm.userName.focus();
    }

    if (trim(msg)==""){
        frm.submit();
    }else{
        alert(msg);
    }
}

function numCheck(event,frm){
    if( event.keyCode == 13){
            validate(frm);  
    }
}

</script>
</HEAD>

<BODY onLoad="document.frmLogin.userName.focus();">

<FORM name='frmLogin' method='post' action='./ServletVerify'>
    <TABLE width="100%" cellspacing="20">
        <tr>
            <td class="mainTextRt">
                Username
                <input type="text" name="userName" maxlength="32" tabindex="1" value="" 
                onKeyPress="numCheck(event,this.form)" class="formTextField120">
            </TD>
            <td class="mainTextLt">
                Password
                <input type="password" name="password" maxlength="32" tabindex="2" value="" 
                onKeyPress="numCheck(event,this.form)" class="formTextField120">
            </TD>
        </TR>

        <tr>                    
            <td colspan="2" class="mainTextCtr" style="color:red">
                Unknown Error
            </td>
        </tr>

        <tr>
            <td colspan="2" class="mainTextCtr">
                <input type="button" tabindex="3" value="Submit" onclick="validate(this.form)" >
            </TD>
        </TR>
    </TABLE>

    <INPUT TYPE="hidden" NAME="actionFlag" VALUE="inbox">
</FORM>

</BODY>
</HTML>

        </TD>
    </TR>
    <TR>
        <TD height="2"></TD>
    </TR>
    <TR>
        <TD colspan="2">
            <HR size="1" noshade>
        </TD>
    </TR>
    <TR>
        <TD colspan="2">
            <LINK href="./theme/default.css" rel="stylesheet" type="text/css">

<TABLE width="80%" align="center" cellspacing="0" cellpadding="0">
    <TR>
        <TD class="footerSubtext">
            Evidence Management System
        </TD>
    </TR>

    <!-- For development builds, change the date accordingly when sending EAR files out to Wal-Mart -->
    <TR>
        <TD class="footerSubtext">
            Build:&nbsp;&nbsp;v3.1
        </TD>
    </TR>

</TABLE>
        </TD>
    </TR>
</TABLE>

</HTML>

我还有什么其他选择?

PS:当我尝试使用手动保存文件CTRL+Shift+S时,文件保存正常。

4

4 回答 4

3

从似乎只是一个 HTML 错误页面的错误响应中:

alert('会话超时。请重新登录。\n');

因此,似乎需要在有效的 HTTP 会话中下载 PDF 文件。HTTP 会话由 cookie 支持。HTTP 会话反过来在服务器端通常包含有关当前活动和/或登录用户的信息。

Selenium Web 驱动程序完全透明地自行管理 cookie。您可以按如下方式以编程方式获取它们:

Set<Cookie> cookies = driver.manage().getCookies();

当手动摆弄java.net.URLSelenium 的外部控制时,您应该确保自己的 URL 连接使用相同的 cookie(因此也维护相同的 HTTP 会话)。您可以在 URL 连接上设置 cookie,如下所示:

URLConnection connection = new URL(driver.getCurrentUrl()).openConnection();

for (Cookie cookie : driver.manage().getCookies()) {
    String cookieHeader = cookie.getName() + "=" + cookie.getValue();
    connection.addRequestProperty("Cookie", cookieHeader);
}

InputStream input = connection.getInputStream(); // Write this to file.
于 2013-09-30T15:43:30.150 回答
3

PDF 被认为是Binary File因为它的工作方式而被损坏copyUrlToFile()。顺便说一句,这看起来像是JAVA - Download Binary File (eg PDF) file from Webserver的副本

试试这个自定义的二进制下载方法 -

public void downloadBinaryFile(String path) {
    URL u = new URL(path);
    URLConnection uc = u.openConnection();
    String contentType = uc.getContentType();
    int contentLength = uc.getContentLength();
    if (contentType.startsWith("text/") || contentLength == -1) {
      throw new IOException("This is not a binary file.");
    }
    InputStream raw = uc.getInputStream();
    InputStream in = new BufferedInputStream(raw);
    byte[] data = new byte[contentLength];
    int bytesRead = 0;
    int offset = 0;
    while (offset < contentLength) {
      bytesRead = in.read(data, offset, data.length - offset);
      if (bytesRead == -1)
        break;
      offset += bytesRead;
    }
    in.close();

    if (offset != contentLength) {
      throw new IOException("Only read " + offset + " bytes; Expected " + contentLength + " bytes");
    }

    String filename = u.getFile().substring(filename.lastIndexOf('/') + 1);
    FileOutputStream out = new FileOutputStream(filename);
    out.write(data);
    out.flush();
    out.close();
}

编辑:实际上听起来好像您不在您认为的页面上..而不是执行 driver.getCurrentUrl()

让您的脚本从 PDF 的链接中获取 URL。假设有一个链接,<a href='http://mysite.com/my.pdf' /> 而不是单击它,然后获取 url,只需从该链接中获取 href,然后下载它。

String pdfPath = driver.findElement(By.id("someId")).getAttribute("href");
downloadBinaryFile(pdfPath);
于 2013-09-27T21:10:22.803 回答
2

服务器可能正在压缩 pdf。您可以使用从这个答案中窃取的代码来检测和解压缩来自服务器的响应,

InputStream is = driver.getCurrentUrl().openStream();
try {
   InputStream decoded = decompressStream(is);
   FileOutputStream output = new FileOutputStream(
       new File("C:\\Users\\myDocs\\myfolder\\myFile.pdf"));
   try {
       IOUtils.copy(decoded, output);
   }
   finally {
       output.close();
   }
} finally {
   is.close();
}

public static InputStream decompressStream(InputStream input) {
     PushBackInputStream pb = new PushBackInputStream( input, 2 ); //we need a pushbackstream to look ahead
     byte [] signature = new byte[2];
     pb.read( signature ); //read the signature
     pb.unread( signature ); //push back the signature to the stream
     if( signature[ 0 ] == (byte) 0x1f && signature[ 1 ] == (byte) 0x8b ) //check if matches standard gzip maguc number
       return new GZIPInputStream( pb );
     else 
       return pb;
}
于 2013-09-30T04:31:48.473 回答
1

当我尝试使用 CTRL+Shift+S 手动保存文件时,文件保存正常。

虽然我提倡使用 Java 下载文件,但有一个不那么推荐的解决方法是按Ctrl++编程:Shift类。SRobot

使用解决方法很糟糕,但据我所知,它在我尝试过的浏览器和操作系统中工作可靠。此代码不应进入任何严肃的应用程序。但是如果你不能以正确的方式解决你的问题,测试是可以的。

Robot robot = new Robot();

按 Ctrl+Shift+S

robot.keyPress(KeyEvent.VK_CONTROL);
robot.keyPress(KeyEvent.VK_SHIFT);
robot.keyPress(KeyEvent.VK_S);
robot.keyRelease(KeyEvent.VK_S);
robot.keyRelease(KeyEvent.VK_SHIFT);
robot.keyRelease(KeyEvent.VK_CONTROL);

在我知道的浏览器和操作系统中,您应该在Save file文件名输入的对话框中。你可以输入你的绝对路径:

robot.keyPress(KeyEvent.VK_C);        // C
robot.keyRelease(KeyEvent.VK_C);
robot.keyPress(KeyEvent.VK_COLON);    // : (colon)
robot.keyRelease(KeyEvent.VK_COLON);
robot.keyPress(KeyEvent.VK_SLASH);    // / (slash)
robot.keyRelease(KeyEvent.VK_SLASH);
// etc. for the whole file path

robot.keyPress(KeyEvent.VK_ENTER);    // confirm by pressing Enter in the end
robot.keyRelease(KeyEvent.VK_ENTER);

要获取键码,您可以使用KeyEvent#getExtendedKeyCodeForChar()(仅限 Java 7+),或者如何让机器人输入一个`:`?并将String 转换为 KeyEvents

于 2013-09-30T15:30:56.027 回答