我在表单上有一个 WebBrowser 控件,但在大多数情况下,它对用户是隐藏的。它在那里处理一系列登录和其他任务。我必须使用这个控件,因为有大量的 Javascript 可以处理登录。(即,我不能只切换到 WebClient 对象。)
跳了一会儿之后,我们最终想要下载一个 PDF 文件。但不是下载,而是文件显示在 webBrowser 控件中,用户看不到。
如何下载 PDF 而不是将其加载到浏览器控件中?
我在表单上有一个 WebBrowser 控件,但在大多数情况下,它对用户是隐藏的。它在那里处理一系列登录和其他任务。我必须使用这个控件,因为有大量的 Javascript 可以处理登录。(即,我不能只切换到 WebClient 对象。)
跳了一会儿之后,我们最终想要下载一个 PDF 文件。但不是下载,而是文件显示在 webBrowser 控件中,用户看不到。
如何下载 PDF 而不是将其加载到浏览器控件中?
在表单中添加 SaveFileDialog 控件,然后在 WebBrowser 的导航事件中添加以下代码:
private void webBrowser1_Navigating(object sender, WebBrowserNavigatingEventArgs e)
{
if (e.Url.Segments[e.Url.Segments.Length - 1].EndsWith(".pdf"))
{
e.Cancel = true;
string filepath = null;
saveFileDialog1.FileName = e.Url.Segments[e.Url.Segments.Length - 1];
if (saveFileDialog1.ShowDialog() == DialogResult.OK)
{
filepath = saveFileDialog1.FileName;
WebClient client = new WebClient();
client.DownloadFileCompleted += new AsyncCompletedEventHandler(client_DownloadFileCompleted);
client.DownloadFileAsync(e.Url, filepath);
}
}
}
//回调函数
void client_DownloadFileCompleted(object sender, AsyncCompletedEventArgs e)
{
MessageBox.Show("File downloaded");
}
我最终使用的解决方案:
我做了所有其他需要的事情来获取需要去的 URL。知道所有登录信息、所需设置、视图状态等都存储在 cookie 中,我终于能够使用 Web 控件的混合来抓取文件,然后使用 WebClient 对象来实际抓取文件字节。
public byte[] GetPDF(string keyValue)
{
DoLogin();
// Ask the source to generate the PDF. The PDF doesn't
// exist on the server until you have visited this page
// at least ONCE. The PDF exists for five minutes after
// the visit, so you have to snag it pretty quick.
LoadUrl(string.Format(
"https://www.theMagicSource.com/getimage.do?&key={0}&imageoutputformat=PDF",
keyValue));
// Now that we're logged in (not shown here), and
// (hopefully) at the right location, snag the cookies.
// We can use them to download the PDF directly.
string cookies = GetCookies();
byte[] fileBytes = null;
try
{
// We are fully logged in, and by now, the PDF should
// be generated. GO GET IT!
WebClient wc = new WebClient();
wc.Headers.Add("Cookie: " + cookies);
string tmpFile = Path.GetTempFileName();
wc.DownloadFile(string.Format(
"https://www.theMagicSource.com/document?id={0}_final.PDF",
keyValue), tmpFile);
fileBytes = File.ReadAllBytes(tmpFile);
File.Delete(tmpFile);
}
catch (Exception ex)
{
// If we can't get the PDF here, then just ignore the error and return null.
throw new WebScrapePDFException(
"Could not find the specified file.", ex);
}
return fileBytes;
}
private void LoadUrl(string url)
{
InternalBrowser.Navigate(url);
// Let the browser control do what it needs to do to start
// processing the page.
Thread.Sleep(100);
// If EITHER we can't continue OR
// the web browser has not been idle for 10 consecutive seconds yet,
// then wait some more.
// ...
// ... Some stuff here to make sure the page is fully loaded and ready.
// ... Removed to reduce complexity, but you get the idea.
// ...
}
private string GetCookies()
{
if (InternalBrowser.InvokeRequired)
{
return (string)InternalBrowser.Invoke(new Func<string>(() => GetCookies()));
}
else
{
return InternalBrowser.Document.Cookie;
}
}
bool documentCompleted = false;
string getInnerText(string url)
{
documentCompleted = false;
web.Navigate(url);
while (!documentCompleted)
Application.DoEvents();
return web.Document.Body.InnerText;
}
private void web_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
documentCompleted = true;
}