c# - Convert html to image with pagination using C#

Question

I'm working on a windows service in c# 4.0 wich transform different file in image (tif and jpeg)

I have a problem when i want to convert a html file (usually an e-mail) in image.

I use WebBrowser

var browser = new WebBrowser();
browser.DocumentCompleted += this.BrowserDocumentCompleted;
browser.DocumentText = html;

and DrawToBitmap

var browser = sender as WebBrowser;
Rectangle body = new Rectangle(browser.Document.Body.ScrollRectangle.X * scaleFactor,
    browser.Document.Body.ScrollRectangle.Y * scaleFactor,
    browser.Document.Body.ScrollRectangle.Width * scaleFactor,
    browser.Document.Body.ScrollRectangle.Height * scaleFactor);

browser.Height = body.Height;
Bitmap output = new Bitmap(body.Width, body.Height);
browser.DrawToBitmap(output, body);

It works fine for small or medium html, but with long html (like 22 000 height px or more) I have GDI exeptions on DrawToBitmap :

Invalid parameter
Not an image GDI+ valid

According to internet, this kind of error append because the image is too big.

My question : How can i convert html in X images (pagination) without generate the big image and crop after, and if it's possible without using library.

Thank you in advance.

Edit : I found a tricky solution : surround the html with a div witch gonna set the page and another for the offset, for exemple :

<div style="height:3000px; overflow:hidden"> 
<div style="margin-top:-3000px">

But this solution can crop on a line of text or in the middle of an image...

score 1 · Accepted Answer

您可以尝试创建自定义IE 打印模板并使用DEVICERECT和LAYOUTRECT元素来驱动分页。然后这些线条不会在中间被剪断，您可以将每个线条的位图捕获DEVICERECT为一页。您需要向 MSHTML 文档对象 ( ) 发出 CGID_MSHTML/ IDM_SETPRINTTEMPLATE命令，webBrowser.Document.DomDocument as IOleCommandTarget以启用打印模板特定的元素标签，例如。可以在此处找到有关打印模板的更多信息。

[已编辑]您甚至可以在对象上使用IHTMLElementRender::DrawToDC API在位DEVICERECT图 DC 上绘制其内容。您需要启用FEATURE_IVIEWOBJECTDRAW_DMLT9_WITH_GDI和禁用FEATURE_GPU_RENDERING 功能控制设置以供您的WebBrowser托管应用使用IHTMLElementRender::DrawToDC。

score 0 · Accepted Answer

感谢您的 anwser Noseratio。

我通过使用打印和虚拟打印机来获取图像文件建立了一个解决方案。

将 html 保存在文件中并删除所有编码：

html = Regex.Replace(html, "<meta[^>]*http-equiv=\"Content-Type\"[^>]*>", string.Empty, RegexOptions.Multiline);
using (var f = File.Create(filePath))
{
   var bytes = Encoding.Default.GetBytes(html);
   f.Write(bytes, 0, bytes.Length);
}

运行打印而不显示 webbrowser 和打印弹出窗口：

const short PRINT_WAITFORCOMPLETION = 2;
const int OLECMDID_PRINT = 6;
const int OLECMDEXECOPT_DONTPROMPTUSER = 2;

dynamic ie = browser.ActiveXInstance;
ie.ExecWB(OLECMDID_PRINT, OLECMDEXECOPT_DONTPROMPTUSER, PRINT_WAITFORCOMPLETION);

我使用 PDFCreator 进行虚拟打印，它将所有文件保存在一个文件夹中。获得所有这些文件并不容易（知道何时打印完成、有多少文件以及何时可以使用它们......）但这不是这篇文章的目的！

c# - Convert html to image with pagination using C#

2 回答 2

Related

Reference