-1

我有一个 html 文档,在解析后只包含格式化的文本。我想知道是否可以像在新文本文档中通过鼠标选择它 + 复制 + 粘贴一样获取它的文本?

我知道这在 Microsoft.Office.Interop 中是可能的,我有 .ActiveSelection 属性来选择打开的 Word 的内容。

我需要找到一种方法以某种方式加载 html(可能在浏览器对象中),然后复制其所有内容并将其分配给字符串。

var doc = new HtmlAgilityPack.HtmlDocument();
var documetText = File.ReadAllText(myhtmlfile.html, Encoding.GetEncoding(1251));
documetText = this.PerformSomeChangesOverDocument(documetText);
doc.LoadHtml(documetText);
var stringWriter = new StringWriter();
AgilityPackEntities.AgilityPack.ConvertTo(doc.DocumentNode, stringWriter);
stringWriter.Flush();
var titleNode = doc.DocumentNode.SelectNodes("//title");
if (titleNode != null)
{
    var titleToBeRemoved = titleNode[0].InnerText;
    document.DocumentContent = stringWriter.ToString().Replace(titleToBeRemoved, string.Empty);
}
else
{
    document.DocumentContent = stringWriter.ToString();
}

然后我返回文档对象。问题是字符串并不总是像我想要的那样格式化

4

1 回答 1

0

您应该可以使用StreamReader,并且当您阅读每一行时,只需使用StreamWriter

像这样的东西会一直读到你的文件末尾并将它保存到一个新文件中。如果您需要在文件中执行额外的逻辑,我会插入一条评论,让您知道在哪里执行所有这些操作。

private void button4_Click(object sender, EventArgs e)
        {
            System.IO.StreamWriter writer = new System.IO.StreamWriter("C:\\XXX\\XXX\\XXX\\test2.html");
            String line;
            using (System.IO.StreamReader reader = new System.IO.StreamReader("C:\\XXX\\XXX\\XXX\\test.html"))
            {
                //Do until the end
                while ((line = reader.ReadLine()) != null) {
                //You can insert extra logic here if you need to omit lines or change them
                writer.WriteLine(line);
                }
                //All done, close the reader
                reader.Close();
            }
            //Flush and close the writer
            writer.Flush();
            writer.Close();

        }

您也可以将其保存到字符串中,然后随心所欲地使用它。您可以使用新行来保持相同的格式。

编辑以下将考虑您的标签

  private void button4_Click(object sender, EventArgs e)
        {
            String line;
            String filetext = null;
            int count = 0;
            using (System.IO.StreamReader reader = new System.IO.StreamReader("C:\\XXXX\\XXXX\\XXXX\\test.html"))
            {
              while ((line = reader.ReadLine()) != null) { 
                if (count == 0) {
                    //No newline since its start
                    if (line.StartsWith("<")) {
                        //skip this it is formatted stuff
                    }
                    else {
                    filetext = filetext + line; 
                    }
                    }
                else {
                    if (line.StartsWith("<"))
                    {
                        //skip this it is formatted stuff
                    }
                    else
                    {
                        filetext = filetext + "\n" + line;
                    }
                }
                count++;                           
           }                
            Trace.WriteLine(filetext);                  
            reader.Close();
            }          
        }
于 2013-08-27T14:54:01.727 回答