0

我构建了一个代码来使用 Jsoup 将整个页面下载为 HTML。下载部分按预期工作。但我的问题是 -当我打开下载的文件时,该页面在浏览器中被多次复制,但我不知道出了什么问题。查看下面的代码:

public class httptest {

    static File file;
    String crawlingNode;
    static BufferedWriter writer = null;
    static httptest ht;

    public httptest() throws IOException{

            file = new File(//***SET HERE YOUR TEST PATH***);   

    }

    private void GetLinks() throws IOException{

        Document doc = Jsoup.connect("http://google.com/search?q=mamamia")
                    .userAgent("Mozilla/5.0 (X11; U; Linux x86_64; en-GB; rv:1.8.1.6) Gecko/20070723 Iceweasel/2.0.0.6 (Debian-2.0.0.6-0etch1)")
                    .cookie("auth", "token")
                    .timeout(3000)
                    .get();

        Elements links = doc.select("*");
            String crawlingNode = links.html();
                System.out.println(crawlingNode);
                    httptest.WriteOnFile(writer, crawlingNode);

    }


       private static void OpenWriter(File file){
           try {
                writer = new BufferedWriter(new FileWriter(file));

        } catch (IOException e) {

            JOptionPane.showMessageDialog(null, "Failed to open URL Writer");
                e.printStackTrace();

        }

       }

       private static void WriteOnFile(BufferedWriter writer, String crawlingNode){

           try {

                writer.write(crawlingNode);
        } catch (IOException e) {

            JOptionPane.showMessageDialog(null, "Failed to write URL Node");
                e.printStackTrace();

        }

       }


       private static void CloseWriter(BufferedWriter writer){
           try {

                writer.close();

           } catch (IOException e) {

               JOptionPane.showMessageDialog(null, "Unable to close URL Writer");
                System.err.println(e);

           }
       }

       public static void main (String[] args) throws IOException{

            ht = new httptest();
            httptest.OpenWriter(file);
            ht.GetLinks();
            httptest.CloseWriter(writer);

    }

}

代码的某些部分可能看起来很奇怪,但请记住这是 SSCCE 代码版本。请问有什么可能有帮助的想法吗?提前致谢。

4

1 回答 1

1

代替:

Elements links = doc.select("*");
    String crawlingNode = links.html();
        System.out.println(crawlingNode);
            httptest.WriteOnFile(writer, crawlingNode);

采用:

  Element links = doc.select("*").first();
            String crawlingNode = links.html();
                System.out.println(crawlingNode);
                    httptest.WriteOnFile(writer, crawlingNode);

我认为 Elements 类型使用起来更加复杂和详细。我发现此代码更改分析此来源:http: //jsoup.org/cookbook/extracting-data/attributes-text-html

无论如何,这个解决方案对我有用。

于 2013-07-17T10:04:12.217 回答