1

我正在使用 jsoup 从 html 页面中提取数据。如果页面只有一个 iframe,我可以提取数据。但是,如果页面有可以打开另一个 iframe 的链接,我如何从第二个 iframe 中提取数据并将所有数据写入一个 xml 文件。请帮助我。

4

1 回答 1

3

一种方法是解析父网站的 iframe 标签并提取“src”。“src”值可用于下载每个 iframe 内容并对其进行解析,如果确实有必要也将它们组合起来。

    String url = "http://example.com/";
    Document document = Jsoup.connect("url").get();

Elements es = document.select("iframe"); 

String[] iframesrc;
int iframeCount = es.size();
iframesrc = new String [iframeCount];
//extract iFrame sources:
int i=0;
for(Element e : es)
{
    iframesrc[i] = e.getElementsByTag("iframe").attr("src"); 
    i++;
}

//get iFrame content
Document [] iframeDoc;
iframeDoc = new Document[iframeCount];
int j = 0;
for (String s : iframesrc){
    iframeDoc[j] = Jsoup.connect("url"+iframesrc[j]).get(); //pay attention that the correct url is built at this point!!!
j++;
}

/*now you got the parent site as well as the iframe "childs" as documents. I've no experience in combining Documents. If nothing works you may try document.tostring()*/

要将文档写入文件,我使用以下代码:

import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;

import org.jsoup.nodes.Document;


public class Write2File {
     public static void saveFile(Document xmlContent, String saveLocation) throws IOException {
         FileWriter fileWriter = new FileWriter(saveLocation);
         BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
         bufferedWriter.write(xmlContent.toString());
         bufferedWriter.close();
         System.out.println("File writing completed.");
     }
}
于 2012-06-10T00:28:19.687 回答