2

老师让我写java从学校网站读取html文件并剪掉所有不需要的部分(该网页中不需要的所有内容),只留下网站中心的公告部分并另存为另一个html文件。

我现在可以将 html 文件读入 java,但不能编写代码来编辑(剪切不需要的)并保存为 html 文件。

到目前为止已经完成的代码是:

import java.io.*;
import java.net.*;

public class Html {

    public static void main(String[] args) throws IOException {

            URL chula = new URL("http://www.ise.eng.chula.ac.th");
            URLConnection yc = chula.openConnection();
            BufferedReader in = new BufferedReader(new InputStreamReader(yc
                    .getInputStream()));

            String inputLine;
            while ((inputLine = in.readLine()) != null) {
                System.out.println(inputLine);
            }
            in.close();
        } 
    }
}
4

1 回答 1

1

change System.out.println(inputLine); to:

    PrintWriter output = new PrintWriter("newFile.html");
    output.println(inputLine);

This will create a new file with all the contents of the inputLine typed into the new file.

I have edited the code you put, and think i found the answer you needed

What you have to do, is use a scanner, with the InputStreamReader. The scanner will read the file, or in this case, the URL that you are using. You then must create a new File using the PrintWriter class, and change the while loop to this:

    while(in.hasNext()) {}

This will read URL that you want, and it will go through each line of the file(URL) and will not stop until it reaches the end. Then you must create a String holding the information from the URl within the while loop. The last thing to do is just write the contents into the file, and ALWAYS make sure to close BOTH the scanner, and the file that you are writing to.

Here is the code:

    import java.io.*;
    import java.net.*;
    import java.util.*;

public class Html {
    public static void main(String[] args) throws IOException {

        URL chula = new URL("http://www.ise.eng.chula.ac.th");
        URLConnection yc = chula.openConnection();
        //BufferedReader in = new BufferedReader(new InputStreamReader(yc
          //      .getInputStream()));
        Scanner in = new Scanner(new InputStreamReader(yc.getInputStream()));

        PrintWriter output = new PrintWriter("newFile.html");
        while (in.hasNext()) {
            String inputLine = in.nextLine();
            output.println(inputLine);
        }
        in.close();
        output.close();
    } 
}

Hope this helps!

于 2013-04-29T12:40:14.893 回答