0

我正在尝试从该网站http://movies.about.com/od/actorsalphalist/Actors_Detailed_Movie_News_Interviews_Websites.htm收集 HTML

我打开一个套接字并尝试读取和打印 HTML 页面的每一行。当我运行它时,我只会得到“EOF 为假”,然后是“1”。

我完全不确定出了什么问题,因为我知道这应该在另一个示例中起作用...非常感谢您的帮助!

import java.net.*;
import java.io.*;
import java.util.*;

public class Twitter {

    static final int DEFAULT_PORT = 80;

    protected DataInputStream reply = null;
    protected PrintStream send = null;
    protected Socket sock = null;

    // ***********************************************************
    // *** The constructors create the socket and set up the input
    // *** and output channels on that socket.

    public Twitter() throws UnknownHostException, IOException {
        this(DEFAULT_PORT);
    }

    public Twitter(int port) throws UnknownHostException, IOException {
        sock = new Socket("movies.about.com", port);
        System.out.println(sock);
        reply = new DataInputStream(sock.getInputStream());
        System.out.println();
        send = new PrintStream(sock.getOutputStream());
    }

    // ***********************************************************
    // *** forecast uses the socket that has already been created
    // *** to carry on a conversation with the Web server that it
    // *** has been contacted through the socket.

    public void forecast() {
        int i;
        String HTMLline;
        boolean eof, gotone;

        // *** This issues the same query that a Web browser would issue
        // *** to the Web server.

        try {
            send.println("GET /od/actorsalphalist/Actors_Detailed_Movie_News_Interviews_Websites.htm HTTP/1.1");
        } catch (Exception e) {
            System.out.println("about.com server is down.");
        }

        // *** This section parses the response from the Web server.
        // *** NOTE THAT "real" EOF does not occur until the Web server
        // *** has closed the connection.

        eof = false;
        gotone = false;
        while (!eof) {
            System.out.println("EOF is false");
            try {
                System.out.println("1");
                HTMLline = reply.readLine();
                System.out.println("2");
                System.out.println(HTMLline);
                System.out.println("Here?");
                if (HTMLline != null) {
                    System.out.println("its not null");
                }
                if (HTMLline == null) {
                    System.out.println("WTFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF");
                } else {
                    eof = true;
                    System.out.println("is it?");
                }
            } catch (Exception e) {
                System.out.println("this exception happend");
                e.printStackTrace();
                eof = true;
            }
        }
    }

    // ***********************************************************
    // *** We need to close the socket when this class is destroyed.

    protected void finalize() throws Throwable {
        sock.close();
    }

    // ***********************************************************
    // *** The main program creates a new Twitter class and
    // *** sends that class the command line args (via findNumber).

    public static void main(String[] args) {
        Twitter aboutCom;
        DataInputStream cin = new DataInputStream(System.in);

        try {
            aboutCom = new Twitter();
            aboutCom.forecast();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}
4

1 回答 1

1

您尚未发送有效的 HTTP 请求,因此服务器仍在等待您完成它。GET 行必须以 \r\n 结尾,然后您需要另一行作为空白行来分隔请求标头。

但是,您应该为此使用 URL、openConnection()、getInputStream() 等,而不是冗余地尝试自己重新实现 HTTP。正如你正在做的那样,你所得到的只是一个错误的机会。

于 2013-01-21T03:56:35.563 回答