0

我需要一些帮助来解决一个问题。我正在尝试从文本文件中加载我的 2000 个代理列表,但我的类只用它在每一行读取的内容填充 1040 个数组索引。

我不知道该怎么办。:(

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

public class ProxyLoader {

private String[] lineSplit = new String[100000];
private static String[] addresses = new String[100000];
private static int[] ports = new int[100000];
public int i = 0;

public ProxyLoader() {
    readData();
}

public synchronized String getAddr(int i) {
    return this.addresses[i];
}

public synchronized int getPort(int i) {
    return this.ports[i];
}

public synchronized void readData() {
    try {
        BufferedReader br = new BufferedReader(
                new FileReader("./proxy.txt"));
        String line = "";

        try {
            while ((line = br.readLine()) != null) {

                lineSplit = line.split(":");
                i++;

                addresses[i] = lineSplit[0];
                ports[i] = Integer.parseInt(lineSplit[1]);
                System.out.println("Line Number [" + i + "]  Adr: "
                        + addresses[i] + " Port: " + ports[i]);
            }

            for (String s : addresses) {
                if (s == null) {
                    s = "127.0.0.1";
                }
            }

            for (int x : ports) {
                if (x == 0) {
                    x = 8080;
                }
            }

        } catch (IOException e) {
            e.printStackTrace();
        }
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
}

}
4

2 回答 2

1

让我们从整理你的代码开始,有很多问题可能会给你带来麻烦。但是,如果没有代理文件的相关部分,我们就无法测试或复制您所看到的行为。考虑创建和发布SSCCE,而不仅仅是代码片段。

  1. 正确缩进/格式化您的代码。
  2. 这些方法不需要(不应该)synchronized- 在多线程环境中从数组读取是安全的,并且您永远不应该ProxyLoader在不同线程上构造多个实例,这意味着synchronizedonreadData()简直是浪费。
  3. 创建大量数组是存储这些数据的一种非常糟糕的方式——分配那么多额外的内存是浪费的,如果正在加载的文件碰巧大于您设置的常量大小,您的代码现在将失败。使用可伸缩的数据结构,例如 anArrayList或 a Map
  4. 您将地址和端口存储在单独的数组中,使用一个对象来保存这两个值将节省内存并防止数据不一致。
  5. 您的public int i变量很危险-大概您正在使用它来表示加载的最大行数,但是应该避免使用该方法代替size()方法-作为公共实例变量,使用该类的任何人都可以更改此值,并且i很差变量的名称,max是一个更好的选择。
  6. 您可能不想readData()公开,因为多次调用它会做非常奇怪的事情(它会再次加载文件,从 开始i,用重复数据填充您的数组)。最好的想法是直接在构造函数中加载数据(或在构造函数private调用的方法中),这样文件只会为每个ProxyLoader创建的实例加载一次。
  7. 您正在创建一个巨大的空数组lineSplit,然后将其替换为String.split(). 这令人困惑和浪费,请使用局部变量来保存分割线。
  8. 读取文件后您没有关闭文件,这可能会导致内存泄漏或其他与数据不一致的情况。使用try-with-resources语法有助于简化此操作。
  9. 在填充它们之后循环整个字符串和端口数组,用本质上是噪音的东西填充剩余的插槽。目前尚不清楚您这样做是为了达到什么目的,但我敢肯定这是一个糟糕的计划。

我建议以下实现:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;

public class ProxyLoader implements Iterable<ProxyLoader.Proxy> {
  // Remove DEFAULT_PROXY if not needed
  private static final Proxy DEFAULT_PROXY = new Proxy("127.0.0.1", 8080);
  private static final String DATA_FILE = "./proxy.txt";
  private ArrayList<Proxy> proxyList = new ArrayList<>();

  public ProxyLoader() {
    // Try-with-resources ensures file is closed safely and cleanly
    try(BufferedReader br = new BufferedReader(new FileReader(DATA_FILE))) {
      String line;
      while ((line = br.readLine()) != null) {
        String[] lineSplit = line.split(":");
        Proxy p = new Proxy(lineSplit[0], Integer.parseInt(lineSplit[1]));
        proxyList.add(p);
      }
    } catch (IOException e) {
      System.err.println("Failed to open/read "+DATA_FILE);
      e.printStackTrace(System.err);
    }
  }

  // If you request a positive index larger than the size of the file, it will return
  // DEFAULT_PROXY, since that's the behavior your original implementation
  // essentially did.  I'd suggest deleting DEFAULT_PROXY, having this method simply
  // return proxyList.get(i), and letting it fail if you request an invalid index.
  public Proxy getProxy(int i) {
    if(i < proxyList.size()) {
      return proxyList.get(i);
    } else {
      return DEFAULT_PROXY;
    }
  }

  // Lets you safely get the maximum index, without exposing the list directly
  public int getSize() {
    return proxyList.size();
  }

  // lets you run for(Proxy p : proxyLoader) { ... }
  @Override
  public Iterator<Proxy> iterator() {
    return proxyList.iterator();
  }

  // Inner static class just to hold data
  // can be pulled out into its own file if you prefer
  public static class Proxy {
    // note these values are public; since they're final, this is safe.
    // Using getters is more standard, but it adds a lot of boilerplate code
    // somewhat needlessly; for a simple case like this, public final should be fine.
    public final String address;
    public int port;

    public Proxy(String a, int p) {
      address = a;
      port = p;
    }
  }
}
于 2013-10-16T14:34:28.593 回答
1

我包含了一些可能不完全适合您的用例的示例,但展示了一些编写更易于维护和阅读的代码的方法。

难以阅读、难以调试和维护的代码。

  • 对象需要验证它们的输入(构造函数参数)。
  • 拒绝不良数据。在更难调试时尽快失败。
  • 除非您可以恢复,否则永远不要捕获异常。要么软化它(在运行时包装并重新抛出它),要么将它添加到你的 throws 子句中。如果你不知道如何处理它,什么也不做。
  • 从来不吃例外。重新扔掉它或处理它。
  • 您的代码坚持声明它不需要。
  • 类比两个 gak 数组更具自我描述性。
  • 避免公共变量。除非他们是最终的。
  • 保护对象的状态。
  • 想想你的方法将如何被调用,并避免副作用。调用 readData 两次会导致奇怪的难以调试的副作用
  • 内存很便宜,但不是免费的。不要实例化不需要的大型数组。
  • 如果你打开它,你必须关闭它。

Java 7 和 8 允许您从 FileSystem 读取行,因此无需编写大部分代码即可:

 Path thePath = FileSystems.getDefault().getPath(location);
 return Files.readAllLines(thePath, Charset.forName("UTF-8"));

如果您必须将大量小文件读入行中并且不想使用 FileSystem,或者您正在使用 Java 6 或 Java 5,那么您将创建一个实用程序类,如下所示:

public class IOUtils {

    public final static String CHARSET = "UTF-8";

...

public static List<String> readLines(File file) {
    try (FileReader reader = new FileReader(file)) {
        return readLines(reader);
    } catch (Exception ex) {
        return Exceptions.handle(List.class, ex);
    }
}

它调用带有 Reader 的 readLines:

public static List<String> readLines(Reader reader) {

    try (BufferedReader bufferedReader = new BufferedReader(reader)) {
          return readLines(bufferedReader);
    } catch (Exception ex) {
        return Exceptions.handle(List.class, ex);
    }
}

它调用了带有 BufferedReader 的 readLines:

public static List<String> readLines(BufferedReader reader) {
    List<String> lines = new ArrayList<>(80);

    try (BufferedReader bufferedReader = reader) {


        String line = null;
        while ( (line = bufferedReader.readLine()) != null) {
        lines.add(line);
        }

    } catch (Exception ex) {

        return Exceptions.handle(List.class, ex);
    }
    return lines;
}

Apache 有一组称为 Apache commons ( http://commons.apache.org/ ) 的实用程序。它包括 lang 和 IO 工具(http://commons.apache.org/proper/commons-io/)。如果您使用的是 Java 5 或 Java 6,那么拥有其中任何一个都很好。

回到我们的示例,您可以将任何位置转换为行列表:

public static List<String> readLines(String location) {
    URI uri =  URI.create(location);

    try {

        if ( uri.getScheme()==null ) {

            Path thePath = FileSystems.getDefault().getPath(location);
            return Files.readAllLines(thePath, Charset.forName("UTF-8"));

        } else if ( uri.getScheme().equals("file") ) {

            Path thePath = FileSystems.getDefault().getPath(uri.getPath());
            return Files.readAllLines(thePath, Charset.forName("UTF-8"));

        } else {
            return readLines(location, uri);
        }

    } catch (Exception ex) {
         return Exceptions.handle(List.class, ex);
    }

}

FileSystem、Path、URI等都在JDK中。

继续示例:

private static List<String> readLines(String location, URI uri) throws Exception {
    try {

        FileSystem fileSystem = FileSystems.getFileSystem(uri);
        Path fsPath = fileSystem.getPath(location);
        return Files.readAllLines(fsPath, Charset.forName("UTF-8"));

    } catch (ProviderNotFoundException ex) {
         return readLines(uri.toURL().openStream());
    }
}

上面尝试从 FileSystem 读取 uri,如果无法加载,则通过 URL 流查找。URL、URI、File、FileSystem 等都是 JDK 的一部分。

要将 URL 流转换为 Reader,然后转换为字符串,我们使用:

public static List<String> readLines(InputStream is) {

    try (Reader reader = new InputStreamReader(is, CHARSET)) {

        return readLines(reader);

    } catch (Exception ex) {

        return Exceptions.handle(List.class, ex);
    }
}

:)

现在让我们回到我们的示例(我们现在可以从任何地方读取行,包括文件):

public static final class Proxy {
    private final String address;
    private final int port;
    private static final String DATA_FILE = "./files/proxy.txt";

    private static final Pattern addressPattern = Pattern.compile("^(\\d{1,3}[.]{1}){3}[0-9]{1,3}$");

    private Proxy(String address, int port) {

        /* Validate address in not null.*/
        Objects.requireNonNull(address, "address should not be null");

        /* Validate port is in range. */
        if (port < 1 || port > 65535) {
            throw new IllegalArgumentException("Port is not in range port=" + port);
        }

        /* Validate address is of the form 123.12.1.5 .*/
        if (!addressPattern.matcher(address).matches()) {
            throw new IllegalArgumentException("Invalid Inet address");
        }

        /* Now initialize our address and port. */
        this.address = address;
        this.port = port;
    }

    private static Proxy createProxy(String line) {
        String[] lineSplit = line.split(":");
        String address = lineSplit[0];
        int port =  parseInt(lineSplit[1]);
        return new Proxy(address, port);
    }

    public final String getAddress() {
        return address;
    }

    public final int getPort() {
        return port;
    }

    public static List<Proxy> loadProxies() {
        List <String> lines = IOUtils.readLines(DATA_FILE);
        List<Proxy> proxyList  = new ArrayList<>(lines.size());

        for (String line : lines) {
            proxyList.add(createProxy(line));
        }
        return proxyList;
    }

}

请注意,我们没有任何不可变状态。这可以防止错误。它使您的代码更易于调试和支持。

注意我们的 IOUtils.readLines 从文件系统中读取行。

请注意构造函数中的额外工作,以确保没有人初始化具有错误状态的代理实例。这些都在 JDK Objects、Pattern 等中。

如果你想要一个可重用的 ProxyLoader,它看起来像这样:

public static class ProxyLoader {
    private static final String DATA_FILE = "./files/proxy.txt";


    private List<Proxy> proxyList = Collections.EMPTY_LIST;
    private final String dataFile;

    public ProxyLoader() {
        this.dataFile = DATA_FILE;
        init();
    }

    public ProxyLoader(String dataFile) {
        this.dataFile = DATA_FILE;
        init();
    }

    private void init() {
        List <String> lines = IO.readLines(dataFile);
        proxyList = new ArrayList<>(lines.size());

        for (String line : lines) {
            proxyList.add(Proxy.createProxy(line));
        }
    }

    public String getDataFile() {
        return this.dataFile;
    }

    public static List<Proxy> loadProxies() {
            return new ProxyLoader().getProxyList();
    }

    public List<Proxy> getProxyList() {
        return proxyList;
    }
   ...

}

public static class Proxy {
    private final String address;
    private final int port;

    ...

    public Proxy(String address, int port) {
        ... 
        this.address = address;
        this.port = port;
    }

    public static Proxy createProxy(String line) {
        String[] lineSplit = line.split(":");
        String address = lineSplit[0];
        int port =  parseInt(lineSplit[1]);
        return new Proxy(address, port);
    }

    public String getAddress() {
        return address;
    }

    public int getPort() {
        return port;
    }
}

编码很棒。测试是神圣的!以下是该示例的一些测试。

public static class ProxyLoader {
    private static final String DATA_FILE = "./files/proxy.txt";


    private List<Proxy> proxyList = Collections.EMPTY_LIST;
    private final String dataFile;

    public ProxyLoader() {
        this.dataFile = DATA_FILE;
        init();
    }

    public ProxyLoader(String dataFile) {
        this.dataFile = DATA_FILE;
        init();
    }

    private void init() {
        List <String> lines = IO.readLines(dataFile);
        proxyList = new ArrayList<>(lines.size());

        for (String line : lines) {
            proxyList.add(Proxy.createProxy(line));
        }
    }

    public String getDataFile() {
        return this.dataFile;
    }

    public static List<Proxy> loadProxies() {
            return new ProxyLoader().getProxyList();
    }

    public List<Proxy> getProxyList() {
        return proxyList;
    }

}

public static class Proxy {
    private final String address;
    private final int port;

    public Proxy(String address, int port) {
        this.address = address;
        this.port = port;
    }

    public static Proxy createProxy(String line) {
        String[] lineSplit = line.split(":");
        String address = lineSplit[0];
        int port =  parseInt(lineSplit[1]);
        return new Proxy(address, port);
    }

    public String getAddress() {
        return address;
    }

    public int getPort() {
        return port;
    }
}

这是一个类的替代方案。(我在 ProxyLoader 中没有看到太多意义)。

public static final class Proxy2 {
    private final String address;
    private final int port;
    private static final String DATA_FILE = "./files/proxy.txt";

    private static final Pattern addressPattern = Pattern.compile("^(\\d{1,3}[.]{1}){3}[0-9]{1,3}$");

    private Proxy2(String address, int port) {

        /* Validate address in not null.*/
        Objects.requireNonNull(address, "address should not be null");

        /* Validate port is in range. */
        if (port < 1 || port > 65535) {
            throw new IllegalArgumentException("Port is not in range port=" + port);
        }

        /* Validate address is of the form 123.12.1.5 .*/
        if (!addressPattern.matcher(address).matches()) {
            throw new IllegalArgumentException("Invalid Inet address");
        }

        /* Now initialize our address and port. */
        this.address = address;
        this.port = port;
    }

    private static Proxy2 createProxy(String line) {
        String[] lineSplit = line.split(":");
        String address = lineSplit[0];
        int port =  parseInt(lineSplit[1]);
        return new Proxy2(address, port);
    }

    public final String getAddress() {
        return address;
    }

    public final int getPort() {
        return port;
    }

    public static List<Proxy2> loadProxies() {
        List <String> lines = IO.readLines(DATA_FILE);
        List<Proxy2> proxyList  = new ArrayList<>(lines.size());

        for (String line : lines) {
            proxyList.add(createProxy(line));
        }
        return proxyList;
    }

}

现在我们编写测试(测试和 TDD 帮助您解决这些问题):

@Test public void proxyTest() {
    List<Proxy> proxyList = ProxyLoader.loadProxies();
    assertEquals(
            5, len(proxyList)
    );


    assertEquals(
            "127.0.0.1", idx(proxyList, 0).getAddress()
    );



    assertEquals(
            8080, idx(proxyList, 0).getPort()
    );


    //192.55.55.57:9091
    assertEquals(
            "192.55.55.57", idx(proxyList, -1).getAddress()
    );



    assertEquals(
            9091, idx(proxyList, -1).getPort()
    );


}

idx 等是在我自己的名为 boon 的帮助程序库中定义的。idx 方法的工作方式类似于 Python 或 Ruby 切片表示法。

@Test public void proxyTest2() {
    List<Proxy2> proxyList = Proxy2.loadProxies();
    assertEquals(
            5, len(proxyList)
    );


    assertEquals(
            "127.0.0.1", idx(proxyList, 0).getAddress()
    );



    assertEquals(
            8080, idx(proxyList, 0).getPort()
    );


    //192.55.55.57:9091
    assertEquals(
            "192.55.55.57", idx(proxyList, -1).getAddress()
    );



    assertEquals(
            9091, idx(proxyList, -1).getPort()
    );


}

我的输入文件

127.0.0.1:8080
192.55.55.55:9090
127.0.0.2:8080
192.55.55.56:9090
192.55.55.57:9091

那么我的 IOUtils(实际上称为 IO)呢:

这是对那些关心 IO (utils) 的人的测试:

package org.boon.utils;

import com.sun.net.httpserver.Headers;
import com.sun.net.httpserver.HttpExchange;
import com.sun.net.httpserver.HttpHandler;
import com.sun.net.httpserver.HttpServer;
import org.junit.Test;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.InetSocketAddress;
import java.net.URI;
import java.util.*;
import java.util.regex.Pattern;

import static javax.lang.Integer.parseInt;
import static org.boon.utils.Lists.idx;
import static org.boon.utils.Lists.len;
import static org.boon.utils.Maps.copy;
import static org.boon.utils.Maps.map;
import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertTrue;

...

这让您了解所涉及的进口。

public class IOTest {

....

这是从文件系统上的文件中读取行的测试。

@Test
public void testReadLines() {
    File testDir = new File("src/test/resources");
    File testFile = new File(testDir, "testfile.txt");


    List<String> lines = IO.readLines(testFile);

    assertLines(lines);

}

这是断言文件已正确读取的辅助方法。

private void assertLines(List<String> lines) {

    assertEquals(
            4, len(lines)
    );


    assertEquals(
            "line 1", idx(lines, 0)
    );



    assertEquals(
            "grapes", idx(lines, 3)
    );
}

这是一个显示从字符串路径读取文件的测试。

@Test
public void testReadLinesFromPath() {


    List<String> lines = IO.readLines("src/test/resources/testfile.txt");

    assertLines(lines);



}

此测试显示从 URI 读取文件。

@Test
public void testReadLinesURI() {

    File testDir = new File("src/test/resources");
    File testFile = new File(testDir, "testfile.txt");
    URI uri = testFile.toURI();


    //"file:///....src/test/resources/testfile.txt"
    List<String> lines = IO.readLines(uri.toString());
    assertLines(lines);


}

这是一个测试,显示您可以从 HTTP 服务器读取文件中的行:

static class MyHandler implements HttpHandler {
    public void handle(HttpExchange t) throws IOException {

        File testDir = new File("src/test/resources");
        File testFile = new File(testDir, "testfile.txt");
        String body = IO.read(testFile);
        t.sendResponseHeaders(200, body.length());
        OutputStream os = t.getResponseBody();
        os.write(body.getBytes(IO.CHARSET));
        os.close();
    }
}

这是 HTTP 服务器测试(它实例化了一个 HTTP 服务器)。

@Test
public void testReadFromHttp() throws Exception {

    HttpServer server = HttpServer.create(new InetSocketAddress(9666), 0);
    server.createContext("/test", new MyHandler());
    server.setExecutor(null); // creates a default executor
    server.start();

    Thread.sleep(1000);

    List<String> lines = IO.readLines("http://localhost:9666/test");
    assertLines(lines);

}

这是代理缓存测试:

public static class ProxyLoader {
    private static final String DATA_FILE = "./files/proxy.txt";


    private List<Proxy> proxyList = Collections.EMPTY_LIST;
    private final String dataFile;

    public ProxyLoader() {
        this.dataFile = DATA_FILE;
        init();
    }

    public ProxyLoader(String dataFile) {
        this.dataFile = DATA_FILE;
        init();
    }

    private void init() {
        List <String> lines = IO.readLines(dataFile);
        proxyList = new ArrayList<>(lines.size());

        for (String line : lines) {
            proxyList.add(Proxy.createProxy(line));
        }
    }

    public String getDataFile() {
        return this.dataFile;
    }

    public static List<Proxy> loadProxies() {
            return new ProxyLoader().getProxyList();
    }

    public List<Proxy> getProxyList() {
        return proxyList;
    }

}

public static class Proxy {
    private final String address;
    private final int port;

    public Proxy(String address, int port) {
        this.address = address;
        this.port = port;
    }

    public static Proxy createProxy(String line) {
        String[] lineSplit = line.split(":");
        String address = lineSplit[0];
        int port =  parseInt(lineSplit[1]);
        return new Proxy(address, port);
    }

    public String getAddress() {
        return address;
    }

    public int getPort() {
        return port;
    }
}


public static final class Proxy2 {
    private final String address;
    private final int port;
    private static final String DATA_FILE = "./files/proxy.txt";

    private static final Pattern addressPattern = Pattern.compile("^(\\d{1,3}[.]{1}){3}[0-9]{1,3}$");

    private Proxy2(String address, int port) {

        /* Validate address in not null.*/
        Objects.requireNonNull(address, "address should not be null");

        /* Validate port is in range. */
        if (port < 1 || port > 65535) {
            throw new IllegalArgumentException("Port is not in range port=" + port);
        }

        /* Validate address is of the form 123.12.1.5 .*/
        if (!addressPattern.matcher(address).matches()) {
            throw new IllegalArgumentException("Invalid Inet address");
        }

        /* Now initialize our address and port. */
        this.address = address;
        this.port = port;
    }

    private static Proxy2 createProxy(String line) {
        String[] lineSplit = line.split(":");
        String address = lineSplit[0];
        int port =  parseInt(lineSplit[1]);
        return new Proxy2(address, port);
    }

    public final String getAddress() {
        return address;
    }

    public final int getPort() {
        return port;
    }

    public static List<Proxy2> loadProxies() {
        List <String> lines = IO.readLines(DATA_FILE);
        List<Proxy2> proxyList  = new ArrayList<>(lines.size());

        for (String line : lines) {
            proxyList.add(createProxy(line));
        }
        return proxyList;
    }

}

@Test public void proxyTest() {
    List<Proxy> proxyList = ProxyLoader.loadProxies();
    assertEquals(
            5, len(proxyList)
    );


    assertEquals(
            "127.0.0.1", idx(proxyList, 0).getAddress()
    );



    assertEquals(
            8080, idx(proxyList, 0).getPort()
    );


    //192.55.55.57:9091
    assertEquals(
            "192.55.55.57", idx(proxyList, -1).getAddress()
    );



    assertEquals(
            9091, idx(proxyList, -1).getPort()
    );


}

这是实际的代理缓存测试:

@Test public void proxyTest2() {
    List<Proxy2> proxyList = Proxy2.loadProxies();
    assertEquals(
            5, len(proxyList)
    );


    assertEquals(
            "127.0.0.1", idx(proxyList, 0).getAddress()
    );



    assertEquals(
            8080, idx(proxyList, 0).getPort()
    );


    //192.55.55.57:9091
    assertEquals(
            "192.55.55.57", idx(proxyList, -1).getAddress()
    );



    assertEquals(
            9091, idx(proxyList, -1).getPort()
    );


}

}

您可以在此处查看此示例和此实用程序类的所有源代码:

https://github.com/RichardHightower/boon

https://github.com/RichardHightower/boon/blob/master/src/main/java/org/boon/utils/IO.java

或者来找我:

http://rick-hightower.blogspot.com/

于 2013-10-17T03:44:07.057 回答