0

我已经将模拟退火算法(Java 中)应用到我正在处理的个人项目中,但想知道如果用另一种语言(即 C++、Python 等)编写,SA 算法在相同数据集上的性能是否会稍好一些。

我创建的虚拟数据集包含五千个用户的姓名、邮政编码和出生日期。

SA 算法应用于数据集以提取许多不同的信息。

目前,我最近的测试是试图让 SA 算法检测出生日期在一周内(在任何给定年份)的所有用户。现在,SA 算法确实运行良好;但是,由于我是一个完美主义者,我想获得稍微快一点的结果,并想知道是否有人对 SA 有任何好的经验,在类似大小的数据集上产生出色的结果,但用其他语言编写?

目前,SA 算法只需不到 5 秒的时间即可执行成功搜索。

4

1 回答 1

3

我会用Java写

public class UserSearchMain {
    static class Person {
        int id;
        int dateOfBirth;

        static Person[] generatePeople(int num) {
            Random rand = new Random();
            Person[] people = new Person[num];
            for (int i = 0; i < num; i++) {
                Person p = new Person();
                p.id = i;
                p.dateOfBirth = rand.nextInt(80 * 365); // any day for 80 years.
                people[i] = p;
            }
            return people;
        }
    }

    public static void main(String... args) {
        Person[] people = Person.generatePeople(5000);
        List<List<Person>> peopleByDOB = new ArrayList<List<Person>>();
        for (Person person : people) {
            int dob = person.dateOfBirth;
            while (peopleByDOB.size() <= dob) peopleByDOB.add(new ArrayList<Person>());
            peopleByDOB.get(dob).add(person);
        }

        Random rand = new Random();

        int warmup = 12 * 1000;
        int runs = 1000 * 1000;
        long start = 0;

        for (int i = -warmup; i < runs; i++) {
            if (i == 0) start = System.nanoTime();
            int dayToSearch = rand.nextInt(80 * 365);
            for (int j = Math.max(0, dayToSearch - 7); j <= dayToSearch + 7 && j < peopleByDOB.size(); j++) {
                List<Person> peopleWithCloseDOM = peopleByDOB.get(j);
            }
        }
        long time = System.nanoTime() - start;
        System.out.printf("Each search took an average of %,d ns%n", time / runs);
    }
}

印刷

Each search took an average of 85 ns

通常,您使用的算法比语言的选择更重要。

编辑:在回答你原来的问题时,你能用相同的算法在 C++ 中让它更快吗?我猜是的,但不是很多。


为了进一步加快速度,您可以使用多个线程。

public static void main(String... args) throws ExecutionException, InterruptedException {
    Person[] people = Person.generatePeople(5000);
    final List<List<Person>> peopleByDOB = new ArrayList<List<Person>>();
    for (Person person : people) {
        int dob = person.dateOfBirth;
        while (peopleByDOB.size() <= dob) peopleByDOB.add(new ArrayList<Person>());
        peopleByDOB.get(dob).add(person);
    }

    final int runs = 10 * 1000 * 1000;
    long start = System.nanoTime();

    int processors = Runtime.getRuntime().availableProcessors();
    ExecutorService service = Executors.newFixedThreadPool(processors);
    List<Future> futures = new ArrayList<Future>();
    for (int i = 0; i < processors; i++) {
        futures.add(service.submit(new Runnable() {
            @Override
            public void run() {
                Random rand = new Random();
                for (int i = 0; i < runs; i++) {
                    int dayToSearch = rand.nextInt(80 * 365);
                    for (int j = Math.max(0, dayToSearch - 7); j <= dayToSearch + 7 && j < peopleByDOB.size(); j++) {
                        List<Person> peopleWithCloseDOM = peopleByDOB.get(j);
                    }
                }
            }
        }));
    }
    for (Future future : futures) {
        future.get();
    }
    service.shutdown();
    double timeSec = (System.nanoTime() - start) / 1e9;
    double rate = processors * runs / timeSec / 1e6;
    System.out.printf("The search throughput was %.1f million per second%n", rate);
}

印刷

The search throughput was 32.4 million per second

这意味着每次搜索的平均时间约为 31 ns。

于 2012-05-01T09:50:30.550 回答