我使用 wordnet 相似性 java api 来测量两个同义词集之间的相似性,如下所示:
 public class WordNetSimalarity {
 private static ILexicalDatabase db = new NictWordNet();
 private static RelatednessCalculator[] rcs = {
                 new HirstStOnge(db), new LeacockChodorow(db), new Lesk(db),  new WuPalmer(db), 
                 new Resnik(db), new JiangConrath(db), new Lin(db), new Path(db)
                 };
 public static double computeSimilarity( String word1, String word2 ) {
         WS4JConfiguration.getInstance().setMFS(true);
         double s=0;
         for ( RelatednessCalculator rc : rcs ) {
                 s = rc.calcRelatednessOfWords(word1, word2);
                // System.out.println( rc.getClass().getName()+"\t"+s );
         }
        return s;
 } 
主班
      public static void main(String[] args) {
         long t0 = System.currentTimeMillis();
         File source = new File ("TagsFiltered.txt");
         File target = new File ("fich4.txt");
         ArrayList<String> sList= new ArrayList<>();
         try {
             if (!target.exists()) target.createNewFile();
            Scanner scanner = new Scanner(source);
            PrintStream psStream= new PrintStream(target);
            while (scanner.hasNext()) {
                sList.add(scanner.nextLine());                  
            }
            for (int i = 0; i < sList.size(); i++) {
            for (int j = i+1; j < sList.size(); j++) {
                psStream.println(sList.get(i)+" "+sList.get(j)+" "+WordNetSimalarity.computeSimilarity(sList.get(i), sList.get(j)));
            }
        }   
            psStream.close();
        } catch (Exception e) {e.printStackTrace();
        }
         long t1 = System.currentTimeMillis();
         System.out.println( "Done in "+(t1-t0)+" msec." );
 }
我的数据库包含 595 个同义词集,这意味着方法computeSimilarity将被调用 (595*594/2) 时间来计算两个单词之间的相似度,它花费的时间超过5000 ms!所以要完成我的任务,我至少需要一个星期!
我的问题是如何减少这段时间!
怎么提高成绩??