java - 面试编码 Java 排序

Question

编写一个 java 程序从文件中读取输入，然后对每个单词中的字符进行排序。完成此操作后，按升序对所有结果单词进行排序，最后是文件中数值的总和。

处理数据时删除特殊字符和停用词
测量执行代码所花费的时间

假设文件的内容是：Sachin Tendulkar 获得了 18111 次 ODI 运行和 14692 次测试运行。

输出：achins adeklnrtu adn cdeors dio estt nrsu nrsu 32803

耗时：3毫秒

我的代码需要 15 毫秒才能执行.....

请建议我任何快速的方法来解决这个问题............

代码：

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.*;

public class Sorting {

    public static void main(String[] ags)throws Exception
    {
        long st=System.currentTimeMillis();
        int v=0;
        List ls=new ArrayList();
        //To read data from file
        BufferedReader in=new BufferedReader(
                 new FileReader("D:\\Bhive\\File.txt"));
        String read=in.readLine().toLowerCase();
        //Spliting the string based on spaces
        String[] sp=read.replaceAll("\\.","").split(" ");
        for(int i=0;i<sp.length;i++)
        {
            //Check for the array if it matches number
            if(sp[i].matches("(\\d+)"))
                //Adding the numbers
                v+=Integer.parseInt(sp[i]);
            else
            {
                //sorting the characters
                char[] c=sp[i].toCharArray();
                Arrays.sort(c);
                String r=new String(c);
                //Adding the resulting word into list
                ls.add(r);
            }
        }
        //Sorting the resulting words in ascending order
        Collections.sort(ls);
        //Appending the number in the end of the list
        ls.add(v);
        //Displaying the string using Iteartor
        Iterator it=ls.iterator();
        while(it.hasNext())
            System.out.print(it.next()+" ");
        long time=System.currentTimeMillis()-st;
        System.out.println("\n Time Taken:"+time);
    }
}

score 5 · Accepted Answer

Use indexOf() to extract words from your string instead of split(" "). It improves performance.

See this thread: Performance of StringTokenizer class vs. split method in Java

Also, try to increase the size of the output, copy-paste the line Sachin Tendulkar scored 18111 ODI runs and 14692 Test runs. 50,000 times in the text file and measure the performance. That way, you will be able to see considerable time difference when you try different optimizations.

EDIT

Tested this code (used .indexOf())

        long st = System.currentTimeMillis();
        int v = 0;
        List ls = new ArrayList();
        // To read data from file
        BufferedReader in = new BufferedReader(new FileReader("D:\\File.txt"));
        String read = in.readLine().toLowerCase();
        read.replaceAll("\\.", "");
        int pos = 0, end;
        while ((end = read.indexOf(' ', pos)) >= 0) {
            String curString = read.substring(pos,end);
            pos = end + 1;
        // Check for the array if it matches number
            try {
                // Adding the numbers
                v += Integer.parseInt(curString);
            }
            catch (NumberFormatException e) {
                // sorting the characters
                char[] c = curString.toCharArray();
                Arrays.sort(c);
                String r = new String(c);
                // Adding the resulting word into TreeSet
                ls.add(r);
            }
        }
        //sorting the list
        Collections.sort(ls);
        //adding the number
        list.add(v);
        // Displaying the string using Iteartor 
        Iterator<String> it = ls.iterator();
        while (it.hasNext()) {
            System.out.print(it.next() + " ");
        }
        long time = System.currentTimeMillis() - st;
        System.out.println("\n Time Taken: " + time + " ms");

Performance using 1 line in file
Your code: 3 ms
My code: 2 ms

Performance using 50K lines in file
Your code: 45 ms
My code: 32 ms

As you see, the difference is significant when the input size increases. Please test it on your machine and share results.

score 3 · Accepted Answer

我唯一看到的是：以下行是不必要的昂贵：

   System.out.print(it.next()+" ");

那是因为打印效率低下，因为所有的冲洗都在进行。相反，使用字符串构建器构造整个字符串，然后减少到一次 print 调用。

score 1 · Accepted Answer

我使用 PriorityQueue 而不是 List 运行相同的代码。此外，正如 nes1983 建议的那样，首先构建输出字符串，而不是单独打印每个单词有助于减少运行时间。

这些修改后我的运行时间肯定减少了。

score 1 · Accepted Answer

我删除了列表并仅使用数组读取它，在我的机器中，使用您的代码将代码变为 6 毫秒，仅使用数组需要 4 到 5 毫秒。在你的机器上运行这段代码，让我知道时间。

import java.io.BufferedReader;

import java.io.FileReader;

import java.util.*;

public class Sorting {
public static void main(String[] ags)throws Exception
{
    long st=System.currentTimeMillis();
    int v=0;
    //To read data from file
    BufferedReader in=new BufferedReader(new FileReader("File.txt"));
    String read=in.readLine().toLowerCase();
    //Spliting the string based on spaces
    String[] sp=read.replaceAll("\\.","").split(" ");
    int j=0;
    for(int i=0;i<sp.length;i++)
    {
        //Check for the array if it matches number
        if(sp[i].matches("(\\d+)"))
            //Adding the numbers
            v+=Integer.parseInt(sp[i]);
        else
        {
            //sorting the characters
            char[] c=sp[i].toCharArray();
            Arrays.sort(c);
            read=new String(c);
            sp[j]= read;
            j++;
        }
    }
    //Sorting the resulting words in ascending order
    Arrays.sort(sp);
    //Appending the number in the end of the list
    //Displaying the string using Iteartor
    for(int i=0;i<j; i++)
        System.out.print(sp[i]+" ");
        System.out.print(v);
    st=System.currentTimeMillis()-st;
    System.out.println("\n Time Taken:"+st);
}

}

score 0 · Accepted Answer

我还通过包含@Teja 逻辑进一步修改了这样的代码，并从 2 毫秒产生了 1 毫秒：

long st=System.currentTimeMillis();
     BufferedReader in=new BufferedReader(new InputStreamReader(new FileInputStream("D:\\Bhive\\File.txt")));
     String read= in.readLine().toLowerCase();
     String[] sp=read.replaceAll("\\.","").split(" ");
     int v=0;
     int len = sp.length;
     int j=0;
     for(int i=0;i<len;i++)
     {
            if(isNum(sp[i]))
             v+=Integer.parseInt(sp[i]);
             else
            {
              char[] c=sp[i].toCharArray();
              Arrays.sort(c);
              String r=new String(c);
              sp[j] = r;
              j++;
             }
      }
        Arrays.sort(sp, 0, len);
        long time=System.currentTimeMillis()-st;
        System.out.println("\n Time Taken:"+time);
        for(int i=0;i<j; i++)
        System.out.print(sp[i]+" ");
        System.out.print(v);

编写了一个小实用程序来执行检查字符串是否包含数字而不是正则表达式：

private static boolean isNum(String cs){
     char [] s = cs.toCharArray();
     for(char c : s)
     {
      if(Character.isDigit(c))
       {
         return true;
       }
     }
     return false;
 }

在调用 System.out 操作之前计算时间，因为这是阻塞操作。

java - 面试编码 Java 排序

5 回答 5

Related

Reference