java - 加载 CSV 时的性能调整

Question

我附上了下面的代码

功能性

用 webmacro 替换值后读取 csv 并插入 db。

从 csv 读取值@第一个标题信息NO,NAME旁边一个一个读取值并放入 webmacro 上下文 context.put("1","RAJARAJAN") 下一个 webmacro 替换 $(NO) ==>1 和 $( NAME)==>RAJARAJAN 并在语句批次达到 1000 后添加执行批次。

代码按功能运行，但解析 50,000 条记录需要 4 分钟，需要性能改进或需要更改逻辑......如果有任何疑问，请告诉我。任何对剧烈性能的改变......

注意：我使用 webmacro 是因为将合并查询中的 $(NO) 替换为 CSV 中读取的值

巴拉.csv

    NO?NAME
    1?RAJARAJAN
    2?ARUN
    3?ARUNKUMAR

Connection con=null;
Statement stmt=null;
Connection con1=null;
int counter=0;
    try{
         WebMacro wm = new WM();
         Context context = wm.getContext();
         String strFilePath = "/home/vbalamurugan/3A/email-1822820895/Bala.csv";
         String msg="merge into temp2  A using
         (select '$(NO)' NO,'$(NAME)' NAME from dual)B on(A.NO=B.NO)
                   when not matched then  insert (NO,NAME)
                      values(B.NO,B.NAME) when matched then
                      update set A.NAME='Attai' where A.NO='$(NO)'"; 
         String[]rowsAsTokens;
         con=getOracleConnection("localhost","raymedi_hq","raymedi_hq","XE");
         con.setAutoCommit(false);
         stmt=con.createStatement();
         File file = new File(strFilePath);
     Scanner scanner = new Scanner(file);
        try {
            String headerField;
            String header[];
            headerField=scanner.nextLine();
            header=headerField.split("\\?");
            long start=System.currentTimeMillis();
            while(scanner.hasNext()) {      
                String scan[]=scanner.nextLine().split("\\?");
                for(int i=0;i<scan.length;i++){
                    context.put(header[i],scan[i]);
                }
          if(context.size()>0){
                String m=replacingWebMacroStatement(msg,wm,context);
                if(counter>1000){
                    stmt.executeBatch();
                    stmt.clearBatch();
                    counter=0;
                }else{
                    stmt.addBatch(m);
                    counter++;
                }

                  }
        }
    long b=System.currentTimeMillis()-start;
    System.out.println("=======Total Time Taken"+b);
        }catch(Exception e){
            e.printStackTrace();
        }
      finally {
         scanner.close();
       }      
              stmt.executeBatch();
              stmt.clearBatch();
              stmt.close();
        }catch(Exception e){
          e.printStackTrace();
          con.rollback();

      }finally{
          con.commit();

      }

// Method For replace webmacro with $
 public static String replacingWebMacroStatement(String Query, WebMacro wm,Context context) throws Exception {

    Template template = new StringTemplate(wm.getBroker(), Query);
    template.parse();
    String macro_replaced = template.evaluateAsString(context);
    return macro_replaced;
}
// for getting oracle connection
 public static Connection getOracleConnection(String IPaddress,String username,String password,String Tns)throws SQLException{
      Connection connection = null;
      try{
      String baseconnectionurl ="jdbc:oracle:thin:@"+IPaddress+":1521:"+Tns;
      String driver = "oracle.jdbc.driver.OracleDriver";
      String user = username;
      String pass = password;
      Class.forName(driver);
      connection=DriverManager.getConnection(baseconnectionurl,user,pass);
      }catch(Exception e){
       e.printStackTrace();
      }
      return connection;
    }

score 0 · Accepted Answer

我可以告诉你，这段代码在我的机器上平均需要大约 150 毫秒：

    StrTokenizer tokenizer = StrTokenizer.getCSVInstance();
    for (int i=0;i<50000;i++) {
        tokenizer.reset("a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z");
        String toks[] = tokenizer.getTokenArray();
    }

您会在 apache commons-lang 包中找到 StrTokenizer，但我怀疑 String.split()、StringTokenizer 或 Scanner.nextLine() 在任何情况下都会成为您的瓶颈。我会假设这是您的数据库插入时间。

如果是这种情况，您可以做以下两件事中的一件：

调整批量大小。
多线程插入

正如建议的那样，分析器将帮助确定您的时间花在哪里。，

java - 加载 CSV 时的性能调整

1 回答 1

Related

Reference