我在使用嵌入式 firebird 数据库引擎和 Java SE 时遇到了一个大问题。我目前正在开发一种过滤工具,供用户过滤数据。所以我做了两个过滤选项,用户可以选择一个或两个:
- 从黑名单中过滤(黑名单由用户控制)。
- 根据记录上传和过滤的每条记录的海量列表进行过滤。
用户上传的数据以纯文本逗号或令牌分隔,如下所示:
(SET OF COLUMNS)| RECORD TO FILTER |
0-MANY COLUMNS | ABC2 |
0-MANY COLUMNS | ABC5 |
当我将它上传到数据库时,我为每个过滤器添加一个标志
(SET OF COLUMNS) | RECORD TO FILTER | FLAG FOR FIlTER A | FLAG FOR FILTER B |
0-MANY COLUMNS | ABC2 | | |
0-MANY COLUMNS | ABC5 | | |
因此,当涉及到第二个过滤器时,程序在第一次运行软件时有一个主空表,然后它用第一次上传的所有记录填充该表。在用户进行几次文本上传后,主表将具有如下表的唯一记录:
Record | Date criteria for filtering |
ABC1 | 08/11/2012:1,07/11/2012:3,06/11/2012:5|
ABC2 | 05/11/2012:1,04/11/2012:0,03/11/2012:0|
ABC3 | 12/11/2012:3,11/11/2012:0,10/11/2012:0|
ABC4 | 12/11/2012:1,11/11/2012:0,10/11/2012:0|
ABC5 | 12/11/2012:3,11/11/2012:0,10/11/2012:3|
ABC9 | 11/11/2012:3,10/11/2012:1,09/11/2012:0|
例如,在处理数据时,软件会同时更新主表和用户表:
(SET OF COLUMNS| RECORD TO FILTER | FLAG FOR FIlTER A | FLAG FOR FILTER B |
0-MANY COLUMNS | ABC4 | | |
0-MANY COLUMNS | ABC9 | | |
所以主表会更新:
Record | Day criteria for filtering |
ABC1 | 08/11/2012:1,07/11/2012:3,06/11/2012:5|
ABC2 | 05/11/2012:1,04/11/2012:0,03/11/2012:0|
ABC3 | 12/11/2012:3,11/11/2012:0,10/11/2012:0|
ABC4 | 12/11/2012:1,11/11/2012:0,10/11/2012:0| ->12/11/2012:2,11/11/2012:0,10/11/2012:0
ABC5 | 12/11/2012:3,11/11/2012:0,10/11/2012:3|
ABC9 | 11/11/2012:3,10/11/2012:1,09/11/2012:0| ->12/11/2012:1,11/11/2012:3,10/11/2012:1
如果在过去三天内数据条件事件达到四个以上,用户表将标记过滤器 B。请注意,每个日期旁边都有一个整数。
(SET OF COLUMNS)| RECORD TO FILTER | FLAG FOR FIlTER A | FLAG FOR FILTER B |
0-MANY COLUMNS | ABC4 | | |
0-MANY COLUMNS | ABC9 | | X |
两次更新都在一个事务中,问题是当用户上传超过 800,000 条记录时,我的程序在 while 循环中抛出以下异常。我使用 StringBuilder 解析和附加方法来获得可变天数字符串的最大性能。
java.lang.OutOfMemoryError:Java 堆空间
这是我的代码,我用了五天:
FactoriaDeDatos factoryInstace = FactoriaDeDatos.getInstance();
Connection cnx = factoryInstace.getConnection();
cnx.setAutoCommit(false);
PreparedStatement pstmt = null;
ResultSet rs=null;
pstmt = cnx.prepareStatement("SELECT CM.MAIL,CM.FECHAS FROM TCOMERCIALMAIL CM INNER JOIN TEMPMAIL TMP ON CM.MAIL=TMP."+colEmail);
rs=pstmt.executeQuery();
pstmtDet = cnx.prepareStatement("ALTER INDEX IDX_MAIL INACTIVE");
pstmtDet.executeUpdate();
pstmtDet = cnx.prepareStatement("SET STATISTICS INDEX IDX_FECHAS");
pstmtDet.executeUpdate();
pstmtDet = cnx.prepareStatement("ALTER INDEX IDX_FECHAS INACTIVE");
pstmtDet.executeUpdate();
pstmtDet = cnx.prepareStatement("SET STATISTICS INDEX IDX_FECHAS");
pstmtDet.executeUpdate();
sql_com_local_tranx=0;
int trxNum=0;
int ix=0;
int ixE1=0;
int ixAc=0;
StringBuilder sb;
StringTokenizer st;
String fechas;
int pos1,pos2,pos3,pos4,pos5,pos6,pos7,pos8,pos9;
StringBuilder s1,s2,sSQL,s4,s5,s6,s7,s8,s9,s10;
long startLoop = System.nanoTime();
long time2 ;
boolean ejecutoMax=false;
//int paginador_sql=1000;
//int trx_ejecutada=0;
sb=new StringBuilder();
s1=new StringBuilder();
s2=new StringBuilder();
sSQL=new StringBuilder();
s4=new StringBuilder();
s6=new StringBuilder();
s8=new StringBuilder();
s10=new StringBuilder();
while(rs.next()){
//De aqui
actConteoDia=0;
sb.setLength(0);
sb.append(rs.getString(2));
pos1= sb.indexOf(":",0);
pos2= sb.indexOf(",",pos1+1);
pos3= sb.indexOf(":",pos2+1);
pos4= sb.indexOf(",",pos3+1);
pos5= sb.indexOf(":",pos4+1);
pos6= sb.indexOf(",",pos5+1);
pos7= sb.indexOf(":",pos6+1);
pos8= sb.indexOf(",",pos7+1);
pos9= sb.indexOf(":",pos8+1);
s1.setLength(0);
s1.append(sb.substring(0, pos1));
s2.setLength(0);
s2.append(sb.substring(pos1+1, pos2));
s4.setLength(0);
s4.append(sb.substring(pos3+1, pos4));
s6.setLength(0);
s6.append(sb.substring(pos5+1, pos6));
s8.setLength(0);
s8.append(sb.substring(pos7+1, pos8));
s10.setLength(0);
s10.append(sb.substring(pos9+1));
actConteoDia=Integer.parseInt(s2.toString());
actConteoDia++;
sb.setLength(0);
//sb.append(s1).a
if(actConteoDia>MAXIMO_LIMITE_POR_SEMANA){
actConteoDia=MAXIMO_LIMITE_POR_SEMANA+1;
}
sb.append(s1).append(":").append(actConteoDia).append(",").append(rs.getString(2).substring(pos2+1, rs.getString(2).length()));
//For every date record it takes aprox 8.3 milisec by record
sSQL.setLength(0);
sSQL.append("UPDATE TCOMERCIALMAIL SET FECHAS='").append(sb.toString()).append("' WHERE MAIL='").append(rs.getString(1)).append("'");
pstmtDet1.addBatch(sSQL.toString());
//actConteoDia=0;
//actConteoDia+=Integer.parseInt(s2.toString());
actConteoDia+=Integer.parseInt(s4.toString());
actConteoDia+=Integer.parseInt(s6.toString());
actConteoDia+=Integer.parseInt(s8.toString());
actConteoDia+=Integer.parseInt(s10.toString());
if(actConteoDia>MAXIMO_LIMITE_POR_SEMANA){
sSQL.setLength(0);
sSQL.append("UPDATE TEMPMAIL SET DIASLIMITE='S' WHERE ").append(colEmail).append("='").append(rs.getString(1)).append("'");
pstmtDet.addBatch(sSQL.toString());
}
sql_com_local_tranx++;
if(sql_com_local_tranx%2000==0 || sql_com_local_tranx%7000==0 ){
brDias.setString("PROCESANDO "+sql_com_local_tranx);
pstmtDet1.executeBatch();
pstmtDet.executeBatch();
}
if(sql_com_local_tranx%100000==0){
System.gc();
System.runFinalization();
}
}
pstmtDet1.executeBatch();
pstmtDet.executeBatch();
cnx.commit();
我已经进行了遥测测试,所以我可以追踪问题所在。我认为这是个大问题,但我不知道问题究竟出在哪里。我正在添加一些遥测测试的图像,我需要正确解释它们。
gc 与 jvm 保持对象存活的时间成反比:
http://imageshack.us/photo/my-images/849/66780403.png
内存堆从 50 MB 变为 250 MB,使用的堆达到 250 MB,从而产生 outOfMemory 异常:
50 MB
http://imageshack.us/photo/my-images/94/52169259.png
达到 250 MB
http://imageshack.us/photo/my-images/706/91313357.png
内存
不足 http://imageshack.us/photo/my-images/825/79083069.png
LiveBytes 排序生成的最终对象堆栈:
http://imageshack.us/photo/my-images/546/95529690.png
任何帮助,建议,答案将不胜感激。