1

我正在使用下面的代码读取 5gb 的 XML 文件,并使用 spring dataJpa 将该数据处理到数据库中,这只是我们关闭 inpustream 和 xsr 对象的示例逻辑。

 XMLInputFactory xf=XMLInputFactory.newInstance();
 XMLStreamReader xsr=xf.createXMLStreamReader(new InputStreamReader(new FileInputStream("test.xml"))

我已经配置了最大 8GB(即 -xms7000m 和 -xmx8000m)的堆内存但是它在保存数据时遇到了下面的休眠堆问题。它插入了大约 700000 个数据,总共 2100000

[dispatcherServlet]    : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: Java heap space] with root cause

java.lang.OutOfMemoryError: Java heap space
    at java.base/java.util.IdentityHashMap.resize(IdentityHashMap.java:472) ~[na:na]
    at java.base/java.util.IdentityHashMap.put(IdentityHashMap.java:441) ~[na:na]
    at org.hibernate.event.internal.DefaultPersistEventListener.entityIsPersistent(DefaultPersistEventListener.java:159) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.event.internal.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:124) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.internal.SessionImpl$$Lambda$1620/0x00000008010a3040.applyEventToListener(Unknown Source) ~[na:na]
    at org.hibernate.event.service.internal.EventListenerGroupImpl.fireEventOnEachListener(EventListenerGroupImpl.java:113) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.internal.SessionImpl.persistOnFlush(SessionImpl.java:765) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.spi.CascadingActions$8.cascade(CascadingActions.java:341) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.internal.Cascade.cascadeToOne(Cascade.java:492) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.internal.Cascade.cascadeAssociation(Cascade.java:416) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.internal.Cascade.cascadeProperty(Cascade.java:218) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.internal.Cascade.cascadeCollectionElements(Cascade.java:525) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.internal.Cascade.cascadeCollection(Cascade.java:456) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.internal.Cascade.cascadeAssociation(Cascade.java:419) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.internal.Cascade.cascadeProperty(Cascade.java:218) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.internal.Cascade.cascade(Cascade.java:151) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.event.internal.AbstractFlushingEventListener.cascadeOnFlush(AbstractFlushingEventListener.java:158) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.event.internal.AbstractFlushingEventListener.prepareEntityFlushes(AbstractFlushingEventListener.java:148) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.event.internal.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:81) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.event.internal.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:39) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.internal.SessionImpl$$Lambda$1597/0x0000000801076040.accept(Unknown Source) ~[na:na]
    at org.hibernate.event.service.internal.EventListenerGroupImpl.fireEventOnEachListener(EventListenerGroupImpl.java:102) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.internal.SessionImpl.doFlush(SessionImpl.java:1362) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.internal.SessionImpl.managedFlush(SessionImpl.java:453) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.internal.SessionImpl.flushBeforeTransactionCompletion(SessionImpl.java:3212) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.internal.SessionImpl.beforeTransactionCompletion(SessionImpl.java:2380) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.jdbc.internal.JdbcCoordinatorImpl.beforeTransactionCompletion(JdbcCoordinatorImpl.java:447) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl.beforeCompletionCallback(JdbcResourceLocalTransactionCoordinatorImpl.java:183) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl.access$300(JdbcResourceLocalTransactionCoordinatorImpl.java:40) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.commit(JdbcResourceLocalTransactionCoordinatorImpl.java:281) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.hibernate.engine.transaction.internal.TransactionImpl.commit(TransactionImpl.java:101) ~[hibernate-core-5.4.22.Final.jar:5.4.22.Final]
    at org.springframework.orm.jpa.JpaTransactionManager.doCommit(JpaTransactionManager.java:534) ~[spring-orm-5.2.10.RELEASE.jar:5.2.10.RELEASE]

根据上面的跟踪日志,hibernate save casacade 似乎有些问题,但无法弄清楚,下面是用于将数据保存在数据库中的实体类。

@Data
@EqualsAndHashCode
@Builder(toBuilder = true)
@NoArgsConstructor(access = AccessLevel.PRIVATE)
@AllArgsConstructor(access = AccessLevel.PRIVATE)
public class UMEnityPK implements Serializable {
    private static final long serialVersionUID=1L;

    private String batchId;
    private Long batchVersion;
    private BigInteger umId;
}

@Data
@Builder(toBuilder = true)
@NoArgsConstructor(access = AccessLevel.PRIVATE)
@AllArgsConstructor(access = AccessLevel.PRIVATE)
@EqualsAndHashCode(of = {"batchId", "batchVersion", "umId"})
@Entity
@Table(name ="um_base")
@IdClass(UMEnityPK.class)
public class UMBase {

    @Id private String batchId;
    @Id private Long batchVersion;
    @Id private BigInteger umId;

    private String firstName;
    private String lastName;
    private String umType;
    private String umLevel;

    @OneToMany(mappedBy = "umBase", cascade = CascadeType.ALL)
    private List<UMAddress> umAddresses;

    @OneToMany(mappedBy = "umBase", cascade = CascadeType.ALL)
    private List<UMIdentifier> umIdentifiers;

    @OneToOne(mappedBy = "umBase", cascade = CascadeType.ALL)
    private UMHierarchy umHierarchy;
}

@Data
@Builder(toBuilder = true)
@NoArgsConstructor(access = AccessLevel.PRIVATE)
@AllArgsConstructor(access = AccessLevel.PRIVATE)
@EqualsAndHashCode(of = {"id"})
@Entity
@Table(name = "um_identifier")
public class UMIdentifier {
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "um_address")
    @SequenceGenerator(name = "um_address", sequenceName = "SEQ_UM_ADDRESS", allocationSize = 1)
    private Long id;

    private String idValue;
    private String idType;
    private String groupType;

    @ManyToOne
    @JoinColumns({
            @JoinColumn(name = "BATCH_ID", referencedColumnName = "batchId"),
            @JoinColumn(name = "BATCH_VERSION", referencedColumnName = "batchVersion"),
            @JoinColumn(name = "UM_ID", referencedColumnName = "umId")
    })
    private UMBase umBase;
}

@Data
@Builder(toBuilder = true)
@NoArgsConstructor(access = AccessLevel.PRIVATE)
@AllArgsConstructor(access = AccessLevel.PRIVATE)
@EqualsAndHashCode(of = {"batchId", "batchVersion", "umId"})
@Entity
@Table(name ="um_hierarchy")
public class UMHierarchy {

    @Id
    private String batchId;
    @Id private Long batchVersion;
    @Id private BigInteger umId;

    private String hierarchyTpe;
    private String umStatusCode;
    private String immediateParentId;
    private Date hierarchyDate;

    @OneToOne(cascade = CascadeType.ALL)
    @JoinColumns({
            @JoinColumn(name = "BATCH_ID", referencedColumnName = "batchId"),
            @JoinColumn(name = "BATCH_VERSION", referencedColumnName = "batchVersion"),
            @JoinColumn(name = "UM_ID", referencedColumnName = "umId")
    })
    private UMBase umBase;
}

@Data
@Builder(toBuilder = true)
@NoArgsConstructor(access = AccessLevel.PRIVATE)
@AllArgsConstructor(access = AccessLevel.PRIVATE)
@EqualsAndHashCode(of = {"id"})
@Entity
@Table(name = "um_address")
public class UMAddress {
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "um_address")
    @SequenceGenerator(name = "um_address", sequenceName = "SEQ_UM_ADDRESS", allocationSize = 1)
    private Long id;
    private String addressType;
    private String addressLine1;
    private String getAddressLine2;
    private String city;
    private String state;
    private String postalCode;
    private String country;

    @ManyToOne
    @JoinColumns({
            @JoinColumn(name = "BATCH_ID", referencedColumnName = "batchId"),
            @JoinColumn(name = "BATCH_VERSION", referencedColumnName = "batchVersion"),
            @JoinColumn(name = "UM_ID", referencedColumnName = "umId")
    })
    private UMBase umBase;
}

休眠实体映射是否存在占用内存的任何问题

4

2 回答 2

1

After checking the heap dump, the issue is with the org.hibernate.engine.StatefulPersistenceContext -> org.hibernate.util.IdentityMap memory leak , so used the below way and worked fine , create a custom JPARepository and have below sample method logic.

public <S extends T> void saveInBatch(Iterable<S> entities) {

        if (entities == null) {
            return;
        }

        EntityManager entityManager = entityManagerFactory.createEntityManager();
        EntityTransaction entityTransaction = entityManager.getTransaction();

        try {
            entityTransaction.begin();

            int i = 0;
            for (S entity : entities) {
                if (i % batchSize == 0 && i > 0) {
                    entityTransaction.commit();
                    entityTransaction.begin();

                    entityManager.clear();
                }

                entityManager.persist(entity);
                i++;
            }
            entityTransaction.commit();
        } catch (RuntimeException e) {
            if (entityTransaction.isActive()) {
                entityTransaction.rollback();
            }

            throw e;
        } finally {
            entityManager.close();
        }
    }
}
于 2021-07-20T09:05:16.760 回答
0

在处理转换这么大的数据集时,您需要分批进行。从 xml 中读取 100 条记录,将它们转换为实体,使用 保存每条记录em.persist(record),然后调用em.flush()em.clear()从 Hibernate 中删除它们,然后从本地集合中清除它们,然后使用手动调用垃圾收集器System.gc()您甚至可能想要使用本教程中描述的 Hibernate 的批处理。

在伪代码中,这将是:

boolean finished = false;
List<Entity> locals = new ArrayList<>(100);
while (!finished) {
  for (int records = 0; records < 100; records++) {
    Entity ent = readEntityFrom(xml);
    // readEntity function must return null when no more remain to read
    if (ent == null) {
      finished = true;
      break;
    }
    locals.add(ent);
  }
  for (Entity ent : locals) em.persist(ent);
  em.flush(); // send any that are still waiting to the database
  em.clear(); // remove references Hibernate holds to these entities
  locals.clear(); // remove references we hold to these entities
  // now all these entity references are weak and can be garbage-collected
  System.gc(); // purge them from memory
}

此外,您可能希望手动开始并围绕每个插入循环提交事务,以确保数据库没有为您的整个导入保存所有内容,或者它可能会耗尽内存而不是 java 应用程序。

于 2021-07-15T13:38:34.940 回答