java - 如何使用 JPA 将大型 Blob 从数据库流式传输到应用程序？

Question

我有 JPA 一个实体类，其中包含一个像这样的 blob 字段：

@Entity
public class Report {
    private Long id;
    private byte[] content;

    @Id
    @Column(name = "report_id")
    @SequenceGenerator(name = "REPORT_ID_GENERATOR", sequenceName = "report_sequence_id", allocationSize = 1)
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "REPORT_ID_GENERATOR")
    public Long getId() {
        return id;
    }
    public void setId(Long id) {
        this.id = id;
    }
    @Lob
    @Column(name = "content")
    public byte[] getContent() {
        return content;
    }

    public void setContent(byte[] content) {
        this.content = content;
    }
}

我在数据库中的记录中插入了一些大数据（超过 3 个演出）（使用 DBMS 过程）。

应用程序用户应该能够下载这些记录的内容，因此我实现了一种将获取的结果流式传输到客户端浏览器的方法。

问题是，由于 JPQL 选择查询倾向于首先从 DB 中获取整个对象，然后将其提供给应用程序，因此每当我尝试使用 JPA 访问此记录时，我都无法分配足够的内存异常。

我已经看到了一些使用 JDBC 连接尝试从数据库流式传输数据的解决方案，但我无法找到任何 JPA 特定的解决方案。

有没有人知道如何解决我应该解决这个问题？

score 3 · Accepted Answer

这是一个较晚的答案，但对于那些仍在寻找解决方案的人，我发现了 Thorben Janssen 的一篇关于 Java 博客思想的好文章。缺点，它是特定于 Hibernate 的，但你似乎仍然在使用它。基本上解决方案是在您的实体中使用 java.sql.Blob 数据类型属性

@Entity
public class Book {

    @Id
    @GeneratedValue
    private Long id;

    private String title;

    @Lob
    private Clob content;

    @Lob
    private Blob cover;

    ...
}

然后你使用 Hibernate 的 BlobProxy，它提供了一个 OutputStream。但是看看这里的文章

score 2 · Accepted Answer

我通过以下方式解决了这个问题，注意这个解决方案可能只适用于 JPA 的休眠实现。

首先，我从实体管理器那里获得了一个休眠会话
然后我创建了一个准备好的语句，用于使用从会话中提取的连接来选择 blob
然后我从准备好的语句的结果集中生成了一个输入流。

这是用于流式传输内容的 DAO 类：

@Repository
public class ReportDAO{

private static final Logger logger = LoggerFactory.getLogger(ReportDAO.class);

@PersistenceContext
private  EntityManager entityManager; 

//---streamToWrite is the stream that we used to deliver the content to client
public void streamReportContent(final Long id, final OutputStream streamToWrite) {
        try{
            entityManager=entityManager.getEntityManagerFactory().createEntityManager();
            Session session = entityManager.unwrap(Session.class);
            session.doWork(new Work() {
                @Override
                public void execute(Connection connection) throws SQLException
                {
                    PreparedStatement stmt=connection.prepareStatement("SELECT content FROM report where id=?");
                    stmt.setLong(1,id);
                    ResultSet rs = stmt.executeQuery();
                    rs.next();
                    if(rs != null)
                    {
                        Blob blob = rs.getBlob(1);
                        InputStream input = blob.getBinaryStream();
                        byte[] buffer = new byte[1024];

                        try {
                            while (input.read(buffer) > 0) {
                                String str = new String(buffer, StandardCharsets.UTF_8);
                                streamToWrite.write(buffer);
                            }

                            input.close();

                        } catch (IOException e) {
                            logger.error("Failure in streaming report", e);
                        }



                        rs.close();
                    }

                }
            });
        }
        catch (Exception e){
            logger.error("A problem happened during the streaming problem", e);
        }
}

score 0 · Accepted Answer

由于您使用关系数据库将大型（千兆字节）数据文件作为 BLOB 存储在数据库中，这不是一个好的做法。相反，通常的做法是数据本身以文件的形式存储在服务器上（可能是 FTP），而关于此的元数据（文件的路径以及服务器等）存储在数据库列中。在这种情况下，将这些数据流式传输到客户端变得更加容易。

score 0 · Accepted Answer

您应该看看社区项目Spring Content。该项目为您提供了一种类似于 Spring Data 的内容方法。它是非结构化数据（文档、图像、视频等），Spring Data 是结构化数据。您可以使用以下内容添加它：-

pom.xml（也可以使用 Spring Boot 启动器）

   <!-- Java API -->
   <dependency>          
      <groupId>com.github.paulcwarren</groupId>
      <artifactId>spring-content-jpa</artifactId>
      <version>0.9.0</version>
   </dependency>
   <!-- REST API -->
   <dependency>
      <groupId>com.github.paulcwarren</groupId>
      <artifactId>spring-content-rest</artifactId>
      <version>0.9.0</version>
   </dependency>

配置

@Configuration
@EnableJpaStores
@Import("org.springframework.content.rest.config.RestConfiguration.class") <!-- enables REST API)
public class ContentConfig {

   <!-- specify the resource specific to your database --> 
   @Value("/org/springframework/content/jpa/schema-drop-h2.sql")
   private ClasspathResource dropBlobTables;

   <!-- specify the resource specific to your database --> 
   @Value("/org/springframework/content/jpa/schema-h2.sql")
   private ClasspathResource createBlobTables;

   @Bean
   DataSourceInitializer datasourceInitializer() {
     ResourceDatabasePopulator databasePopulator =
            new ResourceDatabasePopulator();

     databasePopulator.addScript(dropBlobTables);
     databasePopulator.addScript(createBlobTables);
     databasePopulator.setIgnoreFailedDrops(true);

     DataSourceInitializer initializer = new DataSourceInitializer();
     initializer.setDataSource(dataSource());
     initializer.setDatabasePopulator(databasePopulator);

     return initializer;
   }
}

注意：如果您使用 Spring Boot 启动器，则不需要此配置。

要关联内容，请将 Spring Content 注释添加到您的帐户实体。

Example.java

@Entity
public class Report {

   // replace @Lob field with:

   @ContentId
   private String contentId;

   @ContentLength
   private long contentLength = 0L;

   // if you have rest endpoints
   @MimeType
   private String mimeType = "text/plain";

创建一个“商店”：

ExampleStore.java

@StoreRestResource(path="reportContent")
public interface ReportContentStore extends ContentStore<Report, String> {
}

这就是创建 REST 端点 @ 所需的全部内容/reportContent。当您的应用程序启动时，Spring Content 将查看您的依赖项（查看 Spring Content JPA/REST），查看您的ReportContentStore接口并为 JPA 注入该接口的实现。它还将注入一个@Controller将http请求转发到该实现的。这使您不必自己实现任何这些。

所以...

curl -X POST /reportsContent/{reportId}-F 'data=@path/to/local/file'</p>

会将的内容存储path/to/local/file在数据库中，并将其与 id 为的报表实体相关联reportId。

curl /reportContent/{reportId}

将再次获取它等等...支持完整的 CRUD。

这里有一些入门指南和视频。参考指南在这里。

高温高压

score 0 · Accepted Answer

我有一个像你这样的类似问题，我需要在一个字段中存储一个 JSON，所以当我使用 BLOB 时，我给自己造成了很多不必要的头痛。您正在将 blob 用于内容类型的数据，我恭敬地建议您将 CLOB 用于字符格式的数据。

总结一下我的答案，如果您使用的是ORACLE数据库（这是一个总是会导致说其语言出现问题的数据库），请使用以下格式作为指南或最佳实践，它基于 oracle 文档本身来解决您的问题：

@Lob @Basic(fetch=LAZY)
@Column(name="REPORT")
protected String report;

祝你好运！

score -2 · Accepted Answer

也许您可以使用压缩算法（例如有损和无损压缩、Huffman、facebook 的 Zstandard）压缩文件，然后将其存储在您的数据库中，并通过解压缩它们来检索。

java - 如何使用 JPA 将大型 Blob 从数据库流式传输到应用程序？

6 回答 6

Related

Reference