1

我只需要修改 Open Office 文件元数据。在不将整个文件加载到内存(file.odt)的情况下如何做到这一点?我只需要使用文件:meta.xml 和标签:...元数据...

我正在使用 Apache ODF Toolkit 0.5-incubating。我的代码加载了 meta.xml 文件,但我无法获取元数据:

OdfPackage pkg = OdfPackage.loadPackage(new File("file.odt"));
Node d = pkg.getDom("meta.xml").getElementsByTagName("office:document-meta").item(0);

for(int i =0; i<d.getAttributes().getLength();i++) {
  String nombre = d.getAttributes().item(i).getNodeName();
  String valor = d.getAttributes().item(i).getNodeValue();
  System.out.println("Clave: " + nombre + " valor: " + valor);
} 
4

2 回答 2

3

如果您想使用多种文件格式,Apache Tika是您的最佳选择。Tika 提供了一个通用接口,用于从大量格式中提取文本和元数据,并向您隐藏不同类型和格式的复杂性。

在命令行上,要从这个示例文件中提取元数据,您需要这样做

java -jar tika-app-1.4.jar --metadata quick.odt

你会得到大量的元数据:

Author: Jesper Steen Møller
Character Count: 43
Content-Length: 7042
Content-Type: application/vnd.oasis.opendocument.text
Creation-Date: 2005-09-06T23:34:00
Edit-Time: PT2M0S
Image-Count: 0
Keywords: Pangram, fox, dog
Last-Modified: 2005-09-06T23:49:00
Last-Save-Date: 2005-09-06T23:49:00
Object-Count: 0
Page-Count: 1
Paragraph-Count: 1
Table-Count: 0
Word-Count: 9
cp:subject: Gym class featuring a brown fox and lazy dog
creator: Jesper Steen Møller
date: 2005-09-06T23:49:00
dc:creator: Jesper Steen Møller
dc:description: Gym class featuring a brown fox and lazy dog
dc:language: en-US
dc:subject: Pangram, fox, dog
dc:title: The quick brown fox jumps over the lazy dog
dcterms:created: 2005-09-06T23:34:00
dcterms:modified: 2005-09-06T23:49:00
description: Gym class featuring a brown fox and lazy dog
editing-cycles: 5
generator: OpenOffice.org/1.9.125$Win32 OpenOffice.org_project/680m125$Build-8947
initial-creator: Nevin Nollop
language: en-US
meta:author: Jesper Steen Møller
meta:character-count: 43
meta:creation-date: 2005-09-06T23:34:00
meta:image-count: 0
meta:initial-author: Nevin Nollop
meta:object-count: 0
meta:page-count: 1
meta:paragraph-count: 1
meta:save-date: 2005-09-06T23:49:00
meta:table-count: 0
meta:word-count: 9
modified: 2005-09-06T23:49:00
nbCharacter: 43
nbImg: 0
nbObject: 0
nbPage: 1
nbPara: 1
nbTab: 0
nbWord: 9
resourceName: quick.odt
subject: Gym class featuring a brown fox and lazy dog
title: The quick brown fox jumps over the lazy dog
xmpTPg:NPages: 1

从 Java 中,您可以通过以下简单的方式获得相同的结果

TikaConfig tika = TikaConfig.getDefaultConfig();
Metadata metadata = new Metadata();
ParseContext context = new ParseContext();

InputStream input = TikaInputStream.get(new File("test.ods"));

tika.getParser().parse(input, null, metadata, context);

你会得到Metadata 对象的元数据

于 2013-09-10T20:23:24.607 回答
0

您可以使用 org.odftoolkit 提供的 OdfDocument 包。您可以在此处获取依赖项 => https://mvnrepository.com/artifact/org.odftoolkit/odfdom-java

您可以解析您的文档

OdfDocument odfDocument = OdfDocument.loadDocument(new URL(URLPath).openStream());

并获取元数据,例如

wordCount = odfDocument.getOfficeMetadata().getDocumentStatistic().getWordCount();
于 2020-01-10T14:37:40.747 回答