2
    <dependency>
                    <groupId>org.apache.tika</groupId>
                    <artifactId>tika-parsers</artifactId>
                    <version>0.9</version>
                </dependency>

我试图添加这个低于依赖而不是高于依赖 tika 以覆盖 Tika 对 PDFBOX 1.6.0 的依赖但它不起作用..

<dependency>
                <groupId>org.apache.tika</groupId>
                <artifactId>tika-parsers</artifactId>
                <version>0.9</version>
    <exclusions> 
    <exclusion>
    <groupId>org.apache.pdfbox</groupId>
          <artifactId>pdfbox</artifactId>
          </exclusion>
    </exclusions>
    </dependency> 
    <dependency>
    <groupId>org.apache.pdfbox</groupId>
              <artifactId>pdfbox</artifactId>
              <version>1.6.0</version>
    </dependency>

Tika Parser 依赖于 PdfBox 1.4.0 版。我想将 Apache Tika 的这种依赖关系更改为 PdfBox 版本 1.6.0。如何在我的 Pom.xml 文件中执行此操作。这是我的 pom.xml 文件。任何建议将不胜感激。

    <   project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
                <modelVersion>4.0.0</modelVersion>

                <groupId>com.xyz.search</groupId>
                <artifactId>xyzz-crawler4j</artifactId>
                <version>0.0.1-SNAPSHOT</version>
                <packaging>jar</packaging>

                <name>qcom-crawler4j</name>
                <url>http://maven.apache.org</url>

                <properties>
                    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
                </properties>

                <repositories>
                    <repository>
                        <id>repo-for-dsiutils</id>
                        <url>http://ir.dcs.gla.ac.uk/~bpiwowar/maven/</url>
                    </repository>
            <repository>
                    <id>JBoss</id>
                    <name>jboss-maven2-release-repository</name>
                    <url>https://oss.sonatype.org/content/repositories/JBoss</url>
                  </repository>
                    <repository>
                        <id>oracle</id>
                        <url>http://download.oracle.com/maven</url>
                    </repository>

                    <repository>
                        <id>boilerpipe</id>
                        <url>http://boilerpipe.googlecode.com/svn/repo/</url>
                    </repository>
                </repositories>

                <dependencies>

                    <dependency>
                        <groupId>org.apache.httpcomponents</groupId>
                        <artifactId>httpclient</artifactId>
                        <version>4.0.1</version>
                        <!-- 4.1.1 -->
                    </dependency>

//PDFBOX version 1.6.0
                        <dependency>
                      <groupId>org.apache.pdfbox</groupId>
                      <artifactId>pdfbox</artifactId>
                      <version>1.6.0</version>
                    </dependency>

                    <dependency>
                        <groupId>org.apache.httpcomponents</groupId>
                        <artifactId>httpcore</artifactId>
                        <version>4.0.1</version>
                    </dependency>
                    <!-- 4.1 -->

                    <dependency>
                        <groupId>it.unimi.dsi</groupId>
                        <artifactId>fastutil</artifactId>
                        <version>6.2.2</version>
                    </dependency>


                    <dependency>
                        <groupId>com.sleepycat</groupId>
                        <artifactId>je</artifactId>
                        <version>4.0.71</version>
                    </dependency>

                    <!-- Boilerpipe -->
                    <dependency>
                        <groupId>de.l3s.boilerpipe</groupId>
                        <artifactId>boilerpipe</artifactId>
                        <version>1.2.0</version>
                    </dependency>
                    <!-- Tika (for non-HTML extractions) -->
                    <dependency>
                        <groupId>org.apache.tika</groupId>
                        <artifactId>tika-core</artifactId>
                        <version>0.9</version>
                    </dependency>

                <dependency>
               <groupId>xerces</groupId>
               <artifactId>xercesImpl</artifactId>
               <version>2.8.1</version>
            </dependency>

            <dependency>
                    <groupId>nekohtml</groupId>
                    <artifactId>nekohtml</artifactId>
                    <version>0.6.5</version>
                  </dependency>


                    <dependency>
                        <groupId>org.apache.tika</groupId>
                        <artifactId>tika-parsers</artifactId>
                        <version>0.9</version>
                    </dependency>
    **// I was trying to add this below dependency instead of just above dependency of tika to override the dependency of Tika to PDFBOX 1.6.0 But its not working..

     <!--   <dependency>
                    <groupId>org.apache.tika</groupId>
                    <artifactId>tika-parsers</artifactId>
                    <version>0.9</version>
        <exclusions> 
        <exclusion>
        <groupId>org.apache.pdfbox</groupId>
              <artifactId>pdfbox</artifactId>
              </exclusion>
        </exclusions>
        </dependency> 
        <dependency>
        <groupId>org.apache.pdfbox</groupId>
                  <artifactId>pdfbox</artifactId>
                  <version>1.6.0</version>
        </dependency>
    -->**


                </dependencies>
            </project>
4

1 回答 1

4

最干净的方法可能是添加一个dependencyManagement 部分,用于升级您的依赖关系树中的PDFBox 版本。例如:

<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>org.apache.pdfbox</groupId>
      <artifactId>pdfbox</artifactId>
      <version>1.6.0</version>
    </dependency>
  </dependencies>
</dependencyManagement>

请注意,许多 Tika 解析器与上游解析器库(如 PDFBox)的特定版本紧密相关,因此如果您像这样覆盖依赖版本,则需要很好地测试系统。

强制更改依赖项版本的另一种方法是使用最新的 Tika 主干版本,其中 PDFBox 依赖项已经是 1.6.0 版。此外,将使用更新的依赖项的 Tika 0.10 版本应该已经在下周初发布。

于 2011-09-22T09:59:34.000 回答