1

这个问题真的让我抓狂

回答大多数人的想法:是的,我将 snowball.jar 添加到 CLASSPATH

我有一个简单的主类,应该将“going”这个词改为“go”:

import weka.core.stemmers.SnowballStemmer;

public class StemmerTest {
    public static void main(String[] args) {
        SnowballStemmer stemmer = new SnowballStemmer();
        stemmer.setStemmer("english");
        System.out.println(stemmer.stem("going"));
    }
}

首先,当我在 Eclipse 中运行它时,它可以工作,我得到以下输出:

Refreshing GOE props...
---Registering Weka Editors---
Trying to add database driver (JDBC): RmiJdbc.RJDriver - Warning, not in CLASSPATH?
Trying to add database driver (JDBC): jdbc.idbDriver - Warning, not in CLASSPATH?
Trying to add database driver (JDBC): org.gjt.mm.mysql.Driver - Warning, not in CLASSPATH?
Trying to add database driver (JDBC): com.mckoi.JDBCDriver - Warning, not in CLASSPATH?
Trying to add database driver (JDBC): org.hsqldb.jdbcDriver - Warning, not in CLASSPATH?
[KnowledgeFlow] Loading properties and plugins...
[KnowledgeFlow] Initializing KF...
go

但是,当我从 eclipse“stem.jar”将它导出为可运行 jar 并在终端“java -jar stem.jar”中执行它时,它不起作用,我得到以下输出:

Refreshing GOE props...
[KnowledgeFlow] Loading properties and plugins...
[KnowledgeFlow] Initializing KF...
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
going

我不知道为什么在导出的 jar 中无法识别 snowball.jar ...虽然 weka.jar 和 snowball.jar 都包含在导出的 jar 中。这是 stem.jar 文件结构:

stem.jar
       |
       |---META-INF
       |---org
       |---StemmerTest.class
       |---snowball.jar
       |---weka.jar

我将不胜感激任何有关问题的帮助

编辑 1: 生成的 ANT 脚本:

<project default="create_run_jar" name="Create Runnable Jar for Project StemmerTest with Jar-in-Jar Loader">
<!--this file was created by Eclipse Runnable JAR Export Wizard-->
<!--ANT 1.7 is required                                        -->
<target name="create_run_jar">
    <jar destfile="stem.jar">
        <manifest>
            <attribute name="Main-Class" value="org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader"/>
            <attribute name="Rsrc-Main-Class" value="StemmerTest"/>
            <attribute name="Class-Path" value="."/>
            <attribute name="Rsrc-Class-Path" value="./ snowball-2012.jar weka.jar snowball.jar"/>
        </manifest>
        <zipfileset src="jar-in-jar-loader.zip"/>
        <zipfileset dir="resources/lib" includes="snowball-2012.jar"/>
        <fileset dir="bin"/>
        <zipfileset dir="." includes="weka.jar"/>
        <zipfileset dir="." includes="snowball.jar"/>
    </jar>
</target>

编辑2:

这是所要求的 MANIFEST.MF 的内容。

Manifest-Version: 1.0
Ant-Version: Apache Ant 1.7.1
Created-By: 23.25-b01 (Oracle Corporation)
Main-Class: org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader
Rsrc-Main-Class: StemmerTest
Rsrc-Class-Path: ./ weka.jar snowball.jar
Class-Path: .

提前致谢, TeFa

4

4 回答 4

2

Although it is not clear for me, I managed to solve this annoying problem (after ~10 hours -.-) by doing the following:-

  • Using "zipgroupfileset" instead of "fileset" for "snowball.jar" to flatten the content in the generated jar file.

  • Exclude "snowball.jar" from the classpath (Since its already included in the generated jar file).

For some UNKNOWN reason, the snowball wrapper in weka.jar couldn't find snowball.jar until its flattened (extracted).

Here is the ant script that works for me:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<project default="jar">
    <path id="dep.runtime">
        <fileset dir="./libs">
            <include name="**/*.jar" />
            <exclude name="**/snowball.jar"/>
        </fileset>
    </path>

    <manifestclasspath property="manifest_cp" jarfile="stem.jar">
        <classpath refid="dep.runtime" />
    </manifestclasspath>

    <target name="jar">
        <jar destfile="stem.jar">
            <manifest>
                <attribute name="Main-Class" value="StemmerTest"/>
                <attribute name="Class-Path" value="${manifest_cp}"/>
            </manifest>
            <zipgroupfileset dir="./libs" includes="snowball.jar"/>
            <fileset dir="bin"/>
        </jar>
    </target>
</project>

Hope this helps if someone is using snowball stemmer.

于 2013-06-22T06:16:20.573 回答
0

我遵循了这种方法,它已经奏效了。我的 IDE 是 NetBeans。我已经从这里下载了 jar 。它是 Snowball 词干分析器标题下的第二个选项。我已将其添加到我的类路径中,并使用以下代码将词干分析器添加到过滤器中。

SnowballStemmer stemmer = new SnowballStemmer();
stemmer.setStemmer("english");
StringToWordVector filter = new StringToWordVector();
filter.setStemmer(stemmer);
于 2014-11-02T05:25:50.087 回答
0

我是在 1 小时的测试后完成的,因为 wiki 上没有关于这件事的任何内容。解决方案是这样的:

SnowballStemmer stemmer = new SnowballStemmer();
stemmer.setStemmer("English");
StringToWordVector STWfilter = new StringToWordVector(1000);
STWfilter.setUseStoplist(true);
STWfilter.setIDFTransform(true);
STWfilter.setTFTransform(true);
STWfilter.setNormalizeDocLength(new SelectedTag(StringToWordVector.FILTER_NORMALIZE_ALL, StringToWordVector.TAGS_FILTER));
STWfilter.setOutputWordCounts(true);
STWfilter.setStemmer(stemmer);
STWfilter.setInputFormat(train);

我发布了整个示例,以便为您节省我花在正确方法上的 1 小时。

于 2013-11-19T19:48:07.120 回答
0

我在使用多线程的 Snowball 时遇到了同样的问题。我是这样解决的:

SnowballStemmer st = new SnowballStemmer();
do{
            //wait until the German stemmer is initialized
}while(!st.stemmerTipText().contains("german"));
st.setStemmer("german");
filter.setStemmer(st);

错误消息“Stemmer 'porter' 未知!” 会留下来,但它会正确设置即德语词干分析器。

于 2014-05-05T07:56:08.540 回答