12

我正在尝试在 Java 中运行 Mallet 并收到以下错误。

Couldn't open cc.mallet.util.MalletLogger resources/logging.properties file.
Perhaps the 'resources' directories weren't copied into the 'class' directory.
Continuing.

我正在尝试从 Mallet 的网站 ( http://mallet.cs.umass.edu/topics-devel.php ) 运行示例。下面是我的代码。任何帮助表示赞赏。

package scriptAnalyzer;

import cc.mallet.util.*;
import cc.mallet.types.*;
import cc.mallet.pipe.*;
import cc.mallet.pipe.iterator.*;
import cc.mallet.topics.*;

import java.util.*;
import java.util.regex.*;
import java.io.*;

public class Mallet {

    public static void main(String[] args) throws Exception {

        String filePath = "C:/mallet/ap.txt";
        // Begin by importing documents from text to feature sequences
        ArrayList<Pipe> pipeList = new ArrayList<Pipe>();

        // Pipes: lowercase, tokenize, remove stopwords, map to features
        pipeList.add( new CharSequenceLowercase() );
        pipeList.add( new CharSequence2TokenSequence(Pattern.compile("\\p{L}[\\p{L}\\p{P}]+\\p{L}")) );
        pipeList.add( new TokenSequenceRemoveStopwords(new File("stoplists/en.txt"), "UTF-8", false, false, false) );
        pipeList.add( new TokenSequence2FeatureSequence() );

        InstanceList instances = new InstanceList (new SerialPipes(pipeList));

        Reader fileReader = new InputStreamReader(new FileInputStream(new File(filePath)), "UTF-8");
        instances.addThruPipe(new CsvIterator (fileReader, Pattern.compile("^(\\S*)[\\s,]*(\\S*)[\\s,]*(.*)$"),
                                               3, 2, 1)); // data, label, name fields

        // Create a model with 100 topics, alpha_t = 0.01, beta_w = 0.01
        //  Note that the first parameter is passed as the sum over topics, while
        //  the second is the parameter for a single dimension of the Dirichlet prior.
        int numTopics = 5;
        ParallelTopicModel model = new ParallelTopicModel(numTopics, 1.0, 0.01);

        model.addInstances(instances);

        // Use two parallel samplers, which each look at one half the corpus and combine
        //  statistics after every iteration.
        model.setNumThreads(2);

        // Run the model for 50 iterations and stop (this is for testing only, 
        //  for real applications, use 1000 to 2000 iterations)
        model.setNumIterations(50);
        model.estimate();

        // Show the words and topics in the first instance

        // The data alphabet maps word IDs to strings
        Alphabet dataAlphabet = instances.getDataAlphabet();

        FeatureSequence tokens = (FeatureSequence) model.getData().get(0).instance.getData();
        LabelSequence topics = model.getData().get(0).topicSequence;

        Formatter out = new Formatter(new StringBuilder(), Locale.US);
        for (int position = 0; position < tokens.getLength(); position++) {
            out.format("%s-%d ", dataAlphabet.lookupObject(tokens.getIndexAtPosition(position)), topics.getIndexAtPosition(position));
        }
        System.out.println(out);

        // Estimate the topic distribution of the first instance, 
        //  given the current Gibbs state.
        double[] topicDistribution = model.getTopicProbabilities(0);

        // Get an array of sorted sets of word ID/count pairs
        ArrayList<TreeSet<IDSorter>> topicSortedWords = model.getSortedWords();

        // Show top 5 words in topics with proportions for the first document
        for (int topic = 0; topic < numTopics; topic++) {
            Iterator<IDSorter> iterator = topicSortedWords.get(topic).iterator();

            out = new Formatter(new StringBuilder(), Locale.US);
            out.format("%d\t%.3f\t", topic, topicDistribution[topic]);
            int rank = 0;
            while (iterator.hasNext() && rank < 5) {
                IDSorter idCountPair = iterator.next();
                out.format("%s (%.0f) ", dataAlphabet.lookupObject(idCountPair.getID()), idCountPair.getWeight());
                rank++;
            }
            System.out.println(out);
        }

        // Create a new instance with high probability of topic 0
        StringBuilder topicZeroText = new StringBuilder();
        Iterator<IDSorter> iterator = topicSortedWords.get(0).iterator();

        int rank = 0;
        while (iterator.hasNext() && rank < 5) {
            IDSorter idCountPair = iterator.next();
            topicZeroText.append(dataAlphabet.lookupObject(idCountPair.getID()) + " ");
            rank++;
        }

        // Create a new instance named "test instance" with empty target and source fields.
        InstanceList testing = new InstanceList(instances.getPipe());
        testing.addThruPipe(new Instance(topicZeroText.toString(), null, "test instance", null));

        TopicInferencer inferencer = model.getInferencer();
        double[] testProbabilities = inferencer.getSampledDistribution(testing.get(0), 10, 1, 5);
        System.out.println("0\t" + testProbabilities[0]);
    }

}
4

4 回答 4

10

如果系统属性中未指定日志文件,Mallet 会查找日志文件。如果您使用 Maven,最简单的解决方法是将文件放入

src/main/resources/cc/mallet/util/resources/logging.properties 

这将自动将它复制到具有标准 Maven 构建过程的一部分到:

target/classes/cc/mallet/util/resources/logging.properties 

所以你不需要任何特殊的配置。该文件可以为空,但在逻辑上故意将其省略,因此您可以配置自己的日志记录。

于 2015-10-16T08:15:32.320 回答
7

对于使用 Maven 并尝试配置 Mallet 日志记录的其他任何人,请尝试以下操作:

在 .创建一个新的文本文件src/mallet_resources/logging.properties。它实际上不需要指定任何东西;一个空文件足以让 Mallet 关闭。

然后修改您的pom.xml文件以确保该文件被复制到另一个答案中提到的位置。为此,请在该<build><plugins>部分中添加:

<!--Mallet logging is horrifically verbose, and has not easy to configure-->
<!--We have to use this complicated process to copy the logging.properties file to the right location -->
<plugin>
    <artifactId>maven-resources-plugin</artifactId>
    <version>2.6</version>
    <executions>
        <execution>
            <id>copy-resources</id>
            <phase>validate</phase>
            <goals>
                <goal>copy-resources</goal>
            </goals>
            <configuration>
                <outputDirectory>
                    ${basedir}/target/classes/cc/mallet/util/resources
                </outputDirectory>
                <resources>
                    <resource>
                        <directory>src/mallet-resources</directory>
                        <filtering>true</filtering>
                    </resource>
                </resources>
            </configuration>
        </execution>
    </executions>
</plugin>
于 2014-08-19T00:34:23.057 回答
5

如果您尝试通过下载版本 2.0.8-SNAPSHOT ( https://github.com/mimno/Mallet ) 或通过获取当前最新的 maven 版本 (2.0.7) 来运行 Mallet,您将收到此错误。

原因是 Mallet 期望文件 logging.properties 在创建的target\classes\cc\mallet\util\resources文件夹中。使用 maven 构建项目时,不会创建此文件,因此此异常发生在MalletLogger.java.

有人应该正确配置 maven,以便在目标文件夹中创建 logging.properties 文件。一个临时解决方案是修改 Mallet 代码,为logging.properties.

于 2014-07-11T13:19:29.740 回答
0

关于“无法打开 edu.umass.cs.mallet.base.util.MalletLogger 资源/logging.properties 文件”错误,在 BANNER 命名实体中运行 run.sh(或其他脚本或命令)时遇到(例如)识别(使用 MALLET)。

解决方案:

复制“logging.properties”从

src/main/java/edu/umass/cs/mallet/base/util/resources/logging.properties

目标/scala-2.11/classes/edu/umass/cs/mallet/base/util/resources/logging.properties

[我正在使用https://github.com/clulab/banner提供的 BANNER ]

我同时遇到的另一个错误(... Logging configuration class "edu.umass.cs.mallet.base.util.Logger.DefaultConfigurator" failed)可以安全地忽略:

https://osdir.com/ml/ai.mallet.devel/2007-11/msg00008.html >> “我认为这是发行版的一个错误,但它只影响日志记录。我一直忽略这个警告。 "

http://comments.gmane.org/gmane.comp.ai.mallet.devel/200 >> “这个错误不应该影响你的输出。”

http://courses.washington.edu/ling572/winter09/teaching_slides/1_08_Mallet.pptx >> 幻灯片 20:“请忽略此消息。” [Fei Xia,2009 年 1 月,“Mallet 简介”,Andrew McCallum 在 UMass 的小组 ( https://people.cs.umass.edu/~mccallum/)]

于 2015-10-15T00:20:09.393 回答