所以我在eclipse中运行boilerpipe。我只是想让它工作,这是代码..
package de.l3s.boilerpipe.demo;
import java.net.URL;
import de.l3s.boilerpipe.extractors.DefaultExtractor;
public static void main(final String[] args) throws Exception {
URL url;
url = new URL("http://religion.blogs.cnn.com/2012/11/16/my-take-113th-congress-looks-like-old-america/?hpt=hp_c3");
final InputStream urlStream = url.openStream();
final InputSource is = new InputSource(urlStream);
final BoilerpipeSAXInput in = new BoilerpipeSAXInput(is);
final TextDocument doc = in.getTextDocument();
urlStream.close();
System.out.println(DefaultExtractor.INSTANCE.getText(doc));
//System.out.println(ArticleExtractor.INSTANCE.getText(doc));
}
我不确定我是否在 Eclipse 中正确设置它,但我的控制台只是说...
SAX features:
http://xml.org/sax/features/namespaces
http://xml.org/sax/features/namespace-prefixes
http://xml.org/sax/features/string-interning
http://xml.org/sax/features/validation
http://xml.org/sax/features/external-general-entities
http://xml.org/sax/features/external-parameter-entities