0

I am using the Extracting Request Handler to index html and pdf files. Along with what tika finds I want to add metadata above and beyond content from tika. To do this I use the literal.= support. Unless I use dynamic fields "*_s" the data is not saved. Only the id field seems to work as advertised. I'm sure that I'm doing something wrong. My schema.xml field definitions:

<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
<!-- The following fields don't work, need to use dynamic fields for some reason -->
<field name="region" type="text_general" indexed="true" stored="true" multiValued="true"/>
   <field name="href" type="text_general" indexed="true" stored="true" multiValued="true"/>
   <field name="services" type="text_general" indexed="false" stored="true" multiValued="true" />

My Solrj code:

        ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract");
    ContentStream contentStream = new ContentStreamBase.StringStream(contentBean.getContent());
    req.addContentStream(contentStream);

    req.setParam("literal.region", region);

    req.setParam("literal.href", contentBean.getHref());

    req.setParam("literal.id", getDocId(url));
    for (Map.Entry<String,String> entry : getFacetsFromURL(url).entrySet()) {
        logger.info("Setting facet field {} to {}", entry.getKey(), entry.getValue());
        req.setParam("literal." + entry.getKey(), entry.getValue());
    }
    // index h1 tag
    req.setParam("fmap.tags_h1", "h1");
    req.setParam("capture", "h1");
    // index img tag
    req.setParam("fmap.img", "tags_img");
    req.setParam("capture", "img");
    // lowercase tag names
    req.setParam("lowernames", "true");
    /*
     * Passing commitWithin as a parameter seems
     * to be the only way to get it to work with
     * this request handler
     */
    req.setParam("commitWithin", "10000");
    /*
     * Now do the work!
     */
    req.process(server);

Changing region to region_s, href to href_s and adding _s to the key value in the map, works. I don't understand why region etc don't get saved unless it's matching the *_s dynamic field in the schema. I noticed a few other issues. I tried to use a copyField to move one of the literal fields to a field for faceting, I never see any data in the facet field. Here are some of the ways I tried this

<field name="services_facet" type="string" indexed="true" stored="false" multiValued="true" />
<copyField source="services_s" dest="services_facet"/>

There is never anything in services_facet. I can facet on services_s but shouldn't this work? Is Solr-Cell broken or just poorly documented?

4

1 回答 1

0

此问题是由运行 solr 的旧码头会话引起的。这阻止了模式更新被拾取。一旦我杀死了码头,事情就按预期工作了。

于 2014-07-22T15:50:49.840 回答