0

I am importing data using the DIH and have a need to parse a string, capture two numbers, then populate a field of type=location (which accepts a "lat,long" coordinate pair). The logical thing to do is:

  <field column="latLong" 
         regex="Latitude is ([-\d.]+)\s+ Longitude is ([-\d.]+)\s+" 
         replaceWith="$1,$2" />

It seems the DIH only knows about a single capture group. So $2 is never used.

Has anyone ever used more than one capture with the regexTransformer? Searching the documentation didn't provide any examples of $2 or $3. What gives, O ye priests of Solr?

4

1 回答 1

2

Solr DIH 不理解$2,$3等是不正确的,

我刚试过这个。在 DIH data-config.xml 中添加了这个:

<entity name="foo" 
        transformer="RegexTransformer" 
        query="SELECT list_id FROM lists WHERE list_id = ${Lists.id}">
    <field column="firstLastNum" 
           regex="^(\d).*?(\d)$" 
           replaceWith="$1:$2" 
           sourceColName="list_id"/>
</entity>

然后在我的 schema.xml 中添加该字段

<field name="firstLastNum" type="string" indexed="true" stored="true"/>

当我用 list_id = 390 索引一个文档时,firstLastNum 是 3:0,这确实是正确的。

我怀疑这个问题可能是因为不正确的正则表达式只匹配第一部分而不是第二部分。也许试试这个正则表达式:

regex="Latitude is ([-\d.]+)\s*Longitude is ([-\d.]+)"

另一个原因可能是 latLong 是location类型并且$1,$2是字符串类型,但我不确定。

于 2013-02-23T04:02:17.740 回答