1

下面的问题:

SOLR中的数据结构:

<field name="id" type="string" required="true"/> 
<field name="session_id" type="string" required="true"/> 
<field name="action_type" required="true"/> 
<field name="error_msg" required="false"/>

(所有字段都有:indexed="true" stored="true" multiValued="false")只有 'error' 字段不是必需的(可以为空)。

oracle中有一个等价的表:

TABLE SOLR_TEST
  (
    ID          NUMBER NOT NULL ,
    SESSION_ID  VARCHAR2(20 BYTE) NOT NULL ,
    ACTION_TYPE VARCHAR2(20 BYTE) NOT NULL ,
    ERROR_MSG   VARCHAR2(20 BYTE)
  );

有样本数据(SOLR和Oracle相同)

ID SESSION_ID           ACTION_TYPE          ERROR_MSG          
-- -------------------- -------------------- --------------------
 1 00001                SELECTED_ACTION                           
 2 00001                SELECTED_ACTION                           
 3 00001                OTHER                                     
 4 00002                A2                   ERROR_001            
 5 00002                OTHER                                     
 6 00003                SELECTED_ACTION      ERROR_002            
 7 00004                A1                   ERROR_001            
 8 00005                A2                                        
 9 00005                SELECTED_ACTION                           
10 00005                SELECTED_ACTION      ERROR_003            
11 00006                SELECTED_ACTION                           
12 00006                OTHER                ERROR_004            

问题:

如何在 SOLR 查询中创建将返回:所有session_id已指定action_type但从未发生action_type的非空指定error_msg

或相当于 Oracle 中的此查询:

select distinct session_id 
    from SOLR_TEST 
    where action_type='SELECTED_ACTION' 
    and not session_id in 
      ( select session_id 
        from SOLR_TEST 
        where action_type='SELECTED_ACTION' 
              and error_msg is not null
      );

此查询的结果是:

SESSION_ID         
--------------------
00001                
00006                

例如这样的 SOLR 查询不起作用

http://solrhost/solr/collection/select?rows=1&q=-(error_msg:[*+TO+*]+AND+action_type:SELECTED_ACTION)&wt=xml&indent=true&facet=true&facet.field=session_id&facet.zeros=false&fq=action_type:SELECTED_ACTION

// 编辑 /////////////////////////////////////

真正的架构如下所示:

<schema name="elogging" version="1.5">
  <fields>
    <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false"/>
    <field name="action_type" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
    <field name="session_id" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
    <field name="error_msg" type="string" indexed="true" stored="true" required="false" multiValued="false"/>
    <field name="_version_" type="long" indexed="true" stored="true"/>
  </fields>
  <uniqueKey>id</uniqueKey>
  <types>
    <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>
    <fieldType name="uuid" class="solr.UUIDField" indexed="true"/>
  </types>
  <updateRequestProcessorChain name="uniq-fields">
    <processor class="org.apache.solr.update.processor.UniqFieldsUpdateProcessorFactory">
      <lst name="fields">
        <str>id</str>
      </lst>
    </processor>
    <processor class="solr.RunUpdateProcessorFactory"/>
  </updateRequestProcessorChain>
</schema>

// 编辑 2 ////////////////////

SOLR 查询没有像我预期的那样工作 - 这个 SOLR 查询返回如下内容:

select distinct session_id 
from SOLR_TEST 
where action_type='SELECTED_ACTION' 
and error_msg is null;

SESSION_ID         
--------------------
00001                
00005                
00006

值 '00005' 是错误的,因为有一行:

10 00005                SELECTED_ACTION      ERROR_003            

// 编辑 3 ////////////

此 SOLR 查询也不起作用(与以前的问题相同):

http://solrhost/solr/collection/select?rows=1&q=action_type:SELECTED_ACTION+AND+-{!join+from=session_id+to=session_id}error_msg:*+AND+action_type:SELECTED_ACTION&wt=xml&indent=true&facet=true&facet.field=session_id&facet.zeros=false

// 编辑 4 ///////

*修复架构 - 'error_msg' 已编入索引*

// 编辑 5 /////

那里有 SOLR 的示例数据:

id,session_id,action_type,error_msg
1,00001,SELECTED_ACTION,
2,00001,SELECTED_ACTION,
3,00001,OTHER,
4,00002,A2,ERROR_001
5,00002,OTHER,
6,00003,SELECTED_ACTION,ERROR_002
7,00004,A1,ERROR_001
8,00005,A2,
9,00005,SELECTED_ACTION,
10,00005,SELECTED_ACTION,ERROR_003
11,00006,SELECTED_ACTION,
12,00006,OTHER,ERROR_004

此数据和查询的 SOLR 结果http://localhost:8983/solr/collection3/select?rows=1&q=-(error_msg:[*+TO+*]+AND+action_type:SELECTED_ACTION)&wt=xml&indent=true&facet=true&facet.field=session_id&facet.zeros=false&fq=action_type:SELECTED_ACTION

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">30</int>
<lst name="params">
<str name="facet.zeros">false</str>
<str name="facet">true</str>
<str name="indent">true</str>
<str name="q">
-(error_msg:[* TO *] AND action_type:SELECTED_ACTION)
</str>
<str name="facet.field">session_id</str>
<str name="wt">xml</str>
<str name="fq">action_type:SELECTED_ACTION</str>
<str name="rows">1</str>
</lst>
</lst>
<result name="response" numFound="4" start="0">
<doc>
<str name="id">1</str>
<str name="session_id">00001</str>
<str name="action_type">SELECTED_ACTION</str>
<long name="_version_">1449881246216749056</long>
</doc>
</result>
<lst name="facet_counts">
<lst name="facet_queries"/>
<lst name="facet_fields">
<lst name="session_id">
<int name="00001">2</int>
<int name="00005">1</int>
<int name="00006">1</int>
</lst>
</lst>
<lst name="facet_dates"/>
<lst name="facet_ranges"/>
</lst>
</response>
4

1 回答 1

1

这有点棘手,因为据我所知(如果有人能证明这是错误的,我会很高兴) - 不可能在另一个查询中重用部分查询结果(例如过滤查询或嵌套查询)。

所以,这是我目前所能得到的最接近的:

查询

http://localhost:8983/solr/stack19588325/select?q=action_type%3A%22SELECTED_ACTION%22&fq=%7B!tag%3Ddt%7Daction_type%3ASELECTED_ACTION+AND+error_msg%3A%5B*+TO+*%5D+AND+_query_%3A%7B!join+from%3Dsession_id+to%3Dsession_id+v%3D%24qq%7D&rows=0&wt=xml&indent=true&facet=true&facet.mincount=1&facet.field={!ex=dt%20key=nonfilter_session_id}session_id&facet.field=session_id&qq=-error_msg:[*%20TO%20*]

结果

<response>    
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">0</int>
  <lst name="params">
    <str name="qq">-error_msg:[* TO *]</str>
    <str name="q">action_type:"SELECTED_ACTION"</str>
    <arr name="facet.field">
      <str>{!ex=dt key=nonfilter_session_id}session_id</str>
      <str>session_id</str>
    </arr>
    <str name="indent">true</str>
    <str name="fq">{!tag=dt}action_type:SELECTED_ACTION AND error_msg:[* TO *] AND _query_:{!join from=session_id to=session_id v=$qq}</str>
    <str name="facet.mincount">1</str>
    <str name="rows">0</str>
    <str name="wt">xml</str>
    <str name="facet">true</str>
    <str name="_">1382878844535</str>
  </lst>
</lst>
<result name="response" numFound="1" start="0">
</result>
<lst name="facet_counts">
  <lst name="facet_queries"/>
  <lst name="facet_fields">
    <lst name="nonfilter_session_id">
      <int name="00001">2</int>
      <int name="00005">2</int>
      <int name="00003">1</int>
      <int name="00006">1</int>
    </lst>
    <lst name="session_id">
      <int name="00005">1</int>
    </lst>
  </lst>
  <lst name="facet_dates"/>
  <lst name="facet_ranges"/>
</lst>
</response>

因此,正如您在此处看到的,我们有两个不同的方面结果:

  • nonfilter_session_id - 这显示那些没有error_msg的“session_id”。计数 - 是 session_id 记录的总计数。
  • session_id - 这显示了那些都有并且没有error_msg的“session_id”(00005就是这种情况)。计数 - 是带有 error_msg 的 session_id。

因此,如果没有更好的选择 - 您可以将这两组进行交集,并且只有那些 session_id 是预期的。

于 2013-10-27T11:53:09.133 回答