0

我有一个 SOLR 查询

group.field=USER_TYPE&group.limit=3&group.format=grouped

因此,这完美地根据用户类型进行分组。但是,组内的结果包含重复项。也就是说,每个用户都有一个唯一的 user_id 和一个 user_group。每个用户可以有多个 user_group。因此,结果包含具有不同 user_groups 和相同 user_id 的重复用户。我希望分组结果不应包含重复的 user_id 值。

{"groupValue":"A","doclist":{"numFound":849956,"start":0,"maxScore":9.992027,\
    "docs":[
        {"user_group":"GPA","user_id":"4443510",.....},
        {"user_group":"GPB","user_id":"4443510",.....},
        {"user_group":"GPC","user_id":"4443510",.....},
        ....
        ]

在这种情况下,任何人都可以帮助避免重复。

编辑: 我期待结果可能是

{"groupValue":"A","doclist":{"numFound":849956,"start":0,"maxScore":9.992027,       
    "docs":[
        {"groupValue":"4443510"
            "docs":[            
                {"user_group":"GPA","user_id":"4443510",.....},
                {"user_group":"GPB","user_id":"4443510",.....},
                {"user_group":"GPC","user_id":"4443510",.....},
                ....
                ]
        ....
        ]
4

1 回答 1

0

I don't think it is possible to do grouping within a group.

But on the other hand, I think you could solve this issue by modifying the way you are indexing.

Now, you have multiple documents for each user_id:

 "docs":[            
                {"user_group":"GPA","user_id":"4443510",.....},
                {"user_group":"GPB","user_id":"4443510",.....},
                {"user_group":"GPC","user_id":"4443510",.....},
                ....
                ]

You can modify it as follows to solve the issue:

 "docs":[            
                {"user_group":["GPA","GPB","GPC"],"user_id":"4443510",.....},
                {"user_group":["GPB"],"user_id":"4443511",.....},
                {"user_group":["GPA","GPC"],"user_id":"4443512",.....},
                ....
                ]

I mean you can modify the user_group to be mutivalued, so that you have only one document per user.

于 2013-02-04T04:09:34.013 回答