我已经开始使用重要术语聚合来查看与我索引的整个文档集相比,哪些关键字在文档组中是重要的。
在很多文档被索引之前,它都很好用。然后对于曾经有效的相同查询,elasticsearch 只说:
SearchPhaseExecutionException[Failed to execute phase [query],
all shards failed; shardFailures {[OIWBSjVzT1uxfxwizhS5eg][demo_paragraphs][0]:
CircuitBreakingException[Data too large, data for field [text]
would be larger than limit of [633785548/604.4mb]];
我的查询如下所示:
POST /demo_paragraphs/_search
{
"query": {
"match": {
"django_target_id": 1915661
}
},
"aggregations" : {
"signKeywords" : {
"significant_terms" : {
"field" : "text"
}
}
}
}
以及文件结构:
"_source": {
"django_ct": "citations.citation",
"django_target_id": 1915661,
"django_id": 3414077,
"internal_citation_id": "CR7_151",
"django_source_id": 1915654,
"text": "Mucin 1 (MUC1) is a protein heterodimer that is overexpressed in lung cancers [6]. MUC1 consists of two subunits, an N-terminal extracellular subunit (MUC1-N) and a C-terminal transmembrane subunit (MUC1-C). Overexpression of MUC1 is sufficient for the induction of anchorage independent growth and tumorigenicity [7]. Other studies have shown that the MUC1-C cytoplasmic domain is responsible for the induction of the malignant phenotype and that MUC1-N is dispensable for transformation [8]. Overexpression of",
"id": "citations.citation.3414077",
"num_distinct_citations": 0
}
我索引的数据是科学论文的段落。没有文件真的很大。
关于如何分析或解决问题的任何想法?