0

我正在使用 Solr 3.6.2 为我确定包含特定字符串的文档提取片段。(首先,这种用法正确吗?)不幸的是,我得到的片段不包含我的查询字符串(简单、单一、非停用词)。

例如,对于我知道包含“funmitflags”的文档 123456,我有一个类型的查询:

id:123456 and content_en:funmitflags

fl=id&hl=true&hl.fl=content_en&hl.snippets=2&hl.alternateField=content_en&hl.maxAlternateFieldLength=400&hl.maxAnalyzedCharacters=2147483647&hl.fragsize=400&rows=100

(我将“content_en”作为备用字段,以便从文档中获取任何片段。我通常在该字段中有大量文本。)但是,现在我通常会返回前 400 个字符,而不是那些包含我的“ funmitflags”一词。

无论如何,我可以从管理页面检索文档,只是不是一个适当的亮点。这很尴尬,因为我在大约 75% 的查询中遇到了这个问题。

在我的 schema.xml 中,我将“content_en”定义为“text_en”。

<field name="content_en" type="text_en" indexed="true" stored="true" />

我将“text_en”从原始定义更改为以下内容:

 <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="lang/stopwords_en.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.WordDelimiterFilterFactory" 
            generateWordParts="1" 
            generateNumberParts="1" 
            catenateWords="0" 
            catenateNumbers="0" 
            catenateAll="0" 
            splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory" 
            generateWordParts="1" 
            generateNumberParts="1" 
            catenateWords="0" 
            catenateNumbers="0" 
            catenateAll="0" 
            splitOnCaseChange="1"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="lang/stopwords_en.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
    <filter class="solr.PorterStemFilterFactory"/>
  </analyzer>
</fieldType>

重新索引后,在这两种情况下我都没有得到正确的片段。有人可以给我一个方向吗?我应该总是得到一个包含我的搜索的片段吗?

4

3 回答 3

1

刚刚尝试突出显示使用您的分析器链largetext_en的类型字段。text_en突出显示效果很好,4 个片段包含搜索的单词Polyana,如下所示。(很难阅读下面字段的内容,所以复制粘贴到文本编辑器中查看。)

http://localhost:8983/solr/collection1/select?q=id:mateva_highlight%20AND%20largetext_en:polyana&wt=json&fl=id,largetext_en&hl=true&hl.fl=largetext_en&hl.snippets=10&hl.alternateField=largetext_en&hl.maxAlternateFieldLength=400&hl.maxAnalyzedCharacters=2147483647&hl.fragsize=400&rows=100

这是输出:

responseHeader: {
status: 0
QTime: 5
-params: {
hl.fragsize: "400"
fl: "id,largetext_en"
hl.snippets: "10"
hl.maxAlternateFieldLength: "400"
q: "id:mateva_highlight AND largetext_en:polyana"
hl.alternateField: "description"
hl.fl: "largetext_en"
wt: "json"
hl: "true"
-rows: [
"41"
"100"
]
hl.maxAnalyzedCharacters: "2147483647"
}
}
-response: {
numFound: 1
start: 0
-docs: [
-{
id: "mateva_highlight"
-largetext_en: [
"COUNT LEO NIKOLAYEVICH TOLSTOY was born August 28, 1828, at the family estate of Yasna- ya Polyana, in the province of Tula. His moth- er died when he was three and his father six years later. Placed in the care of his aunts, he passed many of his early years at Kazan, where, in 1844, after a preliminary training by French tutors, he entered the university. He cared lit- tle for the university and in 1847 withdrew be- cause of "ill-health and domestic circum- stances." He had, however, done a great deal of reading, of French, English, and Russian novels, the New Testament, Voltaire, and Hegel. The author exercising the greatest in- fluence upon him at this time was Rousseau; he read his complete works and for sometime wore about his neck a medallion of Rousseau. Immediately upon leaving the university, Tolstoy returned to his estate and, perhaps inr spired by his enthusiasm for Rousseau, pre- pared to devote himself to agriculture and to improving the condition of his serfs. His first attempt at social reform proved disappointing, and after six months he withdrew to Moscow and St. Petersburg, where he gave himself over to the irregular life characteristic of his class and time. In 1851, determined to "escape my debts and, more than anything else, my hab- its," he enlisted in the Army as a gentleman- volunteer, and went to the Caucasus. While at Tiflis, preparing for his examinations as a cadet, he wrote the first portion of the trilogy, Childhood, Boyhood, and Youth, in which he celebrated the happiness of "being with Na- ture, seeing her, communing with her." He al- so began The Cossacks with the intention of showing that culture is the enemy of happi- ness. Although continuing his army life, he gradually came to realize that "a military ca- reer is not for me, and the sooner I get out of it and devote myself entirely to literature the better." His Sevastopol Sketches (1855) were so successful that Czar Nicholas issued special orders that he should be removed from a post of danger. Returning to St. Petersburg, Tolstoy was re- ceived with great favor in both the official and literary circles of the capital. He soon became interested in the popular progressive move- ment of the time, and in 1857 he decided to go abroad and study the educational and munici- pal systems of other countries. That year, and again in 1860, he traveled in Europe. At Yas- naya Polyana in 1861 he liberated his serfs and opened a school, established on the principle that "everything which savours of compulsion is harmful." He started a magazine to promote his notions on education and at the same time served as an official arbitrator for grievances between the nobles and the recently emanci- pated serfs. By the end of 1863 he was so ex- hausted that he discontinued his activities and retired to the steppes to drink koumis for his health. Tolstoy had been contemplating marriage for some time, and in 1862 he married Sophie Behrs, sixteen years his junior, and the daugh- ter of a fashionable Moscow doctor. Their early married life at Yasnaya Polyana was tranquil. Family cares occupied the Countess, and in the course of her life she bore thirteen children, nine of whom survived infancy. Yet she also acted as a copyist for her husband, who after their marriage turned again to writ- ing. He was soon at work upon "a novel of the i8io's and *2o's" which absorbed all his time and effort. He went frequently to Mos- cow, "studying letters, diaries, and traditions" and "accumulated a whole library" of histori- cal material on the period. He interviewed survivors of the battles of that time and trav- eled to Borodino to draw up a map of the battleground. Finally, in 1869, after his work had undergone several changes in conception and he had "spent five years of uninterrupted andjgxceptionally strenuous labor Tnnierthe IbesfcondUtions of life/' he published War and Peace. Its appearance immediately established Tolstoy's reputation, and in the judgment of Turgenev, the acknowledged dean of Russian letters, gave him "first place among all our contemporary writers." The years immediately following the com- pletion of War and Peace were pa**efl in a great variety of occupations, none of which Tohtoy found satisfying. He tried busying VI BIOGRAPHICAL NOTE himself with the affairs of his estate, under- took the learning of Greek to read the ancient classics, turned again to education, wrote a series of elementary school books, and served as school inspector. With much urging from his wife and friends, he completed Anna Kare- nina, which appeared serially between 1875 and 1877. Disturbed by what he considered his unreflective and prosperous existence, Tolstoy became increasingly interested in religion. At first he turned to the orthodox faith of the people. Unable to find rest there, he began a detailed examination of religions, and out of his reading, particularly of the Gospels, gradu- ally evolved his own personal doctrine. Following his conversion, Tolstoy adopted a new mode of life. He dressed like a peasant, devoted much of his time to manual work, learned shoemaking, and followed a vegetari- an diet. With the exception of his youngest daughter, Alexandra, Tolstoy's family re- mained hostile to his teaching. The breach be- tween him and his wife grew steadily wider. In 1879 he wrote the Kreutzer Sonata in which he attacked the normal state of marriage and extolled a life of celibacy and chastity. In 1881 he divided his estate among his heirs and, a few years later, despite the opposition of his wife, announced that he would forego royal- ties on all the works published after his con- version. Tolstoy made no attempt at first to propa- gate his religious teaching, although it attracted many followers. After a visit to the Moscow slums iri 1881, he became concerned with social conditions, and he subsequently aided the suf- ferers of the famine by sponsoring two hun- dred and fifty relief kitchens. After his meet- ing and intimacy with Chertkov, "Tolstoyism" began to develop as an organized sect. Tol- stoy's writings became almost exclusively pre- occupied with religious problems. In addition to numerous pamphlets and plays, he wrote IV hat is Art? (1896), in which he explained his new aesthetic theories, and Hadji-Murad, (1904), which became the favorite work of his old age. Although his activities were looked upon with increasing suspicion by the official authorities, Tolstoy escaped official censure until 1901, when he was excommunicated by the Orthodox Church. His followers were f re- quently subjected to persecution, and many were either banished or imprisoned. Tolstoy's last years were embittered by mounting hostility within his own household. Although his personal life was ascetic, he felt the ambiguity of his position as a preacher of poverty living on his great estate. Finally, at the age of eighty-two, with the aid of his daugh- ter, Alexandra, he fled from home. His health broke down a few days later, and he was re- moved from the train to the station-master's hut at Astopovo, where he died, November 7, 1910. He was buried at Yasnaya Polyana, in the first public funeral to be held in Russia without religious rites. "
]
}
]
}
-highlighting: {
-mateva_highlight: {
-largetext_en: [
"COUNT LEO NIKOLAYEVICH TOLSTOY was born August 28, 1828, at the family estate of Yasna- ya <em>Polyana</em>, in the province of Tula. His moth- er died when he was three and his father six years later. Placed in the care of his aunts, he passed many of his early years at Kazan, where, in 1844, after a preliminary training by French tutors, he entered the university. He cared lit"
" and study the educational and munici- pal systems of other countries. That year, and again in 1860, he traveled in Europe. At Yas- naya <em>Polyana</em> in 1861 he liberated his serfs and opened a school, established on the principle that "everything which savours of compulsion is harmful." He started a magazine to promote his notions on education and at the same time served as an official"
" doctor. Their early married life at Yasnaya <em>Polyana</em> was tranquil. Family cares occupied the Countess, and in the course of her life she bore thirteen children, nine of whom survived infancy. Yet she also acted as a copyist for her husband, who after their marriage turned again to writ- ing. He was soon at work upon "a novel of the i8io's and *2o's" which absorbed all his time"
" position as a preacher of poverty living on his great estate. Finally, at the age of eighty-two, with the aid of his daugh- ter, Alexandra, he fled from home. His health broke down a few days later, and he was re- moved from the train to the station-master's hut at Astopovo, where he died, November 7, 1910. He was buried at Yasnaya <em>Polyana</em>, in the first public funeral to be held"
]
}
}
}
于 2013-03-01T03:10:05.930 回答
0

感谢@arun 的实验截断了一半的可能性,我找到了解决方案。

  • 由于我的文本很大,我在 solrconfig.xml 中设置

    <maxFieldLength>1000000</maxFieldLength>

  • 为了提高我开始使用 fastVectorHighlighter 的速度:

    solrQuery.set("hl.useFastVectorHighlighter", true); 我的查询。似乎它禁用了我的 highligherSimplePre 和 highligherSimplePost,但谁在乎。

另外,我必须在我的内容字段中添加 term* 选项:

` <field name="content_en" type="text_en" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />`
  • 当然,执行了重新索引。
于 2013-03-01T13:15:39.020 回答
0

Note that the query has hl.maxAnalyzedCharacters=2147483647, but this is the wrong parameter name -- what's wanted instead is hl.maxAnalyzedChars

于 2014-11-04T20:30:51.953 回答