5

This is what I've in mind, but it's O(n^2):

For ex: Input is "Thisisawesome", we need to check if adding the current character makes the older found set any longer and meaningful. But in order to see till where we need to back up we'll have to traverse all the way to the beginning. For ex: "awe" and "some" make proper words but "awesome" makes the bigger word. Please suggest how can we improve the complexity. Here is the code:

void update(string in)
{
   int len= in.length();
   int DS[len];
   string word;
   for(int i=0; i<len; i++) DS[i]=0;

   for(int i=0; i<len; i++)
        for(int j=i+1; j<=len; j++)
        {
            word = in.substr(i,j-i);
            if(dict.find(word)!=dict.end())
                   DS[j-1] = (DS[j-1] > word.length()) ? DS[j-1] : word.length();   
         }
}
4

2 回答 2

3

有一个动态编程解决方案,起初看起来它将是 O(n^2),但对于足够大的 n 和固定大小的字典来说,结果只有 O(n)。

从左到右处理字符串。在第 i 个阶段,您需要确定前 i 个字符是否有解决方案。为了解决这个问题,考虑所有可能的方法将这些 i 字符分成两个块。如果第二个块是一个单词,而第一个块可以分解成单词,那么就有一个解决方案。你可以用你的字典检查第一个要求。您可以通过查看是否找到前 j 个字符的答案来检查第二个要求,其中 j 是第一个块的长度。

This would be O(n^2) because for each of 1,2,3,...n lengths you consider every possible split. However, if you know what the longest word in your dictionary is you know that there is no point considering splits which make the second chunk longer than this. So for each of 1,2,3...n lengths you consider at most w possible splits, where w is the longest word in your dictionary, and the cost is O(n).

于 2013-01-27T06:15:17.800 回答
2

我今天已经编写了我的解决方案,明天将把它放在一个网站上。无论如何,方法如下:

  1. 将字典排成一排。

    trie 可以帮助快速进行多个匹配,因为所有以相同字母开头的字典单词都可以同时匹配。

    (例如,“chairman”在 trie 中匹配“chair”和“chairman”。)

  2. 使用 Dijkstra 算法找到最佳匹配。

    (例如对于“主席”,如果我们将“c”算作位置 0,那么我们就有关系 0->5、0->8、1->5、2->5、5->8。这些关系形成一个非常适合 Dijkstra 算法的网络。)

    (注意:边的权重在哪里?见下一点。)

  3. 为字典单词分配权重。

    如果不对坏匹配加权,则对好匹配进行加权。(例如,“iamahero”变成“我是英雄”而不是“我是英雄”。)

    http://app.aspell.net/create上的 SCOWL 字典很好地发挥了作用,因为它有不同大小的字典。这些尺寸(10、20 等)是称重的好选择)。

    经过一些尝试,我发现需要减少以“s”结尾的单词的权重,因此“eyesandme”变成“eyes and me”而不是“eye sand me”。

我已经能够在几毫秒内分割一个段落。该算法对要拆分的字符串的长度具有线性复杂度,因此只要内存足够,该算法就可以很好地扩展。

这是转储(抱歉吹牛)。(选择的段落是维基百科中的“小说”。)

D:\GoogleDrive\programs\WordBreaker>"word breaker"<novelnospace.txt>output.txt

D:\GoogleDrive\programs\WordBreaker>type output.txt
Number of words after reading words-10.txt : 4101
Number of words after reading words-20.txt : 11329
Number of words after reading words-35.txt : 43292
Number of words after reading words-40.txt : 49406
Number of words after reading words-50.txt : 87966

Time elapsed in reading dictionary: 0.956782s

Enter the string to be broken into words:

Result:
a novel is along narrative normally in prose which describes fictional character
s and events usually in the form of a sequential story while i an watt in the ri
se of the novel 1957 suggests that the novel came into being in the early 18 th
century the genre has also been described as possessing a continuous and compreh
ensive history of about two thousand years with historical roots in classical gr
eece and rome medieval early modern romance and in the tradition of the novel la
the latter an italian word used to describe short stories supplied the present g
eneric english term in the 18 th century miguel de cervantes author of don quixo
te is frequently cited as the first significant europe an novelist of the modern
 era the first part of don quixote was published in 1605 while a more precise de
finition of the genre is difficult the main elements that critics discuss are ho
w the narrative and especially the plot is constructed the themes settings and c
haracterization how language is used and the way that plot character and setting
 relate to reality the romance is a related long prose narrative w alter scott d
efined it as a fictitious narrative in prose or verse the interest of which turn
s upon marvellous and uncommon incidents whereas in the novel the events are acc
ommodated to the ordinary train of human events and the modern state of society
however many romances including the historical romances of scott emily brontes w
u the ring heights and her man melvilles mo by dick are also frequently called n
ovels and scott describes romance as a kind red term romance as defined here sho
uld not be confused with the genre fiction love romance or romance novel other e
urope an languages do not distinguish between romance and novel a novel isle rom
 and err o ma nil roman z o

Time elapsed in splitting: 0.00495095s

D:\GoogleDrive\programs\WordBreaker>type novelnospace.txt
Anovelisalongnarrativenormallyinprosewhichdescribesfictionalcharactersandeventsu
suallyintheformofasequentialstoryWhileIanWattinTheRiseoftheNovel1957suggeststhat
thenovelcameintobeingintheearly18thcenturythegenrehasalsobeendescribedaspossessi
ngacontinuousandcomprehensivehistoryofabouttwothousandyearswithhistoricalrootsin
ClassicalGreeceandRomemedievalearlymodernromanceandinthetraditionofthenovellaThe
latteranItalianwordusedtodescribeshortstoriessuppliedthepresentgenericEnglishter
minthe18thcenturyMigueldeCervantesauthorofDonQuixoteisfrequentlycitedasthefirsts
ignificantEuropeannovelistofthemodernerathefirstpartofDonQuixotewaspublishedin16
05Whileamoreprecisedefinitionofthegenreisdifficultthemainelementsthatcriticsdisc
ussarehowthenarrativeandespeciallytheplotisconstructedthethemessettingsandcharac
terizationhowlanguageisusedandthewaythatplotcharacterandsettingrelatetorealityTh
eromanceisarelatedlongprosenarrativeWalterScottdefineditasafictitiousnarrativein
proseorversetheinterestofwhichturnsuponmarvellousanduncommonincidentswhereasinth
enoveltheeventsareaccommodatedtotheordinarytrainofhumaneventsandthemodernstateof
societyHowevermanyromancesincludingthehistoricalromancesofScottEmilyBrontesWuthe
ringHeightsandHermanMelvillesMobyDickarealsofrequentlycallednovelsandScottdescri
besromanceasakindredtermRomanceasdefinedhereshouldnotbeconfusedwiththegenreficti
onloveromanceorromancenovelOtherEuropeanlanguagesdonotdistinguishbetweenromancea
ndnovelanovelisleromanderRomanilromanzo
D:\GoogleDrive\programs\WordBreaker>
于 2015-06-30T16:44:39.187 回答