3

This is best described with an example. Given the paragraph:

The longest string in this paragraph is not the shortest string in the paragraph because it is the longest string in the paragraph

I want to list the order of matching sub-strings first by frequency and then by length, so in this case, it should list (non case-sensitive)

  • The longest string in
  • the paragraph
  • is not the shortest string in
  • because
  • it is
  • this

The above lists the substrings by the order of frequency they occur, followed by length, so The longest string in is repeated twice and is the longest substring. is not the shortest string in is longer than the paragraph, but the paragraph is repeated twice, so it is listed first.

Update(based on observation by AlexC and MattBurland):

Even if a sub-string such as the space character or in occur more than other substrings, they should not be listed if they are already included in a substring that is longer than their occurrence * length. For example, in occurs 3 times which is 6 characters in length (9 including spaces at the end), but since 9 characters is shorter than the paragraph, it is not listed. I hope this makes sense?

4

1 回答 1

0

是的,就像其他人说的那样,如果您从提供的示例中提取子字符串并修剪空格,您将得到一个字符串数组,如下所示:

string[] myArray = {"the" , "longest", .... 等

现在,你可以做的是循环到数组中删除相似的字符串,同时增加它们的出现。然后,您将此信息添加到列表中。

然后你再次循环以按长度排序。但是,最后,列表中的字符串不能是单词的组合,除非输入字符串被其他东西分隔,而不是像 $ 符号这样的空格。

“$本段$中最长的字符串不是$本段$中最短的字符串,因为它是$本段中最长的字符串”

如果是这种情况,您只需执行与上述完全相同的过程,但用 $ 符号而不是空格分隔子字符串。

于 2012-04-18T23:19:30.500 回答