5

I have a List of strings:

List<string> _words = ExtractWords(strippedHtml);

_words contains 1799 indexes; in each index there is a string.

Some of the strings contain only numbers, for example:

" 2" or "2013"

I want to remove these strings and so in the end the List will contain only strings with letters and not digits.

A string like "001hello" is OK but "001" is not OK and should be removed.

4

4 回答 4

8

You can use LINQ for that:

_words = _words.Where(w => w.Any(c => !Char.IsDigit(c))).ToList();

This would filter out strings that consist entirely of digits, along with empty strings.

于 2013-06-27T17:08:04.500 回答
4
_words = _words.Where(w => !w.All(char.IsDigit))
               .ToList();
于 2013-06-27T17:08:36.367 回答
2

For removing words that are only made of digits and whitespace:

var good = new List<string>();
var _regex = new Regex(@"^[\d\s]*$");
foreach (var s in _words) {
    if (!_regex.Match(s).Success)
        good.Add(s);
}

If you want to use LINQ something like this should do:

_words = _words.Where(w => w.Any(c => !char.IsDigit(c) && !char.IsWhiteSpace(c)))
               .ToList();
于 2013-06-27T17:08:43.387 回答
1

You can use a traditional foreach and Integer.TryParse to detect numbers. This will be faster than Regex or LINQ.

var stringsWithoutNumbers = new List<string>();
foreach (var str in _words)
{
    int n;
    bool isNumeric = int.TryParse(str, out n);
    if (!isNumeric)
    {
        stringsWithoutNumbers.Add(str);
    }
}
于 2013-06-27T17:13:56.490 回答