string - 从具有一些非常奇怪的字符串长度结果的文本文件中读取

Question

我正在尝试读取一个包含 Twitter 屏幕名称的文本文件并将它们存储在数据库中。ScreenNames 不能超过 15 个字符，因此我的一项检查确保名称不超过 15 个字符。

当我尝试上传美国运通时，我发现发生了一些非常奇怪的事情。

这是我的文本文件内容：

americanexpress
AmericanExpress‎
AMERICANEXPRESS

这是我的代码：

var names = new List<string>();
var badNames = new List<string>();

using (StreamReader reader = new StreamReader(file.InputStream, Encoding.UTF8))
{
    string line;
    while (!reader.EndOfStream)
    {
        line = reader.ReadLine();
        var name = line.ToLower().Trim();

        Debug.WriteLine(line + " " + line.Length + " " + name + " " + name.Length);
        if (name.Length > 15 || string.IsNullOrWhiteSpace(name))
        {
            badNames.Add(name);
            continue;
        }

        if (names.Contains(name))
        {
            continue;
        }

        names.Add(name);
    }
}

第一个美国运通通过了 15 岁以下长度测试，第二个未通过，第三个通过。当我在 AmericanExpress 的第二个循环中调试代码并将鼠标悬停在名称上时，这就是我得到的：

在此处输入图像描述

这是调试输出：

americanexpress 15 americanexpress 15
AmericanExpress‎ 16 americanexpress‎ 16
AMERICANEXPRESS 15 americanexpress 15

我数过美国运通中的字符至少 10 次，而且我很确定它只有 15 个字符。

有谁知道为什么 Visual Studio 告诉我 Americanexpress.Length = 16？

解决方案

name = Regex.Replace(name, @"[^\u0000-\u007F]", string.Empty);

score 2 · Accepted Answer

在 s 之后是一个字符，它不可见但算作一个字符。看着

name[15]    8206 '‎'

有关 char 8206 的信息，请参见 http://www.fileformat.info/info/unicode/char/200e/index.htm

可能的解决方案：只读取 ASCII 值

var name = Encoding.ASCII.GetString(Encoding.ASCII.GetBytes(line.ToLower().Trim()));

string - 从具有一些非常奇怪的字符串长度结果的文本文件中读取

1 回答 1

Related

Reference