.net - VB.NET 二进制文件中的 0x00

Question

更新如下

我正在使用 VB.NET 中的 BinaryReader 读取二进制文件。文件中每一行的结构是：

    "Category" = 1 byte
    "Code" = 1 byte
    "Text" = 60 Bytes

    Dim Category As Byte
    Dim Code As Byte
    Dim byText() As Byte
    Dim chText() As Char
    Dim br As New BinaryReader(fs)

    Category = br.ReadByte()
    Code = br.ReadByte()
    byText = br.ReadBytes(60)
    chText = encASCII.GetChars(byText)

问题是“文本”字段有一些用于填充的时髦字符。大多数似乎是 0x00 空字符。

有没有办法通过一些编码来摆脱这些 0x00 字符？
否则，如何在 chText 数组上进行替换以摆脱 0x00 字符？我正在尝试将生成的数据表序列化为 XML，但在这些不兼容的字符上失败了。我能够遍历数组，但是我不知道如何进行替换？

更新：

这就是我在下面的男人/女孩的很多帮助下所处的位置。第一个解决方案有效，但不像我希望的那样灵活，第二个解决方案在一个用例中失败，但更通用。

广告 1）我可以通过将字符串传递给这个子例程来解决这个问题

    Public Function StripBad(ByVal InString As String) As String
        Dim str As String = InString
        Dim sb As New System.Text.StringBuilder
        strNew = strNew.Replace(chBad, " ")
        For Each ch As Char In str

            If StrComp(ChrW(Val("&H25")), ch) >= 0 Then
                ch = " "
            End If
            sb.Append(ch)
        Next

        Return sb.ToString()
    End Function

广告 2) 此例程确实删除了几个违规字符，但由于 0x00 失败。这改编自 MSDN， http: //msdn.microsoft.com/en-us/library/kdcak6ye.aspx。

    Public Function StripBadwithConvert(ByVal InString As String) As String
        Dim unicodeString As String
        unicodeString = InString
        ' Create two different encodings.
        Dim ascii As Encoding = Encoding.ASCII
        Dim [unicode] As Encoding = Encoding.UTF8

        ' Convert the string into a byte[].
        Dim unicodeBytes As Byte() = [unicode].GetBytes(unicodeString)

        Dim asciiBytes As Byte() = Encoding.Convert([unicode], ascii, unicodeBytes)

        Dim asciiChars(ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length) - 1) As Char
        ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0)
        Dim asciiString As New String(asciiChars)

        Return asciiString
    End Function

score 3 · Accepted Answer

首先你应该找出文本的格式是什么，这样你就只是盲目地删除一些东西而不知道你点击了什么。

根据格式，您可以使用不同的方法来删除字符。

要仅删除零个字符：

Dim len As Integer = 0
For pos As Integer = 0 To byText.Length - 1
   If byText(pos) <> 0 Then
      byText(len) = byText(pos)
      len += 1
   End If
Next
strText = Encoding.ASCII.GetChars(byText, 0, len)

要删除从第一个零字符到数组末尾的所有内容：

Dim len As Integer
While len < byText.Length AndAlso byText(len) <> 0
   len += 1
End While
strText = Encoding.ASCII.GetChars(byText, 0, len)

编辑：
如果您只想保留任何碰巧是 ASCII 字符的垃圾：

Dim len As Integer = 0
For pos As Integer = 0 To byText.Length - 1
   If byText(pos) >= 32 And byText(pos) <= 127 Then
      byText(len) = byText(pos)
      len += 1
   End If
Next
strText = Encoding.ASCII.GetChars(byText, 0, len)

score 0 · Accepted Answer

如果将空字符用作文本的右填充（即终止），这将是正常情况，这很容易：

Dim strText As String = encASCII.GetString(byText)
Dim strlen As Integer = strText.IndexOf(Chr(0))
If strlen <> -1 Then
    strText = strText.Substr(0, strlen - 1)
End If

如果没有，您仍然可以Replace对字符串执行正常操作。如果在将字节数组转换为字符串之前对字节数组进行修剪，它会稍微“更干净” 。不过，原理还是一样的。

Dim strlen As Integer = Array.IndexOf(byText, 0)
If strlen = -1 Then
    strlen = byText.Length + 1
End If
Dim strText = encASCII.GetString(byText, 0, strlen - 1)

score 0 · Accepted Answer

您可以使用结构来加载数据：

[System.Runtime.InteropServices.StructLayout(System.Runtime.InteropServices.LayoutKind.Explicit)]
internal struct TextFileRecord
{
    [System.Runtime.InteropServices.FieldOffset(0)]
    public byte Category;
    [System.Runtime.InteropServices.FieldOffset( 1 )]
    public byte Code;
    [System.Runtime.InteropServices.FieldOffset( 2 )]
    [System.Runtime.InteropServices.MarshalAs(System.Runtime.InteropServices.UnmanagedType.LPTStr, SizeConst=60)]
    public string Text;
}

您必须调整 UnmanagedType-Argument 以适合您的字符串编码。

.net - VB.NET 二进制文件中的 0x00

3 回答 3

Related

Reference