0

文本字符串中有不可见的连续点 (..)。如果我将文件视为二进制文件,我只能看到它们。如果我用 Vim 打开它,我看不到它们。这些点搞砸了进一步的解析,我想替换这些隐藏的点。将

 replaceAll("\\.","dot") 

处理隐形字符?

更新:“ATA ..Buffer”的十六进制转储为 41 54 41 20 20 20 20 09 09 42 75 66 66 65 72 20
我认为 09 是“点”的十六进制值

4

2 回答 2

2
41 54 41 20 20 20 20 09 09 42 75 66 66 65 72 20

假设纯 ASCII(在这种情况下与假设 UTF-8 相同),作为 Java 字符串,这是

"ATA    \t\tBuffer "

请注意,\t代表水平制表符。

于 2013-08-04T09:15:24.817 回答
0

我使用以下两种方法,toPrintable将原始字符串转换为可打印字符串,fromPrintable将其转换回来。

我在转换中包含了百分号,因为有时我可能希望将转换后的字符串用作格式字符串的一部分,这样可以防止原始百分号与格式化百分号混淆。

/**
 * Converts a string containing control characters to a printable string.
 * Control characters are replaced by \hh, were hh is the hexadecimal
 * representation. The backslash and percent sign are also converted to
 * hexadecimal.
 * 
 * @param raw
 *            The input string to be converted.
 * 
 * @return a string representing this instance.
 */
public static String toPrintable(final String raw) {
    final StringBuilder sb = new StringBuilder();

    if (raw == null) {
        return "";
    }

    for (final char c : raw.toCharArray()) {
        if ((c <= 31) || (c == 127) || (c == '\\') || (c == '%')) {
            sb.append(String.format("\\%02X", (int) c));
        } else {
            sb.append(c);
        }
    }

    /*
     * If the last character is a space, convert it to hexadecimal, to avoid
     * loosing it.
     */
    if (raw.endsWith(" ")) {
        sb.setLength(sb.length() - 1);
        sb.append("\\20");
    }

    return sb.toString();
}

/**
 * Converts a string containing coded control characters to the original
 * string. Control characters are represented by \hh, were hh is the
 * hexadecimal representation. The backslash is also represented as
 * hexadecimal.
 * 
 * @param t
 *            The converted string to be restored.
 * @return The original string.
 */
public static String fromPrintable(final String t) {
    final StringBuilder sb = new StringBuilder();

    final int tLength = t.length();
    boolean error = false;

    for (int i = 0; i < tLength; i++) {
        if (t.charAt(i) == '\\') {
            if ((i + 1) < tLength) {
                if (t.charAt(i + 1) == '\\') {
                    sb.append(t.charAt(i++));
                } else {
                    if (i < (tLength - 2)) {
                        final int v1 = validHexDigits.indexOf(t
                                .charAt(i + 1));
                        final int v2 = validHexDigits.indexOf(t
                                .charAt(i + 2));
                        i += 2;
                        if ((v1 < 0) || (v2 < 0)) {
                            error = true;
                        } else {
                            final char cc = (char) ((validHexValues[v1] << 4) + validHexValues[v2]);
                            sb.append(cc);
                        }
                    } else {
                        error = true;
                    }
                }
            } else {
                error = true;
            }
        } else {
            sb.append(t.charAt(i));
        }
    }

    if (error) {
        log.warn("fromPrintable: Invalid input [%s]", t);
    }
    return sb.toString();
}
于 2013-08-04T19:55:03.590 回答