文本字符串中有不可见的连续点 (..)。如果我将文件视为二进制文件,我只能看到它们。如果我用 Vim 打开它,我看不到它们。这些点搞砸了进一步的解析,我想替换这些隐藏的点。将
replaceAll("\\.","dot")
处理隐形字符?
更新:“ATA ..Buffer”的十六进制转储为 41 54 41 20 20 20 20 09 09 42 75 66 66 65 72 20
我认为 09 是“点”的十六进制值
文本字符串中有不可见的连续点 (..)。如果我将文件视为二进制文件,我只能看到它们。如果我用 Vim 打开它,我看不到它们。这些点搞砸了进一步的解析,我想替换这些隐藏的点。将
replaceAll("\\.","dot")
处理隐形字符?
更新:“ATA ..Buffer”的十六进制转储为 41 54 41 20 20 20 20 09 09 42 75 66 66 65 72 20
我认为 09 是“点”的十六进制值
41 54 41 20 20 20 20 09 09 42 75 66 66 65 72 20
假设纯 ASCII(在这种情况下与假设 UTF-8 相同),作为 Java 字符串,这是
"ATA \t\tBuffer "
请注意,\t
代表水平制表符。
我使用以下两种方法,toPrintable将原始字符串转换为可打印字符串,fromPrintable将其转换回来。
我在转换中包含了百分号,因为有时我可能希望将转换后的字符串用作格式字符串的一部分,这样可以防止原始百分号与格式化百分号混淆。
/**
* Converts a string containing control characters to a printable string.
* Control characters are replaced by \hh, were hh is the hexadecimal
* representation. The backslash and percent sign are also converted to
* hexadecimal.
*
* @param raw
* The input string to be converted.
*
* @return a string representing this instance.
*/
public static String toPrintable(final String raw) {
final StringBuilder sb = new StringBuilder();
if (raw == null) {
return "";
}
for (final char c : raw.toCharArray()) {
if ((c <= 31) || (c == 127) || (c == '\\') || (c == '%')) {
sb.append(String.format("\\%02X", (int) c));
} else {
sb.append(c);
}
}
/*
* If the last character is a space, convert it to hexadecimal, to avoid
* loosing it.
*/
if (raw.endsWith(" ")) {
sb.setLength(sb.length() - 1);
sb.append("\\20");
}
return sb.toString();
}
/**
* Converts a string containing coded control characters to the original
* string. Control characters are represented by \hh, were hh is the
* hexadecimal representation. The backslash is also represented as
* hexadecimal.
*
* @param t
* The converted string to be restored.
* @return The original string.
*/
public static String fromPrintable(final String t) {
final StringBuilder sb = new StringBuilder();
final int tLength = t.length();
boolean error = false;
for (int i = 0; i < tLength; i++) {
if (t.charAt(i) == '\\') {
if ((i + 1) < tLength) {
if (t.charAt(i + 1) == '\\') {
sb.append(t.charAt(i++));
} else {
if (i < (tLength - 2)) {
final int v1 = validHexDigits.indexOf(t
.charAt(i + 1));
final int v2 = validHexDigits.indexOf(t
.charAt(i + 2));
i += 2;
if ((v1 < 0) || (v2 < 0)) {
error = true;
} else {
final char cc = (char) ((validHexValues[v1] << 4) + validHexValues[v2]);
sb.append(cc);
}
} else {
error = true;
}
}
} else {
error = true;
}
} else {
sb.append(t.charAt(i));
}
}
if (error) {
log.warn("fromPrintable: Invalid input [%s]", t);
}
return sb.toString();
}