这个类应该检查currentFile
和检测编码。如果结果是 UTF-8 return true
。
运行后的输出是 - java.lang.OutOfMemoryError: Java heap space
。
对于读取数据,您需要为此使用JDK 7Files.readAllBytes(path)
代码:
class EncodingsCheck implements Checker {
@Override
public boolean check(File currentFile) {
return isUTF8(currentFile);
}
public static boolean isUTF8(File file) {
// validate input
if (null == file) {
throw new IllegalArgumentException("input file can't be null");
}
if (file.isDirectory()) {
throw new IllegalArgumentException(
"input file refers to a directory");
}
// read input file
byte[] buffer;
try {
buffer = readUTFHeaderBytes(file);
} catch (IOException e) {
throw new IllegalArgumentException(
"Can't read input file, error = " + e.getLocalizedMessage());
}
if (0 == (buffer[0] & 0x80)) {
return true; // ASCII subset character, fast path
} else if (0xF0 == (buffer[0] & 0xF8)) { // start of 4-byte sequence
if (buffer[3] >= buffer.length) {
return false;
}
if ((0x80 == (buffer[1] & 0xC0)) && (0x80 == (buffer[2] & 0xC0))
&& (0x80 == (buffer[3] & 0xC0)))
return true;
} else if (0xE0 == (buffer[0] & 0xF0)) { // start of 3-byte sequence
if (buffer[2] >= buffer.length) {
return false;
}
if ((0x80 == (buffer[1] & 0xC0)) && (0x80 == (buffer[2] & 0xC0))) {
return true;
}
} else if (0xC0 == (buffer[0] & 0xE0)) { // start of 2-byte sequence
if (buffer[1] >= buffer.length) {
return false;
}
if (0x80 == (buffer[1] & 0xC0)) {
return true;
}
}
return false;
}
private static byte[] readUTFHeaderBytes(File input) throws IOException {
// read data
Path path = Paths.get(input.getAbsolutePath());
byte[] data = Files.readAllBytes(path);
return data;
}
}
问题:
- 如何解决这个问题?
- 如何以这种方式检查 UTF-16(我们需要担心这个或这只是无用的麻烦)?