3

我想知道是否有一种“正确”的方法来使用 Jackson 解析 JSON 文件,其中 JSON 文件包含一个巨大的属性,而无需将整个流加载到内存中。我需要保持低内存,因为它是一个 Android 应用程序。我不是在这里问如何使用Android:解析大型 JSON 文件,而是一个属性非常大,而其他属性无关紧要。

例如,假设我有以下内容:

{
    "filename": "afilename.jpg",
    "data": "**Huge data here, about 20Mb base64 string**",
    "mime": "mimeType",
    "otherProperties": "..."
}

如果需要(通过输出流或其他含义),可以将数据属性提取到一个新文件中,但我无法使用 Jackson 来实现这一点。我愿意使用其他库我只是认为杰克逊将是理想的,这要归功于它的流 API。

谢谢

4

2 回答 2

3

最后,我设法像这样恢复我的大量数据,in我想从中解析数据的 json 文件的输入流在哪里,并且是我要将数据out写入的文件:

public boolean extrationContenuDocument(FileInputStream in, FileOutputStream out, FileInfo info) 
throws JsonParseException, IOException {

    SerializedString keyDocContent = new SerializedString("data");
    boolean isDone = false;

    JsonParser jp = this.jsonFactory.createJsonParser(in);

    // Let's move our inputstream cursor until the 'data' property is found
    while (!jp.nextFieldName(keyDocContent)) {
        Log.v("Traitement JSON", "Searching for 'data' property ...");
    }

    // Found it? Ok, move the inputstream cursor until the begining of it's
    // content
    JsonToken current = jp.nextToken();

    // if the current token is not String value it means u didn't found the
    // 'data' property or it's content is not a correct => stop
    if (current == JsonToken.VALUE_STRING) {
        Log.v("Traitement JSON", "Property 'data' found");

        // Here it gets a little tricky cause if the file is not big enough
        // all the content of the 'data' property could be read directly
        // insted of using this
        if (info.getSize() > TAILLE_MIN_PETIT_FICHER) {
            Log.v("Traitement JSON", "the content of 'data' is too big to be read directly -> using buffered reading");

            // JsonParser uses a buffer to read, there is some data that
            // could have been read by it, i need to fetch it
            ByteArrayOutputStream debutDocStream = new ByteArrayOutputStream();
            int premierePartieRead = jp.releaseBuffered(debutDocStream);
            byte[] debutDoc = debutDocStream.toByteArray();

            // Write the head of the content of the 'data' property, this is
            // actually what as read from the inputstream by the JsonParser
            // when did jp.nextToken()
            Log.v("Traitement JSON", "Write the head");
            out.write(debutDoc);

            // Now we need to write the rest until we find the tail of the
            // content of the 'data' property
            Log.v("Traitement JSON", "Write the middle");

            // So i prepare a buffer to continue reading the inputstream
            byte[] buffer = new byte[TAILLE_BUFFER_GROS_FICHER];

            // The escape char that determines where to stop reading will be "
            byte endChar = (byte) '"';

            // Fetch me some bytes from the inputstream
            int bytesRead = in.read(buffer);
            int bytesBeforeEndChar = 0;

            int deuxiemePartieRead = 0;
            boolean isDocContentFin = false;

            // Are we at the end of the 'data' property? Keep writing the
            // content of the 'data' property if it's not the case
            while ((bytesRead > 0) && !isDocContentFin) {
                bytesBeforeEndChar = 0;

                // Since am using a buffer the escape char could be in the
                // middle of it, gotta look if it is
                for (byte b : buffer) {
                    if (b != endChar) {
                        bytesBeforeEndChar++;
                    } else {
                        isDocContentFin = true;
                        break;
                    }
                }

                if (bytesRead > bytesBeforeEndChar) {
                    Log.v("Traitement JSON", "Write the tail");
                    out.write(buffer, 0, bytesBeforeEndChar);
                    deuxiemePartieRead += bytesBeforeEndChar;
                } else {
                    out.write(buffer, 0, bytesRead);
                    deuxiemePartieRead += bytesRead;
                }

                bytesRead = in.read(buffer);
            }

            Log.v("Traitement JSON", "Bytes read: " + (premierePartieRead + deuxiemePartieRead) + " (" + premierePartieRead + " head,"
                    + deuxiemePartieRead + " tail)");
            isDone = true;
        } else {
            Log.v("Traitement JSON", "File is small enough to be read directly");
            String contenuFichier = jp.getText();
            out.write(contenuFichier.getBytes());
            isDone = true;
        }
    } else {
        throw new JsonParseException("The property " + keyDocContent.getValue() + " couldn't be found in the Json Stream.", null);
    }
    jp.close();

    return isDone;
}

它不漂亮,但就像一个魅力!@staxman 让我知道你的想法。

编辑 :


现在这是一个已实现的功能,请参阅:https ://github.com/FasterXML/jackson-core/issues/14 和JsonParser.readBinaryValue()

于 2012-07-18T16:18:46.890 回答
1

编辑:这不是这个问题的好答案——如果子树是要绑定的对象,它会起作用,但当问题是单个大的 Base64 编码字符串时则不会。


如果我正确理解了这个问题,是的,如果您的输入由一系列 JSON 对象或数组组成,您可以增量读取文件,但仍然可以进行数据绑定。

如果是这样,您可以使用JsonParser推进流以指向第一个对象(其 START_OBJECT 标记),然后使用JsonParser( JsonParser.readValueAs()) 或ObjectMapper( ObjectMapper.readValue(JsonParser, type)) 中的数据绑定方法。

就像是:

ObjectMapper mapper = new ObjectMapper();
JsonParser jp = mapper.getJsonFactory().createJsonParser(new File("file.json"));
while (jp.nextToken() != null) {
   MyPojo pojo = jp.readValueAs(MyPojo.class);
   // do something
}

(注意:根据 JSON 的确切结构,您可能需要跳过一些元素——在调用 readValueAs() 时,解析器必须收到启动 JSON 对象绑定的 START_ELEMENT)。

readValues或者,更简单的是,您可以使用以下方法ObjectReader

ObjectReader r = mapper.reader(MyPojo.class);
MappingIterator<MyPojo> it = r.readValues(new File("file.json"));
while (it.hasNextValue()) {
   MyPojo pojo. = it.nextValue();
  // do something with it
}

在这两种情况下,Jackson 数据绑定器仅读取生成单个对象(MyPojo 或您拥有的任何类型)所需的尽可能多的 JSON 令牌。JsonParser本身只需要足够的内存来包含有关单个 JSON 令牌的信息。

于 2012-07-16T17:47:12.787 回答