0

我计划使用 Java 来处理 Markdown 文本文件,这些文件在 YAML 格式的文档开头指定其他元信息,如标题、作者、创建日期等。这是一个例子:

---
title: An example document
author: Paul
created: 2013-05-19
---

The _body_ of this document is
written in **Markdown**.

为了解析 YAML 数据,我可以使用snakeyaml。据我所知,您可以通过方法和从 a java.io.InputStream、 ajava.io.Reader或 a加载 YAML 文档(请参阅SnakeYAML 文档API)。Stringyaml.load()yaml.loadAll()

我不想使用从 a 读取的版本String,因为这会导致大文件出现性能问题。但是将文件作为InputStream失败处理,因为流不代表有效的 YAML 文档。只有流的第一部分表示有效文档。

所以我的问题是:我如何使用java.io.FilterInputStream/java.io.FilterReader或其他方法来生成一个流,该流在第二个之后停止,---所以整个流是有效的 YAML?

4

2 回答 2

1

在您希望 YAML 解析器停止的位置添加“...”(三个点)。

于 2013-05-21T12:32:49.590 回答
0

这是我的解决方案(Scala 代码):

import java.io.InputStreamReader
import java.io.InputStream
import java.nio.charset.Charset

import scala.collection.mutable.Queue

/**
 * Reader for Metadata that is contained in the given `InputStream`.
 *
 * @constructor Create a new metadata reader with a given `Charset`.
 * @param in underlying input stream
 * @param charset encoding of the stream
 */
class MetadataReader(in: InputStream, charset: Charset)
    extends InputStreamReader(in, charset) {
  private val lookahead = Queue.empty[Int] // buffer for looking ahead
  private var afterNewline = true // indicates that the last char was a newline
  private var divider = 0 // number of divider characters in a row ('-')

  /**
   * Create new MetadataReader with the systems default `Charset`.
   *
   * @param in underlying input stream
   */
  def this(in: InputStream) = this(in, Charset.defaultCharset())

  /**
   * Read the next character.
   *
   * @return next character
   */
  override def read: Int =
    if (divider == 2) {
      -1
    } else if (!lookahead.isEmpty) {
      lookahead.dequeue
    } else {

      // read next character
      def readNext: Int =
        if (lookahead.length == 3) {
          divider += 1
          read
        } else {
          val c = super.read
          if (c == '-') {
            lookahead.enqueue(c)
            readNext
          } else {
            lookahead.enqueue(c)
            lookahead.dequeue
          }
        }

      readNext
    }

  /**
   * Read characters into a buffer character array.
   *
   * @param buf buffer array
   * @param off offset to start in the array
   * @param len number of characters to read
   * @return actually read characters
   */
  override def read(buf: Array[Char], off: Int, len: Int): Int = {
    var j = 0
    for (i <- 0 until len) {
      val c = read

      if (c == -1)
        return j

      if (i >= off) {
        buf(i) = c.toChar
        j += 1
      }
    }

    j
  }
}

你可以这样使用它:

val yaml = new Yaml
val mr = new MetadataReader(new FileInputStream(
  new File("src/test/resources/yaml-test.txt")), Charset.forName("UTF-8"))
println(yaml.load(mr))
mis.close()

反馈表示赞赏。

于 2013-05-20T15:03:35.917 回答