2

我是新手jq,我有以下代码来获取每个名为的元素的值列表Abc

["Abc"], ( .. | objects | select(has("Abc")) | [.["Abc"]] ) | @tsv

这是我得到的当前输出:

"Abc"
"4"
"2"
"1"
"9"
"3"
"2"
"4"
"9"

我想在左侧添加 4 列,以显示每个Abc值对应的页面、行和列。此外,如果可能,作为第一列添加一个从 1 到“Abc”元素数量的计数器。

下面我展示了当前的输出,与所需的输出和 Json 文件的结构进行了比较,以便澄清: 在此处输入图像描述

输入的 Json 文件如下:

{
  "document": {
    "page": [
      {
        "@index": "0",
        "image": {
          "Abc": "4"
        }
      },
      {
        "@index": "1",
        "row": [
          {
            "column": [
              {
                "text": {
                  "Abc": "2"
                }
              }
            ]
          },
          {
            "column": [
              {
                "text": {
                  "Abc": "1"
                }
              },
              {
                "text": {
                  "Abc": "9"
                }
              }
            ]
          },
          {
            "column": [
              {
                "text": {
                  "Abc": "3"
                }
              }
            ]
          }
        ]
      },
      {
        "@index": "2",
        "row": [
          {
            "column": [
              {
                "text": {
                  "Abc": "2"
                }
              }
            ]
          },
          {
            "column": [
              {
                "text": {
                  "Abc": "4"
                }
              },
              {
                "text": {
                  "Abc": "9"
                }
              }
            ]
          }
        ]
      }
    ]
  }
}

我希望有人能帮助我。提前致谢。

4

2 回答 2

1

输入数据的不规则性使要求有点不透明,但以下会产生所需的输出。

["counter", "page", "row", "column", "Abc"],
(foreach (.document.page[] | objects) as $page ({page: -1, counter: 0};
  .page += 1
  | if ($page | (has("image") and (.image|has("Abc"))))
    then
      .counter +=1
      | .out = [.counter, .page, null, null, ($page|.image.Abc)]
    else foreach ($page | .row[]?) as $row (.row=-1;
      .row += 1
      | foreach ($row | .column[]) as $column (.column=-1;
          .column +=1
          | foreach ($column | .text | objects) as $x (.;
              .counter += 1
              | .out = [.counter, .page, .row, .column, $x["Abc"]]
              ; . )
           ; . )
      ; . )
    end
    ; .out )
)
| @tsv

输出

具体来说,使用 -r 命令行选项,从给定输入生成的输出如下(包括选项卡):

counter page    row column  Abc
1   0           4
2   1   0   0   2
3   1   1   0   1
4   1   1   1   9
5   1   2   0   3
6   2   0   0   2
7   2   1   0   4
8   2   1   1   9
于 2019-05-31T00:19:09.913 回答
1

以下解决方案使用paths并具有几个优点,包括简洁、简单,并且可以轻松适应不同格式的句柄数据。

为了清楚起见,我们首先定义一个添加行号的函数:

# add a sequential id, starting at 1
def tsvRows(s):
  foreach s as $s (0; .+1; [.] + $s)
  | @tsv;

(["counter", "page", "row", "column", "Abc"] | @tsv),
tsvRows(paths as $p
  | select($p[-1] == "Abc")
  | getpath($p) as $v
  | $p
  | .[2] as $page
  | (if .[3] == "row" then .[4] else null end) as $row
  | (if .[5] == "column" then .[6] else null end) as $column
  | [$page, $row, $column, $v] )
于 2019-05-31T02:54:07.000 回答