image-processing - 来自边界框的 Google Vision Api 文本检测布局信息

Question

我正在尝试使用 google vision api 对我的图像执行 OCR。API 调用的 Json 输出返回带有边界框信息的已识别单词。

有人可以告诉我如何使用这个边界框信息对我的图像进行布局分析吗？

如果有一个库将其作为输入并返回句子而不是单词？

{
  "description": "Ingredients:",
  "boundingPoly": {
    "vertices": [
      {
        "x": 14,
        "y": 87
      },
      {
        "x": 53,
        "y": 87
      },
      {
        "x": 53,
        "y": 98
      },
      {
        "x": 14,
        "y": 98
      }
    ]
  }
},
{
  "description": "Chicken",
  "boundingPoly": {
    "vertices": [
      {
        "x": 55,
        "y": 87
      },
      {
        "x": 77,
        "y": 87
      },
      {
        "x": 77,
        "y": 98
      },
      {
        "x": 55,
        "y": 98
      }
    ]
  }
},

例如，在上面的 json 中，“Ingredients:”“Chicken”这两个词在同一行。是否有图书馆可以开箱即用地为我提供这些信息？

用于 OCR源图像的图像

score 1 · Accepted Answer

尝试使用 Word 或任何其他可让您旋转的工具旋转图像。在我的情况下，它产生了正确的答案是连续读取行中的所有内容。

score 1 · Accepted Answer

有几个客户端库可用于获取句子而不是单词。github 中也有官方可用的示例。例如，您可以在此处查看go 语言示例文件。detect.go包含按块输出文本的下一个函数：

    // detectDocumentText gets the full document text from the Vision API for an image at the given file path.
func detectDocumentTextURI(w io.Writer, file string) error {
    ctx := context.Background()

    client, err := vision.NewImageAnnotatorClient(ctx)
    if err != nil {
        return err
    }

    image := vision.NewImageFromURI(file)
    annotation, err := client.DetectDocumentText(ctx, image, nil)
    if err != nil {
        return err
    }

    if annotation == nil {
        fmt.Fprintln(w, "No text found.")
    } else {
        fmt.Fprintln(w, "Document Text:")
        fmt.Fprintf(w, "%q\n", annotation.Text)

        fmt.Fprintln(w, "Pages:")
        for _, page := range annotation.Pages {
            fmt.Fprintf(w, "\tConfidence: %f, Width: %d, Height: %d\n", page.Confidence, page.Width, page.Height)
            fmt.Fprintln(w, "\tBlocks:")
            for _, block := range page.Blocks {
                fmt.Fprintf(w, "\t\tConfidence: %f, Block type: %v\n", block.Confidence, block.BlockType)
                fmt.Fprintln(w, "\t\tParagraphs:")
                for _, paragraph := range block.Paragraphs {
                    fmt.Fprintf(w, "\t\t\tConfidence: %f", paragraph.Confidence)
                    fmt.Fprintln(w, "\t\t\tWords:")
                    for _, word := range paragraph.Words {
                        symbols := make([]string, len(word.Symbols))
                        for i, s := range word.Symbols {
                            symbols[i] = s.Text
                        }
                        wordText := strings.Join(symbols, "")
                        fmt.Fprintf(w, "\t\t\t\tConfidence: %f, Symbols: %s\n", word.Confidence, wordText)
                    }
                }
            }
        }
    }

    return nil
}

image-processing - 来自边界框的 Google Vision Api 文本检测布局信息

2 回答 2

Related

Reference