我正在使用 Azure 认知服务计算机视觉 API,但我遇到了知道如何处理结果的问题。用例是我有一张图像,它是特定月份事件日历的照片。我正在通过计算机视觉 API OCR 方法运行图像
并获得一个 JSON,它是区域、线条和单词的对象,每个都有边界。我很难找到将这些项目“分组”为所需格式的方法。这是为此图像返回的示例 JSON
https://drive.google.com/file/d/12dO0vIjGNQ8_nARTQbFHmaLNQNOWBC2x/view?usp=sharing
{
"textAngle": 0.0,
"orientation": "NotDetected",
"language": "en",
"regions": [
{
"boundingBox": "727,56,1692,119",
"lines": [
{
"boundingBox": "727,56,1692,119",
"words": [
{
"boundingBox": "727,57,727,118",
"text": "CHILDREN!S"
},
{
"boundingBox": "1576,58,583,111",
"text": "JANUARY"
},
{
"boundingBox": "2280,56,139,114",
"text": "20"
}
]
}
]
},
{
"boundingBox": "361,265,159,42",
"lines": [
{
"boundingBox": "361,265,159,42",
"words": [
{
"boundingBox": "361,265,159,42",
"text": "Sunday"
}
]
}
]
},
{
"boundingBox": "279,593,298,1261",
"lines": [
{
"boundingBox": "279,593,17,26",
"words": [
{
"boundingBox": "279,593,17,26",
"text": "7"
}
]
},
{
"boundingBox": "280,633,203,33",
"words": [
{
"boundingBox": "280,633,102,33",
"text": "Library"
},
{
"boundingBox": "394,634,89,32",
"text": "Open"
}
]
},
{
"boundingBox": "282,675,124,32",
"words": [
{
"boundingBox": "282,675,7,26",
"text": "1"
},
{
"boundingBox": "307,675,37,26",
"text": "-5"
},
{
"boundingBox": "356,681,50,26",
"text": "pm"
}
]
},
{
"boundingBox": "280,716,252,31",
"words": [
{
"boundingBox": "280,716,71,25",
"text": "New"
},
{
"boundingBox": "360,716,73,25",
"text": "Year"
},
{
"boundingBox": "444,716,88,31",
"text": "Open"
}
]
},
{
"boundingBox": "281,757,96,26",
"words": [
{
"boundingBox": "281,757,96,26",
"text": "House"
}
]
},
{
"boundingBox": "280,797,297,27",
"words": [
{
"boundingBox": "280,797,67,27",
"text": "Start"
},
{
"boundingBox": "357,797,55,26",
"text": "The"
},
{
"boundingBox": "424,797,71,26",
"text": "New"
},
{
"boundingBox": "503,797,74,26",
"text": "Year"
}
]
},
{
"boundingBox": "281,836,286,34",
"words": [
{
"boundingBox": "281,837,77,33",
"text": "Right"
},
{
"boundingBox": "367,837,25,26",
"text": "@"
},
{
"boundingBox": "401,837,51,26",
"text": "the"
},
{
"boundingBox": "463,836,104,33",
"text": "Library"
}
]
},
{
"boundingBox": "281,878,110,32",
"words": [
{
"boundingBox": "281,878,48,26",
"text": "1-5"
},
{
"boundingBox": "341,885,50,25",
"text": "pm"
}
]
},
{
"boundingBox": "282,976,34,25",
"words": [
{
"boundingBox": "282,976,34,25",
"text": "14"
}
]
},
{
"boundingBox": "281,1034,223,33",
"words": [
{
"boundingBox": "281,1034,103,33",
"text": "Library"
},
{
"boundingBox": "395,1034,109,26",
"text": "Closed"
}
]
}
]
}
]
}
例如,我希望能够按日期对所有最低级别的单词进行分组
7:图书馆开放 1 -5 pm,新年开放日 开始新的一年@图书馆 1-5 pm 14:图书馆关闭
是否有任何好的算法来做这样的事情,或者是暴力破解(检查每个文本的日期并在其间获取数组中的跨度)唯一的方法?如果有帮助,我可以将示例图像放在某个地方。