我一直在尝试获取每周菜单 pdf 并将其分成网格框以进行裁剪,然后使用TesseractOCR进行 OCR 。
我已经看到lineJunctions在这里可能会有所帮助,但无法在 imagemagick php 文档中找到它们。我还在一个类似的 stackoverflow question 中看到了 Hough Lines,但再次无法在 php 文档中找到它们。
//read the image
$im = new Imagick();
$im->readimage('menu.png');
//resize and contrast
$im->resizeImage($im->getImageWidth()/6, $im->getImageHeight()/6 , 9, 1);
$im->thresholdImage( 0.65 * Imagick::getQuantum() );;
//remove "noise"
//this is done by creating two new images where only horizontal lines, then vertical are preserved using morphology and then combined into one
$horizontalLines = clone $im;
$verticalLines = clone $im;
$horizontalLineKernel = \ImagickKernel::fromBuiltIn(\Imagick::KERNEL_RECTANGLE, "19x1");
$horizontalLines->morphology(\Imagick::MORPHOLOGY_CLOSE, 1, $horizontalLineKernel);
$verticalLineKernel = \ImagickKernel::fromBuiltIn(\Imagick::KERNEL_RECTANGLE, "1x15");
$verticalLines->morphology(\Imagick::MORPHOLOGY_CLOSE, 1, $verticalLineKernel);
$horizontalLines->compositeimage($verticalLines, 5, 0, 0);
$im = clone $horizontal;
$horizontalLines->clear();
$horizontalLines->destroy();
$verticalLines->clear();
$verticalLines->destroy();
// Create boxes at corners
// These are at points from which I intent to create the individual grid boxes
$plusKernel = \ImagickKernel::fromBuiltIn(\Imagick::KERNEL_PLUS, "4");
$im->morphology(\Imagick::MORPHOLOGY_OPEN, 1, $plusKernel);
$squareKernel = \ImagickKernel::fromBuiltIn(\Imagick::KERNEL_SQUARE, "2");
$im->morphology(\Imagick::MORPHOLOGY_CLOSE, 1, $squareKernel);
通过这样做,我最终得到了一个带有框的图像,如果我可以获得 ax,y,width 和 height,我应该能够获得坐标,但是它错过了右下角并且非常混乱。我确信必须有更好的方法。
图像被缩小,然后我计划将坐标放大 6 倍,如$im->resizeImage()
. 有没有更好的方法可以解决这个问题?