image-processing - 透视变形矩形的比例

Question

给定一张被透视扭曲的矩形的 2d 图片：

在此处输入图像描述

我知道这个形状最初是一个矩形，但我不知道它的原始大小。

如果我知道这张图片中各个角的像素坐标，如何计算出原来的比例，即矩形的商（宽/高）？

（背景：目标是自动去扭曲矩形文档的照片，边缘检测可能会用霍夫变换完成）

更新：

关于是否有可能根据给出的信息确定宽度：高度比，已经进行了一些讨论。我天真的想法是它一定是可能的，因为我想不出办法将例如一个 1:4 的矩形投影到上面描述的四边形上。该比率显然接近 1:1，因此应该有一种数学方法来确定它。然而，除了我的直觉猜测之外，我没有证据证明这一点。

我还没有完全理解下面提出的论点，但我认为必须有一些隐含的假设，即我们在这里遗漏了，并且对此有不同的解释。

但是，经过几个小时的搜索，我终于找到了一些与该问题相关的论文。我正在努力理解那里使用的数学，到目前为止没有成功。特别是第一篇论文似乎准确地讨论了我想要做什么，不幸的是没有代码示例和非常密集的数学。

张正友，何立伟，“白板扫描与图像增强” http://research.microsoft.com/en-us/um/people/zhang/papers/tr03-39.pdf p.11

“由于透视失真，矩形的图像看起来像是一个四边形。但是，由于我们知道它是空间中的矩形，因此我们能够估计相机的焦距和矩形的纵横比。”
ROBERT M. HARALICK “从矩形的透视投影确定相机参数” http://portal.acm.org/citation.cfm?id=87146

“我们展示了如何使用 3D 空间中未知大小和位置的矩形的 2D 透视投影来确定相对于矩形平面的相机视角参数。”

score 29 · Accepted Answer

这是我在阅读论文后尝试回答我的问题

张正友，何立伟，“白板扫描与图像增强” http://research.microsoft.com/en-us/um/people/zhang/papers/tr03-39.pdf

我在 SAGE 中处理了一段时间的方程，并提出了这个 c 风格的伪代码：


// in case it matters: licensed under GPLv2 or later
// legend:
// sqr(x)  = x*x
// sqrt(x) = square root of x

// let m1x,m1y ... m4x,m4y be the (x,y) pixel coordinates
// of the 4 corners of the detected quadrangle
// i.e. (m1x, m1y) are the cordinates of the first corner, 
// (m2x, m2y) of the second corner and so on.
// let u0, v0 be the pixel coordinates of the principal point of the image
// for a normal camera this will be the center of the image, 
// i.e. u0=IMAGEWIDTH/2; v0 =IMAGEHEIGHT/2
// This assumption does not hold if the image has been cropped asymmetrically

// first, transform the image so the principal point is at (0,0)
// this makes the following equations much easier
m1x = m1x - u0;
m1y = m1y - v0;
m2x = m2x - u0;
m2y = m2y - v0;
m3x = m3x - u0;
m3y = m3y - v0;
m4x = m4x - u0;
m4y = m4y - v0;


// temporary variables k2, k3
double k2 = ((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x) /
            ((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) ;

double k3 = ((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x) / 
            ((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) ;

// f_squared is the focal length of the camera, squared
// if k2==1 OR k3==1 then this equation is not solvable
// if the focal length is known, then this equation is not needed
// in that case assign f_squared= sqr(focal_length)
double f_squared = 
    -((k3*m3y - m1y)*(k2*m2y - m1y) + (k3*m3x - m1x)*(k2*m2x - m1x)) / 
                      ((k3 - 1)*(k2 - 1)) ;

//The width/height ratio of the original rectangle
double whRatio = sqrt( 
    (sqr(k2 - 1) + sqr(k2*m2y - m1y)/f_squared + sqr(k2*m2x - m1x)/f_squared) /
    (sqr(k3 - 1) + sqr(k3*m3y - m1y)/f_squared + sqr(k3*m3x - m1x)/f_squared) 
) ;

// if k2==1 AND k3==1, then the focal length equation is not solvable 
// but the focal length is not needed to calculate the ratio.
// I am still trying to figure out under which circumstances k2 and k3 become 1
// but it seems to be when the rectangle is not distorted by perspective, 
// i.e. viewed straight on. Then the equation is obvious:
if (k2==1 && k3==1) whRatio = sqrt( 
    (sqr(m2y-m1y) + sqr(m2x-m1x)) / 
    (sqr(m3y-m1y) + sqr(m3x-m1x))


// After testing, I found that the above equations 
// actually give the height/width ratio of the rectangle, 
// not the width/height ratio. 
// If someone can find the error that caused this, 
// I would be most grateful.
// until then:
whRatio = 1/whRatio;

更新：这里是这些方程是如何确定的：

以下是SAGE中的代码。可通过http://www.sagenb.org/home/pub/704/在线访问。（Sage 在求解方程方面非常有用，并且可以在任何浏览器中使用，请查看）

# CALCULATING THE ASPECT RATIO OF A RECTANGLE DISTORTED BY PERSPECTIVE

#
# BIBLIOGRAPHY:
# [zhang-single]: "Single-View Geometry of A Rectangle 
#  With Application to Whiteboard Image Rectification"
#  by Zhenggyou Zhang
#  http://research.microsoft.com/users/zhang/Papers/WhiteboardRectification.pdf

# pixel coordinates of the 4 corners of the quadrangle (m1, m2, m3, m4)
# see [zhang-single] figure 1
m1x = var('m1x')
m1y = var('m1y')
m2x = var('m2x')
m2y = var('m2y')
m3x = var('m3x')
m3y = var('m3y')
m4x = var('m4x')
m4y = var('m4y')

# pixel coordinates of the principal point of the image
# for a normal camera this will be the center of the image, 
# i.e. u0=IMAGEWIDTH/2; v0 =IMAGEHEIGHT/2
# This assumption does not hold if the image has been cropped asymmetrically
u0 = var('u0')
v0 = var('v0')

# pixel aspect ratio; for a normal camera pixels are square, so s=1
s = var('s')

# homogenous coordinates of the quadrangle
m1 = vector ([m1x,m1y,1])
m2 = vector ([m2x,m2y,1])
m3 = vector ([m3x,m3y,1])
m4 = vector ([m4x,m4y,1])


# the following equations are later used in calculating the the focal length 
# and the rectangle's aspect ratio.
# temporary variables: k2, k3, n2, n3

# see [zhang-single] Equation 11, 12
k2_ = m1.cross_product(m4).dot_product(m3) / m2.cross_product(m4).dot_product(m3)
k3_ = m1.cross_product(m4).dot_product(m2) / m3.cross_product(m4).dot_product(m2)
k2 = var('k2')
k3 = var('k3')

# see [zhang-single] Equation 14,16
n2 = k2 * m2 - m1
n3 = k3 * m3 - m1


# the focal length of the camera.
f = var('f')
# see [zhang-single] Equation 21
f_ = sqrt(
         -1 / (
          n2[2]*n3[2]*s^2
         ) * (
          (
           n2[0]*n3[0] - (n2[0]*n3[2]+n2[2]*n3[0])*u0 + n2[2]*n3[2]*u0^2
          )*s^2 + (
           n2[1]*n3[1] - (n2[1]*n3[2]+n2[2]*n3[1])*v0 + n2[2]*n3[2]*v0^2
          ) 
         ) 
        )


# standard pinhole camera matrix
# see [zhang-single] Equation 1
A = matrix([[f,0,u0],[0,s*f,v0],[0,0,1]])


#the width/height ratio of the original rectangle
# see [zhang-single] Equation 20
whRatio = sqrt (
               (n2*A.transpose()^(-1) * A^(-1)*n2.transpose()) / 
               (n3*A.transpose()^(-1) * A^(-1)*n3.transpose())
              )

c代码中的简化方程由下式确定

print "simplified equations, assuming u0=0, v0=0, s=1"
print "k2 := ", k2_
print "k3 := ", k3_
print "f  := ", f_(u0=0,v0=0,s=1)
print "whRatio := ", whRatio(u0=0,v0=0,s=1)

    simplified equations, assuming u0=0, v0=0, s=1
    k2 :=  ((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y
    - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    k3 :=  ((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)/((m3y
    - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x)
    f  :=  sqrt(-((k3*m3y - m1y)*(k2*m2y - m1y) + (k3*m3x - m1x)*(k2*m2x
    - m1x))/((k3 - 1)*(k2 - 1)))
    whRatio :=  sqrt(((k2 - 1)^2 + (k2*m2y - m1y)^2/f^2 + (k2*m2x -
    m1x)^2/f^2)/((k3 - 1)^2 + (k3*m3y - m1y)^2/f^2 + (k3*m3x -
    m1x)^2/f^2))

print "Everything in one equation:"
print "whRatio := ", whRatio(f=f_)(k2=k2_,k3=k3_)(u0=0,v0=0,s=1)

    Everything in one equation:
    whRatio :=  sqrt(((((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
    m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
    m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x
    - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1y)^2/((((m1y - m4y)*m2x -
    (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
    m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
    m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
    + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
    m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
    - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    - m1x)) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
    m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
    m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)*m2x/((m2y - m4y)*m3x
    - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - m1x)^2/((((m1y - m4y)*m2x -
    (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
    m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
    m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
    + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
    m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
    - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    - m1x)) - (((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) -
    1)^2)/((((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
    m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
    m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x
    - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)^2/((((m1y - m4y)*m2x -
    (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
    m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
    m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
    + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
    m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
    - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    - m1x)) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y - m1y*m4x)/((m2y -
    m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x) - 1)*(((m1y -
    m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x
    - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) - m1x)^2/((((m1y - m4y)*m2x -
    (m1x - m4x)*m2y + m1x*m4y - m1y*m4x)*m3y/((m3y - m4y)*m2x - (m3x -
    m4x)*m2y + m3x*m4y - m3y*m4x) - m1y)*(((m1y - m4y)*m3x - (m1x -
    m4x)*m3y + m1x*m4y - m1y*m4x)*m2y/((m2y - m4y)*m3x - (m2x - m4x)*m3y
    + m2x*m4y - m2y*m4x) - m1y) + (((m1y - m4y)*m2x - (m1x - m4x)*m2y +
    m1x*m4y - m1y*m4x)*m3x/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y
    - m3y*m4x) - m1x)*(((m1y - m4y)*m3x - (m1x - m4x)*m3y + m1x*m4y -
    m1y*m4x)*m2x/((m2y - m4y)*m3x - (m2x - m4x)*m3y + m2x*m4y - m2y*m4x)
    - m1x)) - (((m1y - m4y)*m2x - (m1x - m4x)*m2y + m1x*m4y -
    m1y*m4x)/((m3y - m4y)*m2x - (m3x - m4x)*m2y + m3x*m4y - m3y*m4x) -
    1)^2))


# some testing:
# - choose a random rectangle, 
# - project it onto a random plane,
# - insert the corners in the above equations,
# - check if the aspect ratio is correct.

from sage.plot.plot3d.transform import rotate_arbitrary

#redundandly random rotation matrix
rand_rotMatrix = \
           rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5)) *\
           rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5)) *\
           rotate_arbitrary((uniform(-5,5),uniform(-5,5),uniform(-5,5)),uniform(-5,5))

#random translation vector
rand_transVector = vector((uniform(-10,10),uniform(-10,10),uniform(-10,10))).transpose()

#random rectangle parameters
rand_width =uniform(0.1,10)
rand_height=uniform(0.1,10)
rand_left  =uniform(-10,10)
rand_top   =uniform(-10,10)

#random focal length and principal point
rand_f  = uniform(0.1,100)
rand_u0 = uniform(-100,100)
rand_v0 = uniform(-100,100)

# homogenous standard pinhole projection, see [zhang-single] Equation 1
hom_projection = A * rand_rotMatrix.augment(rand_transVector)

# construct a random rectangle in the plane z=0, then project it randomly 
rand_m1hom = hom_projection*vector((rand_left           ,rand_top            ,0,1)).transpose()
rand_m2hom = hom_projection*vector((rand_left           ,rand_top+rand_height,0,1)).transpose()
rand_m3hom = hom_projection*vector((rand_left+rand_width,rand_top            ,0,1)).transpose()
rand_m4hom = hom_projection*vector((rand_left+rand_width,rand_top+rand_height,0,1)).transpose()

#change type from 1x3 matrix to vector
rand_m1hom = rand_m1hom.column(0)
rand_m2hom = rand_m2hom.column(0)
rand_m3hom = rand_m3hom.column(0)
rand_m4hom = rand_m4hom.column(0)

#normalize
rand_m1hom = rand_m1hom/rand_m1hom[2]
rand_m2hom = rand_m2hom/rand_m2hom[2]
rand_m3hom = rand_m3hom/rand_m3hom[2]
rand_m4hom = rand_m4hom/rand_m4hom[2]

#substitute random values for f, u0, v0
rand_m1hom = rand_m1hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)
rand_m2hom = rand_m2hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)
rand_m3hom = rand_m3hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)
rand_m4hom = rand_m4hom(f=rand_f,s=1,u0=rand_u0,v0=rand_v0)

# printing the randomly choosen values
print "ground truth: f=", rand_f, "; ratio=", rand_width/rand_height

# substitute all the variables in the equations:
print "calculated: f= ",\
f_(k2=k2_,k3=k3_)(s=1,u0=rand_u0,v0=rand_v0)(
  m1x=rand_m1hom[0],m1y=rand_m1hom[1],
  m2x=rand_m2hom[0],m2y=rand_m2hom[1],
  m3x=rand_m3hom[0],m3y=rand_m3hom[1],
  m4x=rand_m4hom[0],m4y=rand_m4hom[1],
),"; 1/ratio=", \
1/whRatio(f=f_)(k2=k2_,k3=k3_)(s=1,u0=rand_u0,v0=rand_v0)(
  m1x=rand_m1hom[0],m1y=rand_m1hom[1],
  m2x=rand_m2hom[0],m2y=rand_m2hom[1],
  m3x=rand_m3hom[0],m3y=rand_m3hom[1],
  m4x=rand_m4hom[0],m4y=rand_m4hom[1],
)

print "k2 = ", k2_(
  m1x=rand_m1hom[0],m1y=rand_m1hom[1],
  m2x=rand_m2hom[0],m2y=rand_m2hom[1],
  m3x=rand_m3hom[0],m3y=rand_m3hom[1],
  m4x=rand_m4hom[0],m4y=rand_m4hom[1],
), "; k3 = ", k3_(
  m1x=rand_m1hom[0],m1y=rand_m1hom[1],
  m2x=rand_m2hom[0],m2y=rand_m2hom[1],
  m3x=rand_m3hom[0],m3y=rand_m3hom[1],
  m4x=rand_m4hom[0],m4y=rand_m4hom[1],
)

# ATTENTION: testing revealed, that the whRatio 
# is actually the height/width ratio, 
# not the width/height ratio
# This contradicts [zhang-single]
# if anyone can find the error that caused this, I'd be grateful

    ground truth: f= 72.1045134124554 ; ratio= 3.46538779959142
    calculated: f=  72.1045134125 ; 1/ratio= 3.46538779959
    k2 =  0.99114614987 ; k3 =  1.57376280159

score 7 · Accepted Answer

更新

阅读您的更新并查看第一个参考资料（白板扫描和图像增强）后，我看到了缺失点在哪里。

问题的输入数据是一个四元组（A,B,C,D），以及投影图像的中心 O。在文章中，它对应于假设u0=v0=0。加上这一点，问题变得足够约束以获得矩形的纵横比。

然后将问题重述如下：给定 Z=0 平面中的四重 (A,B,C,D)，找到眼睛位置 E(0,0,h)，h>0 和 3D 平面 P，使得(A,B,C,D) 在 P 上的投影是一个矩形。

请注意，P 由 E 确定：要获得平行四边形，P 必须包含与 (EU) 和 (EV) 的平行线，其中 U=(AB)x(CD) 和 V=(AD)x(BC)。

实验上，似乎这个问题通常有一个唯一的解决方案，对应于矩形的 w/h 比的唯一值。

替代文字

上一篇文章

不，您无法从投影中确定矩形比例。

在一般情况下，Z=0 平面的四个非共线点的四重 (A,B,C,D) 是无限多矩形的投影，具有无限多的宽/高比。

考虑两个消失点 U，(AB) 和 (CD) 的交点和 V，(AD) 和 (BC) 的交点，以及点 I，两个对角线 (AC) 和 (BD) 的交点。要投影为 ABCD，中心 I 的平行四边形必须位于包含通过点 I 的平行于 (UV) 的线的平面上。在一个这样的平面上，您可以找到许多投影到 ABCD 的矩形，它们都具有不同的 w/h 比。

请参阅使用 Cabri 3D 完成的这两个图像。在这两种情况下，ABCD 没有改变（在灰色 Z=0 平面上），包含矩形的蓝色平面也没有改变。部分隐藏的绿线是（UV）线，可见的绿线与其平行并包含I。

替代文字

score 1 · Accepted Answer

尺寸并不是真正需要的，比例也不是。考虑到他正在使用文件的照片/扫描件，知道哪一边向上是无关紧要的。我怀疑他会扫描它们的背面。

“拐角相交”是矫正透视的方法。这可能会有所帮助：

如何在 2D 中绘制透视校正网格

score 1 · Accepted Answer

关于为什么结果给出的是 h/w 而不是 w/h 的问题：我想知道上面的公式 20 的表达式是否正确。发表的是：

       whRatio = sqrt (
            (n2*A.transpose()^(-1) * A^(-1)*n2.transpose()) / 
            (n3*A.transpose()^(-1) * A^(-1)*n3.transpose())
           )

当我尝试使用 OpenCV 执行该操作时，出现异常。但是当我使用以下等式时一切正常，在我看来它更像是等式 20：但基于等式 20，它看起来应该是：

        whRatio = sqrt (
            (n2.transpose()*A.transpose()^(-1) * A^(-1)*n2) /
            (n3.transpose()*A.transpose()^(-1) * A^(-1)*n3)
           )

score 1 · Accepted Answer

你可以通过这个答案来确定宽度/高度Calculating rectangle 3D coordinate with coordinate its shadow? . 假设您的矩形在交叉对角点上旋转计算它的宽度和高度。但是当你改变假设阴影平面与真实阴影平面之间的距离时，矩形的比例与计算出的宽度/高度相同！

score 0 · Accepted Answer

如果不知道“相机”的距离，就不可能知道这个矩形的宽度。

从 5 厘米远看的小矩形与从几米远看的大矩形一样

score 0 · Accepted Answer

用这两个消失点和水平线下方的第三个点绘制一个直角等腰三角形（即，与矩形在水平线的同一侧）。第三个点将是我们的原点，到消失点的两条线将是我们的轴。调用从原点到消失点的距离 pi/2。现在将矩形的边从消失点延伸到轴，并标记它们与轴相交的位置。选择一个轴，测量从两个标记到原点的距离，转换这些距离：x->tan(x)，差值将是那一侧的“真实”长度。对另一个轴做同样的事情。取这两个长度的比例，你就完成了。

score 0 · Accepted Answer

Dropbox 在他们的技术博客上有一篇详尽的文章，其中描述了他们如何解决扫描仪应用程序的问题。

https://blogs.dropbox.com/tech/2016/08/fast-document-rectification-and-enhancement/

更正文件

我们假设输入文档在物理世界中是矩形的，但如果它不完全面向相机，则图像中的结果角将是一般的凸四边形。因此，为了满足我们的第一个目标，我们必须撤消捕获过程应用的几何变换。这种转换取决于相机相对于文档的视点（这些是所谓的外部参数），以及相机的焦距（内部参数）等因素。这是捕获场景的示意图：

为了撤消几何变换，我们必须首先确定所述参数。如果我们假设一个非常对称的相机（无散光、无偏斜等），该模型中的未知数是：

相机相对于文档的 3D 位置（3 个自由度），

相机相对于文档的 3D 方向（3 个自由度），

文档的尺寸（2 个自由度），以及

相机的焦距（1 个自由度）。

另一方面，四个检测到的文档角的 x 和 y 坐标实际上给了我们八个约束。虽然未知数 (9) 似乎比约束 (8) 多，但未知数并不是完全自由的变量——可以想象对文档进行物理缩放并将其放置在离相机更远的地方，以获得相同的照片。这个关系放置了一个额外的约束，所以我们有一个完全约束的系统要解决。（我们求解的实际方程组涉及其他一些考虑；相关的维基百科文章给出了很好的总结：https ://en.wikipedia.org/wiki/Camera_resectioning ）

恢复参数后，我们可以撤消捕获过程应用的几何变换以获得漂亮的矩形图像。然而，这可能是一个耗时的过程：对于每个输出像素，需要查找源图像中相应输入像素的值。当然，GPU 是专门为这样的任务设计的：在虚拟空间中渲染纹理。存在一个视图变换——它恰好是我们刚刚解决的相机变换的倒数！——通过它可以渲染完整的输入图像并获得校正后的文档。（查看这一点的一个简单方法是注意，一旦您在手机屏幕上显示了完整的输入图像，您可以倾斜和平移手机，使屏幕上文档区域的投影对您来说是直线的。）

最后，回想一下关于比例的模糊性：例如，我们无法判断文档是信纸大小的纸张（8.5 英寸 x 11 英寸）还是海报板（17 英寸 x 22 英寸）。输出图像的尺寸应该是多少？为了解决这种模糊性，我们计算输入图像中四边形内的像素数，并将输出分辨率设置为与该像素数相匹配。这个想法是我们不想对图像进行过多的上采样或下采样。

score 0 · Accepted Answer

在这个有趣的问题上似乎仍然存在一些混乱。我想给出一个易于理解的解释，说明什么时候可以解决问题，什么时候不能解决。

约束和自由度

通常，当我们遇到这样的问题时，首先要做的是评估未知自由度 (DoF) N 的数量，以及用于约束未知自由度的独立方程 M 的数量。如果 N 如果超过 M（意味着约束少于未知数），则无法解决问题。我们可以排除所有无法解决的问题。如果 N 不超过 M，则可以使用唯一解来解决问题，但这不能保证（请参阅倒数第二段的示例）。

让我们使用p 1、p 2、p 3 和p 4 来表示平面表面的 4 个角在世界坐标中的位置。让我们使用R和t作为将它们转换为相机坐标的 3D 旋转和平移。让我们用K来表示 3x3 相机内在矩阵。我们现在将忽略镜头失真。相机图像中第i个角的 2D 位置由q i=f( K ( Rp i+ t)) 其中 f 是投影函数 f(x,y,z)=(x/z,y/z)。使用这个方程，我们知道图像中的每个角都为我们提供了关于未知数的两个方程（即两个约束）：一个来自q i 的 x 分量，一个来自 y 分量。因此，我们总共有 8 个约束条件可供使用。这些约束的正式名称是重投影约束。

那么我们未知的自由度是什么？当然R和t是未知的，因为我们不知道相机在世界坐标中的位姿。因此，我们已经有 6 个未知的自由度：3 个用于R（例如偏航、俯仰和滚动），3 个用于t。因此，剩余项（K、p 1、p 2、p 3、p 4）中最多有两个未知数。

不同的问题

我们可以根据 ( K , p 1, p 2, p 3, p 4) 中的哪两项我们认为是未知的来构建不同的问题。此时让我们以通常的形式写出K ： K =(fx, 0, cx; 0, fy, cy; 0,0,1) 其中 fx 和 fy 是焦距项（fx/fy 通常称为图像纵横比）和（cx，cy）是主点（图像中的投影中心）。

我们可以通过将 fx 和 fy 作为我们的两个未知数来得到一个问题，并假设 (cx, cy, p 1, p 2, p 3, p 4) 都是已知的。实际上，这个问题在 OpenCV 的相机校准方法中使用和解决，使用棋盘平面目标的图像。这用于通过假设主点位于图像中心（对于大多数相机来说这是一个非常合理的假设）来获得 fx 和 fy 的初始估计。

或者，我们可以通过假设 fx=fy 来创建不同的问题，这对于许多相机来说也是相当合理的，并假设这个焦距（表示为 f）是K中唯一的未知数。因此，我们还有一个未知数可以处理（回想一下，我们最多可以有两个未知数）。因此，让我们通过假设我们知道平面的形状来使用它：作为矩形（这是问题中的原始假设）。因此我们可以如下定义角点：p 1=(0,0,0), p 2=(0,w,0), p 3=(h,0,0) 和p4=(h,w,0)，其中 h 和 w 表示矩形的高度和宽度。现在，因为我们只剩下 1 个未知数，让我们将其设置为平面的纵横比：x=w/h。现在的问题是我们能否同时从 8 个重投影约束中恢复 x、f、 R和t ？答案是肯定的！问题中引用的张的论文中给出了解决方案。

尺度模糊

人们可能想知道是否可以解决另一个问题：如果我们假设K是已知的并且 2 个未知数是 h 和 w。它们可以从重投影方程中求解吗？答案是否定的，因为平面的大小和平面到相机的深度之间存在歧义。具体来说，如果我们按 s 缩放角p i 并按 s 缩放t，则 s 在重投影方程中取消。因此，飞机的绝对比例是不可恢复的。

未知自由度的不同组合可能存在其他问题，例如具有R，t，主点分量之一和平面宽度作为未知数。但是需要考虑哪些情况是实际使用的。不过，我还没有看到所有有用组合的系统解决方案！

更多积分

我们可能会认为，如果我们要在平面和图像之间添加额外的点对应关系，或者利用平面的边缘，我们可以恢复 8 个以上的未知自由度。可悲的是，答案是否定的。这是因为它们没有添加任何额外的独立约束。原因是因为 4 个角完全描述了从平面到图像的变换。这可以通过使用四个角拟合单应矩阵来看出，然后可以确定图像中平面上所有其他点的位置。

score -1 · Accepted Answer

您需要更多信息，转换后的图形可以来自任意角度的任何平行四边形。

所以我想你需要先做一些校准。

编辑：对于那些说我错了的人，这里有数学证明，即有无限的矩形/相机组合可以产生相同的投影：

为了简化问题（因为我们只需要边的比率），我们假设我们的矩形由以下几点定义：（R=[(0,0),(1,0),(1,r),(0,r)]这种简化与将任何问题转换为仿射空间中的等价问题相同）。

变换后的多边形定义为：T=[(tx0,ty0),(tx1,ty1),(tx2,ty2),(tx3,ty3)]

存在M = [[m00,m01,m02],[m10,m11,m12],[m20,m21,m22]]满足的变换矩阵(Rxi,Ryi,1)*M=wi(txi,tyi,1)'

如果我们将上面的等式扩展为点，

因为R_0我们得到：m02-tx0*w0 = m12-ty0*w0 = m22-w0 = 0

因为R_1我们得到：m00-tx1*w1 = m10-ty1*w1 = m20+m22-w1 = 0

因为R_2我们得到：m00+r*m01-tx2*w2 = m10+r*m11-ty2*w2 = m20+r*m21+m22-w2 = 0

R_3我们得到：m00+r*m01-tx3*w3 = m10+r*m11-ty3*w3 = m20 + r*m21 + m22 -w3 = 0

到目前为止，我们有 12 个方程，14 个未知变量（矩阵中的 9 个，的 4 个wi，比率的 1 个r），其余的都是已知值（txi并且tyi已给出）。

即使系统没有被低估，一些未知数在它们自身（r和mi0产品）之间相乘，使系统非线性（您可以将其转换为线性系统，为每个产品分配一个新名称，但您仍然会以13 个未知数，其中 3 个被扩展为无限解）。

如果您发现推理或数学中有任何缺陷，请告诉我。

image-processing - 透视变形矩形的比例

更新：

10 回答 10

更新：这里是这些方程是如何确定的：

Related

Reference