python - OpenCV - 来自未校准立体系统的深度图

Question

我正在尝试使用未校准的方法获取深度图。我可以通过用 SIFT 找到对应点然后使用cv2.findFundamentalMat. 然后我cv2.stereoRectifyUncalibrated用来获取每个图像的单应矩阵。最后我用它cv2.warpPerspective来校正和计算视差，但这并不能创建一个好的深度图。这些值非常高，所以我想知道是否必须使用warpPerspective，或者是否必须从我得到的单应矩阵计算旋转矩阵stereoRectifyUncalibrated。

我不确定通过校正获得的单应矩阵情况下的投影矩阵stereoRectifyUncalibrated。

部分代码：

#Obtainment of the correspondent point with SIFT
sift = cv2.SIFT()

###find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(dst1,None)
kp2, des2 = sift.detectAndCompute(dst2,None)

###FLANN parameters
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50)

flann = cv2.FlannBasedMatcher(index_params,search_params)
matches = flann.knnMatch(des1,des2,k=2)

good = []
pts1 = []
pts2 = []

###ratio test as per Lowe's paper
for i,(m,n) in enumerate(matches):
    if m.distance < 0.8*n.distance:
        good.append(m)
        pts2.append(kp2[m.trainIdx].pt)
        pts1.append(kp1[m.queryIdx].pt)
    
    
pts1 = np.array(pts1)
pts2 = np.array(pts2)

#Computation of the fundamental matrix
F,mask= cv2.findFundamentalMat(pts1,pts2,cv2.FM_LMEDS)


# Obtainment of the rectification matrix and use of the warpPerspective to transform them...
pts1 = pts1[:,:][mask.ravel()==1]
pts2 = pts2[:,:][mask.ravel()==1]

pts1 = np.int32(pts1)
pts2 = np.int32(pts2)

p1fNew = pts1.reshape((pts1.shape[0] * 2, 1))
p2fNew = pts2.reshape((pts2.shape[0] * 2, 1))
    
retBool ,rectmat1, rectmat2 = cv2.stereoRectifyUncalibrated(p1fNew,p2fNew,F,(2048,2048))

dst11 = cv2.warpPerspective(dst1,rectmat1,(2048,2048))
dst22 = cv2.warpPerspective(dst2,rectmat2,(2048,2048))

#calculation of the disparity
stereo = cv2.StereoBM(cv2.STEREO_BM_BASIC_PRESET,ndisparities=16*10, SADWindowSize=9)
disp = stereo.compute(dst22.astype(uint8), dst11.astype(uint8)).astype(np.float32)
plt.imshow(disp);plt.colorbar();plt.clim(0,400)#;plt.show()
plt.savefig("0gauche.png")

#plot depth by using disparity focal length `C1[0,0]` from stereo calibration and `T[0]` the distance between cameras

plt.imshow(C1[0,0]*T[0]/(disp),cmap='hot');plt.clim(-0,500);plt.colorbar();plt.show()

以下是未校准方法（和warpPerspective）的校正图片：

在此处输入图像描述

以下是使用校准方法校正的图片：

在此处输入图像描述

我不知道这两种图片之间的差异为何如此重要。对于校准的方法，它似乎没有对齐。

使用未校准方法的视差图：

在此处输入图像描述

深度计算如下：C1[0,0]*T[0]/(disp) 使用 T 来自stereoCalibrate。价值非常高。

------------ 稍后编辑 ------------

我尝试用“stereoRectifyUncalibrated”获得的单应矩阵“挂载”重建矩阵（[Devernay97]，[Garcia01]），但结果仍然不好。我这样做正确吗？

Y=np.arange(0,2048)
X=np.arange(0,2048)
(XX_field,YY_field)=np.meshgrid(X,Y)

#I mount the X, Y and disparity in a same 3D array 
stock = np.concatenate((np.expand_dims(XX_field,2),np.expand_dims(YY_field,2)),axis=2)
XY_disp = np.concatenate((stock,np.expand_dims(disp,2)),axis=2)

XY_disp_reshape = XY_disp.reshape(XY_disp.shape[0]*XY_disp.shape[1],3)

Ts = np.hstack((np.zeros((3,3)),T_0)) #i use only the translations obtained with the rectified calibration...Is it correct?


# I establish the projective matrix with the homography matrix
P11 = np.dot(rectmat1,C1)
P1 = np.vstack((np.hstack((P11,np.zeros((3,1)))),np.zeros((1,4))))
P1[3,3] = 1

# P1 = np.dot(C1,np.hstack((np.identity(3),np.zeros((3,1)))))

P22 = np.dot(np.dot(rectmat2,C2),Ts)
P2 = np.vstack((P22,np.zeros((1,4))))
P2[3,3] = 1

lambda_t = cv2.norm(P1[0,:].T)/cv2.norm(P2[0,:].T)


#I define the reconstruction matrix
Q = np.zeros((4,4))

Q[0,:] = P1[0,:].T
Q[1,:] = P1[1,:].T
Q[2,:] = lambda_t*P2[1,:].T - P1[1,:].T
Q[3,:] = P1[2,:].T

#I do the calculation to get my 3D coordinates
test = []
for i in range(0,XY_disp_reshape.shape[0]):
    a = np.dot(inv(Q),np.expand_dims(np.concatenate((XY_disp_reshape[i,:],np.ones((1))),axis=0),axis=1))
    test.append(a)

test = np.asarray(test)

XYZ = test[:,:,0].reshape(XY_disp.shape[0],XY_disp.shape[1],4)

score 17 · Accepted Answer

TLDR；对边缘更平滑的图像使用 StereoSGBM（半全局块匹配），如果您希望它更平滑，请使用一些后过滤

OP 没有提供原始图像，所以我使用Tsukuba的是Middlebury 数据集。

常规 StereoBM 的结果

StereoSGBM 的结果（已调整）

我能在文学作品中找到的最佳结果

有关详细信息，请参阅此处的出版物。

后过滤示例（见下面的链接）

OP问题的理论/其他考虑

您校准的校正图像的大黑色区域会让我相信，对于那些，校准做得不是很好。可能有多种原因，可能是物理设置，可能是您进行校准时的照明等，但是有很多相机校准教程可以解决这个问题，我的理解是您正在寻求一种方法从未校准的设置中获得更好的深度图（这不是 100% 清楚，但标题似乎支持这一点，我认为这就是人们会来这里尝试找到的）。

您的基本方法是正确的，但结果肯定可以改进。这种形式的深度映射不属于产生最高质量地图的那些（尤其是未校准的）。最大的改进可能来自使用不同的立体匹配算法。照明也可能具有显着影响。正确的图像（至少在我的肉眼看来）似乎不太亮，这可能会干扰重建。您可以先尝试将其调亮到与另一个相同的水平，或者如果可能的话收集新图像。从这里开始，我假设您无法访问原始相机，因此我将考虑收集新图像、更改设置或执行校准以超出范围。（如果您确实可以访问设置和摄像头，

您用于StereoBM计算确实有效的视差（深度图），但StereoSGBM更适合此应用程序（它可以更好地处理更平滑的边缘）。您可以在下面看到差异。

本文更深入地解释了这些差异：

块匹配侧重于高纹理图像（想象一张树的图片），半全局块匹配将侧重于亚像素级匹配和纹理更平滑的图片（想象一张走廊的图片）。

如果没有任何明确的内在相机参数、有关相机设置的细节（如焦距、相机之间的距离、到主体的距离等）、图像中的已知尺寸或运动（使用来自运动的结构），您可以仅获得投影变换的 3D 重建；您也不会有比例感或旋转感，但您仍然可以生成相对深度图。您可能会遇到一些桶形失真和其他失真，这些失真可以通过适当的相机校准来消除，但只要相机不是很糟糕（镜头系统不太失真）并且设置得很好，没有它您可以获得合理的结果接近规范配置（这基本上意味着它们的定向使得它们的光轴尽可能接近平行，并且它们的视野充分重叠）。然而，这似乎不是 OP 的问题，因为他确实设法使用未校准的方法获得了正确的校正图像。

基本程序

在两个图像中找到至少 5 个匹配良好的点，您可以使用它来计算基本矩阵（您可以使用任何您喜欢的检测器和匹配器，我保留了 FLANN，但使用 ORB 进行检测，因为 SIFT 不在 OpenCV 的主要版本中对于 4.2.0)
计算基本矩阵 F，用findFundamentalMat
使用stereoRectifyUncalibrated和warpPerspective
计算视差（深度图）StereoSGBM

结果要好得多：

与 ORB 和 FLANN 匹配

未失真的图像（左，然后是右）

差距

立体BM

这个结果看起来类似于 OPs 问题（斑点、间隙、某些区域的错误深度）。

StereoSGBM（调谐）

这个结果看起来要好得多，并且使用与 OP 大致相同的方法，减去最终的视差计算，这让我认为 OP 会在他的图像上看到类似的改进，如果提供的话。

后过滤

OpenCV 文档中有一篇很好的文章。如果您需要非常平滑的地图，我建议您查看它。

上面的示例照片是MPI Sintel 数据集中场景ambush_2中的第1 帧。

完整代码（在 OpenCV 4.2.0 上测试）：

import cv2
import numpy as np
import matplotlib.pyplot as plt

imgL = cv2.imread("tsukuba_l.png", cv2.IMREAD_GRAYSCALE)  # left image
imgR = cv2.imread("tsukuba_r.png", cv2.IMREAD_GRAYSCALE)  # right image


def get_keypoints_and_descriptors(imgL, imgR):
    """Use ORB detector and FLANN matcher to get keypoints, descritpors,
    and corresponding matches that will be good for computing
    homography.
    """
    orb = cv2.ORB_create()
    kp1, des1 = orb.detectAndCompute(imgL, None)
    kp2, des2 = orb.detectAndCompute(imgR, None)

    ############## Using FLANN matcher ##############
    # Each keypoint of the first image is matched with a number of
    # keypoints from the second image. k=2 means keep the 2 best matches
    # for each keypoint (best matches = the ones with the smallest
    # distance measurement).
    FLANN_INDEX_LSH = 6
    index_params = dict(
        algorithm=FLANN_INDEX_LSH,
        table_number=6,  # 12
        key_size=12,  # 20
        multi_probe_level=1,
    )  # 2
    search_params = dict(checks=50)  # or pass empty dictionary
    flann = cv2.FlannBasedMatcher(index_params, search_params)
    flann_match_pairs = flann.knnMatch(des1, des2, k=2)
    return kp1, des1, kp2, des2, flann_match_pairs


def lowes_ratio_test(matches, ratio_threshold=0.6):
    """Filter matches using the Lowe's ratio test.

    The ratio test checks if matches are ambiguous and should be
    removed by checking that the two distances are sufficiently
    different. If they are not, then the match at that keypoint is
    ignored.

    https://stackoverflow.com/questions/51197091/how-does-the-lowes-ratio-test-work
    """
    filtered_matches = []
    for m, n in matches:
        if m.distance < ratio_threshold * n.distance:
            filtered_matches.append(m)
    return filtered_matches


def draw_matches(imgL, imgR, kp1, des1, kp2, des2, flann_match_pairs):
    """Draw the first 8 mathces between the left and right images."""
    # https://docs.opencv.org/4.2.0/d4/d5d/group__features2d__draw.html
    # https://docs.opencv.org/2.4/modules/features2d/doc/common_interfaces_of_descriptor_matchers.html
    img = cv2.drawMatches(
        imgL,
        kp1,
        imgR,
        kp2,
        flann_match_pairs[:8],
        None,
        flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS,
    )
    cv2.imshow("Matches", img)
    cv2.imwrite("ORB_FLANN_Matches.png", img)
    cv2.waitKey(0)


def compute_fundamental_matrix(matches, kp1, kp2, method=cv2.FM_RANSAC):
    """Use the set of good mathces to estimate the Fundamental Matrix.

    See  https://en.wikipedia.org/wiki/Eight-point_algorithm#The_normalized_eight-point_algorithm
    for more info.
    """
    pts1, pts2 = [], []
    fundamental_matrix, inliers = None, None
    for m in matches[:8]:
        pts1.append(kp1[m.queryIdx].pt)
        pts2.append(kp2[m.trainIdx].pt)
    if pts1 and pts2:
        # You can play with the Threshold and confidence values here
        # until you get something that gives you reasonable results. I
        # used the defaults
        fundamental_matrix, inliers = cv2.findFundamentalMat(
            np.float32(pts1),
            np.float32(pts2),
            method=method,
            # ransacReprojThreshold=3,
            # confidence=0.99,
        )
    return fundamental_matrix, inliers, pts1, pts2


############## Find good keypoints to use ##############
kp1, des1, kp2, des2, flann_match_pairs = get_keypoints_and_descriptors(imgL, imgR)
good_matches = lowes_ratio_test(flann_match_pairs, 0.2)
draw_matches(imgL, imgR, kp1, des1, kp2, des2, good_matches)


############## Compute Fundamental Matrix ##############
F, I, points1, points2 = compute_fundamental_matrix(good_matches, kp1, kp2)


############## Stereo rectify uncalibrated ##############
h1, w1 = imgL.shape
h2, w2 = imgR.shape
thresh = 0
_, H1, H2 = cv2.stereoRectifyUncalibrated(
    np.float32(points1), np.float32(points2), F, imgSize=(w1, h1), threshold=thresh,
)

############## Undistort (Rectify) ##############
imgL_undistorted = cv2.warpPerspective(imgL, H1, (w1, h1))
imgR_undistorted = cv2.warpPerspective(imgR, H2, (w2, h2))
cv2.imwrite("undistorted_L.png", imgL_undistorted)
cv2.imwrite("undistorted_R.png", imgR_undistorted)

############## Calculate Disparity (Depth Map) ##############

# Using StereoBM
stereo = cv2.StereoBM_create(numDisparities=16, blockSize=15)
disparity_BM = stereo.compute(imgL_undistorted, imgR_undistorted)
plt.imshow(disparity_BM, "gray")
plt.colorbar()
plt.show()

# Using StereoSGBM
# Set disparity parameters. Note: disparity range is tuned according to
#  specific parameters obtained through trial and error.
win_size = 2
min_disp = -4
max_disp = 9
num_disp = max_disp - min_disp  # Needs to be divisible by 16
stereo = cv2.StereoSGBM_create(
    minDisparity=min_disp,
    numDisparities=num_disp,
    blockSize=5,
    uniquenessRatio=5,
    speckleWindowSize=5,
    speckleRange=5,
    disp12MaxDiff=2,
    P1=8 * 3 * win_size ** 2,
    P2=32 * 3 * win_size ** 2,
)
disparity_SGBM = stereo.compute(imgL_undistorted, imgR_undistorted)
plt.imshow(disparity_SGBM, "gray")
plt.colorbar()
plt.show()

score 9 · Accepted Answer

可能有几个可能导致低质量的问题Depth Channel以及Disparity Channel导致我们产生低质量立体声序列的原因。以下是其中的6个问题：

可能的问题一

不完整的公式

顾名思义，如果您不知道或不知道立体对的内在参数及其在环境中的相对位置，实例方法会为您计算校正变换uncalibrated。stereoRectifyUncalibrated

cv.StereoRectifyUncalibrated(pts1, pts2, fm, imgSize, rhm1, rhm2, thres)

在哪里：

# pts1    –&gt; an array of feature points in a first camera
# pts2    –&gt; an array of feature points in a first camera
# fm      –&gt; input fundamental matrix
# imgSize -> size of an image
# rhm1    -> output rectification homography matrix for a first image
# rhm2    -> output rectification homography matrix for a second image
# thres   –&gt; optional threshold used to filter out outliers

你的方法看起来是这样的：

cv2.StereoRectifyUncalibrated(p1fNew, p2fNew, F, (2048, 2048))

因此，您无需考虑三个参数rhm1：rhm2和thres。如果 a ，则在计算单应性之前拒绝threshold > 0所有不符合对极几何的点对。否则，所有点都被视为内点。这个公式看起来像这样：

(pts2[i]^t * fm * pts1[i]) > thres

# t   –&gt; translation vector between coordinate systems of cameras

因此，我认为由于公式的计算不完整，可能会出现视觉上的不准确。

您可以在官方资源上阅读相机校准和 3D 重建。

可能的问题二

轴间距

interaxial distance左右相机镜头之间必须坚固not greater than 200 mm。当interaxial distance大于interocular距离时，效果被称为hyperstereoscopy或hyperdivergence，并且不仅会导致场景中的深度夸张，还会导致观看者的身体不便。阅读 Autodesk 的立体电影制作白皮书，了解有关此主题的更多信息。

可能的问题三

Parallel vs Toed-In 相机模式

Disparity Map由于不正确的相机模式计算，结果可能会出现视觉不准确。许多立体摄影师更喜欢Toe-In camera mode，但皮克斯，例如，更喜欢Parallel camera mode.

可能的问题四

垂直对齐

在立体视觉中，如果发生垂直偏移（即使其中一个视图向上移动 1 毫米），它也会破坏强大的立体体验。因此，在生成之前，Disparity Map您必须确保立体对的左右视图相应对齐。查看Technicolor 立体白皮书，了解立体中的 15 个常见问题。

立体声整流矩阵：

   ┌                  ┐
   |  f   0   cx  tx  |
   |  0   f   cy  ty  |   # use "ty" value to fix vertical shift in one image
   |  0   0   1   0   |
   └                  ┘

这是一个StereoRectify方法：

cv.StereoRectify(cameraMatrix1, cameraMatrix2, distCoeffs1, distCoeffs2, imageSize, R, T, R1, R2, P1, P2, Q=None, flags=CV_CALIB_ZERO_DISPARITY, alpha=-1, newImageSize=(0, 0)) -> (roi1, roi2)

可能的问题 V

镜头畸变

镜头失真是立体构图中非常重要的话题。在生成a之前，Disparity Map您需要对左右视图进行不扭曲，之后生成视差通道，然后再次重新扭曲两个视图。

可能的问题六

没有抗锯齿的低质量深度通道

要创建高质量Disparity Map，您需要Depth Channels预先生成的左右。当您在 3D 包中工作时，只需单击一下即可渲染高质量的深度通道（边缘清晰）。但是从视频序列生成高质量的深度通道并不容易，因为立体对必须在您的环境中移动，以便为未来的运动深度算法生成初始数据。如果帧中没有运动，深度通道将非常糟糕。

此外，Depth通道本身还有一个缺点——它的边缘与 RGB 的边缘不匹配，因为它没有抗锯齿。

视差渠道代码片段：

在这里，我想代表一种快速生成的方法Disparity Map：

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

imageLeft = cv.imread('paris_left.png', 0)
imageRight = cv.imread('paris_right.png', 0)
stereo = cv.StereoBM_create(numDisparities=16, blockSize=15)
disparity = stereo.compute(imageLeft, imageRight)
plt.imshow(disparity, 'gray')
plt.show()

python - OpenCV - 来自未校准立体系统的深度图

2 回答 2

TLDR；对边缘更平滑的图像使用 StereoSGBM（半全局块匹配），如果您希望它更平滑，请使用一些后过滤

常规 StereoBM 的结果

StereoSGBM 的结果（已调整）

我能在文学作品中找到的最佳结果

后过滤示例（见下面的链接）

OP问题的理论/其他考虑

基本程序

与 ORB 和 FLANN 匹配

未失真的图像（左，然后是右）

差距

立体BM

StereoSGBM（调谐）

后过滤

完整代码（在 OpenCV 4.2.0 上测试）：

可能的问题一

可能的问题二

可能的问题三

可能的问题四

可能的问题 V

可能的问题六

视差渠道代码片段：

Related

Reference