opencv - ArUco 标记，姿势估计 - 究竟是针对哪一点给出平移和旋转？

Question

我检测到了 ArUco 标记并估计了姿势。见下图。但是，我得到的 Xt（X 翻译）是一个正值。根据drawAxis函数，正方向远离图像中心。所以我认为它应该是一个负值。为什么我变得积极了。

我的相机距离成像表面大约 120 毫米。但我得到的 Zt（Z 平移）在 650 毫米的范围内。姿态估计是否给出了标记相对于物理相机或图像平面中心的姿态？我不明白为什么Zt这么高。

我在改变的同时不断地测量 Pose Z，得到 roll、pitch、yaw。我注意到滚动（旋转 wrt 凸轮 X 轴）的符号来回改变幅度变化 166-178，但 Xt 的符号没有随着滚动符号的变化而改变。关于它为什么会这样的任何想法？有什么建议可以获得更一致的数据吗？

image=cv.imread(fname)
arucoDict = cv.aruco.Dictionary_get(cv.aruco.DICT_4X4_1000)
arucoParams = cv.aruco.DetectorParameters_create()
(corners, ids, rejected) = cv.aruco.detectMarkers(image, arucoDict,
    parameters=arucoParams)

    print(corners, ids, rejected)
    
    if len(corners) > 0:
        # flatten the ArUco IDs list
        ids = ids.flatten()
        # loop over the detected ArUCo corners
        #for (markerCorner, markerID) in zip(corners, ids):
        #(markerCorner, markerID)=(corners, ids)
            # extract the marker corners (which are always returned in
            # top-left, top-right, bottom-right, and bottom-left order)
        #corners = corners.reshape((4, 2))
        (topLeft, topRight, bottomRight, bottomLeft) = corners[0][0][0],corners[0][0][1],corners[0][0][2],corners[0][0][3]
            # convert each of the (x, y)-coordinate pairs to integers
        topRight = (int(topRight[0]), int(topRight[1]))
        bottomRight = (int(bottomRight[0]), int(bottomRight[1]))
        bottomLeft = (int(bottomLeft[0]), int(bottomLeft[1]))
        topLeft = (int(topLeft[0]), int(topLeft[1]))
    
            # draw the bounding box of the ArUCo detection
        cv.line(image, topLeft, topRight, (0, 255, 0), 2)
        cv.line(image, topRight, bottomRight, (0, 255, 0), 2)
        cv.line(image, bottomRight, bottomLeft, (0, 255, 0), 2)
        cv.line(image, bottomLeft, topLeft, (0, 255, 0), 2)
            # compute and draw the center (x, y)-coordinates of the ArUco
            # marker
        cX = int((topLeft[0] + bottomRight[0]) / 2.0)
        cY = int((topLeft[1] + bottomRight[1]) / 2.0)
        cv.circle(image, (cX, cY), 4, (0, 0, 255), -1)
        
        if topLeft[1]!=topRight[1] or topLeft[0]!=bottomLeft[0]:
            rot1=np.degrees(np.arctan((topLeft[0]-bottomLeft[0])/(bottomLeft[1]-topLeft[1])))
            rot2=np.degrees(np.arctan((topRight[1]-topLeft[1])/(topRight[0]-topLeft[0])))
            rot=(np.round(rot1,3)+np.round(rot2,3))/2
            print(rot1,rot2,rot)
        else:
            rot=0
    
        # draw the ArUco marker ID on the image
        rotS=",rotation:"+str(np.round(rot,3))
        cv.putText(image, ("position: "+str(cX) +","+str(cY)),
        (100, topLeft[1] - 15), cv.FONT_HERSHEY_SIMPLEX,0.5, (255, 0, 80), 2)
        cv.putText(image, rotS,
        (400, topLeft[1] -15), cv.FONT_HERSHEY_SIMPLEX,0.5, (255, 0, 80), 2)
        print("[INFO] ArUco marker ID: {}".format(ids))
        
        
        d=np.round((math.dist(topLeft,bottomRight)+math.dist(topRight,bottomLeft))/2,3)
        # Get the rotation and translation vectors
        rvecs, tvecs, obj_points = cv.aruco.estimatePoseSingleMarkers(corners,aruco_marker_side_length,mtx,dst)
        
         
        # Print the pose for the ArUco marker
        # The pose of the marker is with respect to the camera lens frame.
        # Imagine you are looking through the camera viewfinder, 
        # the camera lens frame's:
        # x-axis points to the right
        # y-axis points straight down towards your toes
        # z-axis points straight ahead away from your eye, out of the camera
        #for i, marker_id in enumerate(marker_ids):
         
            # Store the translation (i.e. position) information
        transform_translation_x = tvecs[0][0][0]
        transform_translation_y = tvecs[0][0][1]
        transform_translation_z = tvecs[0][0][2]
               
            # Store the rotation information
        rotation_matrix = np.eye(4)
        rotation_matrix[0:3, 0:3] = cv.Rodrigues(np.array(rvecs[0]))[0]
        r = R.from_matrix(rotation_matrix[0:3, 0:3])
        quat = r.as_quat()   
             
            # Quaternion format     
        transform_rotation_x = quat[0] 
        transform_rotation_y = quat[1] 
        transform_rotation_z = quat[2] 
        transform_rotation_w = quat[3] 
             
            # Euler angle format in radians
        roll_x, pitch_y, yaw_z = euler_from_quaternion(transform_rotation_x,transform_rotation_y,transform_rotation_z,transform_rotation_w)
             
        roll_x = math.degrees(roll_x)
        pitch_y = math.degrees(pitch_y)
        yaw_z = math.degrees(yaw_z)

score 1 · Accepted Answer

不检查所有代码（看起来大致没问题），一些关于 OpenCV 和 aruco 的基础知识：

两者都使用右手坐标系。拇指 X，索引 Y，中间 Z。

OpenCV 使用 X 向右，Y 向下，Z 远，用于屏幕/相机帧。屏幕和图片的来源是左上角。对于相机，原点是针孔模型的中心，也就是光圈的中心。我无法评论镜头或镜头系统。假设镜头中心是原点。这可能已经足够接近了。

如果标记平放在桌子上，Aruco 使用 X 向右、Y 远、Z 向上。原点位于标记的中心。标记的左上角被认为是“第一个”角。

可以认为标记具有自己的坐标系/框架。

rvec 和 tvec 给出的位姿是相机帧中标记的位姿。这意味着np.linalg.norm(tvec)为您提供从相机到标记中心的直接距离。tvec 的 Z 只是平行于光轴的分量。

如果标记位于图片的右半部分（“一半”由相机矩阵的 cx,cy 定义），您会期望 tvec 的 X 增长。下半部分，Y 为正/增长。

相反，该转换将标记局部坐标转换为相机局部坐标。尝试转换一些标记局部点，例如原点或轴上的点。我相信这cv::transform会有所帮助。使用 OpenCVprojectPoints将 3D 空间点映射到 2D 图像点，然后您可以绘制标记的轴，或在其顶部的立方体，或任何您喜欢的东西。

假设标记笔直立并正对着相机。当您考虑标记和相机在空间（“世界”空间）中的帧三元组时，两者都将是 X“正确”，但一个的 Y 和 Z 与另一个的 Y 和 Z 相反，所以你会期望看到一个绕 X 轴旋转半圈（旋转 Z 和 Y）。

你可以想象转换是这样发生的：

最初，相机通过标记从标记的背面看向世界。相机将“倒置”。相机看到标记空间。
姿势的旋转组件围绕相机的原点旋转整个标记局部世界。从世界框架（参考点）看，相机旋转，进入你会发现自然的姿态。
姿势的平移将标记的世界移到相机前面（Z 为正），或者等效地，相机远离标记。

如果您得到难以置信的值，请检查aruco_marker_side_length相机矩阵。对于典型分辨率（VGA-4k）和视场（60-80 度），f 约为 500-3000。

opencv - ArUco 标记，姿势估计 - 究竟是针对哪一点给出平移和旋转？

1 回答 1

Related

Reference