1

给定两个数据表 Frame。如何将它们组合(合并)在一帧中?

dt_f_A =

+--------+--------+--------+-----+--------+
| A_at_1 | A_at_2 | A_at_3 | ... | A_at_m |
+--------+--------+--------+-----+--------+
| v_1 | | | | |
+--------+--------+--------+-----+--------+
| ... | | | | |
+--------+--------+--------+-----+--------+
| v_N | | | | |
+--------+--------+--------+-----+--------+

dt_f_B =

+--------+--------+--------+-----+--------+
| B_at_1 | B_at_2 | B_at_3 | ... | B_at_k |
+--------+--------+--------+-----+--------+
| w_1 | | | | |
+--------+--------+--------+-----+--------+
| ... | | | | |
+--------+--------+--------+-----+--------+
| w_N | | | | |
+--------+--------+--------+-----+--------+

预期结果 (dt_f_A concat(combine or merge) dt_f_B)

+--------+--------+--------+-----+--------+------- -+--------+--------+-----+--------+
| A_at_1 | A_at_2 | A_at_3 | ... | A_at_m | B_at_1 | B_at_2 | B_at_3 | ... | B_at_k |
+--------+--------+--------+-----+--------+------- -+--------+--------+-----+--------+
| v_1 | | | | | w_1 | | | | |
+--------+--------+--------+-----+--------+------- -+--------+--------+-----+--------+
| ... | | | | | ... | | | | |
+--------+--------+--------+-----+--------+------- -+--------+--------+-----+--------+
| v_N | | | | | w_N | | | | |
+--------+--------+--------+-----+--------+------- -+--------+--------+-----+--------+

我们考虑三种情况:

案例1: a)两个框架具有完全相同的行数,并且b)列中的 唯一属性

情况2:行数不同

案例3:属性不唯一(有重复)

@sammywemmy 感谢您的宝贵意见。

4

1 回答 1

1

案例1: a)两个框架的行数完全相同,b) 列中的唯一属性

1- 使用cbind : dt_f_A.cbind(dt_f_B)

2- 使用 : dt_f_A[:,dt_f_B.names] = dt_f_B

例子 :

import datatable as dt

dt_f_A = dt.Frame({"a":[1,2,3,4],"b":['a','b','c','d']})
dt_f_B = dt.Frame({"c":[1.1, 2.2, 3.3, 4.4], "d":['aa', 'bb', 'cc', 'dd']})

dt_f_A.cbind(dt_f_B)

#dt_f_A[:, dt_f_B.names] = dt_f_B # it's work fine also

print(dt_f_A)

情况2: 行数不同

  • dt_f_A.cbind(dt_f_B)给出 InvalidOperationError: Cannot cbind frame with X rows to a frame with Y rows。(X≠Y)
  • dt_f_A[:, dt_f_B.names]给出 ValueError: Frame 有X行,并且不能在需要Y的表达式中使用。(X≠Y)

解决方案:使用dt_f_A.cbind(dt_f_B,force=True)

例子:

import datatable as dt

dt_f_A = dt.Frame({"a":[1, 2, 3, 4, 5,6], "b":['a', 'b', 'c', 'd', 'e','f']})
dt_f_B = dt.Frame({"c":[1.1, 2.2, 3.3, 4.4], "d":['aa', 'bb', 'cc', 'dd']})

dt_f_A.cbind(dt_f_B,force=True)

print(dt_f_A)

缺失值,则用NA填充

案例3: 属性不唯一(有重复)

  • dt_f_A.cbind(dt_f_B): 它有效并发出警告。它将重复的属性更改为唯一的属性:atatableWarning:找到重复的列名,并被分配了一个唯一的名称:'a' -> 'a.0'

  • dt_f_A[:, dt_f_B.names] = dt_f_B: IT 没有给出任何错误。它消除了 dt_f_A 中的重复列,并将该列保留在dt_f_B中。

例子:

import datatable as dt

dt_f_A = dt.Frame({"a":[1,2,3,4],"b":['a','b','c','d']})
dt_f_B = dt.Frame({"a":[1.1, 2.2, 3.3, 4.4], "d":['aa', 'bb', 'cc', 'dd']})

dt_f_A.cbind(dt_f_B) # rename the duplicated columns
#dt_f_A[:, dt_f_B.names] = dt_f_B # keep only the duplicated columns in dt_f_B

print(dt_f_A)

@sammywemmy 感谢您的宝贵意见 :)

于 2020-08-18T17:34:37.863 回答