uproot - uproot：使用 uproot 方法 .pandas() 处理 TH2D

Question

我对 uproot 和 Python 很陌生，但希望能很快赶上。我想知道为什么该方法.pandas()会从 TH2D 直方图创建如此奇怪的表：

myhisto = file["angular_distr_el/ID3_mol_e0_valid/EN_gate/check_cthetaEE_x"]
type(myhisto)

输出：

uproot.rootio.TH2D

最后， myhisto.pandas() 返回：

        count   variance
cos(theta)  electron energy [eV]        
[-inf, -1.0)    [-inf, 10.0)    0.0 0.0
[10.0, 10.15)   0.0 0.0
[10.15, 10.3)   0.0 0.0
[10.3, 10.45)   0.0 0.0
[10.45, 10.6)   0.0 0.0
... ... ... ...
[1.0, inf)  [24.4, 24.549999999999997)  0.0 0.0
[24.549999999999997, 24.7)  0.0 0.0
[24.7, 24.85)   0.0 0.0
[24.85, 25.0)   0.0 0.0
[25.0, inf) 0.0 0.0
2244 rows × 2 columns

并myhisto.columns返回：

Index(['count', 'variance'], dtype='object')

我在哪里可以找到该方法的文档.pandas()以了解它在做什么？有没有办法myhisto用正确的列在 DataFrame 中重新组织？

score 1 · Accepted Answer

经过一些有趣但绝望的浏览后，我明白了它是哪种类型的对象。这是创建排序的 MultiIndex DataFrames的一种非常聪明的方法。只需输入 myhisto.index 即可直接查看：

MultiIndex([([-inf, -1.0),                [-inf, 10.0)),
            ([-inf, -1.0),               [10.0, 10.15)),
            ([-inf, -1.0),               [10.15, 10.3)),
            ([-inf, -1.0),               [10.3, 10.45)),
            ([-inf, -1.0),               [10.45, 10.6)),
            ([-inf, -1.0),               [10.6, 10.75)),
            ([-inf, -1.0),               [10.75, 10.9)),
            ([-inf, -1.0),               [10.9, 11.05)),
            ([-inf, -1.0),               [11.05, 11.2)),
            ([-inf, -1.0),               [11.2, 11.35)),
            ...
            (  [1.0, inf), [23.65, 23.799999999999997)),
            (  [1.0, inf), [23.799999999999997, 23.95)),
            (  [1.0, inf),               [23.95, 24.1)),
            (  [1.0, inf),               [24.1, 24.25)),
            (  [1.0, inf),               [24.25, 24.4)),
            (  [1.0, inf),  [24.4, 24.549999999999997)),
            (  [1.0, inf),  [24.549999999999997, 24.7)),
            (  [1.0, inf),               [24.7, 24.85)),
            (  [1.0, inf),               [24.85, 25.0)),
            (  [1.0, inf),                 [25.0, inf))],
           names=['cos(theta)', 'electron energy [eV]'], length=2244)

解决方案是取消堆叠或创建 DataFrame 的数据透视表。对于这个特定对象，数据透视表更好，因为在原始 DataFrame 中存在计数和方差作为列。举个例子：

myhisto.unstack()

count   ... variance
electron energy [eV]    [-inf, 10.0)    [10.0, 10.15)   [10.15, 10.3)   [10.3, 10.45)   [10.45, 10.6)   [10.6, 10.75)   [10.75, 10.9)   [10.9, 11.05)   [11.05, 11.2)   [11.2, 11.35)   ... [23.65, 23.799999999999997) [23.799999999999997, 23.95) [23.95, 24.1)   [24.1, 24.25)   [24.25, 24.4)   [24.4, 24.549999999999997)  [24.549999999999997, 24.7)  [24.7, 24.85)   [24.85, 25.0)   [25.0, inf)
cos(theta)                                                                                  
[-inf, -1.0)    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
[-1.0, -0.9)    0.0 1.0 1.0 0.0 0.0 2.0 0.0 2.0 0.0 1.0 ... 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
[-0.9, -0.8)    0.0 0.0 3.0 3.0 0.0 0.0 0.0 0.0 1.0 1.0 ... 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0
[-0.8, -0.7)    0.0 0.0 1.0 2.0 0.0 1.0 1.0 2.0 1.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
[-0.7, -0.6)    0.0 0.0 1.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0
[-0.6, -0.5)    0.0 1.0 1.0 1.0 0.0 0.0 2.0 1.0 0.0 3.0 ... 0.0 1.0 0.0 1.0 1.0 

**22 rows × 204 columns**

对比

pivot_pipanda = pipanda.pivot_table(values="count", index="cos(theta)", columns="electron energy [eV]")

electron energy [eV]    [-inf, 10.0)    [10.0, 10.15)   [10.15, 10.3)   [10.3, 10.45)   [10.45, 10.6)   [10.6, 10.75)   [10.75, 10.9)   [10.9, 11.05)   [11.05, 11.2)   [11.2, 11.35)   ... [23.65, 23.799999999999997) [23.799999999999997, 23.95) [23.95, 24.1)   [24.1, 24.25)   [24.25, 24.4)   [24.4, 24.549999999999997)  [24.549999999999997, 24.7)  [24.7, 24.85)   [24.85, 25.0)   [25.0, inf)
cos(theta)                                                                                  
[-inf, -1.0)    0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
[-1.0, -0.9)    0.0 1.0 1.0 0.0 0.0 2.0 0.0 2.0 0.0 1.0 ... 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
[-0.9, -0.8)    0.0 0.0 3.0 3.0 0.0 0.0 0.0 0.0 1.0 1.0 ... 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 1.0 0.0
[-0.8, -0.7)    0.0 0.0 1.0 2.0 0.0 1.0 1.0 2.0 1.0 1.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
[-0.7, -0.6)    0.0 0.0 1.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 0.0
[-0.6, -0.5)    0.0 1.0 1.0 1.0 0.0 0.0 2.0 1.0 0.0 3.0 ... 0.0 1.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 0.0
[-0.5, -0.3999999999999999) 0.0 0.0 2.0 0.0 1.0 1.0 3.0 2.0 3.0 1.0 ... 3.0 0.0 0.0 0.0 0.0 2.0 0.0 1.0 1.0 0.0

从这里可以使用 pandas 的标准方法！

（使用 loc[] 和 iloc[] 等切片技术：https ://www.youtube.com/watch?v=tcRGa2soc-c ）

uproot - uproot：使用 uproot 方法 .pandas() 处理 TH2D

1 回答 1

Related

Reference