0

我正在计算皮尔逊相关性。最后,我得到如下结果(correlation1)。我想知道为什么我的所有第二个系数都为 0.0 作为相关性的结果。有谁能解释一下吗?此外,我的相关代码运行缓慢。我怎样才能使它快速?

结果(样本):
(0.52543523179249552, 0.0), (0.52543905756911169, 0.0), (0.52544196572206603, 0.0), (0.52545010637443945, 0.0)...

from scipy.stats import pearsonr

s1_list = []
s2_list = []
s3_list = []
s4_list = []

zip_list1 = []
zip_list2 = []

correlation1 = []
for x, y in zip(speed1_list, speed2_list):
    zip1 = {"s1": float(x), "s2": float(y)}
    s1_list.append(zip1["s1"])
    s2_list.append(zip1["s2"])
    zip_list1.append(zip1)
    correlation1.append(pearsonr(s1_list,s2_list))

print correlation1

输入:

speed1_list: [113.0, 116.0, 120.0, 120.0, 117.0, 127.0, 124.0, 118.0, 124.0, 128.0, 128.0, 125.0, 112.0, 122.0, 125.0, 133.0, 128.0, 129.0, 126.0, 123.0, 120.0, 118.0, 114.0, 119.0, 129.0, 127.0, 128.0, 122.0, 120.0, 125.0, 119.0...]

speed2_list: [125.0, 123.0, 120.0, 115.0, 124.0, 120.0, 120.0, 119.0, 119.0, 122.0, 121.0, 116.0, 116.0, 119.0, 116.0, 113.0, 113.0, 115.0, 120.0, 122.0, 122.0, 113.0, 118.0, 121.0, 120.0, 119.0, 116.0...]

相关性1:(0.52543523179249552, 0.0), (0.52543905756911169, 0.0), (0.52544196572206603, 0.0), (0.52545010637443945, 0.0)...

4

1 回答 1

0

如果您阅读pearsonr 函数的文档,您会看到第二项是 p 值,它给出了数据集之间的 Pearson 相关性等于 0 的概率。

如果我在您的示例列表上运行您的代码,我只会得到一个 0 p 值:

相关性1 =[(nan, nan), (-1.0, 0.0), (-0.99946642948624609, 0.020797462218684917), (-0.87259228616792028, 0.12740771383207972), (-0.82714719627765909, 0.083995277603981247), (-0.58025386521762756, 0.22730335863992135), (-0.57868746304695651, 0.17345428063365897), (-0.53247171319158504, 0.17427615080621298), ...

但我猜你给出的值correlation1来自列表中更远的地方,在那里你有足够的样本让你的相关性非常精确,因此 p 值为 0。

于 2016-02-18T13:11:22.530 回答