python - 本福德定律计算 csv 文件中的前导数字

Question

我是 python 新手，正在编写一个从 .csv 文件中读取值的程序，然后显示一个图表，显示测试结果与本福德定律的预期输出相比。

.csv 文件具有我需要在第一列中读取的贷款值，如下所示：

Values  Leading Digit   Number of occurances
170     1               88                   
900     9               62          
250     2               44          
450     4               51          
125     1               19          
.....

主文件 app.py：

 ...
 filename = filedialog.askopenfilename(filetypes=(
    ("Excel files", "*.csv"), ("All files", "*.*")))
 print(filename)
 try:
    with open(filename, 'rt') as csvfile:
        reader = csv.reader(csvfile, delimiter=',')
        next(reader, None)  # skip the headers
        for row in reader:
            minutePriceCloses.append(row[0])
            # calculate the percentage distribution of leading digits
        benford_test_data_dist = calc.getBenfordDist(minutePriceChanges)
        ....

在 calc.py 中：

import numpy as np


def getBenfordDist(data):
# set initial dist to zero
dist = [0, 0, 0, 0, 0, 0, 0, 0, 0]
# for each figure, check what the first non-zero digit is, hacky multiply
# by 1000000 to handle small values
for d in data:
    # sneaky multiply by 1000000 to ensure that the leading digit is unlikely to be zero
    # since benfords law is assumed to relate somehow to scale invariance, this *SHOULDN'T* make a difference
    # but it might, so this might all be wrong :-)
    s = str(np.abs(d) * 1000000)
    for i in range(0, 8):
        if(s.startswith(str(i + 1))):
            dist[i] = dist[i] + 1
            break
# return fractions of the total for each digit
percentDist = []
# convert to % - todo, start using numpy vectors that allow scalar mult/div
for count in dist:
    percentDist.append(float(count) / len(data))
    # print(float(count))
return percentDist

现在我遇到的问题是图形输出没有正确显示值列计数除以具有值的总行数的百分比结果，即对于前导数字为 1 的值，图形上的百分比应该是 0.25 和很快。有 352 行。

请帮忙。谢谢

python - 本福德定律计算 csv 文件中的前导数字

0 回答 0

Related

Reference