3

我正在尝试在 R 中实现本福德定律。到目前为止,一切都相应地工作,除了如果有一些第一个数字出现 0 次,则会引发异常:

Error in data.frame(digit = 1:9, actual.count = first_digit_counts, actual.fraction = first_digit_counts/nrow(fraudDetection),  : 
  arguments imply differing number of rows: 9, 5

这是因为对于我当前的数据集,只有第一个数字以 1、2、7、8 和 9 开头。我怎样才能使 3、4、5、6 的计数为 0 而不是不出现完全在桌子上?

当前数据集:

数据集

这是导致抛出异常的部分:

first_digit_counts <- as.vector(table(fraudDetection$first.digit))

该代码适合的当前代码如下:

# load the required packages
require(reshape)
require(stringr)
require(plyr)
require(ggplot2)
require(scales)

# load in data from CSV file
fraudDetection <- read.csv("Fraud Case in Arizona 1993.csv")
names(fraudDetection)

# take only the columns containing the counts and manipulate the data into a "long" format with only one value per row
# let's try to compare the amount of the fraudulent transactions against the Benford's Law
fraudDetection <- melt(fraudDetection["Amount"])

# add columns containing the first and last digits, extracted using regular expressions
fraudDetection <- ddply(fraudDetection, .(variable), transform, first.digit = str_extract(value, "[123456789]"), last.digit  = str_extract(value, "[[:digit:]]$"))

# compare counts of each actual first digit against the counts predicted by Benford’s Law
first_digit_counts <- as.vector(table(fraudDetection$first.digit))
first_digit_actual_vs_expected <- data.frame(
digit            = 1:9,
actual.count     = first_digit_counts,    
actual.fraction  = first_digit_counts / nrow(fraudDetection),
benford.fraction = log10(1 + 1 / (1:9))
)
4

3 回答 3

7

为了确保所有数字都用 表示first_digit_counts,您可以转换first.digit为一个因子,显式设置级别,使它们包括从 1 到 9 的所有数字:

first_digit = c(1, 1, 3, 5, 5, 5, 7, 7, 7, 7, 9)
first_digit_factor = factor(first_digit, levels=1:9) # Explicitly set the levels

这使您的table通话按预期执行:

> table(first_digit)
first_digit
1 3 5 7 9 
2 1 3 4 1 
> table(first_digit_factor)
first_digit_factor
1 2 3 4 5 6 7 8 9 
2 0 1 0 3 0 4 0 1 
> as.vector(table(first_digit_factor))
[1] 2 0 1 0 3 0 4 0 1
于 2013-07-10T07:04:43.550 回答
3

rattle包中提供了一个功能

library(rattle)
dummy <- rnorm(100)
calcInitialDigitDistr(dummy, split = "none")
于 2013-07-10T07:16:28.407 回答
2

有用的一行功能

benford = function(x) barplot(table(as.numeric(substr(x,1,1))))

benford(ggplot2::diamonds$price)

在此处输入图像描述

于 2016-05-06T11:50:25.257 回答