-2

我正在尝试计算与之相关的每个分类形状的 UFO 瞄准(连续)的平均持续时间。本质上,每个 UFO 形状的平均瞄准长度是多少?

我试过了:

    a <- aggregate(duration..seconds. ~ shape, data=alien, FUN=mean, na.rm=TRUE)
    barplot(a$duration..seconds., names.arg=a$shape)

并得到:

    no non-missing arguments to min; returning Infno non-missing arguments to max; 
    returning -InfError in plot.window(xlim, ylim, log = log, ...) : need finite 'ylim' values

我意识到我需要以某种方式更改我的数据。我想简单地删除所有缺少相应数据的数据(即,我们知道形状但缺少持续时间 - 反之亦然),但我不太清楚如何做到这一点。

谢谢你的帮助!

PS。“持续时间..秒”。是正确的,这就是它从excel文件中传输过来的方式。

    shape       duration..seconds.
    us  changing    3600    NA  4/27/2004   29.8830556  
    us  changing    300     NA  12/16/2005  29.38421    
    us  changing    3600    NA  1/21/2008   53.2    
    us  changing    900     NA  1/17/2004   28.9783333  
    ca  changing    1200    NA  1/22/2004   21.4180556  
    us  changing    3600    NA  4/27/2007   36.595  

有 80000 条 UFO 目击记录,这就是我试图对其进行平均的原因。并且有29种不同的形状。

4

1 回答 1

0

数据

df <- read.table(text="
country shape  duration_seconds dummy1 date dummy2
us  changing    3600    NA  4/27/2004   29.8830556  
us  changing    300     NA  12/16/2005  29.38421    
us  changing    3600    NA  1/21/2008   53.2    
us  changing    900     NA  1/17/2004   28.9783333  
ca  changing    1200    NA  1/22/2004   21.4180556  
us  changing    3600    NA  4/27/2007   36.595  
", header = TRUE, stringsAsFactors = FALSE)

您可以使用修复列标题

names(df) <- c("country", "shape", "duration_seconds", "dummy1", "date", "dummy2")

使用库 dplyr

library(dplyr)
df %>% 
  group_by(shape)  %>%
  summarize(mean_duration_seconds = mean(duration_seconds))

#   shape    mean_duration_seconds
#   <chr>                    <dbl>
# 1 changing                 2200.

并使用原始代码

names(df) <- c("country", "shape", "duration_seconds", "dummy1", "date", "dummy2")
a <- aggregate(duration_seconds ~ shape, data=df, FUN=mean, na.rm=TRUE)
barplot(a$duration_seconds, names.arg=a$shape)

a
#   shape    duration_seconds
# 1 changing             2200
于 2018-04-01T22:08:29.530 回答