4

我有一个数据框是这样的:

输入(xx)

structure(list(TimeStamp = structure(c(15705, 15706), class = "Date"), 
    Host = c("Host1", "Host2"), OS = structure(c(1L, 1L), .Label = "solaris", class = "factor"), 
    ID = structure(c(1L, 1L), .Label = "1234", class = "factor"), 
    Class = structure(c(1L, 1L), .Label = "Processor", class = "factor"), 
    Stat = structure(c(1L, 1L), .Label = "CPU", class = "factor"), 
    Instance = structure(c(1L, 1L), .Label = c("_Total", "CPU0", 
    "CPU1", "CPU10", "CPU11", "CPU12", "CPU13", "CPU14", "CPU15", 
    "CPU16", "CPU17", "CPU18", "CPU19", "CPU2", "CPU20", "CPU21", 
    "CPU22", "CPU23", "CPU3", "CPU4", "CPU5", "CPU6", "CPU7", 
    "CPU8", "CPU9"), class = "factor"), Average = c(4.39009345794392, 
    5.3152972972973), Min = c(3.35, -0.01), Max = c(5.15, 72.31
    )), .Names = c("TimeStamp", "Host", "OS", "ID", "Class", 
"Stat", "Instance", "Average", "Min", "Max"), row.names = c(NA, 
-2L), class = "data.frame")

这个数据框很大,它有很多主机。我面临的挑战是,当像上面这样的主机没有足够的数据点时,下面的 ggplot 会失败,基本上是抱怨没有足够的数据点来绘制图表。

ggplot(xx, aes(TimeStamp, Max, group=Host, colour=Host)) + geom_point() + geom_smooth(mehtod="loess")

如何检查并查看此数据框中的特定主机是否有超过 10 个数据点,如果是,请使用 method="loess"。如果主机的数据点数小于 10,则使用 method="lm"

4

3 回答 3

2

是的,很难找到,但似乎有可能,

# for reproducibility
set.seed(42)
# The idea is to first split the data to < 10 and >= 10 points
# I use data.table for that
require(data.table)
dt <- data.frame(Host = rep(paste("Host", 1:10, sep=""), sample(1:20, 10)), 
         stringsAsFactors = FALSE)
dt <- transform(dt, x=sample(1:nrow(dt)), y = 15*(1:nrow(dt)))
dt <- data.table(dt, key="Host")
dt1 <- dt[, .SD[.N >= 10], by = Host]
dt2 <- dt[, .SD[.N < 10], by = Host]

# on to plotting now    
require(ggplot2)
# Now, dt1 has all Hosts with >= 10 observations and dt2 the other way round
# plot now for dt1
p <- ggplot(data=dt1, aes(x = x, y = y, group = Host)) + geom_line() + 
         geom_smooth(method="loess", se=T)
# plot geom_line for dt2 by telling the data and aes
# The TRICKY part: add geom_smooth by telling data=dt2
p <- p + geom_line(data = dt2, aes(x=x, y=y, group = Host)) + 
            geom_smooth(data = dt2, method="lm", se=T)

p

(这是一个丑陋的例子。但它给了你这个想法)。 ggplot2

于 2013-01-16T15:26:49.810 回答
1

添加到 Arun 的出色答案中,我认为您只需要在视觉上进行区分,例如使用实线表示黄土,使用虚线表示 lm:

p <- ggplot(data=dt1, aes(x = x, y = y, group = Host)) + geom_line() + 
         geom_smooth(method='loess', linetype='solid', se=T)

p <- p + geom_line(data = dt2, aes(x=x, y=y, group = Host)) + 
            geom_smooth(data = dt2, method='lm', linetype='dashed', se=T)
于 2014-03-28T17:25:51.497 回答
0

可以通过复制数据点和设置geom_smooth函数的 span 参数来防止警告消息。例如:

data <- rbind(dt1, dt2)
p <- ggplot(data=dt1, aes(x = x, y = y, group = Host)) + geom_line() + 
         geom_smooth(method='loess', span = 1.4, se=T)

如果警告仍然存在,您可以尝试使用不同的 span 参数值。

于 2021-05-05T10:41:14.397 回答