r - 在线条的开始和结束处添加形状，并在线条的某个间隔处添加形状，由分组变量定义

Question

那是我的 df（几乎 100,000 行和 10 个 ID 值）

               Date.time       P    ID
    1   2013-07-03 12:10:00 1114.3  J9335
    2   2013-07-03 12:20:00 1114.5  K0904
    3   2013-07-03 12:30:00 1114.3  K0904
    4   2013-07-03 12:40:00 1114.1  K1136
    5   2013-07-03 12:50:00 1114.1  K1148
............

使用 ggplot 我创建了这个图表：

ggplot(df) + geom_line(aes(Date.time, P, group=ID, colour=ID)

在此处输入图像描述

这张图没有问题。但是目前我还必须以黑白方式打印它，颜色分离不是一个明智的选择。我尝试将 ID 与行类型分组，但结果并不那么令人兴奋。所以我的想法是在每一行的开头和结尾添加一个不同的符号：这样不同的 ID 也可以在 ab/w 论文中识别。
我添加以下行：

geom_point(data=df, aes(x=min(Date.time), y=P, shape=ID))+
geom_point(data=df, aes(x=max(Date.time), y=P, shape=ID))

但是发生错误..有什么建议吗？

鉴于每一行都由大约 5000 或 10000 个值组成，因此不可能将这些值绘制为不同的字符。一种解决方案可能是绘制线条，然后将点绘制为每个 ID 的不同符号（例如，每 500 个值一个字符）。有可能这样做吗？

score 3 · Accepted Answer

使用仅具有最小-最大时间值的数据添加geom_points怎么样？subset

# some data
df <- data.frame(
  ID = rep(c("a", "b"), each = 4),
  Date.time = rep(seq(Sys.time(), by = "hour", length.out = 4), 2),
  P = sample(1:10, 8))
df

# create a subset with min and max time values
# if min(x) and max(x) is the same for each ID:
df_minmax <- subset(x= df, subset = Date.time == min(Date.time) | Date.time == max(Date.time))

# if min(x) and max(x) may differ between ID,
# calculate min and max values *per* ID
# Here I use ddply, but several other aggregating functions in base R will do as well.
library(plyr)
df_minmax <- ddply(.data = df, .variables = .(ID), subset,
             Date.time == min(Date.time) | Date.time == max(Date.time))


gg <- ggplot(data = df, aes(x = Date.time, y = P)) +
  geom_line(aes(group = ID, colour = ID)) +
  geom_point(data = df_minmax, aes(shape = ID))

gg

如果你想对你的 s 有一些控制shape，你可以看看?scale_shape_discrete（这里有例子）。

编辑以下更新的问题
对于每个 ID，每隔一段时间向该行添加一个形状。

# create a slightly larger data set
df <- data.frame(
  ID = rep(c("a", "b"), each = 100),
  Date.time = rep(seq(Sys.time(), by = "day", length.out = 100), 2),
  P = c(sample(1:10, 100, replace = TRUE), sample(11:20, 100, replace = TRUE)))


# for each ID:
# create a time sequence from min(time) to max(time), by some time step
# e.g. a week
df_gap <- ddply(.data = df, .variables = .(ID), summarize,
             Date.time =
                  seq(from = min(Date.time), to = max(Date.time), by = "week"))

# add P from df to df_gap
df_gap <- merge(x = df_gap, y = df)


gg <- ggplot(data = df, aes(x = Date.time, y = P)) +
    geom_line(aes(group = ID, colour = ID)) +
    geom_point(data = df_gap, aes(shape = ID)) +
    # if your gaps are not a multiple of the length of the data
    # you may wish to add the max points as well
    geom_point(data = df_minmax, aes(shape = ID))

gg

score 1 · Accepted Answer

该错误源于单个数值 min(Date.time) 在长度上与向量 P 或 ID 不匹配的事实。另一个问题可能是即使您已经拥有 ggplot(df)，您仍在重新声明数据变量。

立即想到的解决方案是找出最小和最大日期的行索引。如果它们都共享相同的最小和最大时间戳，那就容易了。使用 which() 函数得出您需要的行号数组。

min.index <- which(df$Date.time == min(df$Date.time))
max.index <- which(df$Date.time == max(df$Date.time))

然后使用这些数组作为索引。

geom_point(aes(x=Date.time[min.index], y=P[min.index], shape=ID[min.index]))+
geom_point(aes(x=Date.time[max.index], y=P[max.index], shape=ID[max.index]))

r - 在线条的开始和结束处添加形状，并在线条的某个间隔处添加形状，由分组变量定义

2 回答 2

Related

Reference