0

最近我ggparcoord()在R中使用时遇到了一个问题。我想为平行坐标图中的线条添加一些标签,但我似乎无法做到。

这是一个MWE:

A <- rnorm(200, 60, 200)
B <- rnorm(200, 40, 126)
C <- rnorm(200, 200, 800)
D <- c( rep("C1", 50), rep("C2", 50), rep("C3", 50), rep("C4", 50) )

df <- data.frame(A, B, C, D)

ggparcoord(df, columns = c(1, 2, 3), groupColumn = 4) + 
  geom_line(size = 0.25) + geom_text(label = "x", hjust = -0.5) +
  ggtitle("Var relationships across clusters") + 
  xlab("My dimensions") + ylab("Scaled values") +
  scale_colour_manual(values = c("C1" = "#2166ac", 
                                 "C2" = "#67a9cf", 
                                 "C3" = "#ef8a62",
                                 "C4" = "#b2182b"))

所以这种工作,并在 3 个轴上添加一个“x”。当我想提供适当长度的字符向量而不是“x”时,就会出现问题。因此,例如:

my_labs <- sample(LETTERS, nrow(df), replace = T)

ggparcoord(df, columns = c(1, 2, 3), groupColumn = 4) + 
  geom_line(size = 0.25) + geom_text(label = rep(my_labs, 3), hjust = -0.5 ) +
  ggtitle("Var relationships across clusters") + 
  xlab("My dimensions") + ylab("Scaled values") +
  scale_colour_manual(values = c("C1" = "#2166ac", 
                                 "C2" = "#67a9cf", 
                                 "C3" = "#ef8a62",
                                 "C4" = "#b2182b"))

在这里,我将my_labs向量乘以 3,以匹配ggparcoord()3 个轴所需的长度(理论上)。但令人惊讶的是,这仍然失败:Error: Aesthetics must be either length 1 or the same as the data (4): label, hjust. 我真的不明白这甚至意味着什么,data (4)在里面。帮助表示赞赏!

PS。在我的真实数据中,我计划实际上只标记相关的行子集;其他的将""在字符向量中有一个。所以我不太担心情节过于拥挤。谢谢!

4

2 回答 2

0

按照上面评论中的说明,我不确定你会对标签有多少控制权。另一种选择,尽管涉及更多,是远离ggparcoord并仅使用ggplot. 如果你这样做,你可以标记任何你喜欢的点。缺点是更多的工作,你必须自己重新扩展。

A <- rnorm(200, 60, 200)
B <- rnorm(200, 40, 126)
C <- rnorm(200, 200, 800)
D <- c( rep("C1", 50), rep("C2", 50), rep("C3", 50), rep("C4", 50) )

df <- data.frame(A, B, C, D)

# Re-scaling the numeric columns, and adding column D to a new data frame
# Use a different type of scaling if needed
dfScaled <- data.frame(scale(df[,1:3]), D)

# Check that we get mean of 0 and sd of 1
colMeans(dfScaled[,1:3])
apply(dfScaled[,1:3], 2, sd)

require(reshape2)
# Turn the data into long format
# Add a "row" variable that will help keep track of what row the data came from
# Use df or dfScaled
df2 <- melt(data.frame(dfScaled, row = 1:nrow(dfScaled)),
            id.vars = c("D", "row"),
            measure.vars = c("A", "B", "C" ),
            variable.name = "OrgCol",
            value.name = "Value"
)

# Reordering may help see the original structure better
# the first 3 rows was your original first row
odf2 <- df2[order(df2$row, df2$OrgCol), ]

# Add whatever labels you want, making them all blank here
odf2$my_labs <- ""

# Here only labeling the end (far right point) of the first line
# (first line is from row 1 of the original df)
odf2$my_labs[3] <- "A"

# See the structure
head(odf2)

# Create the plot with lines connected by row, colored by D
# I colored the one labeled point green just to make it stand out
ggplot(odf2, aes(x = OrgCol, y = Value, group = row, color = D)) + geom_line() +
 geom_text(aes(label = my_labs), colour = "green") +
  ggtitle("Var relationships across clusters") + 
  xlab("My dimensions") + ylab("Scaled values") +
  scale_colour_manual(values = c("C1" = "#2166ac", 
                                 "C2" = "#67a9cf", 
                                 "C3" = "#ef8a62",
                                 "C4" = "#b2182b"))

在此处输入图像描述

于 2017-04-11T16:23:57.120 回答
-1

感谢@aosmith 的帮助(非常感谢!),我找到了具体问题的答案。将标签放在数据框之外并保存其他列并不是直接的问题,而是关键问题是我没有将标签包装aes()geom_text().

所以,我将把我的标签放在实际数据之外,因为我想手动调整 600 长度的向量。有点hacky,我知道,但它会工作。这是因为如果我将 200 个标签放在数据框中,它们会在所有 3 个ggparcoord()轴上重复,这是我不想要的。我希望它们仅位于绘图/轴的一侧,其余最多 600 的位置被转换为空填充器(或"")。所以我发现的解决方法是这样的,虽然确实aes()在使用geom_text()

# Given same data above:

# Creating a label vector:
my_labs <- sample(LETTERS, nrow(df), replace = T)

# Adding some gaps to avoid overcrowding. 
# Shall keep only one in 10 labels, to illustrate what the 4 groups are about :
to_keep <- seq( 1, length( my_labs ), by = 10 )
to_remove <- setdiff( 1 : length( my_labs ), to_keep )
my_labs[ to_remove ] <- ""

# Here adding filler to the vector, to create a length of 600:
my_labs <- c( my_labs, rep( "", 2 * length( my_labs ) ) )


ggparcoord(df, columns = c(1, 2, 3), groupColumn = 4) + 
  geom_line(size = 0.25) + geom_text( aes(label = my_labs), hjust = 1.5 ) +
  ggtitle("Var relationships across clusters") + 
  xlab("My dimensions") + ylab("Scaled values") +
  scale_colour_manual(values = c("C1" = "#2166ac", 
                                 "C2" = "#67a9cf", 
                                 "C3" = "#ef8a62",
                                 "C4" = "#b2182b"))
于 2017-04-11T16:22:35.393 回答