0

我有一个如下命名的 .csv 表mailing.csv。它由接收者、主题和发送者组成。

       Receiver                                subject       sender
1   Adrian Cole    RE: [WHIRR-117] Composable services    Tom White
2   Adrian Cole    RE: [WHIRR-117] Composable services    Tom White
3   Adrian Cole    RE: [WHIRR-117] Composable services  Adrian Cole
4   Adrian Cole    RE: [WHIRR-117] Composable services  Adrian Cole
5   Adrian Cole    RE: [WHIRR-117] Composable services    Tom White
6   Adrian Cole    RE: [WHIRR-117] Composable services  Adrian Cole
7   Adrian Cole    RE: [WHIRR-117] Composable services    Tom White
8   Adrian Cole    RE: [WHIRR-117] Composable services    Tom White
9   Adrian Cole    RE: [WHIRR-117] Composable services  Adrian Cole
10  Adrian Cole    RE: [WHIRR-117] Composable services  Adrian Cole
11  Adrian Cole    RE: [WHIRR-117] Composable services    Tom White
12  Adrian Cole    RE: [WHIRR-117] Composable services    Tom White
13  Adrian Cole    RE: [WHIRR-117] Composable services    Tom White
14  Adrian Cole    RE: [WHIRR-117] Composable services    Tom White
15 Patrick Hunt RE: [WHIRR-123] Cassandra integration     Tom White
16 Patrick Hunt RE: [WHIRR-123] Cassandra integration   Andrei Savu
17 Patrick Hunt RE: [WHIRR-123] Cassandra integration   Andrei Savu
18 Patrick Hunt RE: [WHIRR-123] Cassandra integration     Tom White
19 Patrick Hunt RE: [WHIRR-123] Cassandra integration     Tom White
20 Patrick Hunt RE: [WHIRR-123] Cassandra integration   Adrian Cole
21 Patrick Hunt RE: [WHIRR-123] Cassandra integration     Tom White
22 Patrick Hunt RE: [WHIRR-123] Cassandra integration  Patrick Hunt

我想做的是将上表中的信息更新/映射到 .csv 模板(命名为AC_template.csv),然后使用 backet 中的主题详细信息作为文件名(例如AC_WHIRR-117)将其保存在单独的文件中。对于上表,它应该创建两个新文件名,分别为AC_WHIRR-117AC_WHIRR-123

示例 .csv 模板 ( AC_template.csv) 如下:

                Adrian.Cole Patrick.Hunt Andrei.Savu Bruno.Dumon Edward.J..Yoon Eugene.Koontz Jakob.Homan Kelvin.Kakugawa Tom.White
Adrian Cole               0            0           0           0              0             0           0               0         0
Patrick Hunt              0            0           0           0              0             0           0               0         0
Andrei Savu               0            0           0           0              0             0           0               0         0
Bruno Dumon               0            0           0           0              0             0           0               0         0
Edward J. Yoon            0            0           0           0              0             0           0               0         0
Eugene Koontz             0            0           0           0              0             0           0               0         0
Jakob Homan               0            0           0           0              0             0           0               0         0
Kelvin Kakugawa           0            0           0           0              0             0           0               0         0
Tom White                 0            0           0           0              0             0           0               0         0
Lars George               0            0           0           0              0             0           0               0         0
Soren Macbeth             0            0           0           0              0             0           0               0         0
                Lars.George Soren.Macbeth
Adrian Cole               0             0
Patrick Hunt              0             0
Andrei Savu               0             0
Bruno Dumon               0             0
Edward J. Yoon            0             0
Eugene Koontz             0             0
Jakob Homan               0             0
Kelvin Kakugawa           0             0
Tom White                 0             0
Lars George               0             0
Soren Macbeth             0             0

此问题的示例输出如下:

AC_WHIRR-117 的示例输出:

                Adrian.Cole Patrick.Hunt Andrei.Savu Bruno.Dumon Edward.J..Yoon Eugene.Koontz Jakob.Homan Kelvin.Kakugawa Tom.White
Adrian Cole               0            0           0           0              0             0           0               0         9
Patrick Hunt              0            0           0           0              0             0           0               0         0
Andrei Savu               0            0           0           0              0             0           0               0         0
Bruno Dumon               0            0           0           0              0             0           0               0         0
Edward J. Yoon            0            0           0           0              0             0           0               0         0
Eugene Koontz             0            0           0           0              0             0           0               0         0
Jakob Homan               0            0           0           0              0             0           0               0         0
Kelvin Kakugawa           0            0           0           0              0             0           0               0         0
Tom White                 9            0           0           0              0             0           0               0         0
Lars George               0            0           0           0              0             0           0               0         0
Soren Macbeth             0            0           0           0              0             0           0               0         0
                Lars.George Soren.Macbeth
Adrian Cole               0             0
Patrick Hunt              0             0
Andrei Savu               0             0
Bruno Dumon               0             0
Edward J. Yoon            0             0
Eugene Koontz             0             0
Jakob Homan               0             0
Kelvin Kakugawa           0             0
Tom White                 0             0
Lars George               0             0
Soren Macbeth             0             0

AC_WHIRR-123 的样本输出

               Adrian.Cole Patrick.Hunt Andrei.Savu Bruno.Dumon Edward.J..Yoon Eugene.Koontz Jakob.Homan Kelvin.Kakugawa Tom.White
Adrian Cole               0            1           0           0              0             0           0               0         0
Patrick Hunt              1            0           2           0              0             0           0               0         4
Andrei Savu               0            2           0           0              0             0           0               0         0
Bruno Dumon               0            0           0           0              0             0           0               0         0
Edward J. Yoon            0            0           0           0              0             0           0               0         0
Eugene Koontz             0            0           0           0              0             0           0               0         0
Jakob Homan               0            0           0           0              0             0           0               0         0
Kelvin Kakugawa           0            0           0           0              0             0           0               0         0
Tom White                 0            4           0           0              0             0           0               0         0
Lars George               0            0           0           0              0             0           0               0         0
Soren Macbeth             0            0           0           0              0             0           0               0         0
                Lars.George Soren.Macbeth
Adrian Cole               0             0
Patrick Hunt              0             0
Andrei Savu               0             0
Bruno Dumon               0             0
Edward J. Yoon            0             0
Eugene Koontz             0             0
Jakob Homan               0             0
Kelvin Kakugawa           0             0
Tom White                 0             0
Lars George               0             0
Soren Macbeth             0             0

dput(head)mailing.csv如下:

structure(list(Receiver = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Adrian Cole", 
"Patrick Hunt"), class = "factor"), subject = structure(c(1L, 
1L, 1L, 1L, 1L, 1L), .Label = c("RE: [WHIRR-117] Composable services", 
"RE: [WHIRR-123] Cassandra integration "), class = "factor"), 
    sender = structure(c(4L, 4L, 1L, 1L, 4L, 1L), .Label = c("Adrian Cole", 
    "Andrei Savu", "Patrick Hunt", "Tom White"), class = "factor")), .Names = c("Receiver", 
"subject", "sender"), row.names = c(NA, 6L), class = "data.frame")

输入(头)为AC_template.csv

structure(list(Adrian.Cole = c(0L, 0L, 0L, 0L, 0L, 0L), Patrick.Hunt = c(0L, 
0L, 0L, 0L, 0L, 0L), Andrei.Savu = c(0L, 0L, 0L, 0L, 0L, 0L), 
    Bruno.Dumon = c(0L, 0L, 0L, 0L, 0L, 0L), Edward.J..Yoon = c(0L, 
    0L, 0L, 0L, 0L, 0L), Eugene.Koontz = c(0L, 0L, 0L, 0L, 0L, 
    0L), Jakob.Homan = c(0L, 0L, 0L, 0L, 0L, 0L), Kelvin.Kakugawa = c(0L, 
    0L, 0L, 0L, 0L, 0L), Tom.White = c(0L, 0L, 0L, 0L, 0L, 0L
    ), Lars.George = c(0L, 0L, 0L, 0L, 0L, 0L), Soren.Macbeth = c(0L, 
    0L, 0L, 0L, 0L, 0L)), .Names = c("Adrian.Cole", "Patrick.Hunt", 
"Andrei.Savu", "Bruno.Dumon", "Edward.J..Yoon", "Eugene.Koontz", 
"Jakob.Homan", "Kelvin.Kakugawa", "Tom.White", "Lars.George", 
"Soren.Macbeth"), row.names = c("Adrian Cole", "Patrick Hunt", 
"Andrei Savu", "Bruno Dumon", "Edward J. Yoon", "Eugene Koontz"
), class = "data.frame")

WHIRR-117 的示例输出如下:

structure(list(Adrian.Cole = c(0L, 0L, 0L, 0L, 0L, 0L), Patrick.Hunt = c(0L, 
0L, 0L, 0L, 0L, 0L), Andrei.Savu = c(0L, 0L, 0L, 0L, 0L, 0L), 
    Bruno.Dumon = c(0L, 0L, 0L, 0L, 0L, 0L), Edward.J..Yoon = c(0L, 
    0L, 0L, 0L, 0L, 0L), Eugene.Koontz = c(0L, 0L, 0L, 0L, 0L, 
    0L), Jakob.Homan = c(0L, 0L, 0L, 0L, 0L, 0L), Kelvin.Kakugawa = c(0L, 
    0L, 0L, 0L, 0L, 0L), Tom.White = c(9L, 0L, 0L, 0L, 0L, 0L
    ), Lars.George = c(0L, 0L, 0L, 0L, 0L, 0L), Soren.Macbeth = c(0L, 
    0L, 0L, 0L, 0L, 0L)), .Names = c("Adrian.Cole", "Patrick.Hunt", 
"Andrei.Savu", "Bruno.Dumon", "Edward.J..Yoon", "Eugene.Koontz", 
"Jakob.Homan", "Kelvin.Kakugawa", "Tom.White", "Lars.George", 
"Soren.Macbeth"), row.names = c("Adrian Cole", "Patrick Hunt", 
"Andrei Savu", "Bruno Dumon", "Edward J. Yoon", "Eugene Koontz"
), class = "data.frame")

WHIRR-123 的示例输出如下:

structure(list(Adrian.Cole = c(0L, 1L, 0L, 0L, 0L, 0L), Patrick.Hunt = c(1L, 
0L, 2L, 0L, 0L, 0L), Andrei.Savu = c(0L, 2L, 0L, 0L, 0L, 0L), 
    Bruno.Dumon = c(0L, 0L, 0L, 0L, 0L, 0L), Edward.J..Yoon = c(0L, 
    0L, 0L, 0L, 0L, 0L), Eugene.Koontz = c(0L, 0L, 0L, 0L, 0L, 
    0L), Jakob.Homan = c(0L, 0L, 0L, 0L, 0L, 0L), Kelvin.Kakugawa = c(0L, 
    0L, 0L, 0L, 0L, 0L), Tom.White = c(0L, 4L, 0L, 0L, 0L, 0L
    ), Lars.George = c(0L, 0L, 0L, 0L, 0L, 0L), Soren.Macbeth = c(0L, 
    0L, 0L, 0L, 0L, 0L)), .Names = c("Adrian.Cole", "Patrick.Hunt", 
"Andrei.Savu", "Bruno.Dumon", "Edward.J..Yoon", "Eugene.Koontz", 
"Jakob.Homan", "Kelvin.Kakugawa", "Tom.White", "Lars.George", 
"Soren.Macbeth"), row.names = c("Adrian Cole", "Patrick Hunt", 
"Andrei Savu", "Bruno Dumon", "Edward J. Yoon", "Eugene Koontz"
), class = "data.frame")

感谢专家的帮助...

4

1 回答 1

2

一些使用plyr带有基本功能的包table。可能需要一些组装。这应该能让你大部分时间到达那里。

#load template
template <- structure(list(Adrian.Cole = c(0L, 0L, 0L, 0L, 0L, 0L), Patrick.Hunt = c(0L, 
0L, 0L, 0L, 0L, 0L), Andrei.Savu = c(0L, 0L, 0L, 0L, 0L, 0L), 
    Bruno.Dumon = c(0L, 0L, 0L, 0L, 0L, 0L), Edward.J..Yoon = c(0L, 
    0L, 0L, 0L, 0L, 0L), Eugene.Koontz = c(0L, 0L, 0L, 0L, 0L, 
    0L), Jakob.Homan = c(0L, 0L, 0L, 0L, 0L, 0L), Kelvin.Kakugawa = c(0L, 
    0L, 0L, 0L, 0L, 0L), Tom.White = c(0L, 0L, 0L, 0L, 0L, 0L
    ), Lars.George = c(0L, 0L, 0L, 0L, 0L, 0L), Soren.Macbeth = c(0L, 
    0L, 0L, 0L, 0L, 0L)), .Names = c("Adrian.Cole", "Patrick.Hunt", 
"Andrei.Savu", "Bruno.Dumon", "Edward.J..Yoon", "Eugene.Koontz", 
"Jakob.Homan", "Kelvin.Kakugawa", "Tom.White", "Lars.George", 
"Soren.Macbeth"), row.names = c("Adrian Cole", "Patrick Hunt", 
"Andrei Savu", "Bruno Dumon", "Edward J. Yoon", "Eugene Koontz"
), class = "data.frame")
#the rownames of this data frame hold the names of senders/receivers 
#that we are interested in
names.to.search <- rownames(template)

#load data frame
mailing <- structure(list(Receiver = structure(c(1L, 1L, 1L, 1L, 1L, 1L), 
    .Label = c("Adrian Cole", "Patrick Hunt"), class = "factor"), 
    subject = structure(c(1L, 1L, 1L, 1L, 1L, 1L), 
    .Label = c("RE: [WHIRR-117] Composable services", 
    "RE: [WHIRR-123] Cassandra integration "), class = "factor"), 
  sender = structure(c(4L, 4L, 1L, 1L, 4L, 1L), .Label = c("Adrian Cole", 
  "Andrei Savu", "Patrick Hunt", "Tom White"), class = "factor")), .Names = c("Receiver", 
    "subject", "sender"), row.names = c(NA, 6L), class = "data.frame")
names(mailing) <- tolower(names(mailing))
#get topic to sort by
mailing$topic <- gsub(".*\\[(.*)\\].*","\\1",mailing$subject)
#restrict to rows that have sender and receiver in names list
mailing <- mailing[mailing$receiver %in% names.to.search & 
    mailing$sender %in% names.to.search,]
library(plyr)
fn <- function(x) {
    with(x, {
        #add NA-name and name-NA to the sender and receiver lists 
        #so that the resulting table is of the right dimension
        receiver <- append(as.character(receiver), 
            c(names.to.search, rep(NA,times=length(names.to.search))))
        sender <- append(as.character(sender), 
            c(rep(NA,times=length(names.to.search)),names.to.search))
        #create the table
        y <- table(receiver,sender)
        #write table to csv file
        write.csv(y,file=paste0("AC_",topic[1],".csv"))
    })
}
#perform fn on each section of data frame by topic
d_ply(mailing,.(topic),fn)
于 2012-09-25T04:57:47.677 回答