首先,一些假设:
- 合并的标题位于 CSV 的第一行
- 合并的标题从 CSV 的第二列开始
- CSV 重复第二行中的变量名称(第一列中的变量除外)
第二,你的数据。
temp = c(",\"H\",,\"J\",",
"\"Y\",\"M\",\"F\",\"M\",\"F\"",
"\"Y1\",\"V1\",\"V2\",\"V3\",\"V4\"")
第三,这个答案的略微修改版本。
# check.names is set to FALSE to allow variable names to be repeated
ONE = read.csv(textConnection(temp), skip=1, check.names=FALSE,
stringsAsFactors=FALSE)
GROUPS = read.csv(textConnection(temp), header=FALSE,
nrows=1, stringsAsFactors=FALSE)
GROUPS = GROUPS[!is.na(GROUPS)]
# This can be shortened, but I've written it this way to show how
# it can be generalized. For instance, if 3 columns were repeated
# instead of 2, the rep statement could be changed to reflect that
names(ONE)[-1] = paste0(names(ONE)[-1], ".",
rep(GROUPS, each=(length(names(ONE)[-1])/2)))
第四,数据的实际重塑。
TWO = reshape(ONE, direction="long", ids=1, varying=2:ncol(ONE))
# And, here's the output.
TWO
# Y time M F id
# 1.H Y1 H V1 V2 1
# 1.J Y1 J V3 V4 1