enter code here
我正在使用如下所示的数据集“Final.Export”:
LakeID LakeName SourceVariableName SourceVariableDescription SourceFlags
47 390 Moosehead Acolor(PCU) Apparent color <NA>
48 390 Moosehead Acolor(PCU) Apparent color <NA>
49 390 Moosehead Acolor(PCU) Apparent color <NA>
50 390 Moosehead Acolor(PCU) Apparent color <NA>
51 390 Moosehead Acolor(PCU) Apparent color <NA>
52 390 Moosehead Acolor(PCU) Apparent color <NA>
53 390 Moosehead Acolor(PCU) Apparent color <NA>
54 390 Moosehead Acolor(PCU) Apparent color <NA>
55 390 Moosehead Acolor(PCU) Apparent color <NA>
56 390 Moosehead Acolor(PCU) Apparent color <NA>
LagosVariableID LagosVariableName Value Units CensorCode DetectionLimit Date
47 11 Color, apparent 22 PCU NC NA 2003-08-26
48 11 Color, apparent 17 PCU NC NA 2003-08-26
49 11 Color, apparent 16 PCU NC NA 2003-08-26
50 11 Color, apparent 14 PCU NC NA 2003-08-26
51 11 Color, apparent 14 PCU NC NA 2003-08-26
52 11 Color, apparent 17 PCU NC NA 2003-08-26
53 11 Color, apparent 16 PCU NC NA 2003-08-26
54 11 Color, apparent 17 PCU NC NA 2003-08-26
55 11 Color, apparent 14 PCU NC NA 2003-08-26
56 11 Color, apparent 17 PCU NC NA 2003-08-26
LabMethodName LabMethodInfo SampleType SamplePosition SampleDepth MethodInfo
47 <NA> <NA> INTEGRATED SPECIFIED 6 <NA>
48 <NA> <NA> INTEGRATED SPECIFIED 7 <NA>
49 <NA> <NA> INTEGRATED SPECIFIED 6 <NA>
50 <NA> <NA> INTEGRATED SPECIFIED 10 <NA>
51 <NA> <NA> INTEGRATED SPECIFIED 10 <NA>
52 <NA> <NA> INTEGRATED SPECIFIED 9 <NA>
53 <NA> <NA> INTEGRATED SPECIFIED 10 <NA>
54 <NA> <NA> INTEGRATED SPECIFIED 8 <NA>
55 <NA> <NA> INTEGRATED SPECIFIED 10 <NA>
56 <NA> <NA> INTEGRATED SPECIFIED 10 <NA>
BasinType Subprogram Comments Dup
47 UNKNOWN NA NA NA
48 UNKNOWN NA NA NA
49 UNKNOWN NA NA NA
50 UNKNOWN NA NA NA
51 UNKNOWN NA NA NA
52 UNKNOWN NA NA NA
53 UNKNOWN NA NA NA
54 UNKNOWN NA NA NA
55 UNKNOWN NA NA NA
56 UNKNOWN NA NA NA
我想将所有重复值标记为 1。重复值定义为在“LakeID”、“Date”、“LagosVariableID”、“SampleDepth”和“SamplePosition”列的每一列中具有完全相同值的值。
为此,我使用以下代码创建了一个新的数据表“data1”:
library(data.table)
data1=data.table(Final.Export,key=c('LakeID','Date','LagosVariableID','SampleDepth','SamplePosition','Value'))
data1=data1[,Dup:=duplicated(.SD),.SDcols=c('LakeID','Date', 'LagosVariableID', 'SampleDepth', 'SamplePosition','Value')]
data1$Dup[which(data1$Dup==FALSE)]=NA
data1$Dup[which(data1$Dup==TRUE)]=1
“data1”的问题是只有在第一个唯一行(标记为 NA)之后的重复行(根据我对重复的定义)被标记为“1”。我需要将唯一行和相关的重复行标记为“1”。任何想法如何做到这一点?
如果这令人困惑,请告诉我如何澄清。