I have a dataset that looks like:
ColA ColB ColC ColD ColE
rs778 C Can + C/T
rs778 C Pro + C/T
rs779 P Can + A/G
rs779 P Can - A/G
I want to remove duplicate entries in Column A based on column C. Said another way, if two entries in Column A are the same, I want the row that stays to be determined by the entry in Column C. If the entries in Column C are the same, then the row that stays should be determined by Column D. If "Can" > "Pro" and "+" > "-", then the final output I'm looking for would look like this:
ColA ColB ColC ColD ColE
rs778 C Can + C/T
rs779 P Can + A/G
I removed completely duplicated data using:
data2 <- data[!duplicated(data[-2]),]
And am hoping my solution lies in some modification of this I have yet to discover. Thanks for your help!