I have a large pandas dataframe like this:
log apple watermelon orange lemon grapes
1 1 1 yes 0 0
1 2 0 1 0 0
1 True 0 0 0 2
2 0 0 0 0 2
2 1 1 yes 0 0
2 0 0 0 0 2
2 0 0 0 0 2
3 True 0 0 0 2
4 0 0 0 0 2.1
4 0 0 0 0 2.1
How can I label the rows that are the same, for example:
log apple watermelon orange lemon grapes ID
1 1 1 yes 0 0 1
1 2 0 1 0 0 2
1 True 0 0 0 2 3
2 0 0 0 0 2 4
2 1 1 yes 0 0 1
2 0 0 0 0 2 4
2 0 0 0 0 2 4
3 True 0 0 0 2 3
4 0 0 0 0 2.1 5
4 0 0 0 0 2.1 5
I tried to:
df['ID']=df.groupby('log')[df.columns].transform('ID')
And
df['personid'] = df['log'].clip_upper(2) - 2*d.duplicated(subset='apple')
df
However, the above doesnt work because I literally have a lot of columns.
But its not giving me the expected output. Any idea of how to group and label this dataframe?