I have the data frame, below is sample data from it.
Company Category Margin
SBI BK 34.5
PNB BK 39.5
UCO BANK BK 39.9
BANK BK 41.3
INDIAN BANK BK 42.3
DENA BANK BK 44.5
VIJAYA BANK BK 44.5
UNION BANK BK 47.6
CENTRAL BANK BK 49.8
INFOSYS IT 5.6
HCL TECH IT 5.9
TCS IT 6.9
CMC IT 12.6
TECHMAHINDRA IT 12.6
COGNIZANT IT 15.8
IGATE IT 22.4
WIPRO IT 22.9
HEXAWARE IT 34.8
MAHINDRA SATYAM IT 34.8
DR. REDDYS PH 14.5
SUN PHARMA PH 19.2
CIPLA PH 23.9
LUPIN PH 23.9
DIVIS LABS PH 29
A careful look at the data frame tells that it is sorted on CATEGORY, MARGIN and then COMPANY columns.
Now, my requirement is to add a new column called Ranking and to give a ranking starting from 1 for every set of CATEGORY. The Ranking numbering should start from 1 whenever a new CATEGORY appears on the list
Sample Output:
Company Category Margin Ranking
SBI BK 34.5 1
PNB BK 39.5 2
UCO BANK BK 39.9 3
BANK BK 41.3 4
INDIAN BANK BK 42.3 5
DENA BANK BK 44.5 6
VIJAYA BANK BK 44.5 7
UNION BANK BK 47.6 8
CENTRAL BANK BK 49.8 9
INFOSYS IT 5.6 1
HCL TECH IT 5.9 2
TCS IT 6.9 3
CMC IT 12.6 4
TECHMAHINDRA IT 12.6 5
COGNIZANT IT 15.8 6
IGATE IT 22.4 7
WIPRO IT 22.9 8
HEXAWARE IT 34.8 9
MAHINDRA SATYAM IT 34.8 10
DR. REDDYS PH 14.5 1
SUN PHARMA PH 19.2 2
CIPLA PH 23.9 3
LUPIN PH 23.9 4
DIVIS LABS PH 29 5
Further Requirement
Assume Input dataset which is completely zigzagged. Then
unique(df$Category) # gives 5 different category
[1] "BK" "IT" "PH" "MT" "EG"
After formatting, the same one returns
unique(df$Category) # gives only 3 categories. rest of 2 categories were deleted.
[1] "BK" "IT" "PH"
Note: In the process of formatting the input dataset in order to prepare it free from missing values, a few categories were completed removed.
Note: Returned dataframe should have the row names as categories
After Ranking the data frame, I would like to write a function, wherein I will pass Ranking as a parameter to the function. The function should return a data frame with Company in each CATEGORY with that specific ranking. In case, in any CATEGORY, if there is no COMPANY with such specific RANKING then NA will be returned.
head(companyRanks(3), 4) returns
COMPANY CATEGORY
BK UCO BANK BK
IT TCS IT
PH CIPLA PH
MT <NA> MT
EG <NA> EG
head(companyRanks(10), 4) # returns:
COMPANY CATEGORY
BK <NA> BK # Since there is no company with rank 10 under category BK, NA returned
IT MAHINDRA SATYAM IT
PH <NA> PH
MT <NA> MT
EG <NA> EG
Is there any function to get this kind of requirement easily?