I have a very large data set with hundreds of thousands of rows. I have managed to split it into two columns like so:
Name: | John
Birth year: | 1982
Favorite sport: | Rugby
Favorite color: | Blue
|
Name: | Mike
Birth year: | 1977
|
Name: | Shayla
Favorite sport: | Soccer
|
Name: | Kimberly
Birth year: | 1983
Favorite sport: | Baseball
Favorite color: | Yellow
Favorite food: | Pizza
However, I want to eliminate the repetition of categories that currently exists. How do I split each data "entry" into separate rows or columns and use the "categories" so that none repeat, like so:
Name | Birth year | Favorite sport | Favorite color | Favorite food
John | 1982 | Rugby | Blue |
Mike | 1977 | | |
Shayla | | Soccer | |
Kimberly| 1983 | Baseball | Yellow | Pizza
It should be noted that the existing data set will contain the name plus one or more categories