I am new to R and I am trying to create an edgelist using two variables. The first is a an ID number, the second is a short bio of words. I would like to separate the words, but keep them associated with the same ID number. Then I'd like to change the format to an edgelist, where each ID number is paired with each of the words that were used in the corresponding bio.
I don't know if I first need to split up each word, or if it's possible to just separate the words based on the spaces. I think I need help in both conceptualizing what I need to do, and potentially help doing it.
For example, the first five lines of the code for the two variables are below:
twitterdata_clean <- read.csv(text="user_id_str,bios2
39,dont worry be happy
63,country girl country music country life
54,taylor logan 21 years young follow me
78,i can wave my head and i can wave my hand
60,i love justin timberlake
46,goals luke brooks giving me a black eye", stringsAsFactors=FALSE)
I would like to create an edgelist that looks like this:
39 dont
39 worry
39 be
39 happy
63 country
63 girl
63 country
63 music
63 country
63 life
54 taylor
54 logan
54 21
54 years
54 young
54 follow
54 me
. . . and so on.
I tried using some code I found online, but it's not running for me, and I'm not sure if my data is in the right format for this to work:
do.call(rbind, lapply(twitterdata_clean$user_id_str, function(x) t(combn(x, 1))))