I have to create an adjacency list of users and TV shows where the rows are the users and the TV shows are the columns. If a user follows that TV show then there will be a 1 in the matrix else a zero. This information I have already collected from twitter. In total there are 140 TV shows and approximately 530000 unique users. I am using the following code to generate the matrix, using python:
- NoTvShows: Total number of TV shows(IDs)
- unique_user: All the unique users
- collected_users: This is a list of lists. The sublists correspond to TV shows and list the IDs of the followers.
for i in range(0,NoTvShows):
for every_user in unique_users:
if every_user in collected_users[i]:
matrix.append(1)
else:
matrix.append(0)
main_matrix.append(matrix)
matrix = []
the_matrix = zip(*main_matrix)
simplejson.dump(the_matrix,fwrite)
fwrite.close()
When I try executing my program on the server, it crashes as it is taking a lot of time and memory. I know I can use numpy to reduce the size of my matrix and then use it to compute similarities between the users. However, I am not sure as to how to code the numpy in this code and generate the reduced matrix.
I hope someone can guide me in this regard
Thank you
Richa