matlab - Matlab处理文本文件中的数据

Question

我尝试从文本文件中读取数据。我可以通过导入来做到这一点。它工作正常。我的数据导入为：UserID|SportID|Rating

有很多用户可以喜欢任何评分的任何运动，例如：

User|SportID|Rating
1      2       10
1      3        5
2      1       10
2      3        2

我尝试创建一个如下所示的新矩阵

UserID  Sport1  Sport2  Sport3
 1      (null)    10      5
 2        10    (null)    2

我尝试通过“for”和“loop”来实现这一点，但是有将近 2000 个用户和 1000 个运动，他们的数据几乎是 100000。我该怎么做？

score 2 · Accepted Answer

要快速做到这一点，您可以使用具有一维UserID和另一维的稀疏矩阵Sports。稀疏矩阵在大多数情况下都会像普通矩阵一样表现。像这样构造它

out = sparse(User, SportID, Rating)

其中User和是SportID与Rating文本文件的列相对应的向量。

注1：对于重复的User，SportID将Rating相加。

注意2：如(null)问题中所写的空条目不存储在稀疏矩阵中，仅存储非零（这是稀疏矩阵的要点）。

score 1 · Accepted Answer

您可以执行以下操作：

% Test Input
inputVar = [1 2 10; 1 3 5; 2 1 10; 2 3 2]; 

% Determine number of users, and sports to create the new table
numSports = max(inputVar(1:end,2));
numUsers = max(inputVar(1:end,1));
newTable = NaN(numUsers, numSports);

% Iterate for each row of the new table (# of users)
for ii = 1:numUsers
    % Determine where the user rated from input mat, which sport he/she rated, and the rating
    userRating = find(inputVar(1:end,1) == ii);
    sportIndex = inputVar(userRating, 2)';
    sportRating = inputVar(userRating, 3)';
    newTable(ii, sportIndex) = sportRating; % Crete the new table based on the ratings.
end

newTable

其中产生了以下内容：

newTable =

   NaN    10     5
    10   NaN     2

这只需要针对输入表中的用户数量运行。

score 1 · Accepted Answer

我想您已经定义null为简化的数字。

Null = -1; % or any other value which could not be a rating.

考虑：

nSports = 1000; % Number of sports
nUsers = 2000; % Number of users

预分配结果：

Rating_Mat = ones(nUsers, nSports) * Null; % Pre-allocation

然后使用sub2ind（类似于这个答案）：

Rating_Mat (sub2ind([nUsers nSports], User, SportID) = Rating;

或者accumarray：

Rating_Mat = accumarray([User, SportID], Rating);

假设User和SportID是Nx1。

希望能帮助到你。

matlab - Matlab处理文本文件中的数据

3 回答 3

Related

Reference