string - 尝试读取文本文件...但没有获取所有内容

Question

我正在尝试读取具有以下重复格式的文件（但由于数据太长，即使是第一次重复，我也已经删除了数据）：

1.00 'day' 2011-01-02
'Total Velocity Magnitude RC - Matrix' 'm/day'
    0.190189     0.279141     0.452853      0.61355     0.757833     0.884577 
    0.994502      1.08952      1.17203      1.24442      1.30872      1.36653 
     1.41897      1.46675      1.51035      1.55003      1.58595      1.61824

在此处下载包含完整数据的实际文件

这是我用来从上述文件中读取数据的代码：

fid = fopen(file_name); % open the file

dotTXT_fileContents = textscan(fid,'%s','Delimiter','\n'); % read it as string ('%s') into one big array, row by row
dotTXT_fileContents = dotTXT_fileContents{1};
fclose(fid); %# don't forget to close the file again

%# find rows containing 'Total Velocity Magnitude RC - Matrix' 'm/day'
data_starts = strmatch('''Total Velocity Magnitude RC - Matrix'' ''m/day''',...
    dotTXT_fileContents); % data_starts contains the line numbers wherever 'Total Velocity Magnitude RC - Matrix' 'm/day' is found

ndata = length(data_starts); % total no. of data values will be equal to the corresponding no. of '**  K' read from the .txt file

%# loop through the file and read the numeric data
for w = 1:ndata-1
    %# read lines containing numbers
    tmp_str = dotTXT_fileContents(data_starts(w)+1:data_starts(w+1)-3); % stores the content from file dotTXT_fileContents of the rows following the row containing 'Total Velocity Magnitude RC - Matrix' 'm/day' in form of string
    %# convert strings to numbers
    tmp_str = tmp_str{:}; % store the content of the string which contains data in form of a character
    %# assign output
    data_matrix_grid_wise(w,:) = str2num(tmp_str); % convert the part of the character containing data into number
end

为了让您了解我的文本文件中的数据模式，以下是代码的一些结果：

data_starts =

           2
        1672
        3342
        5012
        6682
        8352
       10022

ndata =

     7

因此，我的data_matrix_grid_wise应该包含1672-2-2-1(for a new line)=1667行。但是，我得到了这个结果：

data_matrix_grid_wise =

  Columns 1 through 2

   0.190189000000000   0.279141000000000
   0.423029000000000   0.616590000000000
   0.406297000000000   0.604505000000000
   0.259073000000000   0.381895000000000
   0.231265000000000   0.338288000000000
   0.237899000000000   0.348274000000000

  Columns 3 through 4

   0.452853000000000   0.613550000000000
   0.981086000000000   1.289920000000000
   0.996090000000000   1.373680000000000
   0.625792000000000   0.859638000000000
   0.547906000000000   0.743446000000000
   0.562903000000000   0.759652000000000

  Columns 5 through 6

   0.757833000000000   0.884577000000000
   1.534560000000000   1.714330000000000
   1.733690000000000   2.074690000000000
   1.078000000000000   1.277930000000000
   0.921371000000000   1.080570000000000
   0.934820000000000   1.087410000000000

我哪里错了？在我的最终结果中，我应该data_matrix_grid_wise由10000元素而不是36元素组成。谢谢。

更新：如何在 data_starts(w) 之前的一行中包含“day”之前的数字，即 1、2、3 等？我在循环中使用它，但它似乎不起作用：

days_str = dotTXT_fileContents(data_starts(w)-1);
    days_str = days_str{1};
    days(w,:) = sscanf(days_str(w-1,:), '%d %*s %*s', [1, inf]);

score 1 · Accepted Answer

问题在于最后两条语句。当你这样做时，tmp_str{:}你将单元格数组转换为以逗号分隔的字符串列表。如果将此列表分配给单个变量，则仅分配第一个字符串。所以tmp_str现在将只有第一行数据。

这是您可以做的，而不是最后两行：

tmp_mat = cellfun(@str2num, tmp_str, 'uniformoutput',0);
data_matrix_grid_wise(w,:) = cell2mat(tmp_mat);

但是，您会遇到连接 ( cell2mat) 的问题，因为并非所有行都具有相同的列数。这取决于你如何解决它。

score 1 · Accepted Answer

tmp_str = tmp_str{:}; 行中的问题 Matlab 在处理字符时有奇怪的行为。您的简短解决方案是将 last 替换为接下来的两行：

y = cell2mat( cellfun(@(z) sscanf(z,'%f'),tmp_str,'UniformOutput',false));
data_matrix_grid_wise(w,:) = y;

string - 尝试读取文本文件...但没有获取所有内容

2 回答 2

Related

Reference