matlab - 使用 textscan 读取数据块

Question

如何提取每个月的“平均”和“深度”数据？

MEAN, S.D., NO. OF OBSERVATIONS


                      January                February       ...            
 Depth       Mean   S.D.  #Obs       Mean   S.D.  #Obs       ...
     0      32.92   0.43     9      32.95   0.32    21      
    10      32.92   0.43    14      33.06   0.37    48      
    20      32.88   0.46    10      33.06   0.37    50      
    30      32.90   0.51     9      33.12   0.35    48      
    50      33.05   0.54     6      33.20   0.42    41      
    75      33.70   1.11     7      33.53   0.67    37      
   100      34.77            1      34.47   0.42    10      
   150                                                                                           
   200

                         July                  August               
 Depth       Mean   S.D.  #Obs       Mean   S.D.  #Obs       
     0      32.76   0.45    18      32.75   0.80    73      
    10      32.76   0.40    23      32.65   0.92   130      
    20      32.98   0.53    24      32.84   0.84   121     
    30      32.99   0.50    24      32.93   0.59   120      
    50      33.21   0.48    16      33.05   0.47   109      
    75      33.70   0.77    10      33.41   0.73    80      
   100      34.72   0.54     3      34.83   0.62    20      
   150                              34.69            1                                                     
   200

它在数据之间有不可定义的空格数，并在开头有一个介绍行。

谢谢！

score 0 · Accepted Answer

Matlab 的正则表达式对于从结构较少的文本中提取数据非常强大。一般来说，熟悉正则表达式真的很值得：http: //www.mathworks.com/help/techdoc/ref/regexp.html

在这种情况下，您将定义模式以捕获每个观察组（平均 SD Obs），例如：32.92 0.43 9

在这里，我看到了每组数据的模式：每组前面有 6 个空格（正则表达式 = \s{6}），并且 3 个数据点被小于 6 个空格（\s+）分割。数据本身由两个浮点数 (\d+.\d+) 和一个整数 (\d+) 组成：

因此，将这些放在一起，您的捕获模式将如下所示（括号包围要捕获的数据模式）：

expr = '\s{6}(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+)';

我们可以通过添加“？”为每个标记（即要在组中捕获的每个数据点）添加名称。括号内：

expr = '\s{6}(?<mean>\d+\.\d+)\s+(?<sd>\d+\.\d+)\s+(?<obs>\d+)';

然后，只需将您的文件读入一个字符串变量“strFile”并使用此定义的模式提取数据：

strFile = urlread('file://mydata.txt');
[tokens data] = regexp(strFile, expr, 'tokens', 'names');

变量 'tokens' 将包含一系列观察组，而 'data' 是一个具有 .mean .sd 和 .obs 字段的结构（因为这些是 'expr' 中的标记名称）。

score 0 · Accepted Answer

例如，如果您只想获取前两列，那么 textscan() 是一个不错的选择。

fid = fopen('yourfile.txt');

tline = fgetl(fid);
while ischar(tline)
    oneCell = textscan(tline, '%n'); % read the whole line, put it into a cell
    allTheNums = oneCell{1}; % open up the cell to get at the columns

    if isempty(allTheNums) % no numbers, header line
        continue;
    end

    usefulNums = allTheNums(1:2) % get the first two columns
end

fclose(fid);

textscan自动将您输入的字符串拆分到有空格的地方，因此列之间未定义的字符串数量不是问题。没有数字的字符串将提供一个数组，您可以将其测试为空以避免越界或错误数据错误。

如果您需要以编程方式确定要获取哪些列，您可以扫描“深度”和“平均”这两个词来查找索引。正则表达式在这里可能会有所帮助，但textscan也应该可以正常工作。

score 0 · Accepted Answer

以下是如何从文件中读取行的示例：

fid = fopen('yourfile.txt');

tline = fgetl(fid);
while ischar(tline)
    disp(tline)
    tline = fgetl(fid);
end

fclose(fid);

在 while 循环中，您将希望使用strtok（或类似的东西）将每一行分解为由空格分隔的字符串标记。

matlab - 使用 textscan 读取数据块

3 回答 3

Related

Reference