string - 在 MATLAB 中使用 textscan 读取固定宽度字符串时出错

Question

我正在使用 textscan 从文本文件中读取固定宽度（9 个字符）的数据。Textscan 在包含字符串的某一行失败：

'   9574865.0E+10  '

我想从中读取两个数字：

957486 5.0E+10

问题可以这样复制：

dat = textscan('   9574865.0E+10  ','%9f %9f','Delimiter','','CollectOutput',true,'ReturnOnError',false);

返回以下错误：

Error using textscan
Mismatch between file and format string.
Trouble reading floating point number from file (row 1u, field 2u) ==> E+10

令人惊讶的是，如果我们加上一个减号，我们不会得到一个错误，而是一个错误的结果：

dat = textscan('  -9574865.0E+10  ','%9f %9f','Delimiter','','CollectOutput',true,'ReturnOnError',false);

现在 dat{1} 是：

    -9574865           0

显然，我需要这两种情况才能工作。我目前的解决方法是在字段之间添加逗号并在 textscan 中使用逗号作为分隔符，但这很慢而且不是一个好的解决方案。有什么方法可以使用 textscan 或其他内置（出于性能原因）MATLAB 函数正确读取此字符串？

score 0 · Accepted Answer

我怀疑textscan 首先修剪前导空格，然后解析格式字符串。我认为这是因为如果您将 yuor 格式字符串从

'%9f%9f'

至

'%6f%9f'

你的单线突然起作用了。另外，如果你尝试

'%9s%9s'

您会看到第一个字符串的前导空格被删除（因此有 3 个字符“太多”），但由于某种原因，最后一个字符串保留了其尾随空格。

显然，这意味着您必须确切知道两个数字中有多少位数字。我猜这是不可取的。

解决方法可能如下所示：

% Split string on the "dot"
dat = textscan(<your data>,'%9s%9s',...
    'Delimiter'     , '.',...
    'CollectOutput' , true,...
    'ReturnOnError' , false);

% Correct the strings; move the last digit of the first string to the 
% front of the second string, and put the dot back
dat = cellfun(@(x,y) str2double({y(1:end-1),  [y(end) '.' x]}),  dat{1}(:,2), dat{1}(:,1), 'UniformOutput', false);

% Cast to regular array
dat  = cat(1, dat{:})

score 0 · Accepted Answer

我有一个类似的问题，并通过调用textscan两次来解决它，这被证明比cellfunor更快，str2double并且可以处理任何可以由 Matlab 解释的输入'%f'

在您的情况下，我将首先仅使用字符串参数调用 textscan 并Whitespace = ''正确定义字段的宽度。

data = '   9574865.0E+10  ';
tmp = textscan(data, '%9s %9s', 'Whitespace', '');

现在您需要交织并附加一个不会干扰您的数据的分隔符，例如;

tmp = [char(join([tmp{:}],';',2)) ';'];

textscan现在，您可以通过使用如下分隔符再次调用来将正确的格式应用于您的数据：

result = textscan(tmp, '%f %f', 'Delimiter', ';', 'CollectOutput', true);
format shortE
result{:}

ans =

9.5749e+05   5.0000e+10

将这种方法的速度与str2double：

n = 50000;
data = repmat('   9574865.0E+10  ', n, 1);
% Approach 1 with str2double
tic
tmp = textscan(data', '%9s %9s', 'Whitespace', '');
result1 = str2double([tmp{:}]);
toc

Elapsed time is 2.435376 seconds.

% Approach 2 with double textscan
tic
tmp = textscan(data', '%9s %9s', 'Whitespace', '');
tmp = [char(join([tmp{:}],';',2)) char(59)*ones(n,1)]; % char(59) is just ';'
result2 = cell2mat(textscan(tmp', '%f %f', 'Delimiter', ';', 'CollectOutput', true));
toc

Elapsed time is 0.098833 seconds.

string - 在 MATLAB 中使用 textscan 读取固定宽度字符串时出错

2 回答 2

Related

Reference