regex - 使用正则表达式读取文本文件并存储到结构中

Question

我大家，

我正在尝试将文本文件解析为 matlab：它由几个块（START_BLOCK/END_BLOCK）组成，其中分配了字符串（变量）和值（与先前的变量相关联）。

一个例子是这样的：

START_BLOCK_EXTREMEWIND
velocity_v1 29.7


velocity_v50    44.8


velocity_vred1  32.67
velocity_vred50 49.28


velocity_ve1    37.9



velocity_ve50   57


velocity_vref   50



END_BLOCK_EXTREMEWIND

目前，我的代码是：

fid = fopen('test_struct.txt','rt');
C = textscan(fid,'%s %f32 %*[^\n]','CollectOutput',true);
C{1} = reshape(C{1},1,numel(C{1}));
C{2} = reshape(C{2},1,numel(C{2}));



startIdx = find(~cellfun(@isempty, regexp(C{1}, 'START_BLOCK_', 'match')));
endIdx = find(~cellfun(@isempty, regexp(C{1}, 'END_BLOCK_', 'match')));
assert(all(size(startIdx) == size(endIdx)))
extract_parameters = @(n)({C{1}{startIdx(n)+1:endIdx(n) - 1}});
parameters = arrayfun(extract_parameters, 1:numel(startIdx), 'UniformOutput', false);

s = cell2struct(cell(size(parameters{1})),parameters{1}(1:numel(parameters{1})),2);

s.velocity_v1 = C{2}(2);
s.velocity_v50 = C{2}(3);
s.velocity_vred1 = C{2}(4);
s.velocity_vred50 = C{2}(5);
s.velocity_ve1 = C{2}(6);
s.velocity_ve50 = C{2}(7);
s.velocity_vref = C{2}(8);

它有效，但它绝对是静态的。我宁愿有一个代码能够：

1. check the existence of blocks --> as already implemented;
2. the strings are to be taken as fields of the structure;
3. the numbers are meant to be the attributes of each field.

最后，如果有多个块，则应该对这些块进行迭代以获得整个结构。这是我第一次接触结构编码，所以请耐心等待。

我提前感谢大家。

最亲切的问候。

score 1 · Accepted Answer

听起来您会想要使用动态字段名称。如果您有一个 struct s，一个存储字段名称的字符串fieldName，并fieldVal保存您要为该字段设置的值，那么您可以使用以下语法来执行分配：

s.(fieldName) = fieldVal;

此 MATLAB文档提供了更多信息。

考虑到这一点，我采用了一种稍微不同的方法来解析文本。我用 for 循环遍历了文本。尽管在 MATLAB 中 for 循环有时不受欢迎（因为 MATLAB 针对矢量化操作进行了优化），但我认为在这种情况下它有助于使代码更简洁。此外，我的理解是，如果您必须使用arrayfun，那么用 for 循环替换它可能不会真正对性能造成太大影响。

以下代码将文本中的每个块转换为具有指定字段和值的结构。然后将这些生成的“块”结构添加到更高级别的“结果”结构。

fid = fopen('test_struct.txt','rt');
C = textscan(fid,'%s %f32 %*[^\n]','CollectOutput',true);
fclose(fid);

paramNames = C{1};
paramVals = C{2};

curBlockName = [];
inBlock = 0;
blockCount = 0;

%// Iterate through all of the entries in "paramNames".  Each block will be a
%// new struct that is then added to a high-level "result" struct.
for i=1:length(paramNames)
    curParamName = paramNames{i};
    isStart = ~isempty(regexp(curParamName, 'START_BLOCK_', 'match'));
    isEnd = ~isempty(regexp(curParamName, 'END_BLOCK_', 'match'));

    %// If at the start of a new block, create a new struct with a single
    %// field - the BlockName (as specified by the text after "START_BLOCK_"
    if(isStart)
        assert(inBlock == 0);
        curBlockName = curParamName(length('START_BLOCK_') + 1:end);
        inBlock = 1;
        blockCount = blockCount + 1;
        s = struct('BlockName', curBlockName);          

    %// If at the end of a block, add the struct that we've just populated to
    %// our high-level "result" struct.
    elseif(isEnd)
        assert(inBlock == 1);
        inBlock = 0;
        %// EDIT - storing result in "structure of structures"
        %//  rather than array of structs
        %// s_array(blockCount) = s;
        result.(curBlockName) = s;

    %// Otherwise, assume that we are inside of a block, so add the current
    %// parameter to the struct.
    else
        assert(inBlock == 1);
        s.(curParamName) = paramVals(i);
    end
end

%// Results stored in "result" structure

希望这能回答你的问题......或者至少提供一些有用的提示。

score 0 · Accepted Answer

我今天编辑了我的代码，现在它几乎可以正常工作：

clc, clear all, close all

%Find all row headers
fid = fopen('test_struct.txt','r');
row_headers = textscan(fid,'%s %*[^\n]','CommentStyle','%','CollectOutput',1);
row_headers = row_headers{1};
fclose(fid);

%Find all attributes
fid1 = fopen('test_struct.txt','r');
attributes = textscan(fid1,'%*s %s','CommentStyle','%','CollectOutput',1);
attributes = attributes{1};
fclose(fid1);

%Collect row headers and attributes in a single cell
parameters = [row_headers,attributes];


%Find all the blocks
startIdx = find(~cellfun(@isempty, regexp(parameters, 'BLOCK_START_', 'match')));
endIdx = find(~cellfun(@isempty, regexp(parameters, 'BLOCK_END_', 'match')));
assert(all(size(startIdx) == size(endIdx)))


%Extract fields between BLOCK_START_ and BLOCK_END_
extract_fields = @(n)(parameters(startIdx(n)+1:endIdx(n)-1,1));
struct_fields = arrayfun(extract_fields, 1:numel(startIdx), 'UniformOutput', false);

%Extract attributes between BLOCK_START_ and BLOCK_END_
extract_attributes = @(n)(parameters(startIdx(n)+1:endIdx(n)-1,2));
struct_attributes = arrayfun(extract_attributes, 1:numel(startIdx), 'UniformOutput', false);


for i = 1:numel(struct_attributes)
    s{i} = cell2struct(struct_attributes{i},struct_fields{i},1);
end

现在，最后，我得到了一个可以满足我要求的结构单元。我想改进的唯一一点是：

- Give each structure the name of the respective block.

有人有有价值的提示吗？

谢谢大家对我的支持。

问候，弗朗切斯科

regex - 使用正则表达式读取文本文件并存储到结构中

2 回答 2

Related

Reference