0

我有一个大约 9000 个小写单词的文本文件。我想找到每个单词中最后一个字母的概率(字母频率/单词数)。

这是我的第一次尝试:

function [ prma ] = problast()
counts = zeros(1,26);
%refer to cell index here to get alphabetic number of char
s = regexp('abcdefghijklmnopqrstuvwxyz','.','match');
f = fopen('nouns.txt');
ns = textscan(f,'%s');
fclose(f);
%8960 is the length of the file 
for i =1:8960
 c = substr(ns(i),-1,1);
 num = find(s == c);
 counts(num) = num;
end
prma = counts / 8960;
disp(prma);

这给了我这个错误:

Undefined function 'substr' for input arguments of type 'cell'.

有任何想法吗?

4

5 回答 5

3

首先,你不需要regexp你的问题。解决您的问题的一个非常简单有效的方法是:

clear;
close;
clc;

counts = zeros(1,26);

f = fopen('nouns.txt');
ns = textscan(f,'%s');
fclose(f);

for i =1:numel(ns{1})
    c = ns{1}{i}(end);
    counts('c'-96) = counts('c'-96)+1;
end

prma = counts / numel(ns{1});
disp(prma);

例如,如果"noun.txt"要包含

paris
london

输出将是:

  Columns 1 through 8

         0         0         0         0         0         0         0         0

  Columns 9 through 16

         0         0         0         0         0    0.5000         0         0

  Columns 17 through 24

         0         0    0.5000         0         0         0         0         0

  Columns 25 through 26

         0         0
于 2013-04-02T08:29:20.263 回答
2

textscan文档指出结果是一个单元格数组。如果您不熟悉元胞数组,我强烈建议您阅读我提供的链接,但总而言之,您的代码应该如下所示:

c = substr(ns{i},-1,1);

请注意从( )to的变化{ }- 这是访问元胞数组元素的方式。

于 2013-04-02T07:45:41.473 回答
2

怎么样:

f = fopen('nouns.txt');
ns = textscan(f, '%s');
fclose(f);

num = cellfun(@(x)(x(end) - 'a' + 1), ns{:}); %// Convert to 1-26
counts = hist(num, 1:26);                     %// Count occurrences
prob = counts / numel(ns{:})                  %// Compute probabilities
于 2013-04-02T08:30:15.220 回答
1

不确定是什么导致了问题,但这应该可以解决问题,假设它ns{i}包含您的字符串:

str = ns{i}; 
c = str(end);

如果这不起作用,则应该不会太难玩一点并str基于创建变量ns

于 2013-04-02T08:27:35.747 回答
0

感谢大家的建议,我自己解决了这个问题,但我回去尝试了最后一个答案,效果很好。这是我想出的:

%Keep track of counts
counts = zeros(1,26);
%Refer to this array to get alphabetic numeric value of character
s = regexp('abcdefghijklmnopqrstuvwxyz','.','match');
f = fopen('nouns.txt');
ns = textscan(f,'%s');
fclose(f);
%8960 = length of nouns.txt
for i =1:8960
    %string from vs
    str = ns{1}{i};
    %last character in that string
    c = str(length(str));
    %index in s
    temp = strfind(s,c);
    index = find(not(cellfun('isempty',temp)));
    counts(index) = counts(index)+1;
 end

%Get probabilities
prma = counts / 8960;
disp(prma);

我投票支持每个人帮助我进行头脑风暴。

于 2013-04-02T08:41:21.627 回答