计算每个符号的出现次数以及它们在文本、单词或行中出现的位置 我有一个类似许多语言的单词列表。
我正在尝试做的是计算每个字符的出现次数以及它们在文本中的位置或常见位置。如果有可能计算音节的常见数量也会有帮助。
sommige
disa
بَعْض - ba'th
mi qani - մի քանի
bəzi
batzuk
nyeykі/nyeykaya/nyeykaye/nyeykіya - нейкі/нейкая/нейкае/нейкія
kisu - কিসু
afouhe - بعض
neki
alguns
njakoj - някой
一些
algú/alguns/alguna/algunes
neki
někteří
nogle
berekhey āz - برخی از
een paar
kam - كام
some
iuj
mõned
berekhey āz - برخی از
ilan
joitakin
sommige
certains
algúns
ramdenime - რამდენიმე
einige
peripou - περίπου
keṭelāk - કેટલાક
wasu
kèk
khemeh - כמה
kuch - कुछ
néhány
sumir
beberapa
roinnt
alcuni
ikutsu ka no - いくつかの
kelavu
មួយចំនួន
조금 - jo geum
هەندێک
aliquis
daži
keli
nekoi - некои
misy
beberapa
ഏതാനും
xi
yī xiē - 一些
kaahi - कांही
neki
shwiya - بعض
kehi - केही
enkelte
gari
berekhey āz - برخی از
b'eda - بعضی
kilka
ਕਈ
alguns
câţiva/câteva
некоторые - nekotorыe
some
neki - неки
samahara - සමහර
niektorí
nekaj
algunos
baadhi
några
ilan
yakchand - якчанд
konjam - கொஞ்சம்
yan
konni - కొన్ని
บาง - baang
bazı
dejakі - деякі
chened - چند
ba'zi, qandaydir
một số
rhai
עטלעכע
die
okumbalwa
这是当前的代码 sehe 使它与 unicode 一起使用
//#define PREFER_BOOST
#include <iostream>
#include <fstream>
#include <string>
#include <map>
#include <istream>
#include <algorithm>
#include <iterator>
#ifdef PREFER_BOOST
#include <boost/locale.hpp>
#endif
using namespace std;
std::map<wchar_t, int> letterCount;
struct Counter
{
void operator()(wchar_t item)
{
if ( !std::isspace(item) )
++letterCount[std::tolower(item)]; //remove tolower if you want case-sensitive solution!
}
};
int main()
{
std::setlocale(LC_ALL, "en_US.UTF-8");
wifstream input("input.txt");
#ifdef PREFER_BOOST
boost::locale::generator gen;
std::locale loc = gen("en_US.UTF-8");
#else
std::locale loc("en_US.UTF-8");
#endif
input.imbue(loc);
wcout.imbue(loc);
istreambuf_iterator<wchar_t> start(input), end;
std::for_each(start, end, Counter());
for (std::map<wchar_t, int>::iterator it = letterCount.begin(); it != letterCount.end(); ++it)
{
wcout << it->first <<" : "<< it->second << endl;
}
}
这是我的原始代码
#include <iostream>
#include <cctype>
#include <fstream>
#include <string>
#include <map>
#include <istream>
#include <vector>
#include <list>
#include <algorithm>
#include <iterator>
using namespace std;
struct letter_only: std::ctype<char>
{
letter_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table()
{
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
std::fill(&rc['A'], &rc['z'+1], std::ctype_base::alpha);
return &rc[0];
}
};
struct Counter
{
std::map<char, int> letterCount;
void operator()(char item)
{
if ( item != std::ctype_base::space)
++letterCount[tolower(item)]; //remove tolower if you want case-sensitive solution!
}
operator std::map<char, int>() { return letterCount ; }
};
int main()
{
ifstream input;
input.imbue(std::locale(std::locale(), new letter_only())); //enable reading only leters only!
input.open("T");
istream_iterator<char> start(input);
istream_iterator<char> end;
std::map<char, int> letterCount = std::for_each(start, end, Counter());
for (std::map<char, int>::iterator it = letterCount.begin(); it != letterCount.end(); ++it)
{
cout << it->first <<" : "<< it->second << endl;
}
}
我试图得到的例子
к : 10 (2,5) (1,5,8) (2,7) (1,3,5)
找到的字母 K 然后找到它的出现次数 10 然后在每个单词中找到它的位置,如前所述。