有谁认识这种格式(见底部的粘贴)?它来自 Répertoire de vedettes-matière (RVM)。这两个都不是:
- https://metacpan.org/pod/Catmandu::Exporter::MARC::MiJ
- http://search.cpan.org/~cfouts/MARC-File-JSON-0.003/lib/MARC/File/JSON.pm
我可以用 Perl 编程,也发布为https://github.com/LibreCat/Catmandu-MARC/issues/88。
我可以只用 XS::JSON 破解它,但我不知道如何处理这种奇怪的重音编码(从 325 显示的一些示例行):
{grave}e
{ring}Z
{ringb}h
{ringb}s
{rlig}a
{rlig}A
这是奇怪的 MARC JSON:
{
"rows" : [
{
"RecordNumber" : "1",
"Tag" : "LDR",
"Indicators" : "",
"Content" : "00533nz 2200205n 4500"
}
,
{
"RecordNumber" : "1",
"Tag" : "001",
"Indicators" : "\" \"",
"Content" : "201-0000001"
}
,
{
"RecordNumber" : "1",
"Tag" : "005",
"Indicators" : "\" \"",
"Content" : "20121025110000.0"
}
,
{
"RecordNumber" : "1",
"Tag" : "008",
"Indicators" : "\" \"",
"Content" : "790704\\nfanvnnbabn\\\\\\\\\\\\\\\\\\\\\\b\\ana\\\\\\\\\\\\"
}
,
{
"RecordNumber" : "1",
"Tag" : "016",
"Indicators" : "\\\\",
"Content" : "$a0509B3366"
}
,
{
"RecordNumber" : "1",
"Tag" : "035",
"Indicators" : "\\\\",
"Content" : "$a(ISM)8013850"
}
,
{
"RecordNumber" : "1",
"Tag" : "035",
"Indicators" : "9\\",
"Content" : "$a201-0000001"
}
,
{
"RecordNumber" : "1",
"Tag" : "040",
"Indicators" : "\\\\",
"Content" : "$aCaQQLa$bfre"
}
,
{
"RecordNumber" : "1",
"Tag" : "150",
"Indicators" : "\\\\",
"Content" : "$aAlg{grave}ebres de Von Neumann"
}
,
{
"RecordNumber" : "1",
"Tag" : "450",
"Indicators" : "\\\\",
"Content" : "$wnne$aVon Neumann, Alg{grave}ebres de"
}
,
{
"RecordNumber" : "1",
"Tag" : "450",
"Indicators" : "\\\\",
"Content" : "$aW*-alg{grave}ebres"
}
,
{
"RecordNumber" : "1",
"Tag" : "550",
"Indicators" : "\\\\",
"Content" : "$wg$aC*-alg{grave}ebres"
}
,
{
"RecordNumber" : "1",
"Tag" : "550",
"Indicators" : "\\\\",
"Content" : "$wg$aEspace de Hilbert"
}
,
{
"RecordNumber" : "1",
"Tag" : "697",
"Indicators" : "\\\\",
"Content" : "$amm."
}
,
{
"RecordNumber" : "1",
"Tag" : "750",
"Indicators" : "\\7",
"Content" : "$aVon Neumann, Alg{grave}ebres de$2ram"
}
,
{
"RecordNumber" : "1",
"Tag" : "750",
"Indicators" : "\\0",
"Content" : "$aVon Neumann algebras"
}
]
}
添加:此重音编码来自 MARCmkr。我使用了以下内容:
use MARC::File::MARCMaker; # https://metacpan.org/pod/MARC::File::MARCMaker
# for some reason can't be found by module name, so use:
# cpanm http://www.cpan.org/authors/id/E/EI/EIJABB/MARC-File-MARCMaker-0.05.tar.gz
my $marc_charset = MARC::File::MARCMaker::usmarc_default();
$content = MARC::File::MARCMaker::_maker2char ($content, $marc_charset);
但是,当我在此文本https://github.com/gmcharlt/marc-perl/blob/e8e0ecc92946d6dcb3c2270706041a30eff0f68d/marc-marcmaker/t/marcmaker.t#L92上对其进行测试时,它只是将重音符号/连字转换为 XML 实体。我尝试在浏览器中打开翻译后的文本:一些实体没有被解释,并且没有一个重读下一个字符。所以我想我现在需要使用一些“XML to Unicode”模块来完成翻译
This a test of diacritics like the uppercase Polish L in
Ł´od´z, the uppercase Scandinavia O in &Ostrok;st, the
uppercase D with crossbar in Đuro, the uppercase Icelandic
thorn in Þann, the uppercase digraph AE in Ægir, the
uppercase digraph OE in Œuvres, the soft sign in
rech&softsign;, the middle dot in col·lecci´o, the musical
flat in F♭, the patent mark in Frizbee®, the plus or minus
sign in ±54%, the uppercase O-hook in B&Ohorn;, the
uppercase U-hook in X&Uhorn;A, the alif in
mas&mlrhring;alah, the ayn in &mllhring;arab, the lowercase
Polish l in Włocław, the lowercase Scandinavian o in
K&ostrok;benhavn, the lowercase d with crossbar in đavola,
the lowercase Icelandic thorn in þann, the lowercase digraph
ae in være, the lowercase digraph oe in cœur, the lowercase
hardsign in s&hardsign;ezd, the Turkish dotless i in masalı,
the British pound sign in £5.95, the lowercase eth in
verður, the lowercase o-hook (with pseudo question mark) in
S&hooka;&ohorn;, the lowercase u-hook in T&uhorn; D&uhorn;c,
the pseudo question mark in c&hooka;ui, the grave accent in
tr`es, the acute accent in d´esir´ee, the circumflex in
cˆote, the tilde in ma˜nana, the macron in T¯okyo, the breve
in russki˘i, the dot above in ˙zaba, the dieresis (umlaut)
in L¨owenbr¨au, the caron (hachek) in ˇcrny, the circle
above (angstrom) in ˚arbok, the ligature first and second
halves in d&llig;i&rlig;ad&llig;i&rlig;a, the high comma off
center in rozdel&rcommaa;ovac, the double acute in
id˝oszaki, the candrabindu (breve with dot above) in
Ali&candra;iev, the cedilla in ¸ca va comme ¸ca, the right
hook in viet˛a, the dot below in te&dotb;da, the double dot
below in &under;k&under;hu&dbldotb;tbah, the circle below in
Sa&dotb;msk&ringb;rta, the double underscore in
&dblunder;Ghulam, the left hook in Lech Wał&commab;esa, the
right cedilla (comma below) in khŗong, the upadhmaniya (half
circle below) in &breveb;humantuˇs, double tilde, first and
second halves in &ldbltil;n&rdbltil;galan, high comma
(centered) in g&commaa;eotermika.