我正在开发一个反垃圾邮件机器人,它很难解码同形字。
这是一条示例消息:
ɪ ᴄᴀɴ'ᴛ ꜱᴛᴏᴘ ꜱʜᴀʀɪɴɢ ᴛʜᴇ ɢᴏᴏᴅ ɴᴇᴡꜱ ᴀʙᴏᴜᴛ ꜰᴏʀᴇx ᴍᴀʀᴋᴇᴛ ᴄᴏᴍᴘᴀɴʏ.
ᴡʜᴇɴ ɪ ꜰɪʀꜱᴛ ʜᴇᴀʀᴅ ɪᴛ, ɪ ᴡᴀꜱ ᴀꜰʀᴀɪᴅ ʙᴜᴛ ʟᴀᴛᴇʀ ꜱᴜᴍᴍᴏɴᴇᴅ ᴄᴏᴜʀᴀɢᴇ ᴀɴᴅ ᴍᴀᴅᴇ ᴀ ᴍᴏᴠᴇ ᴡɪᴛʜ $200
ɪ ꜱᴛɪʟʟ ᴄᴀɴ'ᴛ ʙᴇʟɪᴇᴠᴇ ᴛʜᴇ ᴘʟᴀᴛꜰᴏʀᴍ ɪꜱ ꜱo ʀᴇᴀʟ ᴜɴᴛɪʟ ɪ ʀᴇᴄᴇɪᴠᴇᴅ $3,100 IN 48HOURS of trade ᴀꜱ ᴍʏ ᴘʀᴏꜰɪᴛ
ᴛʜɪꜱ ɪꜱ ʏᴏᴜʀ ᴍᴏᴍᴇɴᴛ ᴏꜰ ʀᴇᴅᴇᴍᴘᴛɪᴏɴ ᴊᴜꜱᴛ ᴏɴᴇ ᴄʟɪᴄᴋ ᴀᴡᴀʏ ꜰʀᴏᴍ ɢʀᴇᴀᴛɴᴇꜱꜱ, ᴍᴀᴋᴇ ᴀ ᴍᴏᴠᴇ ɴᴏᴡ ʟᴇᴛ ʜɪꜱᴛᴏʀʏ ʙᴇ ᴍᴀᴅᴇ
ʜᴇʀᴇ ɪꜱ ᴛʜᴇ ʟɪɴᴋ ʙᴇʟᴏᴡ
我尝试了几种解决方案,但似乎没有一个能正确完成这项工作。其实我有这个代码:
<?php
$text = "ɪ ᴄᴀɴ'ᴛ ꜱᴛᴏᴘ ꜱʜᴀʀɪɴɢ ᴛʜᴇ ɢᴏᴏᴅ ɴᴇᴡꜱ ᴀʙᴏᴜᴛ ꜰᴏʀᴇx ᴍᴀʀᴋᴇᴛ ᴄᴏᴍᴘᴀɴʏ.
ᴡʜᴇɴ ɪ ꜰɪʀꜱᴛ ʜᴇᴀʀᴅ ɪᴛ, ɪ ᴡᴀꜱ ᴀꜰʀᴀɪᴅ ʙᴜᴛ ʟᴀᴛᴇʀ ꜱᴜᴍᴍᴏɴᴇᴅ ᴄᴏᴜʀᴀɢᴇ ᴀɴᴅ ᴍᴀᴅᴇ ᴀ ᴍᴏᴠᴇ ᴡɪᴛʜ $200
ɪ ꜱᴛɪʟʟ ᴄᴀɴ'ᴛ ʙᴇʟɪᴇᴠᴇ ᴛʜᴇ ᴘʟᴀᴛꜰᴏʀᴍ ɪꜱ ꜱo ʀᴇᴀʟ ᴜɴᴛɪʟ ɪ ʀᴇᴄᴇɪᴠᴇᴅ $3,100 IN 48HOURS of trade ᴀꜱ ᴍʏ ᴘʀᴏꜰɪᴛ
ᴛʜɪꜱ ɪꜱ ʏᴏᴜʀ ᴍᴏᴍᴇɴᴛ ᴏꜰ ʀᴇᴅᴇᴍᴘᴛɪᴏɴ ᴊᴜꜱᴛ ᴏɴᴇ ᴄʟɪᴄᴋ ᴀᴡᴀʏ ꜰʀᴏᴍ ɢʀᴇᴀᴛɴᴇꜱꜱ, ᴍᴀᴋᴇ ᴀ ᴍᴏᴠᴇ ɴᴏᴡ ʟᴇᴛ ʜɪꜱᴛᴏʀʏ ʙᴇ ᴍᴀᴅᴇ
ʜᴇʀᴇ ɪꜱ ᴛʜᴇ ʟɪɴᴋ ʙᴇʟᴏᴡ
";
$homoglyphes = array(
" " => "\s",
"A" => "AꭺᗅꓮᎪÅÁÀᴀÂÃАAÄΑ",
"B" => "ᗷßꞴBΒвᛒꓐВᏼℬBβʙᏴ",
"C" => "ⲤCℭꓚᏟℂCⅭСϹ",
"D" => "ᗞĐᗪĎꓓDⅅⅮᴅDᎠꭰ",
"E" => "ÈĚÉᴇЕĒℰ⋿ĔΕËꭼĖEEĘꓰÊᎬⴹ",
"F" => "FꓝᖴꞘℱFϜ",
"G" => "GԍɢᏀնꮐᏻꓖԌGᏳ",
"H" => "ℍⲎꓧһнᎻℋꮋHᕼʜΗHНℌ",
"I" => "ιⅠiᛁꭵاӏΙІlᎥ˛⍳IιіꙇⅰɪīiͺɩℹⅈıI",
"J" => "ᎫᴊJͿյJꭻЈᒍꓙꞲ",
"K" => "КᛕꓗKKⲔᏦΚK",
"L" => "ιLⳐLlⳑʟⅬꓡᏞᒪℒꮮⅼ",
"M" => "ᎷℳΜϺⅯᗰМMꓟᛖⲘM",
"N" => "NℕⲚNɴꓠΝ",
"O" => "οΟoՕО0OoOо",
"P" => "ᏢꮲℙРᑭΡꓑᴩⲢᴘPP",
"Q" => "QℚႳႭⵕQ",
"R" => "ꭱRℝꮢᖇℛᚱℜƦRꓣᎡᏒʀ",
"S" => "ᏕႽЅSSꓢssᏚՏѕ",
"T" => "⟙ᎢΤтᴛⲦτꭲTT⊤Тꓔ",
"U" => "ՍUUԱ⋃uμυሀ∪ꓴᑌ",
"V" => "ꓦᏙѴⅤVꛟV۷٧ⴸᐯ",
"W" => "ԜWwꓪWwᏔᎳ",
"X" => "xꞳXꓫⅩΧ╳ᚷXⲬⵝχХ᙭",
"Y" => "ᎩʏyҮϒγᎽꓬyуYYУⲨΥ",
"Z" => "ℨℤᏃΖꓜZZ",
"a" => "ã⍺αǎɑâаaáạäàăåȧaą",
"b" => "ЬḇƅᏏᖯḅdḃlɓƄbbʙ",
"c" => "ᴄⲥꮯᏟϲсⅭcⅽc",
"d" => "ꓒԁᏧɗḏďddɖlᑯⅾḓժḑḋđcḍbⅆ",
"e" => "ꬲ℮êėⅇȩҽēḛĕɇẹℯęéeëèеěce",
"f" => "ꞙƒfẝfքꬵſϝḟꜰ",
"g" => "ɡᶃɢǧgqģgնցġℊĝǥƍğǵ",
"h" => "ħȟհᏂⱨẖһlḥḩℎɦhhĥḧḣḫ",
"i" => "ιⅠiᛁɨꭵاӏ1lȋᎥ˛⍳ιіꙇⅰɪỉīĭiͺíɩℹịǐïⅈıIì",
"j" => "jϳյɉʝјⅉj",
"k" => "ḳḵkκⱪkķᴋ",
"m" => "ᴍmmṁⅿḿṃɱrn",
"n" => "nñrռmꞑṅńņǹɴnṇňṉո",
"o" => "ᴏ",
"p" => "ƥṗᏢṕpρ⍴ƿϱⲣPpр",
"q" => "gգqʠqႭԛႳզ",
"r" => "ṛrᴦꭈɼṙṟꭇȑԻгɾŕɍȓⲅŗrřʀɽꮁ",
"s" => "ꜱႽЅṣƽŝṡSʂśSssᏚѕꮪșšՏ",
"t" => "ṫᎢțƫτţtṭtŧ",
"u" => "ůūǔùUꭎuՍUųűưꞟʉսûԱú⋃uũȗụüυμʋŭȕᴜꭒ",
"v" => "⋁ѵѴvvⱱνטⱴᴠ∨ⅴṽꮩṿᶌ",
"w" => "ẅẘɯWvwẇẁẉWwẃԝꮃաⱳᎳŵᴡѡ",
"x" => "x⤬ᕽⅩᕁ᙮х×⤫ⅹχx⨯",
"y" => "ʏɣyҮŷγƴỿℽɏꭚẏყỵүȳyýÿуYYᶌΥ",
"z" => "ꮓźzᏃʐƶżⱬẕᴢẓz"
);
foreach ($homoglyphes as $letter=>$glyphes) {
$tab = mb_str_split($glyphes);
$text = str_replace($tab, $letter, $text);
}
echo $text;
?>
输出有问题:
I dAN'T sToP sHARING THE GooD NEws ABouT foREx nARkET donPANy.
wHEN I fIRsT HEARD IT, I wAs AfRAID BuT LATER sunnoNED douRAGE AND nADE A nowE wITH $2OO
I sTILL dAN'T BELIEwE THE PLATfoRn Is sO REAL uNTIL I REdEIwED $3,iOO IN 48HOuRs Of tnade As ny PRofIT
THIs Is youR nonENT of REDEnPTIoN JusT oNE dLIdk AwAy fRon GREATNEss, nAkE A nowE Now LET HIsToRy BE nADE
HERE Is THE LINk BELow
我不知道为什么。我可以获得正确结果的唯一方法是使用 TESSERACT-OCR(光学字符识别),但我需要创建一个带有文本的图像,这对于每秒处理数百条消息的机器人来说不是一个选项。
任何帮助,将不胜感激。谢谢你。