charset set on the page to curl is Shift_JIS and lang set to jp
function jp_new ($jp_text)
{
// Begin Curl
$session = curl_init();
//$url1 = "http://nihongo.j-talk.com/index.php";
$url1 = "http://www.romaji.org/index.php";
$parameters = '&text='.urlencode($jp_text).'&save=convert+text+to+Romaji';
$header = array(
"Accept-Language: jp",
"Accept-Charset: Shift_JIS");
// $header[] = "Accept-Language: ja";
//$parameters = 'kanji='.urlencode($jp_text).'&converter=spaced&Submit=Translate+Now';
curl_setopt($session, CURLOPT_HTTPHEADER, $header);
curl_setopt($session, CURLOPT_POSTFIELDS, $parameters);
curl_setopt($session, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($session, CURLOPT_RETURNTRANSFER, true);
curl_setopt($session, CURLOPT_POST, true);
curl_setopt($session, CURLOPT_URL, $url1);
$jp_page = curl_exec($session);
curl_close($session);
//$pattern = "/romaji'>(.+?)</s";
$pattern = "/color=\"red\">(.+?)</s";
preg_match_all ($pattern, $jp_page, $result_ro);
return $result_ro[1];
}
i get a result but its messed up and not the same result i would get if i submited the form from romaji.com manually. result i get when jptext = "犬猫" is "kou (kigou)(kigou) shin i"
im sure the preg match only will find one match and its finding it in the right place. but it seems like some sort of encoding problem, but idk really.
a similar curl worked for "http://nihongo.j-talk.com/index.php" (the commented out variables) but it seems they have banned me so i need to adapt it to work for this new url romaji.org
UPDATE: the charset on the romaji.org page is Shift_JIS, and my page is UTF-8 so i tried adding the curlopt header to the curl as in the code example now, the result in the output differed little, one of the words in brackets was removed, result is still messed up.