Alright,
thanks for the input, I did "my homework" based on it and got the results, working with 50MB sample of actual CSV data:
First, iterating over the file using PHP:
$in = fopen("a.txt", "r");
$out = fopen("p.txt", "w+");
$start = microtime(true);
while(($line = fgets($in)) !== false) {
$converted = iconv("UTF-8", "EUC-JP//TRANSLIT", $line);
fwrite($out, $converted);
}
$elapsed = microtime(true) - $start;
echo "<br>Iconv took $elapsed seconds\r\n";
Iconv took 2.2817220687866 seconds
That's not so bad I guess, so I tried the exact same approach in #bash, so it wouldn't have to load all the file but iterate over each line instead (which might not exactly happen as I understand what Lajos Veres answered). Indeed, this method wasn't exactly efficient (CPU was under a heavy load all the time). Also, the output file is smaller than the other 2, although after a quick look it looks the same, so I must have made a mistake in the bash script, however, that shouldn't have such effect on performance anyway:
#!/bin/bash
echo "" > b.txt
time echo $(
while read line
do
echo $line |iconv -f utf-8 -t EUC-JP//TRANSLIT >> b.txt
done < a.txt
)
real 9m40.535s
user 2m2.191s
sys 3m18.993s
And then the classic approach which I would have expected to hog the memory, however, checking the CPU/Memory usage, it didn't seem to take any more memory than any other approach, therefore being a winner:
#!/bin/bash
time echo $(
iconv -f utf-8 -t EUC-JP//TRANSLIT a.txt -o b2.txt
)
real 0m0.256s
user 0m0.195s
sys 0m0.060s
I'll try to get a bigger file sample to test the 2 more efficient methods to make sure the memory usage doesn't get significant, however, the result seems obvious enough to assume the single pass through the whole file in bash is the most efficient (I didn't try that in PHP, as I believe loading an entire file to an array/string in PHP isn't ever a good idea).