53

我有一堆不是 UTF-8 编码的文件,我正在将一个站点转换为 UTF-8 编码。

我正在对要以 UTF-8 保存的文件使用简单的脚本,但这些文件以旧编码保存:

header('Content-type: text/html; charset=utf-8');
mb_internal_encoding('UTF-8');
$fpath = "folder";
$d = dir($fpath);
while (False !== ($a = $d->read()))
{
    if ($a != '.' and $a != '..')
    {
        $npath = $fpath . '/' . $a;

        $data = file_get_contents($npath);

        file_put_contents('tempfolder/' . $a, $data);
    }
}

如何以 UTF-8 编码保存文件?

4

10 回答 10

88

添加物料清单:UTF-8

file_put_contents($myFile, "\xEF\xBB\xBF".  $content); 
于 2012-01-28T19:24:09.707 回答
52

file_get_contents()file_put_contents()不会神奇地转换编码。

您必须显式转换字符串;例如使用iconv()mb_convert_encoding()

试试这个:

$data = file_get_contents($npath);
$data = mb_convert_encoding($data, 'UTF-8', 'OLD-ENCODING');
file_put_contents('tempfolder/' . $a, $data);

或者,使用 PHP 的流过滤器:

$fd = fopen($file, 'r');
stream_filter_append($fd, 'convert.iconv.UTF-8/OLD-ENCODING');
stream_copy_to_stream($fd, fopen($output, 'w'));
于 2011-01-29T21:09:12.093 回答
27
<?php
    function writeUTF8File($filename, $content) {
        $f = fopen($filename, "w");
        # Now UTF-8 - Add byte order mark
        fwrite($f, pack("CCC", 0xef, 0xbb, 0xbf));
        fwrite($f, $content);
        fclose($f);
    }
?>
于 2013-02-28T21:47:36.690 回答
5

Iconv来救援。

于 2011-01-29T21:09:21.573 回答
3

在 Unix/Linux 上,可以选择使用一个简单的 shell 命令来转换给定目录中的所有文件:

recode L1..UTF8 dir/*

它也可以通过 PHP 的exec()启动。

于 2011-01-30T06:01:48.270 回答
2
//add BOM to fix UTF-8 in Excel
fputs($fp, $bom =( chr(0xEF) . chr(0xBB) . chr(0xBF) ));

我从Cool那里得到了这条线

于 2016-01-26T16:18:34.230 回答
0

如果你想递归地使用 recode 并过滤类型,试试这个:

find . -name "*.html" -exec recode L1..UTF8 {} \;
于 2013-02-12T16:17:42.993 回答
0

这是一个非常有用的问题。我认为我在 Windows 10 PHP 7 上的解决方案对于还有一些 UTF-8 转换问题的人来说非常有用。

这是我的步骤。调用以下函数的 PHP 脚本,在utfsave.php中本身必须具有 UTF-8 编码,这可以通过UltraEdit上的转换轻松完成。

utfsave.php文件中,我们定义了一个调用 PHP fopen($filename, " wb ")的函数,即它以w写入模式打开,尤其是b二进制模式打开。

<?php
//
//  UTF-8 编码:
//
// fnc001: save string as a file in UTF-8:
// The resulting file is UTF-8 only if $strContent is,
// with French accents, Chinese ideograms, etc.
//
function entSaveAsUtf8($strContent, $filename) {
  $fp = fopen($filename, "wb");
  fwrite($fp, $strContent);
  fclose($fp);
  return True;
}

//
// 0. write UTF-8 string in fly into UTF-8 file:
//
$strContent = "My string contains UTF-8 chars ie 鱼肉酒菜 for un été en France";

$filename = "utf8text.txt";

entSaveAsUtf8($strContent, $filename);


//
// 2. convert CP936 ANSI/OEM - Chinese simplified GBK file into UTF-8 file
//
//   CP936: <https://en.wikipedia.org/wiki/Code_page_936_(Microsoft_Windows)>
//   GBK:   <https://en.wikipedia.org/wiki/GBK_(character_encoding)> 
//
$strContent = file_get_contents("cp936gbktext.txt");
$strContent = mb_convert_encoding($strContent, "UTF-8", "CP936");


$filename = "utf8text2.txt";

entSaveAsUtf8($strContent, $filename);

?>

源文件cp936gbktext.txt的内容:

>>Get-Content cp936gbktext.txt
My string contains UTF-8 chars ie 鱼肉酒菜 for un été en France 936 (ANSI/OEM - chinois simplifié GBK)

在 Windows 10 PHP 上运行utf8save.php ,从而创建utf8text.txtutf8text2.txt文件将自动保存为 UTF-8 格式。

使用这种方法,不需要 BOM 字符。BOM 解决方案很糟糕,因为它会在我们为 MySQL 获取 SQL 文件时引起麻烦。

值得注意的是,我未能完成工作file_put_contents($filename, utf8_encode($mystring)); 以此目的。

++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++++

如果不知道源文件的编码,可以用 PHP 列出编码:

print_r(mb_list_encodings());

这给出了一个这样的列表:

Array
(
  [0] => pass
  [1] => wchar
  [2] => byte2be
  [3] => byte2le
  [4] => byte4be
  [5] => byte4le
  [6] => BASE64
  [7] => UUENCODE
  [8] => HTML-ENTITIES
  [9] => Quoted-Printable
  [10] => 7bit
  [11] => 8bit
  [12] => UCS-4
  [13] => UCS-4BE
  [14] => UCS-4LE
  [15] => UCS-2
  [16] => UCS-2BE
  [17] => UCS-2LE
  [18] => UTF-32
  [19] => UTF-32BE
  [20] => UTF-32LE
  [21] => UTF-16
  [22] => UTF-16BE
  [23] => UTF-16LE
  [24] => UTF-8
  [25] => UTF-7
  [26] => UTF7-IMAP
  [27] => ASCII
  [28] => EUC-JP
  [29] => SJIS
  [30] => eucJP-win
  [31] => EUC-JP-2004
  [32] => SJIS-win
  [33] => SJIS-Mobile#DOCOMO
  [34] => SJIS-Mobile#KDDI
  [35] => SJIS-Mobile#SOFTBANK
  [36] => SJIS-mac
  [37] => SJIS-2004
  [38] => UTF-8-Mobile#DOCOMO
  [39] => UTF-8-Mobile#KDDI-A
  [40] => UTF-8-Mobile#KDDI-B
  [41] => UTF-8-Mobile#SOFTBANK
  [42] => CP932
  [43] => CP51932
  [44] => JIS
  [45] => ISO-2022-JP
  [46] => ISO-2022-JP-MS
  [47] => GB18030
  [48] => Windows-1252
  [49] => Windows-1254
  [50] => ISO-8859-1
  [51] => ISO-8859-2
  [52] => ISO-8859-3
  [53] => ISO-8859-4
  [54] => ISO-8859-5
  [55] => ISO-8859-6
  [56] => ISO-8859-7
  [57] => ISO-8859-8
  [58] => ISO-8859-9
  [59] => ISO-8859-10
  [60] => ISO-8859-13
  [61] => ISO-8859-14
  [62] => ISO-8859-15
  [63] => ISO-8859-16
  [64] => EUC-CN
  [65] => CP936
  [66] => HZ
  [67] => EUC-TW
  [68] => BIG-5
  [69] => CP950
  [70] => EUC-KR
  [71] => UHC
  [72] => ISO-2022-KR
  [73] => Windows-1251
  [74] => CP866
  [75] => KOI8-R
  [76] => KOI8-U
  [77] => ArmSCII-8
  [78] => CP850
  [79] => JIS-ms
  [80] => ISO-2022-JP-2004
  [81] => ISO-2022-JP-MOBILE#KDDI
  [82] => CP50220
  [83] => CP50220raw
  [84] => CP50221
  [85] => CP50222
)

如果你猜不出来,你一个一个尝试,因为mb_detect_encoding()不能轻易完成这项工作。

于 2020-03-30T10:57:57.200 回答
-1

我把所有东西放在一起,得到了将 ANSI 文本文件转换为“UTF-8 No Mark”的简单方法:

function filesToUTF8($searchdir, $convdir, $filetypes) {
  $get_files = glob($searchdir . '*{' . $filetypes . '}', GLOB_BRACE);
  foreach($get_files as $file) {
    $expl_path = explode('/', $file);
    $filename = end($expl_path);
    $get_file_content = file_get_contents($file);
    $new_file_content = iconv(mb_detect_encoding($get_file_content, mb_detect_order(), true), "UTF-8", $get_file_content);
    $put_new_file = file_put_contents($convdir.$filename, $new_file_content);
  }
}

用法:

filesToUTF8('C:/Temp/', 'C:/Temp/conv_files/', 'php,txt');
于 2017-10-05T12:26:37.167 回答
-6
  1. 在 Windows 笔记本中打开文件
  2. 将编码更改为 UTF-8 编码
  3. 保存您的文件
  4. 再试一次!:O)
于 2016-02-28T11:33:24.050 回答