oracle - 为什么Oracle在加载UTF8数据时将U+2002转为U+0020？

Question

使用 UTF8 作为字符集将 blob 数据转换为 clob 数据时，Oracle (11gR2) 似乎将字节 0xE2 0x80 0x82 转换为 0x20。正如我所怀疑的那样，这是完全错误的吗？甚至可能是一个错误？

我需要编辑包含 UTF8 编码数据的 clob，同时保留原始字符（所有字符，尤其是 EN 空间）。

仅供参考：0xE2 0x80 0x82 是“EN 空间”，而 0x20 是“正常空间”

http://docs.oracle.com/cd/B19306_01/appdev.102/b14258/d_lob.htm#i1020356

http://en.wikipedia.org/wiki/Space_%28punctuation%29

http://www.utf8-chartable.de/unicode-utf8-table.pl?start=8192

declare
    v_clob                  clob;
    v_clob_content          varchar2(8);
    v_content_ok            varchar2(64);
    v_content_bad           varchar2(64);
    v_blob                  blob;
    v_raw                   raw(8);

    v_c_dst_offset          pls_integer := 1;
    v_c_src_offset          pls_integer := 1;
    v_c_ctx                 pls_integer := 0;
    v_c_warn                pls_integer;
begin

    -- Create temporary lobs and cache them
    dbms_lob.createtemporary(v_blob,true);
    dbms_lob.createtemporary(v_clob,true);

    -- Write 0xE2 0x80 0x82 to the BLOB
    v_raw := hextoraw('E2') || hextoraw('80') || hextoraw('82');
    dbms_lob.write(v_blob,utl_raw.length(v_raw),1,v_raw);

    -- Convert the BLOB to CLOB using AL32UTF8 as encoding
    dbms_lob.converttoclob(
       v_clob,
       v_blob,
       dbms_lob.getlength(v_blob),
       v_c_dst_offset,
       v_c_src_offset, 
       nls_charset_id('AL32UTF8'),
       v_c_ctx,
       v_c_warn);

    -- Put the CLOB contents into a varchar2
    v_clob_content := dbms_lob.substr(v_clob,dbms_lob.getlength(v_clob));   

    -- Output the HEX value of respectively the BLOB content and the CLOB content
    select rawtohex(v_clob_content) into v_content_bad from dual;
    select rawtohex(v_raw) into v_content_ok from dual;
    dbms_output.put_line('[' || v_content_bad || ']');
    dbms_output.put_line('[' || v_content_ok || ']');

    -- Release resources
    dbms_lob.freetemporary(v_clob);
    dbms_lob.freetemporary(v_blob);
end;
/

输出：

[20]
[E28082]

PL/SQL procedure successfully completed.

score 3 · Accepted Answer

您的数据库可能没有在AL32UTF8字符集下运行？

IE：

SQL> col value format a20
SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET%';

PARAMETER                      VALUE
------------------------------ --------------------
NLS_CHARACTERSET               AL32UTF8
NLS_NCHAR_CHARACTERSET         UTF8

SQL> declare
  2      v_clob                  clob;
  3      v_clob_content          varchar2(8);
  4      v_content_ok            varchar2(64);
  5      v_content_bad           varchar2(64);
  6      v_blob                  blob;
  7      v_raw                   raw(8);
  8
  9      v_c_dst_offset          pls_integer := 1;
 10      v_c_src_offset          pls_integer := 1;
 11      v_c_ctx                 pls_integer := 0;
 12      v_c_warn                pls_integer;
 ...
 46  end;
 47  /
[E28082]
[E28082]

PL/SQL procedure successfully completed.

vs 在非 al32utf8 数据库中运行：

SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET%';

PARAMETER                      VALUE
------------------------------ --------------------
NLS_NCHAR_CHARACTERSET         UTF8
NLS_CHARACTERSET               WE8ISO8859P15

SQL> set serverout on
SQL> declare
  2      v_clob                  clob;
...
 47  /
[BF]
[E28082]

lob 在我的西方 ISO 数据库中进行了转换。如果您需要保留AL32UTF8非二进制数据类型，请确保您的数据库在该字符集下运行。

oracle - 为什么Oracle在加载UTF8数据时将U+2002转为U+0020？

1 回答 1

Related

Reference