2

我试图通过只为我想在页面中显示的每个字符指定unicodes从Blob创建一个utf-8编码的 html页面。

例如:我试图显示字符 'a' 和 'b' 之间有一个不间断的空格。

var uint8 = new Uint8Array([97, 160, 98]); // 97 = a, 160 = non-breaking space, 98 = b

如果我只传入 ASCII 范围 (0-127) 内的代码单元,Blob 似乎可以正常工作,但只要有大于 127 的代码单元(例如:代码单元 160/不间断空格)它在 html 中显示为无法识别的字符。

问题

以下是我使用的代码,

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
</head>
<body>
    <div id="container">
        <a  id="nav" target="_blank" href="#"> click to navigate </a> <br />
        <iframe src="" id="i-frame"> </iframe>
    </div>
    <script type="text/javascript">
        var uint8 = new Uint8Array([97, 160, 98]);
        var blob = new Blob([uint8], { type: "text/html;charset=UTF-8" });
        var url = URL.createObjectURL(blob);
        document.getElementById("nav").href = url;
        document.getElementById("i-frame").src= url;
    </script>
</body>
</html>

经过一些发现后,我发现 UTF-8 最多使用 4 个字节来表示一个字符,并且在代码单元 127 之后,它需要两个字节来表示超过 127 个的代码单元(在 UTF-8 中);所以为了使我的 unicode 表示 blob 工作,我必须按如下方式创建 blob,

var uint8 = new Uint8Array([97, 194, 160, 98]);

问题 1 :当我们使用超过 127 个代码单元时,我们是否需要使用类似位移的技术(如https://gist.github.com/lihnux/2aa4a6f5a9170974f6aa )?

问题 2:但是如果我们做一些类似于 Base64 字符串的事情,它具有像图像或 pdf 这样的二进制数据,我们可以毫无问题地得到它的输出。

var base64EncodedString = 'ABC== etc..';
var decoded = atob(base64EncodedString);
var uint8 = new Uint8Array(decoded.length);
for (var i = 0; i < uint8.length; i++) {
    // creating a byte array out of code units which is same as html page create for the question 1
    uint8[i] = decoded.charCodeAt(i);
};

var blob = new Blob([uint8], { type: "image/jpeg" });
var url = URL.createObjectURL(blob);

问题 2 的代码

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
</head>
<body>
    <div id="container">
        <a  id="nav" target="_blank" href="#"> click to navigate </a> <br />
        <iframe src="" id="i-frame"> </iframe>
    </div>
    <script type="text/javascript">
        var imgBase64String = "/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBxMSEhMSExMVFRUXFRgXGBgVFRcYFhYVFxUWFxcYFRUYHSggGRslGxUWITEiJSkrLi4uFx8zODMsNygtLisBCgoKDg0OGhAQGi8lHyUtLS0tLS0tLS8tLS0tLS0tKy0tKy8tLS0tLS0tLS0tLS0tLy0tLS0tLS0tLS0tLS0tK//AABEIAOAA4QMBIgACEQEDEQH/xAAcAAEAAQUBAQAAAAAAAAAAAAAAAgMEBQYHAQj/xAA/EAACAQIDBgMFBgMGBwAAAAAAAQIDEQQhMQUGEkFRYXGBkQcTobHwFCIywdHhQmLxFSQzUoKSFhcjQ1Nysv/EABkBAQADAQEAAAAAAAAAAAAAAAABAgMEBf/EACIRAQEAAgICAwADAQAAAAAAAAABAhEDEiExBBNBFCJRcf/aAAwDAQACEQMRAD8A7gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGK21vDQwtlUl956RjnK3VrkiMspjN0ZUGgYn2iu74KCtycpu9uTaS+Fyz/5iYi/+HSa/1fqYfyuP/Ve0dLBomz/aPBu1ai4rrB3+Dt8zb9mbTpYiPFSmpLmtJLxi80aYcuGfqpllXgANEgAAAAAAAAAAAAAAAAAAAAAAAAAAAGK3hpV6lN0qFouWTm3a0eajzv3sVyupsYjeLeyNPip0mnJZOS5PpHr4nNMTNzk5OTcm823m33Zs+J3CxcVeMoS7KTT+KNbx2zqlB8NSnKL7r5PmeXzXkyu8ozu/1GnFEZ0rZEaditHPmZzFVaypWK2HxU6UlOnJwktHF2Yk+XyIu1rFtWDoGwd/U7QxKs9OOK+Mo/mvQ3ejVjOKlFqUWrpp3TXZnBGr+JnNgbeq4V3hK8OcG3wv9H3R1cXybPGS8ydiBj9j7YpYmkqsJK2jTdnGS1T+tGjUt8N+54bErD0VRajCNSpOpOyUJOSsrc7xXryyOy5yTs0xly9N9BrWA34wdSjTquqlxRu0k3ZrJ2y0unbseS37wSy45f7Sdw63/GzA1+hvng5O3vbeKdvVXMzhcZTqK8JxkuzTJRpXAAAAAAAAAAAAAAAAAAAAACniKEakXGcVKLyaayKgFmxpm2dwqc03Qk4S/wAsm3B+eq+Jpu0t1cZRu3SlJLnT+/52WaOylvtDHU6FOVWrJQhFXbZzZ/FwvmeFeu3AMTKcX96LXimnfwZ59tb/ABGT3y9plTEt08PHhpLRySvLvY1D+1an8cYS7xTi/Pl8PM5suLV8Xbb+JyWbjOe/XJ2+Pw/cvKGKXVetvn4mEwlaFRpK93y5+Fi+VBR1us+ngU9OfKXHxYvvtPDL8UoqSz79L37/ADNO9oNBqtTm84yp5NrnGT4l5cS9TYMbWcKcnorc+fl6Gr7zYqc4UryUqcW1HK0ouaV1fnF8HPNWOjiv46/j+cL4dN2ZudBUqalKTtGPO3L8yeN3QStwNrq9f6lTZW1fexp/fjFcEW7zSd+FZcOuvYy/9opfxuTWdlaK9Xm/ga9sXV0z21p7uTpLjm8uza+RW2TRr3vSUr/5k7JeXP1MzU2p7zKcINfzXa8ypS204pxjFRS5KKt8Csym/bXLiyuOuvlndjbQxcLKtOnNd5JS9TacPXU1dfNP4o5jVx8amaeZYVtpVaUrwm4tdGzackc2XxLP12IHP93d/rtU8RzyUl+fU32lVUkpRaaejRo5MsbjdVMABUAAAAAAAAAAAAAAAB43bNnAPaPvg8fXlSpy/u1N2il/3ZJ2c326dvE6B7Y95XhsKsPTdquIvG61jSX432vdRXi+hw2kjDmz/I7vicW72q6pUll9ZmybC2Iqv1qYClBq11yuZ3d3azo1I3eTavc5cbO3l6tl6f19tkW5NN3urdLfK5Z1d15U78Da8GbrgtsUqsVwtN9tLdcy5tGSOrpjY8y8uW9ZOaUcHTg375Od8rSuyrhKdODX2fDR4r/ineXC+3Fe2vI3tbGpSd9cy5hs6nBZWHRH3YT1GuYDY1SUc1GN3d8MUuXUy9HYUIxV1drVvMvHi4RWq8i2q7YjyK/Xji0+/kzef2bEtsZsiPLIVNspFhjNuxXO/mV/q0n2LHE4JRvxNro1179jC1Mbf7rd7adfAudqbU4ot36mo1MRJyur9fLW5XXXy6Jlc5qs3i3bR3Tz8H+ptO4e+DpSVKpK9Nu2esWc/jjL6+aKdKtwybu+3frny6mmHL5cnPw7j6ghJNJp3TzT7HppHsz3g99SdCbvKH4X1j+xu50vMs1dAACAAAAAAAAAAAAC32jilSpVKr0hCU/9sW/yA+evaVtX7RtOvJ5wpP3MVfK0Pxes3I12jHmU3VlNucs5Tbk/GTbfxbLmMbNx53+szhzy3dvb4cOskV4Tyt3vfn4FSMijcg5GOnV20yOFx1Sm7xkzKf8AFVVpRV0+TRr0alrFLjNMcrIzzxxy9xseH3trwbTk2i9jvfK1m5eenkajGfLun6d/MuHZRbtp+5pM6zvDhfxn8RvBKS5mPW15Wsm07u+evTL19TE+/bV/gW7qGeVtaYzHH1GbrbVk1qWNTFN3zLOM7kZSKTFa8nhcKtdNNvTLx7lGnVcXdOzz07qz+Z5Vmnaytlnnq88+xTRdnaqUqtnpyaz7nnvfgU0SryXArL713d9Vyy9fUmKW+G3bg7TdLEwknpKztzTdj6CjK6TWjVz5d3ak4zT7r5n0ju5iPeYalL+W3pkduF3i8nnms2SABZiAAAAAAAAAAAaz7SsRwbMxb5uk4L/W1H8zZjT/AGsxvsyuu8P/ALRGXqr8c3nP+vn2loivSazvrbLxvz8rlOGhGDzOCvbl0rp5Hsa3De3NNadfEpspMiRa5K3K9+ehGXjfL0fQpXBOldqsJCcyB5LJ2JNpU5EJs9sQkrN9granQkr59OXXkeSkeWyXxI3BtI9uQbJWA9hUtfK+X0ytTw3G7LPJPwfRkKMLmw7Iw8UvXLkvrL0L4Y7rPPPrFHYmDcNVz+B3fcp/3WK6Nr5HJMDTvI67udG2GXeTfyOvGajzOa7rOAAlkAAAAAAAAAAAa/v7hfebPxMefu+JeMWn+RsBRxlBTpzg/wCKLj6qwvlON1dvlKT5EFJ8jL7wYD3VWUbaNmHepxXHT2Jn2m4rX5E3OycbLNp3tnlfR+fyKDqt5vw8kRlK5XS/ZUeXoEjyMu77+IjKzaa5c++jJQmnyKcos8jNp3XIlUq3zJ0be3yKbbF1w87538OWXqexS7+pERfKV8iCRUnQa75Xy5dn3EUubGkoqJJakp2ysuWfdkVC+nQnQr0GZ/ZcrtZ816GswdjZthUsr88jXBhy1ndnwzy5s7FsOjwUKcf5fmc03cwDnUiu51eEbJJclY6J6ednd1IABQAAAAAAAAAAAAAcj9pO7qVaVRJ2kr5d9ficxxeEfG+d809L2/ofSm8uzFXotW+8s1+aOG7f2Y4vSzTMuTHcdvxuT8rUbaohGLLqdJ8Tyzvp18ChGdnllrp35HM7bEYyKjtbLXn+xCydlzvqSpQTlZu3fVfAEUrnsIu1+V7FT3TbsmtMrtJc3qyNKDzXn6fTJ2jXlKC7ft3JKVjyku/79io7ZJrS92tX43IWk8PXN2S73KahfTW9rc3cmnb0sU1HqSmql76+H6Hk1Z2fLoRbPITtnfPTyJ0japBp2Vud7m7bvYa8U+X9DScLBylFRXO1+rOr7A2bdQpRWeSf5m/FHH8jLTbdy9n2TqNaZLx5m2FHB4dU4RgtEisauAAAAAAAAAAAAAAAAANC363fWdWKyetuT/Q30hWpKcXGSumrNBMurt81bXwDi76P5mEnC6fU6vvnsH3EmmrwlnGX5eJzfaGF4JduqOfPDVelxcvbHVYzi0yXQ8v0yJVPwtWWt78yhN+OhXqtc9Kr1J5eVvr4lDj0fPnd6slGS6Z3+HgR1TM1eCWpGMswnz5X5aeRGM9ejel9PH1I0v2VJVLSuvIj7y6d34IhHrk/G+Z4s33fzJ0i5KvA7X6fnoRvdW0tfO2bfQQlr/UrYDDSqNLvkvEt12i5SRm909mt1OJrT5s7hufs3hj7xrsvzZom6uyfvU6MV9569lzkzrtCkoRUVolZHRjNR5nLn2yTABLMAAAAAAAAAAAAAAAAAISqWAttq7OhiKcqU1k9HzT5NHBd8tj1MJVlTqLLWMuUo9U/qx32pirGu704fD4uk6VdXWsZLKUJdYvkRZtfDK4187VZJPsWdStn2uZzebdPEYeTdP8A69LlKK+8l/NDW/hc1GdZp24Wn0s7+hn1dP2r9VcrPx+mVlMt8Pg6rV3Hh/8AbJvyE4yjlJNd9blbF8c4vKkvgiKfV2LR1vpnk6+mZGmn2Rdudly+F/Ihxr6ZaPFdPQnClUab4Wl3y+eY6q3ki+w95Oy1f6m27JjCgk8p1nolna/XuaVs3B16s1FNUk9ZO7y8sztW4WwMLhkp3dat/wCSotH/ACR5fFmmM05+Tk7Nt3H2O8PSdSr/AI1TOV/4Y8o/mzZ1UMZTxFy4hM0c69TPShCRViEJABgAAAAAAAAAAAAAHkkUKlNlwAMXXw7ZiMdsuUja7EXTQTty7ae7FWV7Nmv1tyKl78J290F0IvCx6EaT2cJqbnVf8rKE9z6jVnBtHe3g49Dz7FDoNJ7184Yn2eVW/u8UV0tc9pez2pzi343+R9HfYYdB9hh0RHU71wHDbiVF/DbwRk6G4knqmdsWCh0JLCx6E6O9cu2duYo/wmzYHY3BojbVQj0JKmug0i5MVh8K0XlOiXSR6SqhGmTSAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA//2Q==";

        var decoded = atob(imgBase64String);

        var uint8 = new Uint8Array(decoded.length);
        for (var i = 0; i < uint8.length; i++) {
            uint8[i] = decoded.charCodeAt(i);
        };

        var blob = new Blob([uint8], { type: "image/jpeg" });
        var url = URL.createObjectURL(blob);
        document.getElementById("nav").href = url;
        document.getElementById("i-frame").src= url;
    </script>
</body>
</html>

第二种方法是否有效,因为图像、pdf 等二进制文件没有字符集?

任何人都可以请解释这两种情况。

4

1 回答 1

1

您使用的代码点是 UTF-16 的代码点。UTF-8 中的U+00A0 NO-BREAK SPACE 字符由两个字节表示0xC2 0xA0

new Blob( [ <ArrayBuffer> ] )根本不会更改传递给 Blob 的数据,因此 ArrayBuffer 中的字节也将在生成的 Blob 中。
因此,此时您创建了一个以 UTF-16 编码的文本文件。
然后,您告诉浏览器以 UTF-8 编码的 html 文档的形式获取它。当它单独看到 0xA0 字节时,它不知道该怎么做,并用 U+FFFD REPLACEMENT CHARACTER 替换它。

因此,如果您想从"a\u00A0b"以 UTF-8 编码的字符串制作 Blob,您可以直接传递该字符串,因为new Blob( [ <DOMString> ] )它会自动将 DOMString 编码为 UTF-8

var data = "a\u00a0b";
var blob = new Blob([data], { type: "text/html;charset=UTF-8" });
var url = URL.createObjectURL(blob);
document.getElementById("i-frame").src= url;
<div id="container">
    <iframe src="" id="i-frame"> </iframe>
</div>

或者如果你真的只是让这个 ArrayBuffer 在 Uint8Array 中填充了一些 UTF-16 代码点(这看起来很奇怪,因为并非所有 UTF-16 字符都可以用一个字节表示),那么你可以从这些代码点生成一个 DOMString :

var uint8 = new Uint8Array([97, 160, 98]);
var data = [...uint8].map( (code) => String.fromCharCode(code) ).join( "" );
var blob = new Blob([data], { type: "text/html;charset=UTF-8" });
var url = URL.createObjectURL(blob);
document.getElementById("i-frame").src= url;
<div id="container">
    <iframe src="" id="i-frame"> </iframe>
</div>

最后,如果您将数据正确编码为 UTF-16,每个字符正确 2 个字节(即在 Uint16Array 中),那么您可以使用 TextDecoder:

var uint16 = new Uint16Array([97, 160, 98]);
var data = new TextDecoder("utf-16").decode( uint16 );
var blob = new Blob([data], { type: "text/html;charset=UTF-8" });
var url = URL.createObjectURL(blob);
document.getElementById("i-frame").src= url;
<div id="container">
    <iframe src="" id="i-frame"> </iframe>
</div>

或者,如果你真的只想要这个 UTF-8 文本的 Uint8Array,那么使用正确的值:

var uint8 = new Uint8Array([97, 0xC2, 0xA0, 98]);
var blob = new Blob([uint8], { type: "text/html;charset=UTF-8" });
var url = URL.createObjectURL(blob);
document.getElementById("i-frame").src= url;
<div id="container">
    <iframe src="" id="i-frame"> </iframe>
</div>

于 2020-11-27T09:33:29.833 回答