0

我正在研究 Delphi XE5 并使用 IDHTTP 从服务器获取 XML。获取 XML 工作正常,但有一些损坏的字符。字符是“•”(项目符号点)。其他都很好,但要点被打破了。

我创建了如下 IDHTTP:

idhttps := TIdHTTP.Create();
idhttps.IOHandler := TIdSSLIOHandlerSocketOpenSSL.Create(nil);
idhttps.IOHandler.DefStringEncoding := IndyTextEncoding(TEncoding.UTF8);
idhttps.HandleRedirects := True;
idhttps.ConnectTimeout := 5000;
idhttps.Request.USERNAME := 'USERNAME';
idhttps.Request.PASSWORD := 'PASSWORD';
idhttps.Request.BasicAuthentication := True;
idhttps.Request.Accept := 'text/xml';

然后得到如下的xml:

SS := TStringStream.Create('', TEncoding.UTF8);

try
  self.GetIdHTTPForLexicomp.Get(URL, SS);
  XMLDoc := TXMLDocument.Create(nil);
  XMLDoc.LoadFromStream(SS, TXMLEncodingType.xetUTF_8Like);
finally
  SS.Free;
end;

在 XML 项目符号点显示如下:

? 过敏反应/超敏反应:可能引起超敏反应,

XML 标头如下:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

我应该检查什么?

更新:我添加了 XML 片段。它需要一个用于样式的 XSL 文件,但在这种情况下,我认为这不是问题。“?” 是破碎的性格。

<?xml version="1.0" standalone="yes"?>
<ns2:monogragh>
  <monograghFields>
    <field fieldId="234837" fieldTypeCode="war" created="2005-04-07T17:28:33Z" modified="2014-10-02T11:32:57Z" sectionId="0">
      <fieldName>Warnings/Precautions</fieldName>
      <content>
        <div id="war" class="block">
          <p style="text-indent:-2em;margin-left:2em;text-align:justify;">
            <b>
              <i>Concerns related to adverse effects:</i>
            </b>
          </p>
          <p style="text-indent:-2em;margin-left:4em;text-align:justify;">
            ? Anaphylaxis/hypersensitivity: May cause hypersensitivity reactions, including anaphylaxis; use with caution in patients with anaphylactic disorders.
          </p>
        </div>
      </content>
    </field>
  </monograghFields>
</ns2:monogragh>

看来我提供了错误信息。我附上了捕获的 xml 片段。第一个是使用rest客户端工具从浏览器获取的结果,最后一个是通过idhttp获取xml的结果。

使用 REST 客户端工具从浏览器获取 XML。

通过 idhttp 获取 XML

4

1 回答 1

5
  1. Do not set the IOHandler.DefStringEncoding property when using TIdHTTP. Let TIdHTTP handle encodings its own ways.

  2. Using a TStream to receive the XML is the correct choice. However, using a TStringStream in particular is not a good choice, because it is bound to the TEncoding you specify in the constructor. If the XML is not encoded in the same charset that the TEncoding implements, the XML would not be decoded properly. Use a TMemoryStream or TBytesStream instead, to preserve the original XML bytes as-is.

  3. XML is self-describing when it comes to its encoding. Do not tell TXMLDocument the encoding it should use, let the XML itself tell TXMLDocument which encoding to use.

Try this:

idhttps := TIdHTTP.Create();
idhttps.IOHandler := TIdSSLIOHandlerSocketOpenSSL.Create(idhttps);
idhttps.HandleRedirects := True;
idhttps.ConnectTimeout := 5000;
idhttps.Request.USERNAME := 'USERNAME';
idhttps.Request.PASSWORD := 'PASSWORD';
idhttps.Request.BasicAuthentication := True;
idhttps.Request.Accept := 'text/xml';

MS := TMemoryStream.Create;
try
  idhttps.Get(URL, MS);
  MS.Position := 0;
  XMLDoc := TXMLDocument.Create(nil); // XMLDoc must be IXMLDocument, or a memory leak occurs
  XMLDoc.LoadFromStream(MS);
finally
  MS.Free;
end;

Now, TXMLDocument should be parsing the raw bytes that the server actually sends, without any interpretation by TIdHTTP or the RTL beforehand.

If you are still having the same problem, then either the XML itself is not properly encoded to begin with, or you are not processing/displaying the XML correctly after it has been loaded into TXMLDocument. Neither of which you have shown yet, so we can only guess where your actual problem lies, outside of what I mentioned above.

于 2015-02-06T04:34:14.613 回答