python - Error parsing XML with em dash

Question

I'm working on a web app that pulls in a list of tweets through a python script. When I pull in a tweet that contains an em dash, I'm unable to parse the XML file.

My script is:

#! /usr/bin/python
import cgi
from peewee import *
from sql_connect import *
import sql_connect
import sys

xmlString = ""

# Create XML string
xmlString += "<TweetList>"

tweets = Tweet_Info.select()
for tweet in tweets:
    xmlString += "<Tweet>"
    xmlString += "<UserName>"
    xmlString += tweet.user
    xmlString += "</UserName>"
    xmlString += "<UserImage>"
    xmlString += tweet.user_image_url
    xmlString += "</UserImage>"
    xmlString += "<Text>"
    xmlString += tweet.text
    xmlString += "</Text>"
    xmlString += "</Tweet>"

xmlString += "</TweetList>"

# Print beginning xml stuff
print "Content-Type: text/xml"
print
print '<?xml version="1.0" encoding="UTF-8"?>' 
print xmlString

The error it gives when I load the python script in the browser is:

XML Parsing Error: no element found
Location: http://localhost/cgi-bin/GetTweets2.py
Line Number 2, Column 1:

I feel like the solution to this is probably fairly simple. I've tried using a variety of different encoding types for the xml, but with no success. Is there a specific encoding type that I should use? Or is there a simple way of filtering out a special character that I'm missing?

score 0 · Accepted Answer

如果要生成 XML，最好以正确的方式进行：创建一个数据结构来保存要序列化的数据，并使用内置的 Python 功能将其转换为 XML。这种方法还有一个优点，就是您不必担心编码错误和奇怪的输入。（想想如果一条推文包含文本，你当前的脚本会发生什么</Text>。）

python - Error parsing XML with em dash

1 回答 1

Related

Reference