0

我有一个包含多个 unicode 文字的标准 Python3 字符串(例如 "\u00c6" )
我需要将它们转换回相应的字符(斯堪的纳维亚字母:æ、ø、å)

我试过用谷歌搜索它并使用.encode()and.decode()函数在 af 组合之间切换latin-1utf-8并且unicode-escape
.decode()只适用于bytes类型,所以它不起作用,因为它是一个字符串

该字符串来自使用 BeautifulSoup4 完成 的该网站的抓取:

landingPage = "https://www.kmdvalg.dk/Main/Home/KV"

def soupMe(pageLink):
return BeautifulSoup(urllib2.urlopen(pageLink), "html.parser", from_encoding='utf-8')

soup = soupMe(landingPage)

fullList = soup.find(class_="row-masonry")
letters = fullList.find_all("div", class_="masonry")
kommuner = []

for x in letters:
    for y in x.div.find_all("a"):
        kommuner.append({"label": y.string, "link": y.get("href")})

print(json.dumps(kommuner))

哪个输出:

[{"label": "Albertslund ", "link": "https://www.kmdvalg.dk/kv/2017/K84982165.htm"}, {"label": "Aller\u00f8d ", "link": "https://www.kmdvalg.dk/kv/2017/K84982201.htm"}, {"label": "Assens ", "link": "https://www.kmdvalg.dk/kv/2017/K84733420.htm"}, {"label": "Ballerup ", "link": "https://www.kmdvalg.dk/kv/2017/K84982151.htm"}, {"label": "Billund ", "link": "https://www.kmdvalg.dk/kv/2017/K84733530.htm"}, {"label": "Bornholm ", "link": "https://www.kmdvalg.dk/kv/2017/K84982400.htm"}, {"label": "Br\u00f8ndby ", "link": "https://www.kmdvalg.dk/kv/2017/K84982153.htm"}, {"label": "Br\u00f8nderslev ", "link": "https://www.kmdvalg.dk/kv/2017/K84712810.htm"}, {"label": "Drag\u00f8r ", "link": "https://www.kmdvalg.dk/kv/2017/K84982155.htm"}, {"label": "Egedal ", "link": "https://www.kmdvalg.dk/kv/2017/K84982240.htm"}, {"label": "Esbjerg ", "link": "https://www.kmdvalg.dk/kv/2017/K84733561.htm"}, {"label": "Fan\u00f8 ", "link": "https://www.kmdvalg.dk/kv/2017/K84733563.htm"}, {"label": "Favrskov ", "link": "https://www.kmdvalg.dk/kv/2017/K84713710.htm"}, {"label": "Faxe ", "link": "https://www.kmdvalg.dk/kv/2017/K84979320.htm"}, {"label": "Fredensborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84982210.htm"}, {"label": "Fredericia ", "link": "https://www.kmdvalg.dk/kv/2017/K84733607.htm"}, {"label": "Frederiksberg ", "link": "https://www.kmdvalg.dk/kv/2017/K84982147.htm"}, {"label": "Frederikshavn ", "link": "https://www.kmdvalg.dk/kv/2017/K84712813.htm"}, {"label": "Frederikssund ", "link": "https://www.kmdvalg.dk/kv/2017/K84982250.htm"}, {"label": "Fures\u00f8 ", "link": "https://www.kmdvalg.dk/kv/2017/K84982190.htm"}, {"label": "Faaborg-Midtfyn ", "link": "https://www.kmdvalg.dk/kv/2017/K84733430.htm"}, {"label": "Gentofte ", "link": "https://www.kmdvalg.dk/kv/2017/K84982157.htm"}, {"label": "Gladsaxe ", "link": "https://www.kmdvalg.dk/kv/2017/K84982159.htm"}, {"label": "Glostrup ", "link": "https://www.kmdvalg.dk/kv/2017/K84982161.htm"}, {"label": "Greve ", "link": "https://www.kmdvalg.dk/kv/2017/K84979253.htm"}, {"label": "Gribskov ", "link": "https://www.kmdvalg.dk/kv/2017/K84982270.htm"}, {"label": "Guldborgsund", "link": "https://www.kmdvalg.dk/kv/2017/K84979376.htm"}, {"label": "Haderslev ", "link": "https://www.kmdvalg.dk/kv/2017/K84733510.htm"}, {"label": "Halsn\u00e6s ", "link": "https://www.kmdvalg.dk/kv/2017/K84982260.htm"}, {"label": "Hedensted ", "link": "https://www.kmdvalg.dk/kv/2017/K84713766.htm"}, {"label": "Helsing\u00f8r ", "link": "https://www.kmdvalg.dk/kv/2017/K84982217.htm"}, {"label": "Herlev ", "link": "https://www.kmdvalg.dk/kv/2017/K84982163.htm"}, {"label": "Herning ", "link": "https://www.kmdvalg.dk/kv/2017/K84713657.htm"}, {"label": "Hiller\u00f8d ", "link": "https://www.kmdvalg.dk/kv/2017/K84982219.htm"}, {"label": "Hj\u00f8rring ", "link": "https://www.kmdvalg.dk/kv/2017/K84712860.htm"}, {"label": "Holb\u00e6k ", "link": "https://www.kmdvalg.dk/kv/2017/K84979316.htm"}, {"label": "Holstebro ", "link": "https://www.kmdvalg.dk/kv/2017/K84713661.htm"}, {"label": "Horsens ", "link": "https://www.kmdvalg.dk/kv/2017/K84713615.htm"}, {"label": "Hvidovre ", "link": "https://www.kmdvalg.dk/kv/2017/K84982167.htm"}, {"label": "H\u00f8je-Taastrup ", "link": "https://www.kmdvalg.dk/kv/2017/K84982169.htm"}, {"label": "H\u00f8rsholm ", "link": "https://www.kmdvalg.dk/kv/2017/K84982223.htm"}, {"label": "Ikast-Brande ", "link": "https://www.kmdvalg.dk/kv/2017/K84713756.htm"}, {"label": "Ish\u00f8j ", "link": "https://www.kmdvalg.dk/kv/2017/K84982183.htm"}, {"label": "Jammerbugt ", "link": "https://www.kmdvalg.dk/kv/2017/K84712849.htm"}, {"label": "Kalundborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84979326.htm"}, {"label": "Kerteminde ", "link": "https://www.kmdvalg.dk/kv/2017/K84733440.htm"}, {"label": "Kolding ", "link": "https://www.kmdvalg.dk/kv/2017/K84733621.htm"}, {"label": "K\u00f8benhavn ", "link": "https://www.kmdvalg.dk/kv/2017/K84982101.htm"}, {"label": "K\u00f8ge ", "link": "https://www.kmdvalg.dk/kv/2017/K84979259.htm"}, {"label": "Langeland ", "link": "https://www.kmdvalg.dk/kv/2017/K84733482.htm"}, {"label": "Lejre ", "link": "https://www.kmdvalg.dk/kv/2017/K84979350.htm"}, {"label": "Lemvig ", "link": "https://www.kmdvalg.dk/kv/2017/K84713665.htm"}, {"label": "Lolland ", "link": "https://www.kmdvalg.dk/kv/2017/K84979360.htm"}, {"label": "Lyngby-Taarb\u00e6k ", "link": "https://www.kmdvalg.dk/kv/2017/K84982173.htm"}, {"label": "L\u00e6s\u00f8 ", "link": "https://www.kmdvalg.dk/kv/2017/K84712825.htm"}, {"label": "Mariagerfjord ", "link": "https://www.kmdvalg.dk/kv/2017/K84712846.htm"}, {"label": "Middelfart ", "link": "https://www.kmdvalg.dk/kv/2017/K84733410.htm"}, {"label": "Mors\u00f8 ", "link": "https://www.kmdvalg.dk/kv/2017/K84712773.htm"}, {"label": "Norddjurs ", "link": "https://www.kmdvalg.dk/kv/2017/K84713707.htm"}, {"label": "Nordfyns ", "link": "https://www.kmdvalg.dk/kv/2017/K84733480.htm"}, {"label": "Nyborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84733450.htm"}, {"label": "N\u00e6stved ", "link": "https://www.kmdvalg.dk/kv/2017/K84979370.htm"}, {"label": "Odder ", "link": "https://www.kmdvalg.dk/kv/2017/K84713727.htm"}, {"label": "Odense ", "link": "https://www.kmdvalg.dk/kv/2017/K84733461.htm"}, {"label": "Odsherred ", "link": "https://www.kmdvalg.dk/kv/2017/K84979306.htm"}, {"label": "Randers ", "link": "https://www.kmdvalg.dk/kv/2017/K84713730.htm"}, {"label": "Rebild ", "link": "https://www.kmdvalg.dk/kv/2017/K84712840.htm"}, {"label": "Ringk\u00f8bing-Skjern", "link": "https://www.kmdvalg.dk/kv/2017/K84713760.htm"}, {"label": "Ringsted ", "link": "https://www.kmdvalg.dk/kv/2017/K84979329.htm"}, {"label": "Roskilde ", "link": "https://www.kmdvalg.dk/kv/2017/K84979265.htm"}, {"label": "Rudersdal ", "link": "https://www.kmdvalg.dk/kv/2017/K84982230.htm"}, {"label": "R\u00f8dovre ", "link": "https://www.kmdvalg.dk/kv/2017/K84982175.htm"}, {"label": "Sams\u00f8 ", "link": "https://www.kmdvalg.dk/kv/2017/K84713741.htm"}, {"label": "Silkeborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84713740.htm"}, {"label": "Skanderborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84713746.htm"}, {"label": "Skive ", "link": "https://www.kmdvalg.dk/kv/2017/K84713779.htm"}, {"label": "Slagelse ", "link": "https://www.kmdvalg.dk/kv/2017/K84979330.htm"}, {"label": "Solr\u00f8d ", "link": "https://www.kmdvalg.dk/kv/2017/K84979269.htm"}, {"label": "Sor\u00f8 ", "link": "https://www.kmdvalg.dk/kv/2017/K84979340.htm"}, {"label": "Stevns ", "link": "https://www.kmdvalg.dk/kv/2017/K84979336.htm"}, {"label": "Struer ", "link": "https://www.kmdvalg.dk/kv/2017/K84713671.htm"}, {"label": "Svendborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84733479.htm"}, {"label": "Syddjurs ", "link": "https://www.kmdvalg.dk/kv/2017/K84713706.htm"}, {"label": "S\u00f8nderborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84733540.htm"}, {"label": "Thisted ", "link": "https://www.kmdvalg.dk/kv/2017/K84712787.htm"}, {"label": "T\u00f8nder ", "link": "https://www.kmdvalg.dk/kv/2017/K84733550.htm"}, {"label": "T\u00e5rnby ", "link": "https://www.kmdvalg.dk/kv/2017/K84982185.htm"}, {"label": "Vallensb\u00e6k ", "link": "https://www.kmdvalg.dk/kv/2017/K84982187.htm"}, {"label": "Varde ", "link": "https://www.kmdvalg.dk/kv/2017/K84733573.htm"}, {"label": "Vejen ", "link": "https://www.kmdvalg.dk/kv/2017/K84733575.htm"}, {"label": "Vejle ", "link": "https://www.kmdvalg.dk/kv/2017/K84733630.htm"}, {"label": "Vesthimmerlands ", "link": "https://www.kmdvalg.dk/kv/2017/K84712820.htm"}, {"label": "Viborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84713791.htm"}, {"label": "Vordingborg ", "link": "https://www.kmdvalg.dk/kv/2017/K84979390.htm"}, {"label": "\u00c6r\u00f8 ", "link": "https://www.kmdvalg.dk/kv/2017/K84733492.htm"}, {"label": "Aabenraa", "link": "https://www.kmdvalg.dk/kv/2017/K84733580.htm"}, {"label": "Aalborg", "link": "https://www.kmdvalg.dk/kv/2017/K84712851.htm"}, {"label": "Aarhus", "link": "https://www.kmdvalg.dk/kv/2017/K84713751.htm"}]

问题是标签变成了 unicode 文字,所以

  • æ变成\u00e6
  • Tønder变成T\u00f8nder
  • Ærø变成\u00c6r\u00f8

我如何获取一个字符串并将所有 unicode 文字实例替换为其相应的符号?

4

1 回答 1

0

此序列化ensure_ascii = True程序默认设置,因此输出字符串将始终为纯 ASCII。要在结果中获取 utf8 字符,您可以简单地添加ensure_ascii = Falsejson.dumps().

于 2017-11-30T15:44:57.287 回答