0

我有一个熊猫系列,其中包含从社交媒体中提取的几个文本。我注意到一些文本包含由表情符号编码产生的字符。我需要在情绪分析(vader 工具)中使用此文本,因此保留表情符号很重要。所以我正在使用解决方案: 我们如何解码表情符号,python中的特殊字符?

通过使用 encode('latin'),然后 decode('utf-8') 对 pandas 系列中的一行进行测试,如果我将值更改为 str 然后应用编码和解码,它可以正常工作并显示表情符号,但是如果我直接使用 pandas 系列,它不会像下面那样工作,我需要使用表情符号以便 vader 能够理解它们,不胜感激任何建议:

 # Direct encode&decode case
 posts_texts[127:128]
 output: Tesla increases Model S Plaid prices by $10k hours ahead of first deliveries ð How do they get away with it? If a traditional automaker did this surely theyâd be crucified.
 
 posts_texts[127:128].str.encode('latin')
 posts_texts[127:128].str.decode('utf-8')
 posts_texts[127:128]
 output: Tesla increases Model S Plaid prices by $10k hours ahead of first deliveries ð How do they get away with it? If a traditional automaker did this surely theyâd be crucified

 #change to string case
 str(posts_texts[127:128]) 
 output: Tesla increases Model S Plaid prices by $10k hours ahead of first deliveries ð\x9f\x99\x84 How do they get away with it? If a traditional automaker did this surely theyâ\x80\x99d be crucified

 str(posts_texts[127:128]).encode('latin').decode('utf-8')
output: Tesla increases Model S Plaid prices by $10k hours ahead of first deliveries  How do they get away with it? If a traditional automaker did this surely they’d be crucified
4

0 回答 0