我正在阅读从 CSV 文件复制的文本文件。当我在 python 中读取文件时,我得到了大量不必要的重复行,如下所示。我怎样才能去掉这三个不需要的行,包括每个文本开头和结尾的 \cf0 和 \cell\row?
还是我应该直接从 csv 文件本身读取文本?该文本仅位于 CSV 文件的一列中。
\itap1\trowd \taflags1 \trgaph108\trleft-108 \trbrdrl\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clbrdrt\brdrs\brdrw20\brdrcf2 \clbrdrl\brdrs\brdrw20\brdrcf2 \clbrdrb\brdrs\brdrw20\brdrcf2 \clbrdrr\brdrs\brdrw20\brdrcf2 \clpadl100 \clpadr100 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720
\cf0 i have been using your product and it has been helping me a lot to solve business problem,\cell \row
\itap1\trowd \taflags1 \trgaph108\trleft-108 \trbrdrl\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clbrdrt\brdrs\brdrw20\brdrcf2 \clbrdrl\brdrs\brdrw20\brdrcf2 \clbrdrb\brdrs\brdrw20\brdrcf2 \clbrdrr\brdrs\brdrw20\brdrcf2 \clpadl100 \clpadr100 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720
\cf0 I am very happy with your products. Very easy to use.\cell \row
\itap1\trowd \taflags1 \trgaph108\trleft-108 \trbrdrl\brdrnil \trbrdrr\brdrnil
\clvertalc \clshdrawnil \clbrdrt\brdrs\brdrw20\brdrcf2 \clbrdrl\brdrs\brdrw20\brdrcf2 \clbrdrb\brdrs\brdrw20\brdrcf2 \clbrdrr\brdrs\brdrw20\brdrcf2 \clpadl100 \clpadr100 \gaph\cellx8640
\pard\intbl\itap1\pardeftab720
\cf0 Many improvements with income tracker, and other time saving elements. Newer look, easier navigation. I believe there definitely is a time savings from past versions.\cell \row
这是 csv 文件的片段:
page_url Review_title Product_id Rating Publish_date Review_Description
www.blabla.com Great! 777777 5 01/01/14 Excellent upgrade! Was not disappointed!
我只从 Review_Description 列复制文本并将它们全部粘贴到文本文件中。
这是我的python代码来读取文件:
text_file=open("my_text.txt", "r")
lines=text_file.readlines()
print lines