我是来自 Java 世界的 Python 新手。
我正在尝试编写一个简单的 python 函数,它只打印出 CSV 或“arff”文件的数据行。非数据行以这 3 个模式 @ 、 [@ 、 [% 开头,不应打印此类行。
示例数据文件片段:
% 1. Title: Iris Plants Database % % 2. Sources: % (a) Creator: R.A. Fisher % (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov) % (c) Date: July, 1988 @RELATION iris @ATTRIBUTE sepallength REAL @ATTRIBUTE sepalwidth REAL @ATTRIBUTE petallength REAL @ATTRIBUTE petalwidth REAL @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica} @DATA 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa 4.6,3.1,1.5,0.2,Iris-setosa 5.0,3.6,1.4,0.2,Iris-setosa 5.4,3.9,1.7,0.4,Iris-setosa
Python脚本:
import csv
def loadCSVfile (path):
csvData = open(path, 'rb')
spamreader = csv.reader(csvData, delimiter=',', quotechar='|')
for row in spamreader:
if row.__len__ > 0:
#search the string from index 0 to 2 and if these substrings(@ ,'[\'%' , '[\'@') are not found, than print the row
if (str(row).find('@',0,1) & str(row).find('[\'%',0,2) & str(row).find('[\'@',0,2) != 1):
print str(row)
loadCSVfile('C:/Users/anaim/Desktop/Data Mining/OneR/iris.arff')
实际输出:
['% 1. Title: Iris Plants Database']
['% ']
['% 2. Sources:']
['% (a) Creator: R.A. Fisher']
['% (b) Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)']
['% (c) Date: July', ' 1988']
['% ']
[]
['@RELATION iris']
[]
['@ATTRIBUTE sepallength\tREAL']
['@ATTRIBUTE sepalwidth \tREAL']
['@ATTRIBUTE petallength \tREAL']
['@ATTRIBUTE petalwidth\tREAL']
['@ATTRIBUTE class \t{Iris-setosa', 'Iris-versicolor', 'Iris-virginica}']
[]
['@DATA']
['5.1', '3.5', '1.4', '0.2', 'Iris-setosa']
['4.9', '3.0', '1.4', '0.2', 'Iris-setosa']
['4.7', '3.2', '1.3', '0.2', 'Iris-setosa']
['4.6', '3.1', '1.5', '0.2', 'Iris-setosa']
['5.0', '3.6', '1.4', '0.2', 'Iris-setosa']
['5.4', '3.9', '1.7', '0.4', 'Iris-setosa']
['4.6', '3.4', '1.4', '0.3', 'Iris-setosa']
['5.0', '3.4', '1.5', '0.2', 'Iris-setosa']
期望的输出:
['5.1', '3.5', '1.4', '0.2', 'Iris-setosa']
['4.9', '3.0', '1.4', '0.2', 'Iris-setosa']
['4.7', '3.2', '1.3', '0.2', 'Iris-setosa']
['4.6', '3.1', '1.5', '0.2', 'Iris-setosa']
['5.0', '3.6', '1.4', '0.2', 'Iris-setosa']
['5.4', '3.9', '1.7', '0.4', 'Iris-setosa']
['4.6', '3.4', '1.4', '0.3', 'Iris-setosa']
['5.0', '3.4', '1.5', '0.2', 'Iris-setosa']