1

Scatter plot using x,y coordinates proposes plots in Matplotlib that differ from those obtained using other programs. For example, here are the results of some PCA on the two fit score. The same graph using R and the same data provides different display…I also checked with Excell and Libreoffice : they provided the same display as R. Before Roaring against Matplotlib or report a bug, I would like to get other opinions and to check if I did things well. What are my flaws?

I checked that floats were not the problem, checked that coordinates order similarly,… So plot with R:

mydata = read.csv("C:/Users/Anon/Desktop/data.txt")  # read csv file
summary(mydata)
attach(mydata) 
plot(mydata)

scatter plot made by R enter image description here

Same data plotted with Matplotlib:

import matplotlib.pyplot as mpl
import numpy as np
import os
# open the file with PCA results and convert it into float
file_data = os.getcwd() + "\\data.txt"
F = open(file_data, 'r')
DATA=F.readlines()
F.close()
for x in range(len(DATA)) :
    a = DATA[x]
    b = a.split(',')
    DATA[x] = b
for i in xrange(len(DATA)):
    for j in xrange(len(DATA[i])):
        DATA[i][j] = float(DATA[i][j])
print DATA[0]
X_train = np.mat(DATA)
print "X_train\n",X_train

mpl.scatter(X_train[:, 0], X_train[:, 1], c='white')
mpl.show()

scatter plot made by Matplotlib and results of printing X_train (so you can verify that data are the same) enter image description here With Excell: enter image description here

data: (I cannot put all the data, please tell me how to join the *.txt file ~40.5 Ko)

0.02753547770433    -0.037999362802379
0.05179194064903    0.0257492713593311
-0.0272928319004863 0.0065143681863637
0.0891355504379135  -0.00801696955147688
0.0946809371499167  -0.00502202338807476
-0.0445799941736001 -0.0435759273767196
-0.333617999778119  -0.204222004815357
-0.127212025425053  -0.110264460064754
-0.0243459270896855 -0.0622273166478512
0.0497080821876597  0.0272080474151131
-0.181221703468915  -0.134945934382777
-0.0699503258694739 -0.0835239795690277

edit: So I yet exported PCA data (from scipy) into a text file and opened this common text file with python/matplotlib and R to avoid some prblms related to PCA. Plots were made after that handling (and the graph before PCA looks like a dome)

edit2: using numpy.loadtxt(), it displays as R but my custom method and numpy.loadtxt() provided the same data shape, size, type and values, so what's the mechanism involved?

X_train numpy.loadtxt()
[[ 0.02753548 -0.03799936]
 [ 0.05179194  0.02574927]
 [-0.02729283  0.00651437]
 ..., 
 [ 0.02670961 -0.00696177]
 [ 0.09011859 -0.00661216]
 [-0.04406559  0.09285291]] 
shape and size
(1039L, 2L) 2078

X_train custom-method
[[ 0.02753548 -0.03799936]
 [ 0.05179194  0.02574927]
 [-0.02729283  0.00651437]
 ..., 
 [ 0.02670961 -0.00696177]
 [ 0.09011859 -0.00661216]
 [-0.04406559  0.09285291]] 
shape and size
(1039L, 2L) 2078
4

2 回答 2

4

问题是您表示X_train为矩阵而不是二维数组。这意味着当您使用 对其进行子集化时X_train[:, 0],您不会得到一维数组 - 您会得到一个包含一列的矩阵(matplotlib 然后尝试分散)。您可以通过打印自己查看X_train[:, 0]。*

您只需更改行即可解决问题:

X_train = np.mat(DATA)

X_train = np.array(DATA)

*例如,根据您发布的数据,X_train[:, 0]是:

[[ 0.02753548]
 [ 0.05179194]
 [-0.02729283]
 [ 0.08913555]
 [ 0.09468094]
 [-0.04457999]
 [-0.333618  ]
 [-0.12721203]
 [-0.02434593]
 [ 0.04970808]
 [-0.1812217 ]
 [-0.06995033]]
于 2013-04-18T19:00:15.020 回答
2

在我看来,问题在于数组中读取的代码。你得到错误的维度。尝试使用 numpy.loadtxt 代替。http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html

于 2013-04-18T18:16:29.907 回答