Scatter plot using x,y coordinates proposes plots in Matplotlib that differ from those obtained using other programs. For example, here are the results of some PCA on the two fit score. The same graph using R and the same data provides different display…I also checked with Excell and Libreoffice : they provided the same display as R. Before Roaring against Matplotlib or report a bug, I would like to get other opinions and to check if I did things well. What are my flaws?
I checked that floats were not the problem, checked that coordinates order similarly,… So plot with R:
mydata = read.csv("C:/Users/Anon/Desktop/data.txt") # read csv file
summary(mydata)
attach(mydata)
plot(mydata)
scatter plot made by R
Same data plotted with Matplotlib:
import matplotlib.pyplot as mpl
import numpy as np
import os
# open the file with PCA results and convert it into float
file_data = os.getcwd() + "\\data.txt"
F = open(file_data, 'r')
DATA=F.readlines()
F.close()
for x in range(len(DATA)) :
a = DATA[x]
b = a.split(',')
DATA[x] = b
for i in xrange(len(DATA)):
for j in xrange(len(DATA[i])):
DATA[i][j] = float(DATA[i][j])
print DATA[0]
X_train = np.mat(DATA)
print "X_train\n",X_train
mpl.scatter(X_train[:, 0], X_train[:, 1], c='white')
mpl.show()
and results of printing X_train (so you can verify that data are the same) With Excell:
data: (I cannot put all the data, please tell me how to join the *.txt file ~40.5 Ko)
0.02753547770433 -0.037999362802379
0.05179194064903 0.0257492713593311
-0.0272928319004863 0.0065143681863637
0.0891355504379135 -0.00801696955147688
0.0946809371499167 -0.00502202338807476
-0.0445799941736001 -0.0435759273767196
-0.333617999778119 -0.204222004815357
-0.127212025425053 -0.110264460064754
-0.0243459270896855 -0.0622273166478512
0.0497080821876597 0.0272080474151131
-0.181221703468915 -0.134945934382777
-0.0699503258694739 -0.0835239795690277
edit: So I yet exported PCA data (from scipy) into a text file and opened this common text file with python/matplotlib and R to avoid some prblms related to PCA. Plots were made after that handling (and the graph before PCA looks like a dome)
edit2: using numpy.loadtxt(), it displays as R but my custom method and numpy.loadtxt() provided the same data shape, size, type and values, so what's the mechanism involved?
X_train numpy.loadtxt()
[[ 0.02753548 -0.03799936]
[ 0.05179194 0.02574927]
[-0.02729283 0.00651437]
...,
[ 0.02670961 -0.00696177]
[ 0.09011859 -0.00661216]
[-0.04406559 0.09285291]]
shape and size
(1039L, 2L) 2078
X_train custom-method
[[ 0.02753548 -0.03799936]
[ 0.05179194 0.02574927]
[-0.02729283 0.00651437]
...,
[ 0.02670961 -0.00696177]
[ 0.09011859 -0.00661216]
[-0.04406559 0.09285291]]
shape and size
(1039L, 2L) 2078