python - 使用 Pandas 读取 Movielens 1M 数据集 rating.dat 文件时遇到问题

Question

业余爱好者 - python 新手

您好，我正在阅读 Wes McKinney 的 Python for Data Analysis 一书。我刚开始研究 MovieLens 1M 数据集，在我的一生中，我无法让我的代码为 ratings.dat 文件工作。它适用于 movies.dat 和 users.dat 文件，但我一直收到 ratings.dat 文件的错误。我已经从 github 和 movielens.org 下载了 ratings.dat 的副本，但我得到了同样的错误。我已经重命名了文件，但我仍然得到同样的错误。我搬到了一个不同的目录，但我仍然得到同样的错误。我猜我有一些配置问题？

Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)]
Type "copyright", "credits" or "license" for more information.

IPython 0.13.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
%guiref   -> A brief reference about the graphical user interface.

Welcome to pylab, a matplotlib-based Python environment [backend: TkAgg].
For more information, type 'help(pylab)'.

import pandas as pd

rnames = ['user_id','movie_id','rating','timestamp']

ratings = pd.read_table('e:\ratings.dat',sep='',header=None,names=rnames)

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-1-5513dd9baafa> in <module>()
      3 rnames = ['user_id','movie_id','rating','timestamp']
      4 
----> 5 ratings = pd.read_table('e:\ratings.dat',sep='',header=None,names=rnames)
      6 

E:\Python27_new\lib\site-packages\pandas\io\parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, nrows, iterator, chunksize, verbose, encoding, squeeze)
    397                     buffer_lines=buffer_lines)
    398 
--> 399         return _read(filepath_or_buffer, kwds)
    400 
    401     parser_f.__name__ = name

E:\Python27_new\lib\site-packages\pandas\io\parsers.pyc in _read(filepath_or_buffer, kwds)
    206 
    207     # Create the parser.
--> 208     parser = TextFileReader(filepath_or_buffer, **kwds)
    209 
    210     if nrows is not None:

E:\Python27_new\lib\site-packages\pandas\io\parsers.pyc in __init__(self, f, engine, **kwds)
    505             self.options['has_index_names'] = kwds['has_index_names']
    506 
--> 507         self._make_engine(self.engine)
    508 
    509     def _get_options_with_defaults(self, engine):

E:\Python27_new\lib\site-packages\pandas\io\parsers.pyc in _make_engine(self, engine)
    607     def _make_engine(self, engine='c'):
    608         if engine == 'c':
--> 609             self._engine = CParserWrapper(self.f, **self.options)
    610         else:
    611             if engine == 'python':

E:\Python27_new\lib\site-packages\pandas\io\parsers.pyc in __init__(self, src, **kwds)
    888         # #2442
    889         kwds['allow_leading_cols'] = self.index_col is not False
--> 890         self._reader = _parser.TextReader(src, **kwds)
    891 
    892         # XXX

E:\Python27_new\lib\site-packages\pandas\_parser.pyd in pandas._parser.TextReader.__cinit__ (pandas\src\parser.c:2771)()

E:\Python27_new\lib\site-packages\pandas\_parser.pyd in pandas._parser.TextReader._setup_parser_source (pandas\src\parser.c:4810)()

atings.dat does not exist

错误的最后一行总是截断文件名的第一部分。如前所述，相同的代码适用于 movies.dat 和 users.dat。

score 2 · Accepted Answer

2

尝试将转义添加到您的源e:\ratings.dat路径e:\\ratings.dat

于 2013-02-25T06:26:40.207 回答

score 1 · Accepted Answer

您应该将路径字符串写为原始字符串（注意r前面的）：

ratings = pd.read_table(r'e:\ratings.dat', sep='', header=None, names=rnames)

这不起作用的原因是因为\r具有特殊含义（回车），它不是文件路径的一部分，这意味着 python 找不到文件。原始字符串转义所有特殊字符。
您可以在以下内容中看到这一点：

In [1]: print ('\r')


In [2]: print (r'\r')
\r

\等效地，您可以按照@pravin 的建议（使用） “转义”每个字符\\。

python - 使用 Pandas 读取 Movielens 1M 数据集 rating.dat 文件时遇到问题

2 回答 2

Related

Reference