python - choosing the right type of data for machine learning

Question

I've been wildly curious about machine learning, and I'm using this to learn.

I was able to compile the code without issue, and generate the graph.

I would like to use a different data source. Currently they are using stock prices:

d1 = datetime.datetime(2003, 01, 01)
d2 = datetime.datetime(2008, 01, 01)

symbol_dict = {
        'TOT': 'Total',
        'XOM': 'Exxon',
        'CVX': 'Chevron',
        'COP': 'ConocoPhillips',
     ...
...
    }

symbols, names = np.array(symbol_dict.items()).T

quotes = [finance.quotes_historical_yahoo(symbol, d1, d2, asobject=True)
          for symbol in symbols]

open = np.array([q.open for q in quotes]).astype(np.float)
close = np.array([q.close for q in quotes]).astype(np.float)

what does quotes return? I understand it is price per stock, but I am getting something like this:

[rec.array([ (datetime.date(2003, 1, 2), 2003, 1, 2, 731217.0, 28.12235692134198, 28.5, 28.564279672963064, 28.09825204398083, 12798800.0, 28.5), (datetime.date(2003, 1, 3), 2003, 1, 3, 731218.0, 28.329084507042257, 28.53, 28.634476056338034, 28.28890140845071, 9221900.0, 28.53), (datetime.date(2003, 1, 6), 2003, 1, 6, 731221.0, 28.482778999450247, 29.23, 29.406761957119297, 28.45064046179219, 11925100.0, 29.23), ...,

i would like to input my own dataset. can you please give me an example of a dataset that I can input into quotes ?

the entire code is here:

http://scikit-learn.org/dev/auto_examples/applications/plot_stock_market.html

score 2 · Accepted Answer

If you execute finance.quotes_historical_yahoo? in ipython it would tell you:

In [53]: finance.quotes_historical_yahoo?
Type:       function
String Form:<function quotes_historical_yahoo at 0x10f311d70>
File:       /Users/dvelkov/src/matplotlib/lib/matplotlib/finance.py
Definition: finance.quotes_historical_yahoo(ticker, date1, date2, asobject=False, adjusted=True, cachename=None)
Docstring:
Get historical data for ticker between date1 and date2.  date1 and
date2 are datetime instances or (year, month, day) sequences.

See :func:`parse_yahoo_historical` for explanation of output formats
and the *asobject* and *adjusted* kwargs.

...(more stuff)

So we check parse_yahoo_historical:

In [54]: finance.parse_yahoo_historical?
Type:       function
String Form:<function parse_yahoo_historical at 0x10f996ed8>
File:       /Users/dvelkov/src/matplotlib/lib/matplotlib/finance.py
Definition: finance.parse_yahoo_historical(fh, adjusted=True, asobject=False)
Docstring:
Parse the historical data in file handle fh from yahoo finance.

*adjusted*
  If True (default) replace open, close, high, and low prices with
  their adjusted values. The adjustment is by a scale factor, S =
  adjusted_close/close. Adjusted prices are actual prices
  multiplied by S.

  Volume is not adjusted as it is already backward split adjusted
  by Yahoo. If you want to compute dollars traded, multiply volume
  by the adjusted close, regardless of whether you choose adjusted
  = True|False.


*asobject*
  If False (default for compatibility with earlier versions)
  return a list of tuples containing

    d, open, close, high, low, volume

  If None (preferred alternative to False), return
  a 2-D ndarray corresponding to the list of tuples.

  Otherwise return a numpy recarray with

    date, year, month, day, d, open, close, high, low,
    volume, adjusted_close

  where d is a floating poing representation of date,
  as returned by date2num, and date is a python standard
  library datetime.date instance.

  The name of this kwarg is a historical artifact.  Formerly,
  True returned a cbook Bunch
  holding 1-D ndarrays.  The behavior of a numpy recarray is
  very similar to the Bunch.

In your case you are using asobject=True so the format you get is date, year, month, day, d, open, close, high, low, volume, adjusted_close.

python - choosing the right type of data for machine learning

1 回答 1

Related

Reference