You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/remote_data.rst
+32-99Lines changed: 32 additions & 99 deletions
Original file line number
Diff line number
Diff line change
@@ -1,55 +1,48 @@
1
1
.. _remote_data:
2
2
3
-
.. currentmodule:: pandas
3
+
.. currentmodule:: pandas_datareader
4
4
5
5
.. ipython:: python
6
6
:suppress:
7
7
8
-
import os
9
-
import csv
10
8
import pandas as pd
11
9
12
10
import numpy as np
13
-
np.random.seed(123456)
14
-
randn = np.random.randn
15
11
np.set_printoptions(precision=4, suppress=True)
16
12
17
-
import matplotlib.pyplot as plt
18
-
plt.close('all')
13
+
pd.options.display.max_rows=15
19
14
20
-
from pandas import*
21
-
options.display.max_rows=15
22
-
import pandas.util.testing as tm
23
15
24
16
******************
25
17
Remote Data Access
26
18
******************
27
19
28
20
.. _remote_data.data_reader:
29
21
30
-
Functions from :mod:`pandas.io.data` and :mod:`pandas.io.ga` extract data from various Internet sources into a DataFrame. Currently the following sources are supported:
22
+
Functions from :mod:`pandas_datareader.data` and :mod:`pandas_datareader.wb`
23
+
extract data from various Internet sources into a pandas DataFrame.
24
+
Currently the following sources are supported:
31
25
32
26
- :ref:`Yahoo! Finance<remote_data.yahoo>`
33
27
- :ref:`Google Finance<remote_data.google>`
34
28
- :ref:`St.Louis FED (FRED)<remote_data.fred>`
35
29
- :ref:`Kenneth French's data library<remote_data.ff>`
36
30
- :ref:`World Bank<remote_data.wb>`
37
-
- :ref:`Google Analytics<remote_data.ga>`
38
31
39
32
It should be noted, that various sources support different kinds of data, so not all sources implement the same methods and the data elements returned might also differ.
40
33
41
34
.. _remote_data.yahoo:
42
35
43
36
Yahoo! Finance
44
-
--------------
37
+
==============
45
38
46
39
.. ipython:: python
47
40
48
-
importpandas.io.data as web
41
+
importpandas_datareader.data as web
49
42
import datetime
50
43
start = datetime.datetime(2010, 1, 1)
51
44
end = datetime.datetime(2013, 1, 27)
52
-
f=web.DataReader("F", 'yahoo', start, end)
45
+
f=web.DataReader("F", 'yahoo', start, end)
53
46
f.ix['2010-01-04']
54
47
55
48
.. _remote_data.yahoo_options:
@@ -66,7 +59,7 @@ to the specific option you want.
66
59
67
60
.. ipython:: python
68
61
69
-
frompandas.io.data import Options
62
+
frompandas_datareader.data import Options
70
63
aapl = Options('aapl', 'yahoo')
71
64
data = aapl.get_all_data()
72
65
data.iloc[0:5, 0:5]
@@ -113,69 +106,71 @@ The ``month`` and ``year`` parameters can be used to get all options data for a
Result sets are parsed into a pandas DataFrame with a shape and data types
342
-
derived from the source table.
343
-
344
-
Configuring Access to Google Analytics
345
-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
346
-
347
-
The first thing you need to do is to setup accesses to Google Analytics API. Follow the steps below:
348
-
349
-
#. In the `Google Developers Console <https://console.developers.google.com>`__
350
-
#. enable the Analytics API
351
-
#. create a new project
352
-
#. create a new Client ID for an "Installed Application" (in the "APIs & auth / Credentials section" of the newly created project)
353
-
#. download it (JSON file)
354
-
#. On your machine
355
-
#. rename it to ``client_secrets.json``
356
-
#. move it to the ``pandas/io`` module directory
357
-
358
-
The first time you use the :func:`read_ga` funtion, a browser window will open to ask you to authentify to the Google API. Do proceed.
359
-
360
-
Using the Google Analytics API
361
-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
362
-
363
-
The following will fetch users and pageviews (metrics) data per day of the week, for the first semester of 2014, from a particular property.
364
-
365
-
.. code-block:: python
366
-
367
-
import pandas.io.ga as ga
368
-
ga.read_ga(
369
-
account_id="2360420",
370
-
profile_id="19462946",
371
-
property_id="UA-2360420-5",
372
-
metrics= ['users', 'pageviews'],
373
-
dimensions= ['dayOfWeek'],
374
-
start_date="2014-01-01",
375
-
end_date="2014-08-01",
376
-
index_col=0,
377
-
filters="pagePath=~aboutus;ga:country==France",
378
-
)
379
-
380
-
The only mandatory arguments are ``metrics,````dimensions``and``start_date``. We can only strongly recommend you to always specify the ``account_id``, ``profile_id``and``property_id`` to avoid accessing the wrong data bucket in Google Analytics.
381
-
382
-
The ``index_col`` argument indicates which dimension(s) has to be taken as index.
383
-
384
-
The ``filters`` argument indicates the filtering to apply to the query. In the above example, the page has URL has to contain ``aboutus``AND the visitors country has to be France.
385
-
386
-
Detailed informations in the followings:
387
-
388
-
*`pandas & google analytics, by yhat <http://blog.yhathq.com/posts/pandas-google-analytics.html>`__
389
-
*`Google Analytics integration in pandas, by Chang She <http://quantabee.wordpress.com/2012/12/17/google-analytics-pandas/>`__
390
-
*`Google Analytics Dimensions and Metrics Reference <https://developers.google.com/analytics/devguides/reporting/core/dimsmets>`_
0 commit comments