-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
GBQ: Updated Documentation, and added method to generic.py #5179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -8,7 +8,7 @@ enhancements along with a large number of bug fixes. | |
|
||
Highlights include support for a new index type ``Float64Index``, support for new methods of interpolation, updated ``timedelta`` operations, and a new string manipulation method ``extract``. | ||
Several experimental features are added, including new ``eval/query`` methods for expression evaluation, support for ``msgpack`` serialization, | ||
and an io interface to google's ``BigQuery``. | ||
and an io interface to Google's ``BigQuery``. | ||
|
||
.. warning:: | ||
|
||
|
@@ -648,6 +648,69 @@ Experimental | |
|
||
os.remove('foo.msg') | ||
|
||
- ``pandas.io.gbq`` provides a simple way to extract from, and load data into, | ||
Google's BigQuery Data Sets by way of pandas DataFrames. BigQuery is a high | ||
performance SQL-like database service, useful for performing ad-hoc queries | ||
against extremely large datasets. :ref:`See the docs<io.gbq>` | ||
|
||
.. code-block:: python | ||
|
||
from pandas.io import gbq | ||
|
||
# A query to select the average monthly temperatures in the | ||
# in the year 2000 across the USA. The dataset, | ||
# publicata:samples.gsod, is available on all BigQuery accounts, | ||
# and is based on NOAA gsod data. | ||
|
||
query = """SELECT station_number as STATION, | ||
month as MONTH, AVG(mean_temp) as MEAN_TEMP | ||
FROM publicdata:samples.gsod | ||
WHERE YEAR = 2000 | ||
GROUP BY STATION, MONTH | ||
ORDER BY STATION, MONTH ASC""" | ||
|
||
# Fetch the result set for this query | ||
|
||
# Your Google BigQuery Project ID | ||
# To find this, see your dashboard: | ||
# https://code.google.com/apis/console/b/0/?noredirect | ||
projectid = xxxxxxxxx; | ||
|
||
df = gbq.read_gbq(query, project_id = projectid) | ||
|
||
# Use pandas to process and reshape the dataset | ||
|
||
df2 = df.pivot(index='STATION', columns='MONTH', values='MEAN_TEMP') | ||
df3 = pandas.concat([df2.min(), df2.mean(), df2.max()], | ||
axis=1,keys=["Min Tem", "Mean Temp", "Max Temp"]) | ||
|
||
The resulting dataframe is: | ||
|
||
``` | ||
Min Tem Mean Temp Max Temp | ||
MONTH | ||
1 -53.336667 39.827892 89.770968 | ||
2 -49.837500 43.685219 93.437932 | ||
3 -77.926087 48.708355 96.099998 | ||
4 -82.892858 55.070087 97.317240 | ||
5 -92.378261 61.428117 102.042856 | ||
6 -77.703334 65.858888 102.900000 | ||
7 -87.821428 68.169663 106.510714 | ||
8 -89.431999 68.614215 105.500000 | ||
9 -86.611112 63.436935 107.142856 | ||
10 -78.209677 56.880838 92.103333 | ||
11 -50.125000 48.861228 94.996428 | ||
12 -50.332258 42.286879 94.396774 | ||
``` | ||
.. warning:: | ||
|
||
To use this module, you will need a BigQuery account. See | ||
<https://cloud.google.com/products/big-query> for details. | ||
|
||
As of 10/10/13, there is a bug in Google's API preventing result sets | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can put this in the main docs (the warning about the result sets) |
||
from being larger than 100,000 rows. A patch is scheduled for the week of | ||
10/14/13. | ||
|
||
.. _whatsnew_0130.refactoring: | ||
|
||
Internal Refactoring | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When rebasing, you will have to add
.. currentmodule:: pandas
after this line, because now most functions in api.rst are documented from top-level pandas (#5208).