Add ability to set the allowLargeResults option in BigQuery #10474 #11209

parthea · 2015-09-30T21:35:26Z

Modify read_gbq() to allow users to redirect the query results to a destination table via the destination_table parameter
Modify read_gbq() to allow users to allow users to set the 'allowLargeResults' option in the BigQuery job configuration via the allow_large_results parameter

jreback · 2015-10-02T11:08:11Z

this would be confusing to a user as read_gbq should return s frame
does it here?

parthea · 2015-10-02T11:28:50Z

Yes, read_gbq() still returns a DataFrame with the query results regardless if these additional parameters are set. I've just tried the following scenarios:

In the first test, I set the destination_table and confirmed a destination table was created and DataFrame was returned.
In the second test, I set the destination_table and allow_large_results and confirmed a destination table was created and DataFrame was returned.

I will add a unit tests now for the above mentioned scenarios (I missed it the first time around)

All tests pass locally. Could this make it into the 0.17.0 release? I think it is a very useful feature.

tony@tonypc:~/pandas-parthea/pandas/io/tests$ nosetests test_gbq.py -v
test_should_be_able_to_get_a_bigquery_service (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... ok
test_should_be_able_to_get_results_from_query (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... ok
test_should_be_able_to_get_schema_from_query (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... ok
test_should_be_able_to_get_valid_credentials (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... ok
test_should_be_able_to_make_a_connector (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... ok
test_bad_project_id (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_bad_table_name (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_column_order (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_column_order_plus_index (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_download_dataset_larger_than_200k_rows (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_index_column (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_malformed_query (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_redirect_query_results_to_destination_table_dataset_does_not_exist (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_redirect_query_results_to_destination_table_default (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_redirect_query_results_to_destination_table_if_table_exists_append (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_redirect_query_results_to_destination_table_if_table_exists_fail (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_redirect_query_results_to_destination_table_if_table_exists_replace (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_arbitrary_timestamp (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_empty_strings (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_false_boolean (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_boolean (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_floats (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_integers (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_strings (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_timestamp (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_timestamp_unix_epoch (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_true_boolean (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_valid_floats (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_valid_integers (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_valid_strings (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_unicode_string_conversion_and_normalization (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_zero_rows (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_read_gbq_with_no_project_id_given_should_fail (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_booleans_as_python_booleans (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_floats_as_python_floats (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_integers_as_python_floats (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_strings_as_python_strings (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_timestamps_as_numpy_datetime (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_that_parse_data_works_properly (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_to_gbq_should_fail_if_invalid_table_name_passed (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_to_gbq_with_no_project_id_given_should_fail (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_create_dataset (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_create_table (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_dataset_does_not_exist (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_dataset_exists (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_delete_dataset (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_delete_table (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_generate_schema (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_google_upload_errors_should_raise_exception (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_list_dataset (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_list_table (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_list_table_zero_results (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_table_does_not_exist (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_upload_data (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_upload_data_if_table_exists_append (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_upload_data_if_table_exists_fail (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
test_upload_data_if_table_exists_replace (pandas.io.tests.test_gbq.TestToGBQIntegration) ... ok
pandas.io.tests.test_gbq.test_requirements ... ok
pandas.io.tests.test_gbq.test_generate_bq_schema_deprecated ... ok

----------------------------------------------------------------------
Ran 59 tests in 379.762s

OK

jreback · 2015-10-02T11:30:35Z

this is bloating the API

if u r returning a frame then simply use to_gbq and push it back up

parthea · 2015-10-02T11:55:23Z

I agree that it doesn't make sense to return the data in a DataFrame when a destination table is specified (since you can use to_gbq to push it back up). My preference would be to return an empty DataFrame when a destination table is specified in order to avoid the unnecessary download and upload of data when users want to create smaller datasets from larger ones. The ability to run queries and send the query results directly to a table (in an efficient manner) could be useful.

Regarding the allow_large_results parameter:
From https://cloud.google.com/bigquery/quota-policy#queries, query results > 128 MB compressed require the 'allowLargeResults' option to be set in the job configuration. One of the requirements for allowing large results is that you must specify a destination table.

parthea · 2015-10-02T12:05:46Z

Another potential solution, is to create a new function gbq.query_to_table() which does not return a DataFrame. gbq.query_to_table() would require a destination table to be specified and would support a parameter allow_large_results.

jreback · 2015-10-02T12:25:33Z

@parthea I am not averse to these changes. But would like 0.17.0 to release and settle before considering api change.

jreback · 2015-10-02T12:26:39Z

further crafting a nice useful, non-duplicative api is actually tricky. You want to have the limited set of things that one could 'do' in an intuitve way. So one of the big issues is how to pass in options (.e.g like allow_large_result, which is really a 'user' option.

parthea · 2015-10-02T14:42:02Z

Do you think it would be better to close this pull request, and request that we support this feature in the odo project instead (assuming that odo will support gbq) since the odo project is aimed at data migration?

The functionality in this pull request could be similar to the following pull request in odo which adds ability to append query results to a table : blaze/odo#37

parthea force-pushed the bq-allow-large-results branch from ac3cd4a to 19f910f Compare October 1, 2015 13:51

parthea changed the title ~~Add ability to set the 'allowLargeResults' option in Google BigQu…~~ Add ability to set the allowLargeResults option in BigQuery #10474 Oct 1, 2015

parthea force-pushed the bq-allow-large-results branch from 19f910f to ca84279 Compare October 2, 2015 10:21

jreback added the Google I/O label Oct 2, 2015

Add ability to set the allowLargeResults option in Google BigQuery

a4ff065

parthea force-pushed the bq-allow-large-results branch from ca84279 to a4ff065 Compare October 2, 2015 15:25

parthea closed this Oct 2, 2015

parthea mentioned this pull request Jul 7, 2016

read_gbq should offer an option to save to dataset #13531

Closed

parthea mentioned this pull request Sep 10, 2016

Feature Request: Add support for 'Allow Large Results' to BigQuery connector #10474

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to set the allowLargeResults option in BigQuery #10474 #11209

Add ability to set the allowLargeResults option in BigQuery #10474 #11209

parthea commented Sep 30, 2015

jreback commented Oct 2, 2015

parthea commented Oct 2, 2015

jreback commented Oct 2, 2015

parthea commented Oct 2, 2015

parthea commented Oct 2, 2015

jreback commented Oct 2, 2015

jreback commented Oct 2, 2015

parthea commented Oct 2, 2015

Add ability to set the allowLargeResults option in BigQuery #10474 #11209

Add ability to set the allowLargeResults option in BigQuery #10474 #11209

Conversation

parthea commented Sep 30, 2015

jreback commented Oct 2, 2015

parthea commented Oct 2, 2015

jreback commented Oct 2, 2015

parthea commented Oct 2, 2015

parthea commented Oct 2, 2015

jreback commented Oct 2, 2015

jreback commented Oct 2, 2015

parthea commented Oct 2, 2015