Skip to content

BUG: Unhandled ValueError when Bigquery called through io.gbq returns zero rows #10273 #10274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

ssaumitra
Copy link

closes #10273

@@ -296,6 +296,12 @@ def test_download_dataset_larger_than_200k_rows(self):
df = gbq.read_gbq("SELECT id FROM [publicdata:samples.wikipedia] GROUP EACH BY id ORDER BY id ASC LIMIT 200005", project_id=PROJECT_ID)
self.assertEqual(len(df.drop_duplicates()), 200005)

def test_zero_rows(self):
df = gbq.read_gbq("SELECT * FROM [publicdata:samples.wikipedia] where timestamp=-9999999", project_id=PROJECT_ID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add the issue number here

compare the resultant dataframe with a constructed one and use
assert_frame_equal(result,expected)

@ssaumitra
Copy link
Author

Issue number in form of comment? Also, why is

assert_frame_equal

better for checking empty dataframe? I am asking just for my understanding.

@jreback
Copy link
Contributor

jreback commented Jun 4, 2015

oh, because the frame is not exempty, it has column names (or should have), and an index (which has 0-len). You are guaranteeing a certain return type and meta-data to the user (e.g. a query that returns rows has these things, so an empty one should as well).

@ssaumitra
Copy link
Author

OK. Here it is.

@jreback jreback added this to the 0.16.2 milestone Jun 4, 2015
@jreback
Copy link
Contributor

jreback commented Jun 4, 2015

ok, pls add a release note in the whatsnew for 0.16.2.

pls squash as well.

@ssaumitra
Copy link
Author

Can you check-in the base template for doc/source/whatsnew/v0.16.2.txt please? Or should I copy it from 5ebf521 ?

@jreback
Copy link
Contributor

jreback commented Jun 4, 2015

rebase on master. Its already there.

@ssaumitra
Copy link
Author

Added release note. Integrated with master.

@jreback
Copy link
Contributor

jreback commented Jun 5, 2015

See contributing docs here

pls squash.

@ssaumitra
Copy link
Author

I am checking-in Squashed commit in few minutes.

What about docs? I checked in release note in dadf5c2. Is it missing something?
I read the documentation you mentioned above but could not understand specific problem.

@ssaumitra
Copy link
Author

Squashed commit submitted.

@@ -279,7 +279,7 @@ def _parse_data(schema, rows):
field_type)
page_array[row_num][col_num] = field_value

return DataFrame(page_array)
return DataFrame(page_array, columns=col_names)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should not be necessary, page_array is already a record array (this will just reindex it) (and copy it).

@jreback
Copy link
Contributor

jreback commented Jun 5, 2015

cc @jacobschaer

can you test this out and lmk?

@jreback
Copy link
Contributor

jreback commented Jun 9, 2015

cc @jacobschaer
cc @sean-schaefer

@jreback
Copy link
Contributor

jreback commented Jun 10, 2015

@ssaumitra

can you show

nosetests pandas/io/tests/test_gbq.py -v

on your system (as this is not testsed with actual credentials on travis).

@ssaumitra
Copy link
Author

@jreback I am away from work. I will upload the output next week as soon as I can.

@jreback jreback modified the milestones: 0.16.2, 0.17.0 Jun 11, 2015
@ssaumitra
Copy link
Author

The test output is as follows. I am also updating documentation file to mark changes in 0.17, not in 0.16.2.

$ nosetests pandas/io/tests/test_gbq.py -v
test_should_be_able_to_get_a_bigquery_service (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... ok
test_should_be_able_to_get_results_from_query (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... ok
test_should_be_able_to_get_schema_from_query (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... ok
test_should_be_able_to_get_valid_credentials (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... ok
test_should_be_able_to_make_a_connector (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... ok
test_bad_project_id (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_bad_table_name (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_column_order (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_column_order_plus_index (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_download_dataset_larger_than_200k_rows (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_index_column (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_malformed_query (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_arbitrary_timestamp (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_empty_strings (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_false_boolean (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_boolean (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_floats (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_integers (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_strings (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_timestamp (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_timestamp_unix_epoch (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_true_boolean (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_valid_floats (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_valid_integers (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_should_properly_handle_valid_strings (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_unicode_string_conversion_and_normalization (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_zero_rows (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... ok
test_read_gbq_with_no_project_id_given_should_fail (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_booleans_as_python_booleans (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_floats_as_python_floats (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_integers_as_python_floats (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_strings_as_python_strings (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_timestamps_as_numpy_datetime (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_that_parse_data_works_properly (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_to_gbq_should_fail_if_invalid_table_name_passed (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_to_gbq_with_no_project_id_given_should_fail (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_generate_bq_schema (pandas.io.tests.test_gbq.TestToGBQIntegration) ... SKIP: Cannot run to_gbq tests without bq command line client
test_google_upload_errors_should_raise_exception (pandas.io.tests.test_gbq.TestToGBQIntegration) ... SKIP: Cannot run to_gbq tests without bq command line client
test_upload_data (pandas.io.tests.test_gbq.TestToGBQIntegration) ... SKIP: Cannot run to_gbq tests without bq command line client
pandas.io.tests.test_gbq.test_requirements ... ok

----------------------------------------------------------------------
Ran 40 tests in 51.413s

OK (SKIP=3)

@ssaumitra
Copy link
Author

@jreback Any news?

@ssaumitra
Copy link
Author

@jreback I have merged latest changes. Please let me know whether any changes are needed in this commit. I am available to make changes in next week. But I would be away few weeks after that.

@jreback
Copy link
Contributor

jreback commented Aug 15, 2015

cc @jacobschaer
cc @sean-schaefer

@ssaumitra can you rebase.

@ssaumitra
Copy link
Author

@jreback rebase done.

@@ -669,3 +668,4 @@ Bug Fixes
- Bug in ``PeriodIndex.order`` reset freq (:issue:`10295`)
- Bug in ``iloc`` allowing memory outside bounds of a Series to be accessed with negative integers (:issue:`10779`)
- Bug preventing access to the first index when using ``iloc`` with a list containing the appropriate negative integer (:issue:`10547`, :issue:`10779`)
- Bug where ``io.gbq`` throws ValueError when Bigquery returns zero rows (:issue:`10273`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use double backticks around ValueError. say pd.read_gbq instead of Bigquery

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, following would be the better replacement

Bug where ``pd.read_gbq`` throws ``ValueError`` when Bigquery returns zero rows (:issue:`10273`)

because exception is thrown when Google Bigquery REST API returns zero rows, not the pandas function pd.read_gbq.
Does that look good?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when they use pandas they use pd.read_gbq. using Bigquery in a release note about this is not obvious to the casual reader. but that is fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then adding the line as per my last comment.

@ssaumitra
Copy link
Author

@jreback @sean-schaefer I will be away from work from next week. So I won't be able to respond. It would be great if we can conclude it this week. Is any more input required from my side here?

@jreback
Copy link
Contributor

jreback commented Aug 20, 2015

this looks fine. Ideally i'd like

cc @jacobschaer
cc @sean-schaefer

to give a test

@jacobschaer
Copy link
Contributor

Looked fine, thanks. All tests passed when I ran it:

Successfully installed numpy pandas
Cleaning up...
+ pip freeze
Cython==0.23.1
argparse==1.2.1
bigquery==2.0.17
ez-setup==0.9
google-api-python-client==1.2
google-apputils==0.4.2
httplib2==0.9.1
nose==1.3.7
numpy==1.9.2
oauth2client==1.2
-e git+https://github.com/ssaumitra/pandas.git@cf6025e6ccd2a4bf79fe0b85e852bc3fe0ef50ff#egg=pandas-origin/bugfix-bigquery
pyasn1==0.1.8
pyasn1-modules==0.0.7
python-dateutil==2.4.2
python-gflags==2.0
pytz==2015.4
rsa==3.2
simplejson==3.8.0
six==1.9.0
uritemplate==0.6
wsgiref==0.1.2
+ python pandas/io/tests/test_gbq.py

nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
test_should_be_able_to_get_a_bigquery_service (__main__.TestGBQConnectorIntegration) ... ok

test_should_be_able_to_get_results_from_query (__main__.TestGBQConnectorIntegration) ... ok

test_should_be_able_to_get_schema_from_query (__main__.TestGBQConnectorIntegration) ... ok
test_should_be_able_to_get_valid_credentials (__main__.TestGBQConnectorIntegration) ... ok
test_should_be_able_to_make_a_connector (__main__.TestGBQConnectorIntegration) ... ok

test_bad_project_id (__main__.TestReadGBQIntegration) ... ok
test_bad_table_name (__main__.TestReadGBQIntegration) ... ok

test_column_order (__main__.TestReadGBQIntegration) ... ok

test_column_order_plus_index (__main__.TestReadGBQIntegration) ... ok

test_download_dataset_larger_than_200k_rows (__main__.TestReadGBQIntegration) ... ok

test_index_column (__main__.TestReadGBQIntegration) ... ok
test_malformed_query (__main__.TestReadGBQIntegration) ... ok

test_should_properly_handle_arbitrary_timestamp (__main__.TestReadGBQIntegration) ... ok

test_should_properly_handle_empty_strings (__main__.TestReadGBQIntegration) ... ok

test_should_properly_handle_false_boolean (__main__.TestReadGBQIntegration) ... ok

test_should_properly_handle_null_boolean (__main__.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_floats (__main__.TestReadGBQIntegration) ... ok

test_should_properly_handle_null_integers (__main__.TestReadGBQIntegration) ... ok

test_should_properly_handle_null_strings (__main__.TestReadGBQIntegration) ... ok
test_should_properly_handle_null_timestamp (__main__.TestReadGBQIntegration) ... ok

test_should_properly_handle_timestamp_unix_epoch (__main__.TestReadGBQIntegration) ... ok

test_should_properly_handle_true_boolean (__main__.TestReadGBQIntegration) ... ok

test_should_properly_handle_valid_floats (__main__.TestReadGBQIntegration) ... ok
test_should_properly_handle_valid_integers (__main__.TestReadGBQIntegration) ... ok

test_should_properly_handle_valid_strings (__main__.TestReadGBQIntegration) ... ok

test_unicode_string_conversion_and_normalization (__main__.TestReadGBQIntegration) ... ok

test_zero_rows (__main__.TestReadGBQIntegration) ... ok
test_read_gbq_with_no_project_id_given_should_fail (__main__.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_booleans_as_python_booleans (__main__.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_floats_as_python_floats (__main__.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_integers_as_python_floats (__main__.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_strings_as_python_strings (__main__.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_timestamps_as_numpy_datetime (__main__.TestReadGBQUnitTests) ... ok
test_that_parse_data_works_properly (__main__.TestReadGBQUnitTests) ... ok
test_to_gbq_should_fail_if_invalid_table_name_passed (__main__.TestReadGBQUnitTests) ... ok
test_to_gbq_with_no_project_id_given_should_fail (__main__.TestReadGBQUnitTests) ... ok

Dataset 'serene-epsilon-769:pydata_pandas_bq_testing' successfully created.

Table 'serene-epsilon-769:pydata_pandas_bq_testing.new_test' successfully created.
test_generate_bq_schema (__main__.TestToGBQIntegration) ... 
ok
test_google_upload_errors_should_raise_exception (__main__.TestToGBQIntegration) ...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...
Job not yet complete...



Streaming Insert is 100% Completeok
test_upload_data (__main__.TestToGBQIntegration) ...
__main__.test_requirements ... ok

----------------------------------------------------------------------
Ran 40 tests in 565.830s

OK
Job not yet complete...
[Output Exception BugFix] $ /bin/sh -xe /tmp/hudson8412146074501557652.sh
+ cp bigquery_credentials.dat /var/lib/jenkins/bigquery_credentials.dat
Finished: SUCCESS

@jreback
Copy link
Contributor

jreback commented Aug 29, 2015

@jacobschaer gr8 thanks!

@jreback
Copy link
Contributor

jreback commented Aug 29, 2015

merged via 53a6830

thanks!

@jreback jreback closed this Aug 29, 2015
@ssaumitra
Copy link
Author

Thanks all :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Unhandled ValueError when Bigquery called through io.gbq returns zero rows
3 participants