-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Unhandled ValueError when Bigquery called through io.gbq returns zero rows #10273 #10274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -296,6 +296,12 @@ def test_download_dataset_larger_than_200k_rows(self): | |||
df = gbq.read_gbq("SELECT id FROM [publicdata:samples.wikipedia] GROUP EACH BY id ORDER BY id ASC LIMIT 200005", project_id=PROJECT_ID) | |||
self.assertEqual(len(df.drop_duplicates()), 200005) | |||
|
|||
def test_zero_rows(self): | |||
df = gbq.read_gbq("SELECT * FROM [publicdata:samples.wikipedia] where timestamp=-9999999", project_id=PROJECT_ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add the issue number here
compare the resultant dataframe with a constructed one and use
assert_frame_equal(result,expected)
Issue number in form of comment? Also, why is
better for checking empty dataframe? I am asking just for my understanding. |
oh, because the frame is not exempty, it has column names (or should have), and an index (which has 0-len). You are guaranteeing a certain return type and meta-data to the user (e.g. a query that returns rows has these things, so an empty one should as well). |
OK. Here it is. |
ok, pls add a release note in the whatsnew for 0.16.2. pls squash as well. |
Can you check-in the base template for doc/source/whatsnew/v0.16.2.txt please? Or should I copy it from 5ebf521 ? |
rebase on master. Its already there. |
Added release note. Integrated with master. |
See contributing docs here pls squash. |
I am checking-in Squashed commit in few minutes. What about docs? I checked in release note in dadf5c2. Is it missing something? |
Squashed commit submitted. |
@@ -279,7 +279,7 @@ def _parse_data(schema, rows): | |||
field_type) | |||
page_array[row_num][col_num] = field_value | |||
|
|||
return DataFrame(page_array) | |||
return DataFrame(page_array, columns=col_names) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not be necessary, page_array
is already a record array (this will just reindex it) (and copy it).
cc @jacobschaer can you test this out and lmk? |
cc @jacobschaer |
can you show
on your system (as this is not testsed with actual credentials on travis). |
@jreback I am away from work. I will upload the output next week as soon as I can. |
The test output is as follows. I am also updating documentation file to mark changes in 0.17, not in 0.16.2.
|
@jreback Any news? |
@jreback I have merged latest changes. Please let me know whether any changes are needed in this commit. I am available to make changes in next week. But I would be away few weeks after that. |
cc @jacobschaer @ssaumitra can you rebase. |
@jreback rebase done. |
@@ -669,3 +668,4 @@ Bug Fixes | |||
- Bug in ``PeriodIndex.order`` reset freq (:issue:`10295`) | |||
- Bug in ``iloc`` allowing memory outside bounds of a Series to be accessed with negative integers (:issue:`10779`) | |||
- Bug preventing access to the first index when using ``iloc`` with a list containing the appropriate negative integer (:issue:`10547`, :issue:`10779`) | |||
- Bug where ``io.gbq`` throws ValueError when Bigquery returns zero rows (:issue:`10273`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use double backticks around ValueError
. say pd.read_gbq
instead of Bigquery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, following would be the better replacement
Bug where ``pd.read_gbq`` throws ``ValueError`` when Bigquery returns zero rows (:issue:`10273`)
because exception is thrown when Google Bigquery REST API returns zero rows, not the pandas function pd.read_gbq
.
Does that look good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when they use pandas they use pd.read_gbq
. using Bigquery
in a release note about this is not obvious to the casual reader. but that is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, then adding the line as per my last comment.
@jreback @sean-schaefer I will be away from work from next week. So I won't be able to respond. It would be great if we can conclude it this week. Is any more input required from my side here? |
this looks fine. Ideally i'd like cc @jacobschaer to give a test |
Looked fine, thanks. All tests passed when I ran it:
|
@jacobschaer gr8 thanks! |
merged via 53a6830 thanks! |
Thanks all :) |
closes #10273