BUG: The 'jobComplete' key may be present but False in the BigQuery query results #8728

aaront · 2014-11-04T16:35:56Z

xref #9141

Fixes an issue when looking for 'totalRows' on a large dataset that is not finished in between two subsequent checks.

jreback · 2014-11-05T01:35:16Z

cc @jacobschaer

ok?

@aaront pls add a release note in v0.15.1 for this

aaront · 2014-11-05T14:44:24Z

@jreback Hope I did this right. Let me know if there's an issue (it's my first pull request)

jreback · 2014-11-05T20:12:45Z

yep that looks good

since this is not tested by pandad directly id like Jacob to concur on this

jacobschaer · 2014-11-06T05:17:55Z

I'm surprised this change is needed - the line of code you changed came directly from Google's example. However, I always thought it was silly the way they did it... So, I guess what you encountered is that there exist cases where the job is not yet finished and either 'jobComplete' is absent, or it is False? If it is exclusively the latter, there might be a better way to write this...

I haven't had a chance to run the regression suite.

Thanks for the pull request though @aaront

aaront · 2014-11-10T16:04:21Z

I ran a little test for this by adding a bit of code to to the existing 0.15 code:

while(not 'jobComplete' in query_reply):
    print('Job not yet complete...')
    query_reply = job_collection.getQueryResults(
                    projectId=job_reference['projectId'],
                    jobId=job_reference['jobId']).execute()
if (not 'totalRows' in query_reply):
    print(query_reply)
total_rows = int(query_reply['totalRows'])

And received the following output:

Job not yet complete...
{u'kind': u'bigquery#getQueryResultsResponse', u'etag': u'"1R8EtuJeC1ZTqaGooyltyIvODk8/W9GOBbLTm4BKUyjNgBAQorULao0"', u'jobComplete': False}

Here's the error stacktrace:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-3-078620e7215f> in <module>()
      1 query = geotab.BigQueryBuilder('PerformanceTime', FROM_DATE, TO_DATE).build()
----> 2 df = pd.read_gbq(query, project_id='XXXXXX')

C:\Miniconda\envs\mygeotabenv\lib\site-packages\pandas\io\gbq.py in read_gbq(query, project_id, index_col, col_order, reauth)
    369 
    370     connector = GbqConnector(project_id, reauth = reauth)
--> 371     schema, pages = connector.run_query(query)
    372     dataframe_list = []
    373     while len(pages) > 0:

C:\Miniconda\envs\mygeotabenv\lib\site-packages\pandas\io\gbq.py in run_query(self, query)
    193         if (not 'totalRows' in query_reply):
    194                 print(query_reply)
--> 195         total_rows = int(query_reply['totalRows'])
    196         result_pages = list()
    197         seen_page_tokens = list()

KeyError: 'totalRows'

jacobschaer · 2014-11-11T04:37:51Z

Thanks - that's a bit frustrating they would change that. I'm still trying to get my bq credentials fixed so I can re-run the integration suite and say for sure.

aaront · 2014-11-25T20:48:06Z

Any updates on this?

jreback · 2014-12-04T02:33:56Z

@jacobschaer progress on this?

jreback · 2014-12-05T13:38:41Z

@jacobschaer update on this?

jacobschaer · 2014-12-06T02:27:26Z

The status is that Google managed to break something in their API and now our regression suite won't pass. This doesn't have anything to do with this commit, so we were trying to decide what to do. Frankly, the test that is now failing may have been a bit excessive and we can probably get away with reducing/removing it.

jacobschaer · 2015-01-08T05:11:12Z

Update - the issue with the regression suite seems to have resolved itself (by magic?). I ran the test suite against a modified version of this code (fixed the '1.2.0' != '1.2' in the _test_imports method by hand). Everything looks good.

Technically this is duplicated by #9141. @jreback - how do you wish to proceed?

@sean-schaefer - was there an addition you wanted to make to this PR to fix the '1.2.0' issue or anything else? I thought we had talked about you modifying this code. Or should we just do a seperate pull request?

sean-schaefer · 2015-01-09T05:50:20Z

@jacobschaer I already sneaked the LooseVersion check in #8590 but that PR hasn't been accepted. @jreback Is there something more on our end needed to get that done?

I remember we discussed another modification here, but cannot recall the specifics. If either of us happen to remember, we can open a different PR for it.

jreback · 2015-01-18T21:08:49Z

can you rebase and move the release note to 0.16.0?

…lts. Fixes an error when looking for 'totalRows' on a large dataset which is not finished in between two subsequent checks.

aaront · 2015-01-18T22:04:31Z

@jreback done and done

jreback · 2015-01-18T22:05:51Z

@aaront ok thanks.

@jacobschaer how is testing proceeding? ok with this change?

jacobschaer · 2015-01-18T22:50:06Z

@jreback - Testing looks good. Now to see if we can get #8590 in as well. I'm going to try it now.

jreback · 2015-03-03T01:10:44Z

@jacobschaer this mergable?

jacobschaer · 2015-03-03T04:31:03Z

The original hope was that #8590 would make it in first, since it technically is required for the integration suite to "pass". Unfortuantely, there seems to be an issue with Travis that neither @sean-schaefer nor I can track down. Who would the be go-to person for that?

jreback · 2015-03-05T23:41:44Z

@jacobschaer ok, we can 'skip' the tests for now if you can assure that:

this passes locally for you guys
this is a rationale / correct change

cgrin · 2015-03-06T06:14:15Z

@jreback I've been using this change daily for a few months and can confirm it makes pd.read_gbq() functional.

@jacobschaer I'm not sure #8590 is still actually an issue. As I mentioned before, I use this daily and if the notebook process is launched from a directory that doesn't contain a credential file, running pd.read_gbq() will invoke an auth flow just fine. I can confirm this works in Safari and Chrome on the Mac, and Chrome on Windows. This is with python-gflags 2.0 and google-api-python-client 1.3.1 installed.

jreback · 2015-03-06T22:59:19Z

merged via 3039533

thanks!

aaront changed the title ~~BUG: The 'jobComplete' key may be present but False in the query results~~ BUG: The 'jobComplete' key may be present but False in the BigQuery query results Nov 4, 2014

jreback added the Google I/O label Nov 5, 2014

jreback added this to the 0.15.1 milestone Nov 5, 2014

jreback modified the milestones: 0.15.2, 0.15.1 Nov 6, 2014

jreback modified the milestones: 0.16.0, 0.15.2 Dec 5, 2014

jacobschaer mentioned this pull request Jan 7, 2015

Check whether GBQ Job is finished #9141

Closed

jreback added the Bug label Jan 18, 2015

aaront added 2 commits January 18, 2015 16:57

BUG: The 'jobComplete' key may be present but False in the query resu…

65d7fbf

…lts. Fixes an error when looking for 'totalRows' on a large dataset which is not finished in between two subsequent checks.

DOC: Added line to what's new for an bug in the BigQuery reader (GH8728)

70499b3

aaront force-pushed the master branch from 032efe5 to 70499b3 Compare January 18, 2015 22:03

jreback mentioned this pull request Jan 23, 2015

Fix gbq client to only return results when jobCompleted is True. #9348

Closed

jreback closed this Mar 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: The 'jobComplete' key may be present but False in the BigQuery query results #8728

BUG: The 'jobComplete' key may be present but False in the BigQuery query results #8728

aaront commented Nov 4, 2014

jreback commented Nov 5, 2014

aaront commented Nov 5, 2014

jreback commented Nov 5, 2014

jacobschaer commented Nov 6, 2014

aaront commented Nov 10, 2014

jacobschaer commented Nov 11, 2014

aaront commented Nov 25, 2014

jreback commented Dec 4, 2014

jreback commented Dec 5, 2014

jacobschaer commented Dec 6, 2014

jacobschaer commented Jan 8, 2015

sean-schaefer commented Jan 9, 2015

jreback commented Jan 18, 2015

aaront commented Jan 18, 2015

jreback commented Jan 18, 2015

jacobschaer commented Jan 18, 2015

jreback commented Mar 3, 2015

jacobschaer commented Mar 3, 2015

jreback commented Mar 5, 2015

cgrin commented Mar 6, 2015

jreback commented Mar 6, 2015

BUG: The 'jobComplete' key may be present but False in the BigQuery query results #8728

BUG: The 'jobComplete' key may be present but False in the BigQuery query results #8728

Conversation

aaront commented Nov 4, 2014

jreback commented Nov 5, 2014

aaront commented Nov 5, 2014

jreback commented Nov 5, 2014

jacobschaer commented Nov 6, 2014

aaront commented Nov 10, 2014

jacobschaer commented Nov 11, 2014

aaront commented Nov 25, 2014

jreback commented Dec 4, 2014

jreback commented Dec 5, 2014

jacobschaer commented Dec 6, 2014

jacobschaer commented Jan 8, 2015

sean-schaefer commented Jan 9, 2015

jreback commented Jan 18, 2015

aaront commented Jan 18, 2015

jreback commented Jan 18, 2015

jacobschaer commented Jan 18, 2015

jreback commented Mar 3, 2015

jacobschaer commented Mar 3, 2015

jreback commented Mar 5, 2015

cgrin commented Mar 6, 2015

jreback commented Mar 6, 2015