Skip to content

BUG: The 'jobComplete' key may be present but False in the BigQuery query results #8728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

aaront
Copy link
Contributor

@aaront aaront commented Nov 4, 2014

xref #9141

Fixes an issue when looking for 'totalRows' on a large dataset that is not finished in between two subsequent checks.

@aaront aaront changed the title BUG: The 'jobComplete' key may be present but False in the query results BUG: The 'jobComplete' key may be present but False in the BigQuery query results Nov 4, 2014
@jreback
Copy link
Contributor

jreback commented Nov 5, 2014

cc @jacobschaer

ok?

@aaront pls add a release note in v0.15.1 for this

@jreback jreback added this to the 0.15.1 milestone Nov 5, 2014
@aaront
Copy link
Contributor Author

aaront commented Nov 5, 2014

@jreback Hope I did this right. Let me know if there's an issue (it's my first pull request)

@jreback
Copy link
Contributor

jreback commented Nov 5, 2014

yep that looks good

since this is not tested by pandad directly id like Jacob to concur on this

@jacobschaer
Copy link
Contributor

I'm surprised this change is needed - the line of code you changed came directly from Google's example. However, I always thought it was silly the way they did it... So, I guess what you encountered is that there exist cases where the job is not yet finished and either 'jobComplete' is absent, or it is False? If it is exclusively the latter, there might be a better way to write this...

I haven't had a chance to run the regression suite.

Thanks for the pull request though @aaront

@jreback jreback modified the milestones: 0.15.2, 0.15.1 Nov 6, 2014
@aaront
Copy link
Contributor Author

aaront commented Nov 10, 2014

I ran a little test for this by adding a bit of code to to the existing 0.15 code:

while(not 'jobComplete' in query_reply):
    print('Job not yet complete...')
    query_reply = job_collection.getQueryResults(
                    projectId=job_reference['projectId'],
                    jobId=job_reference['jobId']).execute()
if (not 'totalRows' in query_reply):
    print(query_reply)
total_rows = int(query_reply['totalRows'])

And received the following output:

Job not yet complete...
{u'kind': u'bigquery#getQueryResultsResponse', u'etag': u'"1R8EtuJeC1ZTqaGooyltyIvODk8/W9GOBbLTm4BKUyjNgBAQorULao0"', u'jobComplete': False}

Here's the error stacktrace:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-3-078620e7215f> in <module>()
      1 query = geotab.BigQueryBuilder('PerformanceTime', FROM_DATE, TO_DATE).build()
----> 2 df = pd.read_gbq(query, project_id='XXXXXX')

C:\Miniconda\envs\mygeotabenv\lib\site-packages\pandas\io\gbq.py in read_gbq(query, project_id, index_col, col_order, reauth)
    369 
    370     connector = GbqConnector(project_id, reauth = reauth)
--> 371     schema, pages = connector.run_query(query)
    372     dataframe_list = []
    373     while len(pages) > 0:

C:\Miniconda\envs\mygeotabenv\lib\site-packages\pandas\io\gbq.py in run_query(self, query)
    193         if (not 'totalRows' in query_reply):
    194                 print(query_reply)
--> 195         total_rows = int(query_reply['totalRows'])
    196         result_pages = list()
    197         seen_page_tokens = list()

KeyError: 'totalRows'

@jacobschaer
Copy link
Contributor

Thanks - that's a bit frustrating they would change that. I'm still trying to get my bq credentials fixed so I can re-run the integration suite and say for sure.

@aaront
Copy link
Contributor Author

aaront commented Nov 25, 2014

Any updates on this?

@jreback
Copy link
Contributor

jreback commented Dec 4, 2014

@jacobschaer progress on this?

@jreback jreback modified the milestones: 0.16.0, 0.15.2 Dec 5, 2014
@jreback
Copy link
Contributor

jreback commented Dec 5, 2014

@jacobschaer update on this?

@jacobschaer
Copy link
Contributor

The status is that Google managed to break something in their API and now our regression suite won't pass. This doesn't have anything to do with this commit, so we were trying to decide what to do. Frankly, the test that is now failing may have been a bit excessive and we can probably get away with reducing/removing it.

@jacobschaer
Copy link
Contributor

Update - the issue with the regression suite seems to have resolved itself (by magic?). I ran the test suite against a modified version of this code (fixed the '1.2.0' != '1.2' in the _test_imports method by hand). Everything looks good.

Technically this is duplicated by #9141. @jreback - how do you wish to proceed?

@sean-schaefer - was there an addition you wanted to make to this PR to fix the '1.2.0' issue or anything else? I thought we had talked about you modifying this code. Or should we just do a seperate pull request?

@sean-schaefer
Copy link

@jacobschaer I already sneaked the LooseVersion check in #8590 but that PR hasn't been accepted. @jreback Is there something more on our end needed to get that done?

I remember we discussed another modification here, but cannot recall the specifics. If either of us happen to remember, we can open a different PR for it.

@jreback jreback added the Bug label Jan 18, 2015
@jreback
Copy link
Contributor

jreback commented Jan 18, 2015

can you rebase and move the release note to 0.16.0?

…lts.

Fixes an error when looking for 'totalRows' on a large dataset which is not finished in between two subsequent checks.
@aaront
Copy link
Contributor Author

aaront commented Jan 18, 2015

@jreback done and done

@jreback
Copy link
Contributor

jreback commented Jan 18, 2015

@aaront ok thanks.

@jacobschaer how is testing proceeding? ok with this change?

@jacobschaer
Copy link
Contributor

@jreback - Testing looks good. Now to see if we can get #8590 in as well. I'm going to try it now.

@jreback
Copy link
Contributor

jreback commented Mar 3, 2015

@jacobschaer this mergable?

@jacobschaer
Copy link
Contributor

The original hope was that #8590 would make it in first, since it technically is required for the integration suite to "pass". Unfortuantely, there seems to be an issue with Travis that neither @sean-schaefer nor I can track down. Who would the be go-to person for that?

@jreback
Copy link
Contributor

jreback commented Mar 5, 2015

@jacobschaer ok, we can 'skip' the tests for now if you can assure that:

  1. this passes locally for you guys
  2. this is a rationale / correct change

@cgrin
Copy link
Contributor

cgrin commented Mar 6, 2015

@jreback I've been using this change daily for a few months and can confirm it makes pd.read_gbq() functional.

@jacobschaer I'm not sure #8590 is still actually an issue. As I mentioned before, I use this daily and if the notebook process is launched from a directory that doesn't contain a credential file, running pd.read_gbq() will invoke an auth flow just fine. I can confirm this works in Safari and Chrome on the Mac, and Chrome on Windows. This is with python-gflags 2.0 and google-api-python-client 1.3.1 installed.

@jreback
Copy link
Contributor

jreback commented Mar 6, 2015

merged via 3039533

thanks!

@jreback jreback closed this Mar 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants