-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: The 'jobComplete' key may be present but False in the BigQuery query results #8728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @jacobschaer ok? @aaront pls add a release note in v0.15.1 for this |
@jreback Hope I did this right. Let me know if there's an issue (it's my first pull request) |
yep that looks good since this is not tested by pandad directly id like Jacob to concur on this |
I'm surprised this change is needed - the line of code you changed came directly from Google's example. However, I always thought it was silly the way they did it... So, I guess what you encountered is that there exist cases where the job is not yet finished and either 'jobComplete' is absent, or it is False? If it is exclusively the latter, there might be a better way to write this... I haven't had a chance to run the regression suite. Thanks for the pull request though @aaront |
I ran a little test for this by adding a bit of code to to the existing 0.15 code: while(not 'jobComplete' in query_reply):
print('Job not yet complete...')
query_reply = job_collection.getQueryResults(
projectId=job_reference['projectId'],
jobId=job_reference['jobId']).execute()
if (not 'totalRows' in query_reply):
print(query_reply)
total_rows = int(query_reply['totalRows']) And received the following output:
Here's the error stacktrace: ---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-3-078620e7215f> in <module>()
1 query = geotab.BigQueryBuilder('PerformanceTime', FROM_DATE, TO_DATE).build()
----> 2 df = pd.read_gbq(query, project_id='XXXXXX')
C:\Miniconda\envs\mygeotabenv\lib\site-packages\pandas\io\gbq.py in read_gbq(query, project_id, index_col, col_order, reauth)
369
370 connector = GbqConnector(project_id, reauth = reauth)
--> 371 schema, pages = connector.run_query(query)
372 dataframe_list = []
373 while len(pages) > 0:
C:\Miniconda\envs\mygeotabenv\lib\site-packages\pandas\io\gbq.py in run_query(self, query)
193 if (not 'totalRows' in query_reply):
194 print(query_reply)
--> 195 total_rows = int(query_reply['totalRows'])
196 result_pages = list()
197 seen_page_tokens = list()
KeyError: 'totalRows' |
Thanks - that's a bit frustrating they would change that. I'm still trying to get my bq credentials fixed so I can re-run the integration suite and say for sure. |
Any updates on this? |
@jacobschaer progress on this? |
@jacobschaer update on this? |
The status is that Google managed to break something in their API and now our regression suite won't pass. This doesn't have anything to do with this commit, so we were trying to decide what to do. Frankly, the test that is now failing may have been a bit excessive and we can probably get away with reducing/removing it. |
Update - the issue with the regression suite seems to have resolved itself (by magic?). I ran the test suite against a modified version of this code (fixed the '1.2.0' != '1.2' in the _test_imports method by hand). Everything looks good. Technically this is duplicated by #9141. @jreback - how do you wish to proceed? @sean-schaefer - was there an addition you wanted to make to this PR to fix the '1.2.0' issue or anything else? I thought we had talked about you modifying this code. Or should we just do a seperate pull request? |
@jacobschaer I already sneaked the LooseVersion check in #8590 but that PR hasn't been accepted. @jreback Is there something more on our end needed to get that done? I remember we discussed another modification here, but cannot recall the specifics. If either of us happen to remember, we can open a different PR for it. |
can you rebase and move the release note to 0.16.0? |
…lts. Fixes an error when looking for 'totalRows' on a large dataset which is not finished in between two subsequent checks.
@jreback done and done |
@aaront ok thanks. @jacobschaer how is testing proceeding? ok with this change? |
@jacobschaer this mergable? |
The original hope was that #8590 would make it in first, since it technically is required for the integration suite to "pass". Unfortuantely, there seems to be an issue with Travis that neither @sean-schaefer nor I can track down. Who would the be go-to person for that? |
@jacobschaer ok, we can 'skip' the tests for now if you can assure that:
|
@jreback I've been using this change daily for a few months and can confirm it makes pd.read_gbq() functional. @jacobschaer I'm not sure #8590 is still actually an issue. As I mentioned before, I use this daily and if the notebook process is launched from a directory that doesn't contain a credential file, running pd.read_gbq() will invoke an auth flow just fine. I can confirm this works in Safari and Chrome on the Mac, and Chrome on Windows. This is with python-gflags 2.0 and google-api-python-client 1.3.1 installed. |
merged via 3039533 thanks! |
xref #9141
Fixes an issue when looking for 'totalRows' on a large dataset that is not finished in between two subsequent checks.