-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: improve bigquery connector to optionally allow use of gcloud credentials #8590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
this is meant to close #8489 ? |
seems very hard coded to me how about allowing passing of the path as the credentials first try the regular method (storage) |
yes, it would resolve #8489 |
The gcloud auth credentials file is not exactly in the same format as what appears in |
gr8 u can just put a try except around them |
Changes made -- now the credentials file path can also be specified. oauth2 will not raise an exception when it fails; it just hangs, because it's waiting for terminal input. But folks running this from a console with a browser should still be able to use that auth mechanism first, since the gcloud auth asks for broader permissions, and is in general less desirable. |
can u attempt to get the credential first from the file then fall back to browser based? (so only have a single credentials argument)? |
The problem is that there are two different files, with two different levels of credentialed authorizations.
The two files also have slightly different formats. And it's nice to have a |
ok how about passing an argument named gcloud_credentials. if it's True then u can try the default file path; if it not None then use it as the path to the file instead of the boolean arg u have now |
gbq fix allowing credentials file path to be specified ENH: allow bigquery connector to optionally use gcloud credentials
Done. Commits squashed. |
credentials = storage.get() | ||
gcfp = self.gcloud_credentials # a bit of mangling since this is dual-typed, str or bool | ||
if self.gcloud_credentials == True: | ||
gcfp = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u can move this into the next section no?
then just do
if gcloud_credentials is True:
gcloud_credentials = default config path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, done
ok I think u can test both of those cases |
huh? What two cases. |
gcloud_credentials =None is tested now k think |
Sorry, I'm unfamiliar with your testing framework, and don't have bw to add anything more. |
ok well if I come back to it will be here everything has to be tested (esp this but a bit trilcky as we don't have live credentials) - otherwise bugs creep in |
it works for us in our tests with live credentials. It's going into production use in our team. |
cc @jacobschaer pls have a look |
@@ -353,6 +376,12 @@ def read_gbq(query, project_id = None, index_col=None, col_order=None, reauth=Fa | |||
reauth : boolean (default False) | |||
Force Google BigQuery to reauthenticate the user. This is useful | |||
if multiple accounts are used. | |||
gcloud_credentials: boolean or str (default None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment: you need to use a space before the colon (so arg : type
instead of arg: type
) for sphinx to render it correctly
I'm out of town at the moment, but I can try to look at it and test this coming weekend. |
Did some initial testing on my environment w/ live creds and it looks good. Exception raised given invalid creds path, works fine with True and specific valid path (creds file was generated through Google Cloud SDK upon running gcloud auth login). Would be nice to see some tests for this also. E.g. mock the creds file and test get_credentials with gcloud_credentials = True or str or None, verify returned creds object. @jacobschaer may have thoughts on that as well when he returns. |
@sean-schaefer if guys want to do a mock credentials object would be awesome! then we could add a few more validation tests! |
@sean-schaefer I can look at this more tomorrow evening, but I am up for working on this too. |
I get the following:
|
@jacobschaer status of this? |
The skipping doesn't suprise me unless your environment is already configured to run GBQ. Very few of the tests are actually practical without valid GBQ credentials/project ID's. We setup the test suite to do the best it can, and if the user does want to run a full integration test they should specify their project id and ensure their credentials files are in place. |
@jreback - Can we get this merged in soon as well? Tests passed fine. |
needs a release note (enhancement section), maybe update on the docs? |
These tests appear to all be skipped because of an import issue: https://travis-ci.org/pydata/pandas/jobs/41446098 can you have a look? |
@ichuang, or @shaun-schaefer - Are these skips expected? I don't see why they would skip for the import error - so think the CI requirements calls for all the needed libs? Also, can you rebase and fix docs? |
I double checked the Travis logs on that job and something strange happened with the install of the Google API client. The logs recognize it as a dependency and fetch the package, seems like everything goes well there. But if you look at the output of ci/print_versions.py after all the tests run, the version of the apiclient is None. I'll try and look into it more soon. |
Fixes GH9276 This now relies entirely on the result of calling ``hash`` on the argument. Note: I had to change the test for old style classes on Python 2 -- these are now considered hashable by `is_hashable`, because they don't fail `hash`. CC aevri
BUG: fix common.is_hashable for NumPy scalars on Python 3
BLD: revise setup.py to clean properly on windows platforms
Conflicts: pandas/io/gbq.py
can u rebase this? |
@jreback - Who is the Travis CI point of contact? |
in Travis itself - I just email them thru their help facility - if u have question on pandas side ask away |
The problem with the Travis skipping all those tests seems to be that the optional dependencies aren't being installed properly. This seems unique to Travis, when I build locally (pip) all the dependencies are pulled correctly. You can see that the conda install script does attempt to download and install them, but when it goes to print out the version strings they aren't there. @sean-schaefer was also running into some SSL issues with a couple of the packages when he tried to alter our test script to attempt to catch exactly what deps are failing. |
closing pls reopen if/when updated |
closes #8489
The pandas google bigquery connector can be difficult to use from an ipython notebook: when no authentication token exists, it fails silently (and can hang the notebook). This happens because the oauth2client library is trying to bring up a browser on the console, to request an auth token.
This PR provides an improved mechanism: an optional
use_gcloud_auth
argument which specifies that the user wants to employ a pre-existing authentication token, generated using gcloud auth login. An error is raised if the credentials file is missing.