Skip to content

ENH: improve bigquery connector to optionally allow use of gcloud credentials #8590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

ichuang
Copy link

@ichuang ichuang commented Oct 21, 2014

closes #8489

The pandas google bigquery connector can be difficult to use from an ipython notebook: when no authentication token exists, it fails silently (and can hang the notebook). This happens because the oauth2client library is trying to bring up a browser on the console, to request an auth token.

This PR provides an improved mechanism: an optional use_gcloud_auth argument which specifies that the user wants to employ a pre-existing authentication token, generated using gcloud auth login. An error is raised if the credentials file is missing.

@ichuang ichuang changed the title improve bigquery connector to optionally allow use of gcloud credentials ENH: improve bigquery connector to optionally allow use of gcloud credentials Oct 21, 2014
@jreback
Copy link
Contributor

jreback commented Oct 21, 2014

this is meant to close #8489 ?

@jreback
Copy link
Contributor

jreback commented Oct 21, 2014

seems very hard coded to me

how about allowing passing of the path as the credentials

first try the regular method (storage)
else if the path exists try this method ?

@ichuang
Copy link
Author

ichuang commented Oct 21, 2014

yes, it would resolve #8489

@ichuang
Copy link
Author

ichuang commented Oct 21, 2014

The gcloud auth credentials file is not exactly in the same format as what appears in bigquery_credentials.dat. But it would be quite reasonable to allow passing in of a different credentials path. I'll make that change.

@jreback
Copy link
Contributor

jreback commented Oct 21, 2014

gr8

u can just put a try except around them
and then raise if neither are accepted

@ichuang
Copy link
Author

ichuang commented Oct 21, 2014

Changes made -- now the credentials file path can also be specified.

oauth2 will not raise an exception when it fails; it just hangs, because it's waiting for terminal input. But folks running this from a console with a browser should still be able to use that auth mechanism first, since the gcloud auth asks for broader permissions, and is in general less desirable.

@jreback
Copy link
Contributor

jreback commented Oct 21, 2014

can u attempt to get the credential first from the file then fall back to browser based? (so only have a single credentials argument)?

@ichuang
Copy link
Author

ichuang commented Oct 21, 2014

The problem is that there are two different files, with two different levels of credentialed authorizations.

bigquery_credentials.dat is created with authorization only for BigQuery.

~/.config/gcloud/credentials is created with broader authorization, for all google cloud services.

The two files also have slightly different formats.

And it's nice to have a use_gcloud_auth argument, which causes the correct file path to be used by default, without the pandas user needing to know about that arcania.

@jreback
Copy link
Contributor

jreback commented Oct 21, 2014

ok how about

passing an argument named gcloud_credentials. if it's True then u can try the default file path; if it not None then use it as the path to the file

instead of the boolean arg u have now

gbq fix allowing credentials file path to be specified

ENH: allow bigquery connector to optionally use gcloud credentials
@ichuang
Copy link
Author

ichuang commented Oct 21, 2014

Done.

Commits squashed.

credentials = storage.get()
gcfp = self.gcloud_credentials # a bit of mangling since this is dual-typed, str or bool
if self.gcloud_credentials == True:
gcfp = ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

u can move this into the next section no?
then just do

if gcloud_credentials is True:
gcloud_credentials = default config path

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, done

@jreback
Copy link
Contributor

jreback commented Oct 21, 2014

ok I think u can test both of those cases
(both should raise MissingbCredentials)
eg gcloud_credentials not None

@ichuang
Copy link
Author

ichuang commented Oct 21, 2014

huh? What two cases.

@jreback
Copy link
Contributor

jreback commented Oct 21, 2014

gcloud_credentials =None is tested now k think
so True and a path (both should raise MissingCredentials) as nothing on travis

@ichuang
Copy link
Author

ichuang commented Oct 21, 2014

Sorry, I'm unfamiliar with your testing framework, and don't have bw to add anything more.

@jreback
Copy link
Contributor

jreback commented Oct 21, 2014

ok

well if I come back to it will be here

everything has to be tested (esp this but a bit trilcky as we don't have live credentials) - otherwise bugs creep in

@ichuang
Copy link
Author

ichuang commented Oct 21, 2014

it works for us in our tests with live credentials. It's going into production use in our team.

@jreback
Copy link
Contributor

jreback commented Oct 21, 2014

cc @jacobschaer pls have a look

@@ -353,6 +376,12 @@ def read_gbq(query, project_id = None, index_col=None, col_order=None, reauth=Fa
reauth : boolean (default False)
Force Google BigQuery to reauthenticate the user. This is useful
if multiple accounts are used.
gcloud_credentials: boolean or str (default None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment: you need to use a space before the colon (so arg : type instead of arg: type) for sphinx to render it correctly

@jacobschaer
Copy link
Contributor

I'm out of town at the moment, but I can try to look at it and test this coming weekend.
CC @sean-schaefer @azbones

@sean-schaefer
Copy link

Did some initial testing on my environment w/ live creds and it looks good. Exception raised given invalid creds path, works fine with True and specific valid path (creds file was generated through Google Cloud SDK upon running gcloud auth login).

Would be nice to see some tests for this also. E.g. mock the creds file and test get_credentials with gcloud_credentials = True or str or None, verify returned creds object. @jacobschaer may have thoughts on that as well when he returns.

@jreback jreback added this to the 0.15.1 milestone Oct 22, 2014
@jreback
Copy link
Contributor

jreback commented Oct 22, 2014

@sean-schaefer if guys want to do a mock credentials object would be awesome! then we could add a few more validation tests!

@azbones
Copy link

azbones commented Oct 22, 2014

@sean-schaefer I can look at this more tomorrow evening, but I am up for working on this too.

@jreback
Copy link
Contributor

jreback commented Nov 26, 2014

I get the following:

[jreback-~/pandas] nosetests  pandas/io/tests/test_gbq.py -v
test_should_be_able_to_get_a_bigquery_service (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_be_able_to_get_results_from_query (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_be_able_to_get_schema_from_query (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_be_able_to_get_valid_credentials (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_be_able_to_get_valid_gcloud_credentials (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_be_able_to_make_a_connector (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_fail_if_gcloud_credentials_incorrectly_formatted (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_fail_with_invalid_gcloud_credentials (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_raise_exception_with_invalid_gcloud_creds_path (pandas.io.tests.test_gbq.TestGBQConnectorIntegration) ... SKIP: Cannot run integration tests without a project id
test_bad_project_id (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_bad_table_name (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_column_order (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_column_order_plus_index (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_download_dataset_larger_than_200k_rows (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_index_column (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_malformed_query (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_properly_handle_arbitrary_timestamp (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_properly_handle_empty_strings (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_properly_handle_false_boolean (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_properly_handle_null_boolean (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_properly_handle_null_floats (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_properly_handle_null_integers (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_properly_handle_null_strings (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_properly_handle_null_timestamp (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_properly_handle_timestamp_unix_epoch (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_properly_handle_true_boolean (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_properly_handle_valid_floats (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_properly_handle_valid_integers (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_should_properly_handle_valid_strings (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_unicode_string_conversion_and_normalization (pandas.io.tests.test_gbq.TestReadGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_read_gbq_with_no_project_id_given_should_fail (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_booleans_as_python_booleans (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_floats_as_python_floats (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_integers_as_python_floats (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_strings_as_python_strings (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_should_return_bigquery_timestamps_as_numpy_datetime (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_that_parse_data_works_properly (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_to_gbq_should_fail_if_invalid_table_name_passed (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_to_gbq_with_no_project_id_given_should_fail (pandas.io.tests.test_gbq.TestReadGBQUnitTests) ... ok
test_google_upload_errors_should_raise_exception (pandas.io.tests.test_gbq.TestToGBQIntegration) ... SKIP: Cannot run integration tests without a project id
test_upload_data (pandas.io.tests.test_gbq.TestToGBQIntegration) ... SKIP: Cannot run integration tests without a project id
pandas.io.tests.test_gbq.test_requirements ... ok

----------------------------------------------------------------------
Ran 42 tests in 0.072s

OK (SKIP=32)

@jreback jreback modified the milestones: 0.16.0, 0.15.2 Dec 5, 2014
@jreback
Copy link
Contributor

jreback commented Dec 5, 2014

@jacobschaer status of this?

@jacobschaer
Copy link
Contributor

The skipping doesn't suprise me unless your environment is already configured to run GBQ. Very few of the tests are actually practical without valid GBQ credentials/project ID's. We setup the test suite to do the best it can, and if the user does want to run a full integration test they should specify their project id and ensure their credentials files are in place.

@jacobschaer
Copy link
Contributor

@jreback - Can we get this merged in soon as well? Tests passed fine.

@jreback
Copy link
Contributor

jreback commented Jan 19, 2015

needs a release note (enhancement section), maybe update on the docs?
pls also squash

@jreback
Copy link
Contributor

jreback commented Jan 19, 2015

These tests appear to all be skipped because of an import issue: https://travis-ci.org/pydata/pandas/jobs/41446098

can you have a look?

@jacobschaer
Copy link
Contributor

@ichuang, or @shaun-schaefer - Are these skips expected? I don't see why they would skip for the import error - so think the CI requirements calls for all the needed libs? Also, can you rebase and fix docs?

@sean-schaefer
Copy link

I double checked the Travis logs on that job and something strange happened with the install of the Google API client. The logs recognize it as a dependency and fetch the package, seems like everything goes well there. But if you look at the output of ci/print_versions.py after all the tests run, the version of the apiclient is None.

I'll try and look into it more soon.

shoyer and others added 7 commits February 11, 2015 20:41
Fixes GH9276

This now relies entirely on the result of calling ``hash`` on the argument.

Note: I had to change the test for old style classes on Python 2 -- these are
now considered hashable by `is_hashable`, because they don't fail `hash`.

CC aevri
BUG: fix common.is_hashable for NumPy scalars on Python 3
BLD: revise setup.py to clean properly on windows platforms
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
@jreback
Copy link
Contributor

jreback commented Mar 3, 2015

can u rebase this?

@jacobschaer
Copy link
Contributor

@jreback - Who is the Travis CI point of contact?

@jreback
Copy link
Contributor

jreback commented Mar 3, 2015

in Travis itself - I just email them thru their help facility - if u have question on pandas side ask away

@jacobschaer
Copy link
Contributor

The problem with the Travis skipping all those tests seems to be that the optional dependencies aren't being installed properly. This seems unique to Travis, when I build locally (pip) all the dependencies are pulled correctly. You can see that the conda install script does attempt to download and install them, but when it goes to print out the version strings they aren't there. @sean-schaefer was also running into some SSL issues with a couple of the packages when he tried to alter our test script to attempt to catch exactly what deps are failing.

@jreback
Copy link
Contributor

jreback commented May 9, 2015

closing pls reopen if/when updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BigQuery authentication on remote servers
7 participants