Skip to content

BUG: allow numpy.array as c values to scatterplot #8929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 3, 2014

Conversation

aevri
Copy link
Contributor

@aevri aevri commented Nov 29, 2014

Ensure that we can pass an np.array as 'c' straight through to
matplotlib, this functionality was accidentally removed previously.

Add tests.

Closes #8852

@@ -2500,6 +2500,31 @@ def is_list_like(arg):
not isinstance(arg, compat.string_and_binary_types))


def is_hashy(arg):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty good to me, though I would call this is_hashable rather than is_hashy

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep is_hashable is better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that originally, I felt uncomfortable because of the surprising implementation.

Would is_safely_hashable be good maybe?

Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would not expect is_hashable to use collections.Hashable (if you wanted that, you could just do the isinstance check). So this implementation does not surprise me, at least. Also, I'm pretty sure the Python docs explicitly state that __hash__ is supposed to be cheap to evaluate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok cool, have rebased and renamed to is_hashable, thanks for explaining

@jreback jreback added this to the 0.15.2 milestone Nov 29, 2014
@@ -145,6 +145,8 @@ Bug Fixes
- BUG: Option context applies on __enter__ (:issue:`8514`)


- Bug where plot with kind=scatter fails when checking if an array is in the DataFrame (:issue:`8852`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say DataFrame.plot(kind='scatter')

@jreback
Copy link
Contributor

jreback commented Nov 29, 2014

pls squash, otherwise looks ok

@TomAugspurger ?

#
df = DataFrame({'A': [1, 2], 'B': [3, 4]})
c = np.array([1, 0])
df.plot(kind='scatter', x='A', y='B', c=c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check that the colors are correct? Get back the result from plot with ax = df.plot(kind='scatter', x='A', y='B', c=c) and get the collections attribute, and check ax.collections[0].get_facecolor() against the expected array([[ 1., 1., 1., 1.], [ 0., 0., 0., 1.]])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger Is that the expected color? I think the [1, 0] are interpreted as numeric values that are scaled to the default colormap, not as rgb colors? (when you do c=[[1,1,1],[0,0,0]] matplotlib does something different)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I get for fig, ax = plt.subplots(); ax.scatter([0, 0], [1, 1], c=[0, 1])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c can be a single color format string, or a sequence of color specifications of length N, or a sequence of N numbers to be mapped to colors using the cmap and norm specified via kwargs (see below). Note that c should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. c can be a 2-D array in which the rows are RGB or RGBA, however.

from http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter

We should handle the same if possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird that the order is flipped if you do

ax.scatter([0, 0], [1, 1], c=[0, 1]) vs ax.scatter([0, 0], [1, 1], c=[[0, 0, 0, 1], [1, 1, 1, 1]])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, because I did the same and got colors and not black/white. You don't have a custom matplotlib defaults that sets cmap at grayscale? (maybe seaborn does that?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seaborn imported in my IPython profile.

@TomAugspurger
Copy link
Contributor

I'm fine with this being two commits since it separates the addition of is_hashable and the plotting issue.

Could you add a test or two for is_hashable too? Thanks.

@jreback
Copy link
Contributor

jreback commented Nov 29, 2014

@aevri pls add test for is_hashable in tests/test_common

@aevri
Copy link
Contributor Author

aevri commented Nov 30, 2014

I've updated with these changes:

  • added tests for is_hashable in tests/test_common
  • modified is_hashable to first reject based on collections.Hashable, this is to avoid broadening the definition of hashable to include old-style classes and anything else I can't think of
  • changed whatsnew entry to say DataFrame.plot(kind='scatter')
  • In scatter_with_c tests, supplied RGBA colors as 'c' and added test for face colors to be identical to supplied values
  • rebased to catch up with upstream

Was considering how to test the face colors of the case where we supply a colormap, don't know how to query the colormap ahead of time though.

Cheers!

@jorisvandenbossche
Copy link
Member

@aevri By the way, David (@cournape) fixed it in numpy today: numpy/numpy#5326

And the tests are failing on python 3 (and on numpy master, because of Davids fix)

@aevri aevri force-pushed the fix/8852 branch 2 times, most recently from 14fac02 to 9881d87 Compare December 1, 2014 00:58
@aevri
Copy link
Contributor Author

aevri commented Dec 1, 2014

Have updated:

  • rebased on master to fix merge conflicts
  • wrapped Python2-specific tests in sys.version_info check
  • made numpy.array test work with or without numpy fix

Cheers!

aevri added 2 commits December 2, 2014 22:53
Some types will pass a test against collections.Hashable but fail when
they are actually hashed with hash().

Introduce a new predicate to help handle these types.

Will use this in subsequent commit.

Test Plan:
In addition to running tests, run the doctest
$ nosetests pandas/core/common.py:is_hashable --with-doc -v
Ensure that we can pass an np.array as 'c' straight through to
matplotlib, this functionality was accidentally removed previously.

Add tests.

Closes pandas-dev#8852
@aevri
Copy link
Contributor Author

aevri commented Dec 2, 2014

Rebased again for new merge conflicts

Afaik all points are addressed and this is ready to land, please let me know if there's something I've missed :O)

@jorisvandenbossche
Copy link
Member

Looks OK for me! Lets wait until travis is green

@jreback jreback merged commit bd68912 into pandas-dev:master Dec 3, 2014
@jreback
Copy link
Contributor

jreback commented Dec 3, 2014

@aevri thanks for the fix!

@aevri
Copy link
Contributor Author

aevri commented Dec 3, 2014

Yay! Thanks everyone :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: plot with kind=scatter fails when checking if an array is in the DataFrame
5 participants