Skip to content

Validate that 'name' attribute is set only if hashable #9193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Validate that 'name' attribute is set only if hashable #9193

wants to merge 3 commits into from

Conversation

dr-leo
Copy link
Contributor

@dr-leo dr-leo commented Jan 3, 2015

addresses part of issue #8263.

@shoyer
Copy link
Member

shoyer commented Jan 4, 2015

a few points:

  1. this needs a test to verify that it works.
  2. the cleaner way to do this rather than checking this for all attribute setting on every data structure would be to make name on a property on Series objects (the only type that currently has names in pandas). Something like:
    @property
    def name(self):
        return self._name

    @name.setter
    def name(self, value):
        try:
            hash(value)
        except TypeError:
            raise TypeError('name must be hashable')
        self._name = name

@dr-leo
Copy link
Contributor Author

dr-leo commented Jan 4, 2015

Thanks. Makes sense to me.

What next? Shall I try and make another commit on this branch while the
PR remains open, or do you want to reject it in which case (i) I could
try on a new branch and a new PR, or (ii) you could do it yourself?

Am 04.01.2015 um 20:51 schrieb Stephan Hoyer:

a few points:

  1. this needs a test to verify that it works.

  2. the cleaner way to do this rather than checking this for /all/
    attribute setting on every data structure would be to make |name|
    on a property on |Series| objects (the only type that currently
    has names in pandas). Something like:

    @Property
    def name(self):
    return self._name

    @name.setter
    def name(self, value):
    try:
    hash(value)
    except TypeError:
    raise TypeError('name must be hashable')
    self._name = name


Reply to this email directly or view it on GitHub
#9193 (comment).

@jreback
Copy link
Contributor

jreback commented Jan 4, 2015

@dr-leo just make a new commit on this PR

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label Jan 4, 2015
@jorisvandenbossche
Copy link
Member

There was recently introduced a pd.core.common.is_hashable() in this PR: #8929. Maybe that can be used here to do the hash check.

@dr-leo
Copy link
Contributor Author

dr-leo commented Jan 12, 2015

@joris: sorry, don't know how to use code from another PR in my own.

@ALL: I've just committed what will hopefully do part of the trick. I've
added a name property in core.series.py as Stephan suggested. I've also
added a test for this in test_series.py. I couldn't find a good place
for it so I put it behind the test_constructor_map case.

The test suite produces some failures: in test_series, line 1999, e.g.,
Series.name is set to a list type. That's mean, so I've made it a
string. This does not break that test, it is about repr.

Still there are 13 failures and one error out of 8027 tests. At least
one failure relates to Series.name. Unfortunately I am fairly unfamiliar
with nose and unittest. Before attempting to fix these failures I
thought I'd show you what I've done so far.

Please let me know your views on the test results and what it would
take to accept the PR.

As this is my very first PR for a serious project, I am somewhat baffled
about how much work it takes to put together good software :-O.

Anyway, so far it has been fun to work on this and I've learnt quite a bit.

Thanks.

Leo

Am 05.01.2015 um 23:44 schrieb Joris Van den Bossche:

There was recently introduced a |pd.core.common.is_hashable()| in this
PR: #8929 #8929. Maybe that can
be used here to do the hash check.


Reply to this email directly or view it on GitHub
#9193 (comment).

def name(self, value):
try:
hash(value)
except TypeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could replace this with just if not com.is_hashable(value):.

@shoyer
Copy link
Member

shoyer commented Jan 12, 2015

@dr-leo The other PR is already merged, so you can just use the function it introduced directly.

Trust me, this gets easier with practice :)

@shoyer
Copy link
Member

shoyer commented Jan 12, 2015

I'm pretty sure the test failures you're getting is because None (the default value for name) is not hashable. We definitely want None to remain a valid name, so you'll need to add a special case for that.

@shoyer
Copy link
Member

shoyer commented Jan 12, 2015

Actually, that theory is wrong. None actually is hashable.

@shoyer
Copy link
Member

shoyer commented Jan 12, 2015

OK, it looks like the trouble here is related to some strange business pandas does with overwriting __setattr__. It looks like replacing self._name = value with object.__setattr__(self, '_name', value) should do the trick.

@jreback
Copy link
Contributor

jreback commented Jan 12, 2015

needs a test to see if this survives pickling

@dr-leo
Copy link
Contributor Author

dr-leo commented Jan 13, 2015

Got you. Should all be doable.

On 13/01/2015, jreback [email protected] wrote:

needs a test to see if this survives pickling


Reply to this email directly or view it on GitHub:
#9193 (comment)

@dr-leo
Copy link
Contributor Author

dr-leo commented Jan 15, 2015

On picling: There is already a testcase in test_series @411:
def test_pickle_preserve_name(self):
unpickled = self._pickle_roundtrip_name(self.ts)
self.assertEqual(unpickled.name, self.ts.name)

So I don't see any need for another.

On common.is_hashable: It returns False for NP.float64 (see below). This
breaks a couple of tests. I suppose this is a bug in is_hashable. If
not, we cannot use it to check Series.name for hashability. We allow
float64 for index labels after all.

In [3]: import numpy as NP

In [5]: f=NP.float64(3.14)

In [8]: hash(f)
Out[8]: 1846836513

In [9]: import pandas

In [11]: from pandas.core.common import is_hashable

In [12]: is_hashable(f)
Out[12]: False

Am 13.01.2015 um 00:55 schrieb jreback:

needs a test to see if this survives pickling


Reply to this email directly or view it on GitHub
#9193 (comment).

@shoyer
Copy link
Member

shoyer commented Jan 15, 2015

@dr-leo what version of numpy are you running? I'm seeing a different result on numpy 1.9.1

@dr-leo
Copy link
Contributor Author

dr-leo commented Jan 16, 2015

np1.9.1, py34,32bit, win7 64bit.

Am 15.01.2015 um 23:35 schrieb Stephan Hoyer:

@dr-leo https://github.com/dr-leo what version of numpy are you
running? I'm seeing a different result on numpy 1.9.1


Reply to this email directly or view it on GitHub
#9193 (comment).

@shoyer
Copy link
Member

shoyer commented Jan 16, 2015

@dr-leo It's only a problem with Python 3. Just made a new issue: #9276

@shoyer
Copy link
Member

shoyer commented Feb 17, 2015

@dr-leo can you rebase on master and give this another try? We just fixed the is_hashable bug in #9473.

@dr-leo
Copy link
Contributor Author

dr-leo commented Feb 17, 2015

Great!

However, I am unfamiliar with rebase. To make things worse, I work with
Mercurial using hg-git. It does have a rebase extension but fiddling
with history is not one of my passions.

My hope was that you could simply merge my little PR branch into
master... That said, if you give me a hint I could try to help.

Leo

Am 17.02.2015 um 03:23 schrieb Stephan Hoyer:

@dr-leo https://github.com/dr-leo can you rebase on master and give
this another try? We just fixed the is_hashable bug in #9473
#9473.


Reply to this email directly or view it on GitHub
#9193 (comment).


@name.setter
def name(self, value):
if is_hashable(value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do if value is not None here as it's the most common case

@jreback
Copy link
Contributor

jreback commented Feb 17, 2015

needs a perf check


def test_constructor_unhashable_name(self):
def set_to_unhashable(s_):
s_.name = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a test on the construcor as well

@jreback
Copy link
Contributor

jreback commented May 9, 2015

closing pls reopen if/when updated

@jreback jreback closed this May 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants