Skip to content

BUG: Creating Index with the names argument #19168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

spacesphere
Copy link
Contributor

Name to be stored in the index
tupleize_cols : bool (default: True)
When True, attempt to create a MultiIndex if possible
names : sequence of objects, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move after name

add a Note that these are mutually exclusive

name = data.name
# extract `name` from `names` in case MultiIndex cannot be created
elif names:
name = names[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an

elif names is not None:
     if name is not None:
           raise ValuError(....)
    name = names


if name is None and hasattr(data, 'name'):
name = data.name
if name is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually move all of this names stuff to a separate method

def _validate_names(self, name, names):
    .....

because we need to call this in subclasses

MultiIndex will override this (to allow names as a list), the basic one will raise if name is not a scalar

Copy link
Contributor Author

@spacesphere spacesphere Jan 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can be wrong, but isn't _validate_names used only when we try to create a copy of an object? It doesn't affect initial object creation. And if I add another method specially for these pre-checks it might be a bit confusing.
Perhaps, it'd be better to do some refactoring of existing _validate_names and move copy functionality from it. But it will demand changes in subclasses too.

@@ -305,6 +305,15 @@ def test_constructor_simple_new(self):
result = idx._simple_new(idx, 'obj')
tm.assert_index_equal(result, idx)

def test_constructor_names(self):
idx = Index([1, 2, 3], name='a')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs much more testing. but this is tricky here.

@codecov
Copy link

codecov bot commented Jan 11, 2018

Codecov Report

Merging #19168 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #19168      +/-   ##
==========================================
- Coverage   91.56%   91.53%   -0.04%     
==========================================
  Files         148      147       -1     
  Lines       48856    48834      -22     
==========================================
- Hits        44733    44698      -35     
- Misses       4123     4136      +13
Flag Coverage Δ
#multiple 89.9% <100%> (-0.04%) ⬇️
#single 41.69% <70%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.47% <100%> (+0.01%) ⬆️
pandas/core/accessor.py 93.75% <0%> (-4.96%) ⬇️
pandas/plotting/_converter.py 65.22% <0%> (-1.74%) ⬇️
pandas/core/indexes/accessors.py 89.36% <0%> (-0.64%) ⬇️
pandas/io/json/table_schema.py 98.19% <0%> (-0.1%) ⬇️
pandas/core/series.py 94.61% <0%> (ø) ⬆️
pandas/errors/__init__.py 100% <0%> (ø) ⬆️
pandas/core/frame.py 97.62% <0%> (ø) ⬆️
pandas/core/categorical.py 95.78% <0%> (ø) ⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ebdc50...071c819. Read the comment docs.

name = data.name
# extract `name` from `names` in case MultiIndex cannot be created
elif names:
name = names[0] if is_list_like(names) else names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we really want this else names. Supporting names mainly makes sense for compatibility with the multi-level case, where it must be iterable anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... moreover, it would be nice to check that if we don't create a MultiIndex and names is set, len(names) == 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I totally agree with the first one.
But as for the last comment: I don't quite understand how we can determine in advance that we don't create a MultiIndex as the constructor tries to create it nearly at the end and tupleize_cols is always True by default. But for the use-case from the issue we need it at the beginning.

Copy link
Contributor Author

@spacesphere spacesphere Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it'd be better to get rid of names handling in __new__ and use set_names (that already does all the checks) after an object has been created.

Name to be stored in the index
names : sequence of objects, optional
Names for the index levels used when attempt to create a MultiIndex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct - in this version of the PR, it can be used even when a MultiIndex is not created.

Copy link
Contributor Author

@spacesphere spacesphere Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz I can see what you mean, thanks. Personally, I think it's worth clarifying that the purpose of names is for creation a MultiIndex at the end of the day.
What do you think is the best way to describe this parameter then?

@@ -360,7 +375,7 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None,
if all(isinstance(e, tuple) for e in data):
from .multi import MultiIndex
return MultiIndex.from_tuples(
data, names=name or kwargs.get('names'))
data, names=names or kwargs.get('names') or name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kwargs.get('names') shouldn't be needed any more, right? Moreover, I understand name was accepted as a substitute for names (as in name=['level1', 'level2'])... but once we correctly support names thanks to this PR, we shouldn't any more allow this, I think, and this could just be names=names.

(You are right that instead testing the length of names to be 1 is really messy)

Copy link
Contributor Author

@spacesphere spacesphere Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz, thanks a lot for a note, you're right :)
One more question then: should we remove it from MultiIndex too and not accept name as a parameter for MultiIndex?

if name is not None:
names = name

It looks like a bit breaking change.

@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Jan 12, 2018
@toobaz
Copy link
Member

toobaz commented Jan 12, 2018

One more question then: should we remove it from MultiIndex too and not accept name as a parameter for MultiIndex?

Oh right. I hate that line. But you are right that removing it could break code. Maybe we should just leave the list-like behavior of names (undo my suggestion), and then open a new issue for its deprecation (both in Index and in MultiIndex).

raise TypeError("Can only provide one of `names` and `name`")

if names is not None and not is_list_like(names):
raise TypeError("`names` must be iterable.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need quotes on arg names

**kwargs):

if names is not None and name is not None:
raise TypeError("Can only provide one of `names` and `name`")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no quotes

fastpath=False, tupleize_cols=True, **kwargs):
fastpath=False, tupleize_cols=True, names=None,
**kwargs):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment that checking name / names

Copy link
Contributor Author

@spacesphere spacesphere Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback, Sorry, I didn't get it: a comment for what? Could you, maybe, rephrase your request?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you didn't answer my comments. pls add some comments in the code for this section.

def test_constructor_names(self):
idx = Index([1, 2, 3], name='a')
assert idx.name == 'a'
assert idx.names == ('a',)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need testing for each index type & multiindex. need to hit each of those error conditions (e.g. name and/or names not None). we may already have some of these tests but should consolidate.

@spacesphere spacesphere force-pushed the fix-index-names branch 4 times, most recently from 20a0e71 to 92deb43 Compare January 15, 2018 12:49
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this orthogonal to #19246 ?

@@ -305,6 +305,30 @@ def test_constructor_simple_new(self):
result = idx._simple_new(idx, 'obj')
tm.assert_index_equal(result, idx)

def test_constructor_names(self):
# test both `name` and `names` provided
with pytest.raises(TypeError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you have these first two assertions in a separate test that uses pandas.util.testing.all_index_generator, and passes the name to exhaustively check all the index types (do this in a parameterization)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, correct me if I'm wrong, but don't you think it's a bit redundant? I mean I can't use name or names for all_index_generator to check constructors of different index types. All I can do is to create a new Index with each of the generated indexes passing names and/or name. In this case it doesn't really matter what type of index is going to be created as all the checks are made at the very beginning of Index constructor (before objects creation) and they should guarantee that name and names are valid for any case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this redundant? we need to generally test passing name & names to each of the individual constructors. I suspect this actually does break some of the subclasses. its simple to modify all_index_generator to pass thru kwargs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the Index subclasses (apart from MultiIndex) sets names for their objects. They don't even consider this argument. It shouldn't break anything. The only purpose for names is to create a MultiIndex instance (either via Index or MultiIndex constuctor).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@toobaz , could you, please, add any remarks about the tests too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My remarks here are not worth much as I'm not a core dev (I can only share your frustration :-) )

Theoretically, it could be that some day somebody modified some Index subclass so that it misbehaves on names. Your tests would capture that regression. Yeah, I know, that's paranoia - but also the quickest way to get your PR accepted.

This said: if we end up deprecating names (as I think we should, but I don't think there ever was any discussion, so not sure how it would end up), then all of this is makes even less sense, so you might try to open that discussion now.

Copy link
Contributor Author

@spacesphere spacesphere Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just paranoia, I think, it's something that can just mess things up. We're talking about error cases, but Index subclasses don't consider names at all, so they won't raise any errors when passing wrong names arguments as it's expected for Index and MultiIndex. So, to make these tests work correctly, I'll have to modify all the subclasses too (I don't know how to handle this case otherwise).
The more I think about it, the more I dislike the way things are going :\

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think it's time for #19295 ;-)

Copy link
Contributor

@jreback jreback Jan 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PoppyBagel here's the big pictures. We want to consider the consistency guarantees that pandas offers. We have a loose guaranteee that using the Index constructor or its sub-classes have the same signature, IM is an exception to this. But for example name is accepted.

We want to lockdown this guaranteee via tests in one place. It is somewhat tested in each subclass, but not centrally. Further this locks down the behavior of future subclasses, and prevent inadvertant adding it back. More to the point, I am considering removing names entirely (via deprecation). We need to check for error messages for this.

This is a little bit overkill, to check that we are not accepting an argument, but we have lots of code and inspection can easily fail.

I don't want you to add the names args anywhere, If its only accepted in MI and Index, then we can next consider the removal implications.


# test using `names` for a `MultiIndex`
idx = Index([('A', 1), ('A', 2)], names=('a', 'b'))
assert idx.name is None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct a MI and use tm.assert_index_equal

@jreback jreback added the Bug label Jan 16, 2018
@spacesphere
Copy link
Contributor Author

@jreback In general this PR and #19246 are independent. But cleaning validation can make this one better too. So, I think #19246 is prior.

@@ -305,6 +305,30 @@ def test_constructor_simple_new(self):
result = idx._simple_new(idx, 'obj')
tm.assert_index_equal(result, idx)

def test_constructor_names(self):
# test both `name` and `names` provided
with pytest.raises(TypeError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this redundant? we need to generally test passing name & names to each of the individual constructors. I suspect this actually does break some of the subclasses. its simple to modify all_index_generator to pass thru kwargs

fastpath=False, tupleize_cols=True, **kwargs):
fastpath=False, tupleize_cols=True, names=None,
**kwargs):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you didn't answer my comments. pls add some comments in the code for this section.

idx = Index([1, 2, 3], names=('a',))
assert idx.name == 'a'
assert idx.names == ('a',)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally replace the two blocks with a loop (over kwargs, or over idx itself), but it's a matter of taste. You could also include the case name=('a',), which should still work.

@toobaz
Copy link
Member

toobaz commented Jan 20, 2018

OK, so in light of #19295 (comment) I guess here we want pd.Index(names=) to 1) raise an error if resulting in a non-multi index, 2) raise a DeprecationWarning if resulting in a MultiIndex.

We also want (in other PRs) 1) all non-multi subclasses to raise an error if passed names, 2) pd.MultiIndex to raise a DeprecationWarning if passed names.

@jreback
Copy link
Contributor

jreback commented Jan 20, 2018

no do not change this PR for deprecation
that is orthogonal

@toobaz
Copy link
Member

toobaz commented Jan 20, 2018

no do not change this PR for deprecation
that is orthogonal

How that? This PR is enabling a feature which was never supported and which we want to deprecate in all cases in which it is supported.

@jreback
Copy link
Contributor

jreback commented Jan 21, 2018

@toobaz well if you want to do a deprecation PR first then we can close this one.

@jreback
Copy link
Contributor

jreback commented Feb 24, 2018

OK, so in light of #19295 (comment) I guess here we want pd.Index(names=) to 1) raise an error if resulting in a non-multi index, 2) raise a DeprecationWarning if resulting in a MultiIndex.

We also want (in other PRs) 1) all non-multi subclasses to raise an error if passed names, 2) pd.MultiIndex to raise a DeprecationWarning if passed names.

let's do this, so closing this PR. @PoppyBagel if you'd like to submit one to do this would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Creating Index name using names names argument, doesn't set index name
3 participants