-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Creating Index with the names
argument
#19168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/indexes/base.py
Outdated
Name to be stored in the index | ||
tupleize_cols : bool (default: True) | ||
When True, attempt to create a MultiIndex if possible | ||
names : sequence of objects, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move after name
add a Note that these are mutually exclusive
name = data.name | ||
# extract `name` from `names` in case MultiIndex cannot be created | ||
elif names: | ||
name = names[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an
elif names is not None:
if name is not None:
raise ValuError(....)
name = names
pandas/core/indexes/base.py
Outdated
|
||
if name is None and hasattr(data, 'name'): | ||
name = data.name | ||
if name is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually move all of this names stuff to a separate method
def _validate_names(self, name, names):
.....
because we need to call this in subclasses
MultiIndex will override this (to allow names as a list), the basic one will raise if name
is not a scalar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can be wrong, but isn't _validate_names
used only when we try to create a copy of an object? It doesn't affect initial object creation. And if I add another method specially for these pre-checks it might be a bit confusing.
Perhaps, it'd be better to do some refactoring of existing _validate_names
and move copy functionality from it. But it will demand changes in subclasses too.
pandas/tests/indexes/test_base.py
Outdated
@@ -305,6 +305,15 @@ def test_constructor_simple_new(self): | |||
result = idx._simple_new(idx, 'obj') | |||
tm.assert_index_equal(result, idx) | |||
|
|||
def test_constructor_names(self): | |||
idx = Index([1, 2, 3], name='a') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs much more testing. but this is tricky here.
1b03087
to
91dcad8
Compare
Codecov Report
@@ Coverage Diff @@
## master #19168 +/- ##
==========================================
- Coverage 91.56% 91.53% -0.04%
==========================================
Files 148 147 -1
Lines 48856 48834 -22
==========================================
- Hits 44733 44698 -35
- Misses 4123 4136 +13
Continue to review full report at Codecov.
|
pandas/core/indexes/base.py
Outdated
name = data.name | ||
# extract `name` from `names` in case MultiIndex cannot be created | ||
elif names: | ||
name = names[0] if is_list_like(names) else names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we really want this else names
. Supporting names
mainly makes sense for compatibility with the multi-level case, where it must be iterable anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... moreover, it would be nice to check that if we don't create a MultiIndex
and names
is set, len(names) == 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I totally agree with the first one.
But as for the last comment: I don't quite understand how we can determine in advance that we don't create a MultiIndex
as the constructor tries to create it nearly at the end and tupleize_cols
is always True by default. But for the use-case from the issue we need it at the beginning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe it'd be better to get rid of names
handling in __new__
and use set_names
(that already does all the checks) after an object has been created.
pandas/core/indexes/base.py
Outdated
Name to be stored in the index | ||
names : sequence of objects, optional | ||
Names for the index levels used when attempt to create a MultiIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct - in this version of the PR, it can be used even when a MultiIndex
is not created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz I can see what you mean, thanks. Personally, I think it's worth clarifying that the purpose of names
is for creation a MultiIndex
at the end of the day.
What do you think is the best way to describe this parameter then?
91dcad8
to
d699f88
Compare
pandas/core/indexes/base.py
Outdated
@@ -360,7 +375,7 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None, | |||
if all(isinstance(e, tuple) for e in data): | |||
from .multi import MultiIndex | |||
return MultiIndex.from_tuples( | |||
data, names=name or kwargs.get('names')) | |||
data, names=names or kwargs.get('names') or name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kwargs.get('names')
shouldn't be needed any more, right? Moreover, I understand name
was accepted as a substitute for names
(as in name=['level1', 'level2']
)... but once we correctly support names
thanks to this PR, we shouldn't any more allow this, I think, and this could just be names=names
.
(You are right that instead testing the length of names to be 1 is really messy)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz, thanks a lot for a note, you're right :)
One more question then: should we remove it from MultiIndex
too and not accept name
as a parameter for MultiIndex
?
pandas/pandas/core/indexes/multi.py
Lines 139 to 140 in 78c3ff9
if name is not None: | |
names = name |
It looks like a bit breaking change.
d699f88
to
bc9fbb1
Compare
Oh right. I hate that line. But you are right that removing it could break code. Maybe we should just leave the list-like behavior of |
pandas/core/indexes/base.py
Outdated
raise TypeError("Can only provide one of `names` and `name`") | ||
|
||
if names is not None and not is_list_like(names): | ||
raise TypeError("`names` must be iterable.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need quotes on arg names
pandas/core/indexes/base.py
Outdated
**kwargs): | ||
|
||
if names is not None and name is not None: | ||
raise TypeError("Can only provide one of `names` and `name`") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no quotes
fastpath=False, tupleize_cols=True, **kwargs): | ||
fastpath=False, tupleize_cols=True, names=None, | ||
**kwargs): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment that checking name / names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback, Sorry, I didn't get it: a comment for what? Could you, maybe, rephrase your request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you didn't answer my comments. pls add some comments in the code for this section.
pandas/tests/indexes/test_base.py
Outdated
def test_constructor_names(self): | ||
idx = Index([1, 2, 3], name='a') | ||
assert idx.name == 'a' | ||
assert idx.names == ('a',) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need testing for each index type & multiindex. need to hit each of those error conditions (e.g. name and/or names not None). we may already have some of these tests but should consolidate.
20a0e71
to
92deb43
Compare
92deb43
to
2f3025f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this orthogonal to #19246 ?
@@ -305,6 +305,30 @@ def test_constructor_simple_new(self): | |||
result = idx._simple_new(idx, 'obj') | |||
tm.assert_index_equal(result, idx) | |||
|
|||
def test_constructor_names(self): | |||
# test both `name` and `names` provided | |||
with pytest.raises(TypeError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you have these first two assertions in a separate test that uses pandas.util.testing.all_index_generator
, and passes the name
to exhaustively check all the index types (do this in a parameterization)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, correct me if I'm wrong, but don't you think it's a bit redundant? I mean I can't use name
or names
for all_index_generator
to check constructors of different index types. All I can do is to create a new Index
with each of the generated indexes passing names
and/or name
. In this case it doesn't really matter what type of index is going to be created as all the checks are made at the very beginning of Index
constructor (before objects creation) and they should guarantee that name
and names
are valid for any case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this redundant? we need to generally test passing name & names to each of the individual constructors. I suspect this actually does break some of the subclasses. its simple to modify all_index_generator
to pass thru kwargs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of the Index
subclasses (apart from MultiIndex
) sets names
for their objects. They don't even consider this argument. It shouldn't break anything. The only purpose for names
is to create a MultiIndex
instance (either via Index
or MultiIndex
constuctor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz , could you, please, add any remarks about the tests too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My remarks here are not worth much as I'm not a core dev (I can only share your frustration :-) )
Theoretically, it could be that some day somebody modified some Index
subclass so that it misbehaves on names
. Your tests would capture that regression. Yeah, I know, that's paranoia - but also the quickest way to get your PR accepted.
This said: if we end up deprecating names
(as I think we should, but I don't think there ever was any discussion, so not sure how it would end up), then all of this is makes even less sense, so you might try to open that discussion now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just paranoia, I think, it's something that can just mess things up. We're talking about error cases, but Index
subclasses don't consider names
at all, so they won't raise any errors when passing wrong names arguments as it's expected for Index
and MultiIndex
. So, to make these tests work correctly, I'll have to modify all the subclasses too (I don't know how to handle this case otherwise).
The more I think about it, the more I dislike the way things are going :\
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think it's time for #19295 ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PoppyBagel here's the big pictures. We want to consider the consistency guarantees that pandas offers. We have a loose guaranteee that using the Index
constructor or its sub-classes have the same signature, IM is an exception to this. But for example name
is accepted.
We want to lockdown this guaranteee via tests in one place. It is somewhat tested in each subclass, but not centrally. Further this locks down the behavior of future subclasses, and prevent inadvertant adding it back. More to the point, I am considering removing names
entirely (via deprecation). We need to check for error messages for this.
This is a little bit overkill, to check that we are not accepting an argument, but we have lots of code and inspection can easily fail.
I don't want you to add the names
args anywhere, If its only accepted in MI and Index, then we can next consider the removal implications.
pandas/tests/indexes/test_base.py
Outdated
|
||
# test using `names` for a `MultiIndex` | ||
idx = Index([('A', 1), ('A', 2)], names=('a', 'b')) | ||
assert idx.name is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
construct a MI and use tm.assert_index_equal
2f3025f
to
9e44f55
Compare
@@ -305,6 +305,30 @@ def test_constructor_simple_new(self): | |||
result = idx._simple_new(idx, 'obj') | |||
tm.assert_index_equal(result, idx) | |||
|
|||
def test_constructor_names(self): | |||
# test both `name` and `names` provided | |||
with pytest.raises(TypeError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this redundant? we need to generally test passing name & names to each of the individual constructors. I suspect this actually does break some of the subclasses. its simple to modify all_index_generator
to pass thru kwargs
fastpath=False, tupleize_cols=True, **kwargs): | ||
fastpath=False, tupleize_cols=True, names=None, | ||
**kwargs): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you didn't answer my comments. pls add some comments in the code for this section.
9e44f55
to
071c819
Compare
idx = Index([1, 2, 3], names=('a',)) | ||
assert idx.name == 'a' | ||
assert idx.names == ('a',) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would personally replace the two blocks with a loop (over kwargs
, or over idx
itself), but it's a matter of taste. You could also include the case name=('a',)
, which should still work.
OK, so in light of #19295 (comment) I guess here we want We also want (in other PRs) 1) all non-multi subclasses to raise an error if passed |
no do not change this PR for deprecation |
How that? This PR is enabling a feature which was never supported and which we want to deprecate in all cases in which it is supported. |
@toobaz well if you want to do a deprecation PR first then we can close this one. |
let's do this, so closing this PR. @PoppyBagel if you'd like to submit one to do this would be great. |
names
names argument, doesn't set index name #19082git diff upstream/master -u -- "*.py" | flake8 --diff