BUG: Creating Index with the `names` argument #19168

spacesphere · 2018-01-10T10:11:51Z

closes BUG: Creating Index name using names names argument, doesn't set index name #19082
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff

jreback · 2018-01-10T11:51:17Z

pandas/core/indexes/base.py

        Name to be stored in the index
    tupleize_cols : bool (default: True)
        When True, attempt to create a MultiIndex if possible
+    names : sequence of objects, optional


move after name

add a Note that these are mutually exclusive

jreback · 2018-01-10T11:53:03Z

pandas/core/indexes/base.py

+                name = data.name
+            # extract `name` from `names` in case MultiIndex cannot be created
+            elif names:
+                name = names[0]


add an

elif names is not None: if name is not None: raise ValuError(....) name = names

jreback · 2018-01-10T11:54:16Z

pandas/core/indexes/base.py


-        if name is None and hasattr(data, 'name'):
-            name = data.name
+        if name is None:


actually move all of this names stuff to a separate method

def _validate_names(self, name, names): .....

because we need to call this in subclasses

MultiIndex will override this (to allow names as a list), the basic one will raise if name is not a scalar

I can be wrong, but isn't _validate_names used only when we try to create a copy of an object? It doesn't affect initial object creation. And if I add another method specially for these pre-checks it might be a bit confusing.
Perhaps, it'd be better to do some refactoring of existing _validate_names and move copy functionality from it. But it will demand changes in subclasses too.

jreback · 2018-01-10T11:56:23Z

pandas/tests/indexes/test_base.py

@@ -305,6 +305,15 @@ def test_constructor_simple_new(self):
        result = idx._simple_new(idx, 'obj')
        tm.assert_index_equal(result, idx)

+    def test_constructor_names(self):
+        idx = Index([1, 2, 3], name='a')


this needs much more testing. but this is tricky here.

codecov · 2018-01-11T06:34:30Z

Codecov Report

Merging #19168 into master will decrease coverage by 0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #19168      +/-   ##
==========================================
- Coverage   91.56%   91.53%   -0.04%     
==========================================
  Files         148      147       -1     
  Lines       48856    48834      -22     
==========================================
- Hits        44733    44698      -35     
- Misses       4123     4136      +13

Flag	Coverage Δ
#multiple	`89.9% <100%> (-0.04%)`	⬇️
#single	`41.69% <70%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/base.py	`96.47% <100%> (+0.01%)`	⬆️
pandas/core/accessor.py	`93.75% <0%> (-4.96%)`	⬇️
pandas/plotting/_converter.py	`65.22% <0%> (-1.74%)`	⬇️
pandas/core/indexes/accessors.py	`89.36% <0%> (-0.64%)`	⬇️
pandas/io/json/table_schema.py	`98.19% <0%> (-0.1%)`	⬇️
pandas/core/series.py	`94.61% <0%> (ø)`	⬆️
pandas/errors/__init__.py	`100% <0%> (ø)`	⬆️
pandas/core/frame.py	`97.62% <0%> (ø)`	⬆️
pandas/core/categorical.py	`95.78% <0%> (ø)`	⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ebdc50...071c819. Read the comment docs.

toobaz · 2018-01-11T07:08:45Z

pandas/core/indexes/base.py

+                name = data.name
+            # extract `name` from `names` in case MultiIndex cannot be created
+            elif names:
+                name = names[0] if is_list_like(names) else names


I don't think we really want this else names. Supporting names mainly makes sense for compatibility with the multi-level case, where it must be iterable anyway.

... moreover, it would be nice to check that if we don't create a MultiIndex and names is set, len(names) == 1.

Well, I totally agree with the first one.
But as for the last comment: I don't quite understand how we can determine in advance that we don't create a MultiIndex as the constructor tries to create it nearly at the end and tupleize_cols is always True by default. But for the use-case from the issue we need it at the beginning.

Or maybe it'd be better to get rid of names handling in __new__ and use set_names (that already does all the checks) after an object has been created.

toobaz · 2018-01-11T07:10:21Z

pandas/core/indexes/base.py

        Name to be stored in the index
+    names : sequence of objects, optional
+        Names for the index levels used when attempt to create a MultiIndex


I don't think this is correct - in this version of the PR, it can be used even when a MultiIndex is not created.

@toobaz I can see what you mean, thanks. Personally, I think it's worth clarifying that the purpose of names is for creation a MultiIndex at the end of the day.
What do you think is the best way to describe this parameter then?

toobaz · 2018-01-11T22:54:38Z

pandas/core/indexes/base.py

@@ -360,7 +375,7 @@ def __new__(cls, data=None, dtype=None, copy=False, name=None,
                if all(isinstance(e, tuple) for e in data):
                    from .multi import MultiIndex
                    return MultiIndex.from_tuples(
-                        data, names=name or kwargs.get('names'))
+                        data, names=names or kwargs.get('names') or name)


kwargs.get('names') shouldn't be needed any more, right? Moreover, I understand name was accepted as a substitute for names (as in name=['level1', 'level2'])... but once we correctly support names thanks to this PR, we shouldn't any more allow this, I think, and this could just be names=names.

(You are right that instead testing the length of names to be 1 is really messy)

@toobaz, thanks a lot for a note, you're right :)
One more question then: should we remove it from MultiIndex too and not accept name as a parameter for MultiIndex?

pandas/pandas/core/indexes/multi.py

Lines 139 to 140 in 78c3ff9

if name is not None:

names = name

It looks like a bit breaking change.

toobaz · 2018-01-12T08:38:46Z

One more question then: should we remove it from MultiIndex too and not accept name as a parameter for MultiIndex?

Oh right. I hate that line. But you are right that removing it could break code. Maybe we should just leave the list-like behavior of names (undo my suggestion), and then open a new issue for its deprecation (both in Index and in MultiIndex).

jreback · 2018-01-12T11:37:27Z

pandas/core/indexes/base.py

+            raise TypeError("Can only provide one of `names` and `name`")
+
+        if names is not None and not is_list_like(names):
+            raise TypeError("`names` must be iterable.")


don't need quotes on arg names

jreback · 2018-01-12T11:37:35Z

pandas/core/indexes/base.py

+                **kwargs):
+
+        if names is not None and name is not None:
+            raise TypeError("Can only provide one of `names` and `name`")


jreback · 2018-01-12T11:37:45Z

pandas/core/indexes/base.py

-                fastpath=False, tupleize_cols=True, **kwargs):
+                fastpath=False, tupleize_cols=True, names=None,
+                **kwargs):
+


add a comment that checking name / names

@jreback, Sorry, I didn't get it: a comment for what? Could you, maybe, rephrase your request?

you didn't answer my comments. pls add some comments in the code for this section.

jreback · 2018-01-12T11:39:09Z

pandas/tests/indexes/test_base.py

+    def test_constructor_names(self):
+        idx = Index([1, 2, 3], name='a')
+        assert idx.name == 'a'
+        assert idx.names == ('a',)


need testing for each index type & multiindex. need to hit each of those error conditions (e.g. name and/or names not None). we may already have some of these tests but should consolidate.

jreback

is this orthogonal to #19246 ?

jreback · 2018-01-16T00:54:40Z

pandas/tests/indexes/test_base.py

@@ -305,6 +305,30 @@ def test_constructor_simple_new(self):
        result = idx._simple_new(idx, 'obj')
        tm.assert_index_equal(result, idx)

+    def test_constructor_names(self):
+        # test both `name` and `names` provided
+        with pytest.raises(TypeError):


can you have these first two assertions in a separate test that uses pandas.util.testing.all_index_generator, and passes the name to exhaustively check all the index types (do this in a parameterization)

Please, correct me if I'm wrong, but don't you think it's a bit redundant? I mean I can't use name or names for all_index_generator to check constructors of different index types. All I can do is to create a new Index with each of the generated indexes passing names and/or name. In this case it doesn't really matter what type of index is going to be created as all the checks are made at the very beginning of Index constructor (before objects creation) and they should guarantee that name and names are valid for any case.

how is this redundant? we need to generally test passing name & names to each of the individual constructors. I suspect this actually does break some of the subclasses. its simple to modify all_index_generator to pass thru kwargs

None of the Index subclasses (apart from MultiIndex) sets names for their objects. They don't even consider this argument. It shouldn't break anything. The only purpose for names is to create a MultiIndex instance (either via Index or MultiIndex constuctor).

@toobaz , could you, please, add any remarks about the tests too?

My remarks here are not worth much as I'm not a core dev (I can only share your frustration :-) )

Theoretically, it could be that some day somebody modified some Index subclass so that it misbehaves on names. Your tests would capture that regression. Yeah, I know, that's paranoia - but also the quickest way to get your PR accepted.

This said: if we end up deprecating names (as I think we should, but I don't think there ever was any discussion, so not sure how it would end up), then all of this is makes even less sense, so you might try to open that discussion now.

It's not just paranoia, I think, it's something that can just mess things up. We're talking about error cases, but Index subclasses don't consider names at all, so they won't raise any errors when passing wrong names arguments as it's expected for Index and MultiIndex. So, to make these tests work correctly, I'll have to modify all the subclasses too (I don't know how to handle this case otherwise).
The more I think about it, the more I dislike the way things are going :\

OK, I think it's time for #19295 ;-)

@PoppyBagel here's the big pictures. We want to consider the consistency guarantees that pandas offers. We have a loose guaranteee that using the Index constructor or its sub-classes have the same signature, IM is an exception to this. But for example name is accepted.

We want to lockdown this guaranteee via tests in one place. It is somewhat tested in each subclass, but not centrally. Further this locks down the behavior of future subclasses, and prevent inadvertant adding it back. More to the point, I am considering removing names entirely (via deprecation). We need to check for error messages for this.

This is a little bit overkill, to check that we are not accepting an argument, but we have lots of code and inspection can easily fail.

I don't want you to add the names args anywhere, If its only accepted in MI and Index, then we can next consider the removal implications.

jreback · 2018-01-16T00:55:05Z

pandas/tests/indexes/test_base.py

+
+        # test using `names` for a `MultiIndex`
+        idx = Index([('A', 1), ('A', 2)], names=('a', 'b'))
+        assert idx.name is None


construct a MI and use tm.assert_index_equal

spacesphere · 2018-01-16T07:07:19Z

@jreback In general this PR and #19246 are independent. But cleaning validation can make this one better too. So, I think #19246 is prior.

jreback · 2018-01-16T11:33:34Z

pandas/tests/indexes/test_base.py

@@ -305,6 +305,30 @@ def test_constructor_simple_new(self):
        result = idx._simple_new(idx, 'obj')
        tm.assert_index_equal(result, idx)

+    def test_constructor_names(self):
+        # test both `name` and `names` provided
+        with pytest.raises(TypeError):


how is this redundant? we need to generally test passing name & names to each of the individual constructors. I suspect this actually does break some of the subclasses. its simple to modify all_index_generator to pass thru kwargs

jreback · 2018-01-16T11:34:02Z

pandas/core/indexes/base.py

-                fastpath=False, tupleize_cols=True, **kwargs):
+                fastpath=False, tupleize_cols=True, names=None,
+                **kwargs):
+


you didn't answer my comments. pls add some comments in the code for this section.

toobaz · 2018-01-16T18:16:13Z

pandas/tests/indexes/test_base.py

+        idx = Index([1, 2, 3], names=('a',))
+        assert idx.name == 'a'
+        assert idx.names == ('a',)
+


I would personally replace the two blocks with a loop (over kwargs, or over idx itself), but it's a matter of taste. You could also include the case name=('a',), which should still work.

toobaz · 2018-01-20T08:22:59Z

OK, so in light of #19295 (comment) I guess here we want pd.Index(names=) to 1) raise an error if resulting in a non-multi index, 2) raise a DeprecationWarning if resulting in a MultiIndex.

We also want (in other PRs) 1) all non-multi subclasses to raise an error if passed names, 2) pd.MultiIndex to raise a DeprecationWarning if passed names.

jreback · 2018-01-20T10:26:21Z

no do not change this PR for deprecation
that is orthogonal

toobaz · 2018-01-20T10:31:29Z

no do not change this PR for deprecation
that is orthogonal

How that? This PR is enabling a feature which was never supported and which we want to deprecate in all cases in which it is supported.

jreback · 2018-01-21T15:34:17Z

@toobaz well if you want to do a deprecation PR first then we can close this one.

jreback · 2018-02-24T17:30:42Z

OK, so in light of #19295 (comment) I guess here we want pd.Index(names=) to 1) raise an error if resulting in a non-multi index, 2) raise a DeprecationWarning if resulting in a MultiIndex.

We also want (in other PRs) 1) all non-multi subclasses to raise an error if passed names, 2) pd.MultiIndex to raise a DeprecationWarning if passed names.

let's do this, so closing this PR. @PoppyBagel if you'd like to submit one to do this would be great.

jreback requested changes Jan 10, 2018

View reviewed changes

spacesphere force-pushed the fix-index-names branch from 1b03087 to 91dcad8 Compare January 11, 2018 06:33

toobaz reviewed Jan 11, 2018

View reviewed changes

spacesphere force-pushed the fix-index-names branch from 91dcad8 to d699f88 Compare January 11, 2018 09:52

toobaz reviewed Jan 11, 2018

View reviewed changes

jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Jan 12, 2018

spacesphere force-pushed the fix-index-names branch from d699f88 to bc9fbb1 Compare January 12, 2018 07:17

jreback requested changes Jan 12, 2018

View reviewed changes

spacesphere force-pushed the fix-index-names branch 4 times, most recently from 20a0e71 to 92deb43 Compare January 15, 2018 12:49

spacesphere mentioned this pull request Jan 15, 2018

CLN: Refactor Index._validate_names() #19246

Closed

3 tasks

spacesphere force-pushed the fix-index-names branch from 92deb43 to 2f3025f Compare January 15, 2018 14:09

jreback requested changes Jan 16, 2018

View reviewed changes

jreback added the Bug label Jan 16, 2018

spacesphere force-pushed the fix-index-names branch from 2f3025f to 9e44f55 Compare January 16, 2018 08:49

jreback requested changes Jan 16, 2018

View reviewed changes

BUG: Creating Index with the names argument

071c819

spacesphere force-pushed the fix-index-names branch from 9e44f55 to 071c819 Compare January 16, 2018 13:51

toobaz reviewed Jan 16, 2018

View reviewed changes

toobaz mentioned this pull request Jan 22, 2018

Index subclasses do not check parameters #19348

Closed

jreback closed this Feb 24, 2018

arminv mentioned this pull request Apr 10, 2018

ERR: disallow non-hashables in Index/MultiIndex construction & rename #20548

Merged

4 tasks

BUG: Creating Index with the names argument #19168

BUG: Creating Index with the names argument #19168

Conversation

spacesphere commented Jan 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spacesphere Jan 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 11, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spacesphere Jan 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spacesphere Jan 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spacesphere Jan 12, 2018 • edited Loading

Choose a reason for hiding this comment

toobaz commented Jan 12, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spacesphere Jan 12, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spacesphere Jan 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback Jan 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spacesphere commented Jan 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toobaz commented Jan 20, 2018

jreback commented Jan 20, 2018

toobaz commented Jan 20, 2018 • edited Loading

jreback commented Jan 21, 2018

jreback commented Feb 24, 2018

BUG: Creating Index with the `names` argument #19168

BUG: Creating Index with the `names` argument #19168

spacesphere Jan 10, 2018 •

edited

Loading

codecov bot commented Jan 11, 2018 •

edited

Loading

spacesphere Jan 11, 2018 •

edited

Loading

spacesphere Jan 11, 2018 •

edited

Loading

spacesphere Jan 12, 2018 •

edited

Loading

spacesphere Jan 12, 2018 •

edited

Loading

spacesphere Jan 18, 2018 •

edited

Loading

jreback Jan 18, 2018 •

edited

Loading

toobaz commented Jan 20, 2018 •

edited

Loading