Skip to content

BUG: loc/ix indexers with empty list key lose names of respective axes #6552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
immerrr opened this issue Mar 5, 2014 · 13 comments · Fixed by #8460
Closed

BUG: loc/ix indexers with empty list key lose names of respective axes #6552

immerrr opened this issue Mar 5, 2014 · 13 comments · Fixed by #8460
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation
Milestone

Comments

@immerrr
Copy link
Contributor

immerrr commented Mar 5, 2014

This issue was encountered when fixing #6536. It seems that .loc indexer with empty list key seems to lose names of respective axes:

In [2]: pd.DataFrame(np.arange(9).reshape(3,3), columns=pd.Series(list('abc'), name='foobar'))
Out[2]: 
foobar  a  b  c
0       0  1  2
1       3  4  5
2       6  7  8

[3 rows x 3 columns]

In [3]: _2.iloc[:,[]]
Out[3]: 
Empty DataFrame
Columns: []
Index: [0, 1, 2]

[3 rows x 0 columns]

In [4]: _2.iloc[:,[]].columns.names
Out[4]: FrozenList([u'foobar'])

In [5]: _2.loc[:,[]].columns.names
Out[5]: FrozenList([None])

It's not important, but currently #6551 has two tests skipped because of this bug.

@jreback jreback added this to the 0.14.0 milestone Mar 5, 2014
immerrr referenced this issue Mar 7, 2014
…merrr/pandas into immerrr-fix-fancy-indexing-empty-list

Conflicts:
	doc/source/release.rst
@jreback jreback modified the milestones: 0.15.0, 0.14.0 Apr 21, 2014
@immerrr
Copy link
Contributor Author

immerrr commented Jul 6, 2014

This happens because Index.reindex discards names:

In [24]: idx = pd.Index(list('abc'), name='foobar')

In [25]: idx.names
Out[25]: FrozenList([u'foobar'])

In [27]: idx.reindex([])[0].names
Out[27]: FrozenList([None])

In [28]: idx.reindex(list('abc'))[0].names
Out[28]: FrozenList([None])

In [29]: idx.reindex(idx)[0].names
Out[29]: FrozenList([u'foobar'])

In [30]: idx.reindex(pd.Index([], name='xyzzy'))[0].names
Out[30]: FrozenList([u'xyzzy'])

I think it's more appropriate to preserve names unless the target iterable is an Index itself and in that case its names should be used.

On an unrelated topic, the same logic may be applied to other metadata, too, like timezone.

@jreback
Copy link
Contributor

jreback commented Sep 9, 2014

@immerrr doable for 0.15.0?

@immerrr
Copy link
Contributor Author

immerrr commented Sep 10, 2014

Yes, it is.

@immerrr
Copy link
Contributor Author

immerrr commented Sep 18, 2014

On second thought, let's reiterate on that part of my message:

preserve names unless the target iterable is an Index itself and in that case its names should be used.

If the user has a proper Index with name and whatnot to be used instead of the old one, they should just assign the location to the new value, not doing reindexing. If they reindex, they probably only care about elements and their order to be changed, so the metadata should probably stay.

Also, there may be element-related metadata (e.g. tz and freq) as opposed to index-related metadata (e.g. names) and probably only the last is worth preserving.

@jreback what do you think?

@jreback
Copy link
Contributor

jreback commented Sep 18, 2014

its not hard to preserve the meta data by default (and i think it should happen)

attribs = self._get_attributes_dict()
attribs['freq'] = None
return self._simple_new(result, **attribs)

in this case null out the freq (which I think is right). addtl args are ignored in the constructor so this works for all index types

@immerrr
Copy link
Contributor Author

immerrr commented Sep 18, 2014

Ok, and what about

idx = pd.date_range(..., tz='UTC')
idx2 = idx.reindex(pd.date_range(..., tz='US/Pacific')

What about idx2.tz? I think timestamps should be accompanied by their respective tz no matter where they go, otherwise I'd expect hitting those nasty conversion and naive/aware issues.

@jreback
Copy link
Contributor

jreback commented Sep 18, 2014

I think you may need to check for that
I would raise if the tz are different

though prob should allow a naive tz to reindex a tz-aware as if they were the same

@jreback
Copy link
Contributor

jreback commented Sep 18, 2014

though most of the combination ops coerce to utc before doing anything (intersection / union)

so prob should raise is the tz are different (and not utc / naive)

@jreback
Copy link
Contributor

jreback commented Sep 23, 2014

@immerrr pr for this?

@immerrr
Copy link
Contributor Author

immerrr commented Sep 23, 2014

Will do this sometime week. I will only preserve name as there's some debate in tz-related ticket and freq should be re-inferred anyway.

@jreback
Copy link
Contributor

jreback commented Sep 23, 2014

@immerrr thanks, yep tz-compat is in that other issue

@immerrr
Copy link
Contributor Author

immerrr commented Oct 4, 2014

@jreback, just to be sure, we're changing Index.reindex(indexer) to preserve name if indexer is an Index, but we maintain old behaviour for NDFrames, where frame.reindex(indexer) changes name if indexer is an Index, right?

@immerrr
Copy link
Contributor Author

immerrr commented Oct 4, 2014

Now, that I wrote this down it looks silly to add this inconsistency. OTOH, changing NDFrame.reindex behaviour right before a release doesn't feel right either. I'm going to implement name preservation for cases when indexer is not an Index as originally intended and will create a discussion issue afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants