Skip to content

Index API proposal: unified axis label lookup #7651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
immerrr opened this issue Jul 2, 2014 · 6 comments
Closed

Index API proposal: unified axis label lookup #7651

immerrr opened this issue Jul 2, 2014 · 6 comments
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@immerrr
Copy link
Contributor

immerrr commented Jul 2, 2014

As the next step of separation-of-concerns plan (#6744) I'd like to
propose adding a method (or several, actually) to Index class that
would encapsulate the details of foo.loc[l1,l2,...] lookup.

Implementation Idea

Roughly, the idea is to make loc's getitem as simple as

def __getitem__(self, indexer):
    axes = self.obj.axes
    return self.obj.iloc[axes[0].lookup_labels_nd(indexer, axes[1:], typ='loc')]

Not quite, but hopefully you get the point. The default lookup_labels_nd implementation would then look something like this:

def lookup(self, indexer, other_axes, typ=None):
    if not isinstance(indexer, tuple):
        return self.lookup_labels(indexer, typ=typ)
    else:
        # ndim mismatch error handling is omitted intentionally
        return (self.lookup_labels(indexer[0]),) + \
               tuple(ax.lookup_labels(ix, typ=typ)
                     for ax, ix in zip(other_axes, indexer))

The result should be an object that could be fed to an underlying
BlockManager to perform the requested operation. To support adding
new rows with "setitem", it is only needed to agree that lookup_labels_nd will
never return negative indices unless they reference newly appended
items along that axis.

This would allow to hide Index-subclass-specific lookup peculiarities
in their respective overrides of lookup_labels_nd and lookup_labels (proposals for
better names are welcome), e.g.:

  • looking up str in DatetimeIndex/PeriodIndex
  • looking up int in FloatIndex
  • looking up per-level slices in MultiIndex

Benefits

  • no more confusing errors due to try .. catch block carpet-catching a
    logic error, because corner cases will be handled precisely where
    they are needed and nowhere else
  • no more relying on isinstance checks and exceptions to seek for
    alternative lookup scenarios, meaning more performance
  • the API will provide a contract that is simple to grasp, test, benchmark and,
    eventually, cythonize (as a side effect of this point I'd like to try putting
    up a wiki page with indexing API reference)
@jreback jreback modified the milestones: 0.14.1, 0.15.0 Jul 2, 2014
@immerrr
Copy link
Contributor Author

immerrr commented Jul 21, 2014

I was trying to mock up a proof of concept (while at ep2014) and decided to start with simple things, at and iat, but found something weird:

In [19]: s = pd.Series([1,2,3], index=list('abc'))

In [20]: s.at['a']
Out[20]: 1

In [21]: s.at['b']
Out[21]: 2

In [22]: s.at['c']
Out[22]: 3

In [23]: s.at[0]
Out[23]: 1

I couldn't find any reference to this feature in pandas doc, and I have a strong feeling it shouldn't be there. @jrebac, what do you think, do I have to maintain it?

@jreback
Copy link
Contributor

jreback commented Jul 21, 2014

I think that is a bug, at should be a restricted version of loc, e.g. (no slicing or lists), and certainly no fallback. Prob just not checking that the label type is a valid type for the index. pls do a PR to test/fix this if you would. thanks

@jreback
Copy link
Contributor

jreback commented Jul 21, 2014

#7814

@shoyer
Copy link
Member

shoyer commented Feb 11, 2015

Just stumbled across this -- it gets a bit +1 from me!

@jreback
Copy link
Contributor

jreback commented Jul 6, 2018

cc @toobaz

closing as stale, but certainly would take a PR (or more) to simplify internal indexing :>

@jreback jreback closed this as completed Jul 6, 2018
@jreback jreback modified the milestones: 2.0, No action Jul 6, 2018
@toobaz
Copy link
Member

toobaz commented Jul 6, 2018

Just for the records, since I was cced: I'm -1 on this, for two reasons.

The first is conceptual: I don't like Index to have a method that work on multiple indexes. Despite the mess in core/indexing.py, which would certainly benefit from some decoupling, the general idea that indexing must be done at the object level, rather than at the index level, makes sense to me.

The second is practical: there are calls to .loc which cannot be expressed in calls to .iloc, such as when you do partial indexing on a MultiIndex.

By the way, the crucial step to allow real decoupling is to enable BlockManager.set_value for mixed dtypes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

5 participants