Skip to content

API: fancier fancy indexing to enable coordinate lookup? #7522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
immerrr opened this issue Jun 20, 2014 · 6 comments
Closed

API: fancier fancy indexing to enable coordinate lookup? #7522

immerrr opened this issue Jun 20, 2014 · 6 comments

Comments

@immerrr
Copy link
Contributor

immerrr commented Jun 20, 2014

While wading through setitem-related code I've come across a reference to a GH issue (?? — will fill this in later, the browser history is on a different computer), where a poster complained about weird numpy fancy indexing:

In [11]: arr = np.arange(27).reshape(3,3,3)

In [12]: arr
Out[12]: 
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

In [13]: arr[[0,1],:,[0,1]]
Out[13]: 
array([[ 0,  3,  6],
       [10, 13, 16]])

which he'd apparently expect to be rather:

In [14]: arr[[[0,1]],:,[[0],[1]]]
Out[14]: 
array([[[ 0,  3,  6],
        [ 9, 12, 15]],

       [[ 1,  4,  7],
        [10, 13, 16]]])

or arguably in a more readable way (hint: ix and ix.T are broadcasted against each other to become 2x2 arrays):

In [25]: ix = np.array([[0,1]])

In [26]: arr[ix, :, ix.T]
Out[26]: 
array([[[ 0,  3,  6],
        [ 9, 12, 15]],

       [[ 1,  4,  7],
        [10, 13, 16]]])

Trying to learn what's going on actually made me grind through the arrays.indexing.advanced section of numpy docs and it seems that dimensionality tricks offered by numpy are quite neat and the "take-ish" single-dim arr[[1,2,3]] is only the tip of the iceberg.

Now, to pandas.

The said issue resulted in pandas containers diverging from numpy and implementing the automatically-broadcasting version, i.e. df[[0,1], [0,1]].shape == (2, 2) which is a huge convenience. And while almost all other dimensionality-changing tricks offered by numpy-fancy indexing seem out of scope because label management goes through the window, there's one use-case that fits pandas perfectly, namely folding several dimensions into one (ironically, this was the exact type of behaviour that was complained about in GH-forgot-the-number):

# this is how it's done in numpy
In [39]: arr = np.arange(25).reshape(5,5)

In [40]: arr
Out[40]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

In [41]: arr[[0,1,2], [0,1,2]]
Out[41]: array([ 0,  6, 12])

# and this is the analogous operation in pandas
In [42]: df = pd.DataFrame(arr)

In [43]: df
Out[43]: 
    0   1   2   3   4
0   0   1   2   3   4
1   5   6   7   8   9
2  10  11  12  13  14
3  15  16  17  18  19
4  20  21  22  23  24

In [44]: df.unstack()[[(0,0), (1,1), (2,2)]]
Out[44]: 
0  0     0
1  1     6
2  2    12
dtype: int64

And while the diagonal-getitem is rather easy to implement as shown above, I can't see how do I do the same for setitem besides "unstack->setitem->stack" (it should be the other way round: "stack->setitem->unstack", but hopefully you get the point).

Is there a way to do that? If not, is it possible to fit it into pandas API somehow or that ship has already sailed?

@jreback
Copy link
Contributor

jreback commented Jun 20, 2014

#7138 is the issue it can be done but only if single dtyped
the problem is how to specify it to the indexers that u want coordinates

@immerrr
Copy link
Contributor Author

immerrr commented Jun 20, 2014

I was rather talking about #3777, but yeah.

@immerrr
Copy link
Contributor Author

immerrr commented Jun 20, 2014

Also, I don't see why the mono-type restriction: the only thing you need to ensure is that the item axis of the original object ends up as the item axis of the lower-dim object, e.g. for a panel with axes=[ [a,b,c], [x,y,z], [0,1,2] ]:

  • if item axis doesn't participate in fancy indexing:
panel.fancy[:, [x, y], [0, 1]] -> df, where index=[(x, 0), (y, 1)] and columns=[a, b, c]
  • if item axis does participate in fancy indexing:
panel.fancy[[a,b], :, [0, 1]] -> df, where index=[x, y, z] and columns=[(a, 0), (b, 1)]

Of course, if the result is a series, all elements will be upcasted, but then again, same thing happens when you cross-sect a dataframe along the item axis and no one is forbidding that.

@jreback
Copy link
Contributor

jreback commented Jun 20, 2014

I meant to do setting the. u can use the unstack trick to set
but the problem remains how to specify that u want coordinate indexing
I don't think another indexer is the answer

@immerrr immerrr changed the title API: fancier fancy indexing to enable diagonal-like setitem? API: fancier fancy indexing to enable coordinate lookup? Jun 20, 2014
@jreback
Copy link
Contributor

jreback commented Jul 7, 2014

cross refed to that other issue. going to close this one. can discuss syntax on #7138 (unless you think this is distinct)

@jreback jreback closed this as completed Jul 7, 2014
@immerrr
Copy link
Contributor Author

immerrr commented Jul 10, 2014

Sure, let's continue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants