Skip to content

.ix strange bug for float index #780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xdong opened this issue Feb 14, 2012 · 2 comments
Closed

.ix strange bug for float index #780

xdong opened this issue Feb 14, 2012 · 2 comments
Labels
Milestone

Comments

@xdong
Copy link

xdong commented Feb 14, 2012

In [1]: import pandas

In [2]: index = [52195.504153, 52196.303147, 52198.369883]

In [3]: a = pandas.DataFrame(randn(3, 2), index)

In [4]: a
Out[4]:
0 1
52195.504153 1.367681 0.243237
52196.303147 -0.745796 -1.054106
52198.369883 -1.462461 -0.683286

In [5]: a.ix[52195.:52196.]
Out[5]:
Empty DataFrame
Columns: array([0, 1])
Index: array([], dtype=object)

In [6]: a.ix[52195.1:52196.5]
Out[6]:
Empty DataFrame
Columns: array([0, 1])
Index: array([], dtype=object)

In [7]: a.ix[52195.1:52196.6]
Out[7]:
0 1
52195.504153 1.367681 0.243237
52196.303147 -0.745796 -1.054106

@xdong
Copy link
Author

xdong commented Feb 15, 2012

Thanks for the quick fix. I was going to comment on another issue on float based slicing, but I saw that you had it fixed in commit bc1932f.

Now the float based slicing works as expected when the floats are whole numbers. For example, df.ix[2.0:5.0] is considered label-based as promised in the documentation. However, if I mix integer and float then:

df.ix[2:5.0] is interpretated as interger-based;
df.ix[2:5.1] is interpretated as label-based.

I am worried that it may introduce subtle bugs (admittedly, it's bad practice to mix integer and float.)

@adamklein
Copy link
Contributor

There is definitely still some weirdness in slicing. It's been a game of whack-a-mole.

Part of the complexity is that slicing is context-dependent on what's in the index.

I believe the slicing you point out will be consistent as long as the index type doesn't change ... what I mean is:

In [76]: x = Index([1.5, 2, 3, 4, 5])

In [77]: df = DataFrame(rand(5,5), index=x)

In [78]: df.ix[1.5:4]
Out[78]: 
     0        1        2         3       4     
1.5  0.06102  0.25070  0.009453  0.6829  0.6631
2    0.81916  0.95604  0.397659  0.7903  0.3951
3    0.30179  0.64651  0.701975  0.3746  0.6955
4    0.13221  0.04839  0.788082  0.3093  0.1095

In [79]: df.ix[4:5]
Out[79]: 
   0       1       2        3       4      
5  0.5301  0.8182  0.05318  0.8247  0.01699

In [80]: df.ix[1.5:4].index
Out[80]: Index([1.5, 2, 3, 4], dtype=object)

In [81]: df.ix[4:5].index
Out[81]: Int64Index([5])

This is a bit surprising, that depending on where in the index you slice, you get integer or label based.

I think that maybe the index shouldn't change types when it is subsetted (ie, if it's not an Int64Index, should never become one when sliced).

Furthermore, from the docs: "Therefore, advanced indexing with .ix will always attempt label-based indexing, before falling back on integer-based indexing."

This doesn't seem to be true per the last output, may need fixing here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants