-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DEPR: deprecate .ix #14218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What is the suggested replacement for the deprecated For me >>> df.shape
(10000, 211)
>>> df.index
CategoricalIndex(['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
...
'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
categories=['A', 'B', 'C'], ordered=False, dtype='category', length=10000)
>>> df.loc[['C']].shape
(8000, 211)
>>> %timeit df.loc['C']
100 loops, best of 3: 5.61 ms per loop
>>> %timeit df.ix['C']
100 loops, best of 3: 5.37 ms per loop BTW, passing a list into the indexer adds another 25-50% overhead: >>> %timeit df.loc['C']
100 loops, best of 3: 5.61 ms per loop
>>> %timeit df.loc[['C']]
100 loops, best of 3: 9.97 ms per loop
>>> %timeit df.ix['C']
100 loops, best of 3: 5.37 ms per loop
>>> %timeit df.ix[['C']]
100 loops, best of 3: 7.57 ms per loop |
yes |
@jreback Having terabytes of data and processing it with a help of Dask DataFrame which uses Pandas DataFrames as chunks turns "milliseconds" into minutes... |
@frol doesn't matter how much data you have. you are almost certainly ineffeciently using indexing operations. |
@frol the indexing code paths are going to be rewritten in C/C++ as part of the pandas 2.0 effort, so the microperformance should improve by a factor of 10 or more. Some refactoring or Cythonization may be able to give some quick perf wins in .loc or .iloc |
Question on .ix deprecation-- suppose you want to set the first row of a DataFrame in a particular column with a value (assume that the index is not an Int64Index). Then you can currently use:
In the future can you safely do:
(this seems to beg for SettingWithCopyWarning)? Or is the only proper option going to be |
Our experience has been that mixing positional and label indexing has been a significant source of problems for users. Here you might want to do |
unambigously safe setting (may be better syntactically nicer in 2.0)
or
|
@jreback Thanks, makes sense. |
@jreback I think you have a typo with square brackets used instead of parens?
should be
|
@johne13 yes that was a typo, thanks! |
This looks like it will be really painful for me. Rather than removing ix entirely, could it be switched to a function with keyword only args?
Then I can take a dangerous |
@DavidEscott well you are only delaying the inevitable, so you have some choices
no, converting |
@DavidEscott you're more than welcome to monkey-patch in your own function that does what you want. Since |
closes pandas-dev#14218 closes pandas-dev#15116 Author: Jeff Reback <[email protected]> Closes pandas-dev#15113 from jreback/ix and squashes the following commits: 1544f50 [Jeff Reback] DEPR: deprecate .ix in favor of .loc/.iloc
* Add non-ix alternative solution to Puzzle 8 Added an alternative solution, since ix is being deprecated - pandas-dev/pandas#14218 * Fix ix change formatting issues * Remove chained index solution for Puzzle 8
@wesm I understand that this is not an easy function to maintain, but still I find it unfortunate as it was a VERY expressive way to manipulate DataFrames... I hope someone will be able to make a code snippet to replace ix via monkey-patching? |
I just found a use case that makes Of course I can assign a temporary |
@JonathanTay To support the boolean indexing case with first-n-columns, in addition to you can use the (perhaps prettier, although same length) Yes it's 7 more characters than but generally not having to worry about subtle bugs from |
Can If so, why not have a wrapper around this and keep backward compatibility, and a useful method? |
@ManuelLevi The issue is, each call can be replaced with .iloc, .loc, or a combination, but there's no good way for E.g. if you provide a DataFrame with the Index([0, 2, 4, 6, 8]), and call .ix[:4] on it. Did you want .ix to implicitly use .iloc (returning the first 4 elements) or .loc (returning the first 3 elements)? |
@Liam3851 I see what you mean. I usually use A quick search for Could there be a simple way to change the function behaviour instead of removing it? Maybe assume integers to always be locations, and other types to always be a label? |
This is such a great feature, would be a shame to get it lost... |
@ManuelLevi As I understand it, ix treats anything that could be a label, as a label. This was a source of bugs. For example, if a Series s is indexed by integers [5,3,2,4], then should s.ix[0] return the 0th element or raise KeyError? What if s.index = ['a','b','c'] or [0,1,2,3]? @Liam3851 has a point that the bugs and unexpected behaviour just keep coming once you allow the ambiguity. For example, label based indexing (loc) takes both end points, while position-based (iloc) takes the start but not the end. |
enough said.
The text was updated successfully, but these errors were encountered: