Skip to content

DEPR: deprecate .ix #14218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Sep 14, 2016 · 22 comments
Closed

DEPR: deprecate .ix #14218

jreback opened this issue Sep 14, 2016 · 22 comments
Labels
Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Sep 14, 2016

enough said.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Deprecate Functionality to remove in pandas labels Sep 14, 2016
@jreback jreback added this to the 0.20.0 milestone Sep 14, 2016
@frol
Copy link

frol commented Sep 15, 2016

What is the suggested replacement for the deprecated .ix? Is it .loc?

For me .ix works 5-10% faster than .loc:

>>> df.shape
(10000, 211)

>>> df.index
CategoricalIndex(['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',
                  ...
                  'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
                 categories=['A', 'B', 'C'], ordered=False, dtype='category', length=10000)

>>> df.loc[['C']].shape
(8000, 211)

>>> %timeit df.loc['C']
100 loops, best of 3: 5.61 ms per loop

>>> %timeit df.ix['C']
100 loops, best of 3: 5.37 ms per loop

BTW, passing a list into the indexer adds another 25-50% overhead:

>>> %timeit df.loc['C']
100 loops, best of 3: 5.61 ms per loop

>>> %timeit df.loc[['C']]
100 loops, best of 3: 9.97 ms per loop

>>> %timeit df.ix['C']
100 loops, best of 3: 5.37 ms per loop

>>> %timeit df.ix[['C']]
100 loops, best of 3: 7.57 ms per loop

@jreback
Copy link
Contributor Author

jreback commented Sep 15, 2016

yes .loc and .iloc are the expected replacements. Timings are expected to eventually be faster, though a single sub-millisecond access difference is pretty meaningless in any real usecase.

@frol
Copy link

frol commented Sep 15, 2016

@jreback Having terabytes of data and processing it with a help of Dask DataFrame which uses Pandas DataFrames as chunks turns "milliseconds" into minutes...

@jreback
Copy link
Contributor Author

jreback commented Sep 15, 2016

@frol doesn't matter how much data you have. you are almost certainly ineffeciently using indexing operations.

@wesm
Copy link
Member

wesm commented Sep 15, 2016

@frol the indexing code paths are going to be rewritten in C/C++ as part of the pandas 2.0 effort, so the microperformance should improve by a factor of 10 or more. Some refactoring or Cythonization may be able to give some quick perf wins in .loc or .iloc

@Liam3851
Copy link
Contributor

Question on .ix deprecation-- suppose you want to set the first row of a DataFrame in a particular column with a value (assume that the index is not an Int64Index). Then you can currently use:

df.ix[0, 'colname'] = 5

In the future can you safely do:

df.iloc[0].loc['colname'] = 5

(this seems to beg for SettingWithCopyWarning)? Or is the only proper option going to be
df.loc[df.index[0], 'colname'] = 5
?

@wesm
Copy link
Member

wesm commented Sep 15, 2016

Our experience has been that mixing positional and label indexing has been a significant source of problems for users. Here you might want to do df['colname'][0]

@jreback
Copy link
Contributor Author

jreback commented Sep 15, 2016

unambigously safe setting (may be better syntactically nicer in 2.0)

df.iloc[0, df.columns.get_loc('colname')] = 5

or

df.loc[df.index[0], 'colname'] = 5

@Liam3851
Copy link
Contributor

@jreback Thanks, makes sense.

@johne13
Copy link

johne13 commented Dec 25, 2016

@jreback I think you have a typo with square brackets used instead of parens?

df.iloc[0, df.columns.get_loc['colname']] = 5

should be

df.iloc[0, df.columns.get_loc('colname')] = 5

@jreback
Copy link
Contributor Author

jreback commented Dec 26, 2016

@johne13 yes that was a typo, thanks!

jreback added a commit to jreback/pandas that referenced this issue Jan 11, 2017
jreback added a commit to jreback/pandas that referenced this issue Jan 11, 2017
jreback added a commit to jreback/pandas that referenced this issue Jan 12, 2017
jreback added a commit to jreback/pandas that referenced this issue Jan 12, 2017
jreback added a commit to jreback/pandas that referenced this issue Jan 12, 2017
jreback added a commit to jreback/pandas that referenced this issue Jan 12, 2017
jreback added a commit to jreback/pandas that referenced this issue Jan 12, 2017
jreback added a commit to jreback/pandas that referenced this issue Jan 12, 2017
jreback added a commit to jreback/pandas that referenced this issue Jan 17, 2017
jreback added a commit to jreback/pandas that referenced this issue Jan 18, 2017
@DavidEscott
Copy link

DavidEscott commented Jan 25, 2017

This looks like it will be really painful for me. Rather than removing ix entirely, could it be switched to a function with keyword only args?

  df.ix(row_idx=[0,2], col_name=["foo", "bar"])

Then I can take a dangerous df.ix[[0,2], ["foo", "bar"]] and in a fairly straightforward fashion convert it into an unambiguous index without having to repeat my index name or us the df.get_loc?

@jreback
Copy link
Contributor Author

jreback commented Jan 25, 2017

@DavidEscott well you are only delaying the inevitable, so you have some choices

  • don't upgrade
  • ignore the DeprecationWarning (not this will eventually turn into a FutureWarning and eventually then be removed, but that is a ways down the road
  • change your code.

no, converting .ix to a function is not possible, its an indexer, eg. ix[ ], which is syntactically different.

@wesm
Copy link
Member

wesm commented Jan 25, 2017

@DavidEscott you're more than welcome to monkey-patch in your own function that does what you want. Since .ix has been a significant source of bugs and user problems, we no longer wish to support it

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
closes pandas-dev#14218
closes pandas-dev#15116

Author: Jeff Reback <[email protected]>

Closes pandas-dev#15113 from jreback/ix and squashes the following commits:

1544f50 [Jeff Reback] DEPR: deprecate .ix in favor of .loc/.iloc
ajcr pushed a commit to ajcr/100-pandas-puzzles that referenced this issue May 11, 2017
* Add non-ix alternative solution to Puzzle 8

Added an alternative solution, since ix is being deprecated - pandas-dev/pandas#14218

* Fix ix change formatting issues

* Remove chained index solution for Puzzle 8
@lrq3000
Copy link

lrq3000 commented Jan 27, 2018

@wesm I understand that this is not an easy function to maintain, but still I find it unfortunate as it was a VERY expressive way to manipulate DataFrames... I hope someone will be able to make a code snippet to replace ix via monkey-patching?

@JonathanTay
Copy link

I just found a use case that makes ix quite valuable to me. I have a Dataframe df such that df['mask'] is a boolean mask that I'd like to filter df on. With ix, I can do df[df.mask,:n] to get the first n columns, filtered by mask. Now the best way seems to be df.loc[df.mask,:].iloc[:,:3], which just reads terribly. Using df.get_loc as an indexing workaround feels very kludgy whereas the ix solution made for elegant code.

Of course I can assign a temporary df2 = df.loc[df.mask] and work from there, but that's inelegant as well.

@Liam3851
Copy link
Contributor

Liam3851 commented Jun 7, 2018

@JonathanTay To support the boolean indexing case with first-n-columns, in addition to
df.loc[df.mask, :].iloc[:, :n]

you can use the (perhaps prettier, although same length)
df.iloc[df.mask.values, :n]
or
df.loc[df.mask, df.columns[:n]]

Yes it's 7 more characters than
df.ix[df.mask, :n]

but generally not having to worry about subtle bugs from .ix inference is worth the typing.

@ManuelLevi
Copy link

ManuelLevi commented Jul 17, 2018

Can .ix can be replaced by an .loc chained with an .iloc, or a simple .loc and .iloc?

If so, why not have a wrapper around this and keep backward compatibility, and a useful method?

@Liam3851
Copy link
Contributor

@ManuelLevi The issue is, each call can be replaced with .iloc, .loc, or a combination, but there's no good way for .ix to tell which to use.

E.g. if you provide a DataFrame with the Index([0, 2, 4, 6, 8]), and call .ix[:4] on it. Did you want .ix to implicitly use .iloc (returning the first 4 elements) or .loc (returning the first 3 elements)?

@ManuelLevi
Copy link

@Liam3851 I see what you mean.

I usually use .iloc and .loc combined, but the impact this will have is greater than me. I believe it impacts all the pandas' community.

A quick search for df.ix on GitHub shows almost 4M results. Maybe half a million notebooks and almost 200k python files will break after this. Many of these opensource tutorials and libraries people are counting on.

Could there be a simple way to change the function behaviour instead of removing it? Maybe assume integers to always be locations, and other types to always be a label?

@miguelcdpmarques
Copy link

This is such a great feature, would be a shame to get it lost...
Please consider some of the suggestions above as a way to ease maintenance

@JonathanTay
Copy link

@ManuelLevi As I understand it, ix treats anything that could be a label, as a label. This was a source of bugs. For example, if a Series s is indexed by integers [5,3,2,4], then should s.ix[0] return the 0th element or raise KeyError? What if s.index = ['a','b','c'] or [0,1,2,3]? @Liam3851 has a point that the bugs and unexpected behaviour just keep coming once you allow the ambiguity. For example, label based indexing (loc) takes both end points, while position-based (iloc) takes the start but not the end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants