Skip to content

ENH: Allow where/mask/Indexers to accept callable #12539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Mar 6, 2016

Current impl is very simple like assign. Another idea is applying some restriction to return value from callable (for example, only allow aligned DataFrame or scalar).

@sinhrks sinhrks added Indexing Related to indexing on series/frames, not to indexes themselves API Design labels Mar 6, 2016
@sinhrks sinhrks added this to the 0.18.1 milestone Mar 6, 2016
@kawochen
Copy link
Contributor

kawochen commented Mar 6, 2016

One potential problem is when you subclass DataFrame and make it callable.

@sinhrks
Copy link
Member Author

sinhrks commented Mar 6, 2016

Do you know such a package? The logic is not specific here and I don't think we have to care ATM.

@kawochen
Copy link
Contributor

kawochen commented Mar 6, 2016

I don't know anyone that uses it that way. But since the order of consideration is relevant, I think it should be documented that the args are first tested for callability.

@jreback jreback mentioned this pull request Mar 6, 2016
2 tasks
@jreback
Copy link
Contributor

jreback commented Mar 6, 2016

let's simultaneously do all of the chained reshaping ops #11485 in 0.18.1, making them consistent.

@sinhrks sinhrks force-pushed the where branch 4 times, most recently from 9b39b6f to f2430c4 Compare March 9, 2016 08:20
@sinhrks sinhrks changed the title ENH: Allow .where to accept callable as condition ENH: Allow where/mask/query/Indexers to accept callable Mar 9, 2016
@sinhrks
Copy link
Member Author

sinhrks commented Mar 9, 2016

Sure, once done and changed the title.


.. ipython:: python

# callable returns bool indexer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to have some tests if people try to pass something like:

df.loc[lambda x: ...., list_of_columns] which is a bit odd, but I suppose ok, otherwise we need to interpret this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, added some tests for mixture of lamda and scalar/list.

Method chaininng improvements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Following methods / ``Indexer`` now accept ``callable`` as input. It is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following methods / indexers now acccept a callable

@codecov-io
Copy link

codecov-io commented Mar 16, 2016

Current coverage is 83.91%

Merging #12539 into master will increase coverage by +<.01%

@@             master     #12539   diff @@
==========================================
  Files           136        136          
  Lines         49908      49933    +25   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          41876      41901    +25   
  Misses         8032       8032          
  Partials          0          0          

Powered by Codecov. Last updated by 3921b32

- ``[]`` indexing

When a ``callable`` is passed, it is evaluated on the caller
``Series/DataFrame`` prior to any other logics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this?

@jreback
Copy link
Contributor

jreback commented Apr 25, 2016

ok, how about for .query() we than raise if the input is NOT a string with a nice mesage (to use .loc)? or could show a warning and just do it anyhow.

@jorisvandenbossche
Copy link
Member

You already get an error when passing a callable to query, but I agree a nicer message can be raised.

@shoyer
Copy link
Member

shoyer commented Apr 26, 2016

I would definitely raise if the input to .query() is not a string. Adding warnings and then doing something anyways is a bad idea.

@jreback
Copy link
Contributor

jreback commented Apr 26, 2016

ok @sinhrks if you can make those mods would be great. (you might need to adjust docs/example as I think you use .query, but can just change to .loc)

@sinhrks sinhrks force-pushed the where branch 5 times, most recently from 7ad04b7 to ebe5aaf Compare April 27, 2016 21:29
@sinhrks sinhrks changed the title ENH: Allow where/mask/query/Indexers to accept callable ENH: Allow where/mask/Indexers to accept callable Apr 27, 2016
@sinhrks
Copy link
Member Author

sinhrks commented Apr 28, 2016

Thanks for all the feedbacks. Updated. Lmk if I missed something.

@@ -5489,6 +5489,269 @@ def test_none_coercion_mixed_dtypes(self):
strict_nan=True)


class TestIndexingCallable(tm.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this a separate file: indexing/test_callables.py

@jreback
Copy link
Contributor

jreback commented Apr 28, 2016

just minor change otherwise lgtm.

@jorisvandenbossche @shoyer

df = pd.DataFrame({'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]})
df.where(lambda x: x > 4, lambda x: x + 10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add this example somewhere in the where docs as well (http://pandas-docs.github.io/pandas-docs-travis/indexing.html#the-where-method-and-masking)

@jorisvandenbossche
Copy link
Member

@sinhrks Can you also update the docstrings for loc/iloc/ix (eg for loc https://github.com/pydata/pandas/blob/65ed3afd092908130905e7d86ff1b6b9c2002d00/pandas/core/indexing.py#L1314), these are used in the API docs.

For the rest, +1!

@@ -2454,8 +2459,9 @@ def assign(self, **kwargs):
kwargs : keyword, value pairs
keywords are the column names. If the values are
callable, they are computed on the DataFrame and
assigned to the new columns. If the values are
not callable, (e.g. a Series, scalar, or array),
assigned to the new columns. The callable must not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a versionadded tag here (and other doc-strings)

Copy link
Member Author

@sinhrks sinhrks Apr 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assign impl is unchanged.

@jreback
Copy link
Contributor

jreback commented Apr 29, 2016

@sinhrks some very minor doc-changes. ping when pushed.

@jreback
Copy link
Contributor

jreback commented Apr 29, 2016

thanks @sinhrks !

awesome enhancement!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow .where to accept callable as condition API: chained reshaping ops
7 participants