Skip to content

DOC: Add %in% operator into compare w r (GH3850) #5875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 15, 2014
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions doc/source/comparison_with_r.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,44 @@ function.
For more details and examples see :ref:`the groupby documentation
<groupby.split>`.

|match|_
~~~~~~~~~~~~

A common way to select data in R is using ``%in%`` which is defined using the
function ``match``. The operator ``%in%`` is used to return a logical vector
indicating if there is a match or not:

.. code-block:: r

s <- 0:4
s %in% c(2,4)

The :meth:`~pandas.DataFrame.isin` method is similar to R ``%in%`` operator:

.. ipython:: python

s = pd.Series(np.arange(5),index=np.arange(5)[::-1],dtype=np.float32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the np.arange(5)[::-1]? Doesn't that make just more complicated for the reader to understand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it would, I was thinking of something...strange.

On 8 January 2014 22:01, Joris Van den Bossche [email protected]:

In doc/source/comparison_with_r.rst:

+~~~~~~~~~~~~
+
+A common way to select data in R is using %in% which is defined using the
+function match. The operator %in% is used to return a logical vector
+indicating if there is a match or not:
+
+.. code-block:: r
+

  • s <- 0:4
  • s %in% c(2,4)

+The :meth:~pandas.DataFrame.isin method is similar to R %in% operator:
+
+.. ipython:: python
+

  • s = pd.Series(np.arange(5),index=np.arange(5)[::-1],dtype=np.float32)

Why the np.arange(5)[::-1]? Doesn't that make just more complicated for
the reader to understand?


Reply to this email directly or view it on GitHubhttps://github.com//pull/5875/files#r8720691
.

Chapman

s.isin([2, 4])

The ``match`` function returns a vector of the positions of matches
of its first argument in its second:

.. code-block:: r

s <- 0:4
match(s, c(2,4))

The :meth:`~pandas.core.groupby.GroupBy.apply` method can be used to replicate
this:

.. ipython:: python

s = pd.Series(np.arange(5),index=np.arange(5)[::-1],dtype=np.float32)
s.apply(lambda x: [2, 4].index(x) if x in [2,4] else np.nan)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is actually a pd.match(s, [2, 4]) function which does exactly this. Although, I don't know to which extent it should be advertised, as it is nowhere in the docs and is maybe also not in the best shape? @y-p @jreback ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. pd.match() ignores it's na_sentinal argument, but otherwise it's a 1:1 match.

The _hashtable_algo function it uses seems to be missing at least the int32 case, btw.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm...looks kind of like Index.get_indexer (and the non_unique version). Never even knew this existed. Maybe should open an issue to see use case / doc or deprecate? I don't see it being used anywhere internally

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chappers

replace this with

Series(pd.match(s,[2,4],np.nan)) (which now works), see #5943


For more details and examples see :ref:`the reshaping documentation
<indexing.basics.indexing_isin>`.

|tapply|_
~~~~~~~~~

Expand Down Expand Up @@ -372,6 +410,9 @@ For more details and examples see :ref:`the reshaping documentation
.. |aggregate| replace:: ``aggregate``
.. _aggregate: http://finzi.psych.upenn.edu/R/library/stats/html/aggregate.html

.. |match| replace:: ``match`` / ``%in%``
.. _match: http://finzi.psych.upenn.edu/R/library/base/html/match.html

.. |tapply| replace:: ``tapply``
.. _tapply: http://finzi.psych.upenn.edu/R/library/base/html/tapply.html

Expand Down