Skip to content

Fixed #12661: more clarification in the where statement #12671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

prabhjotsumman
Copy link
Contributor

Improvement in the documentation w.r.t to issue #12661

@sinhrks sinhrks added the Docs label Mar 20, 2016
@sinhrks
Copy link
Member

sinhrks commented Mar 20, 2016

Personally, I feel explanation of .where() is inappropriate here. Because the section focuses on df[df.A > 0] and df[df > 0]

Maybe following sentence should be fixed, or pls add an explanation if you add .where() example.

A where operation for getting.

@prabhjotsumman
Copy link
Contributor Author

ok sir, i will add brief explanation to where

@jorisvandenbossche
Copy link
Member

I agree with @sinhrks, I don't think the explanation of where is needed here. It is a section about boolean indexing (which does a 'where' operation), not about the where method. And I don't think it is needed to mention the where method in an 10min intro (there is already a section on where in the docs: http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-where-method-and-masking).

And sorry that this was not really clear from the issue #12661.

I would just remove the ```` around 'where' to make it clear that it is not a method.

@@ -282,7 +282,14 @@ Using a single column's values to select data.

df[df.A > 0]

A ``where`` operation for getting.
A ``where`` is an attribute of the DataFrame class which helps in getting the results
based upon the conditional statement that was passed as an argument.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So these are not the same at all. I would expand this section and say this is implemented by where internally. Then show an example (and explain the use of where)

In [3]:    dates = pd.date_range('20130101', periods=6)

In [4]:    dates
Out[4]: DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04', '2013-01-05', '2013-01-06'], dtype='datetime64[ns]', freq='D')

In [5]:    df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))

In [6]:    df
Out[6]: 
                   A         B         C         D
2013-01-01 -1.508166 -0.854516  0.148661  1.348457
2013-01-02 -0.890669 -0.329699  0.991305  0.087812
2013-01-03  1.169071 -1.126267 -0.609362 -0.496550
2013-01-04  1.402877 -1.093240  0.038879 -0.042461
2013-01-05 -2.529996  0.570596 -0.556111 -1.365104
2013-01-06  0.036625 -0.241288  0.154433 -1.564450

In [7]: df[df.A>0]
Out[7]: 
                   A         B         C         D
2013-01-03  1.169071 -1.126267 -0.609362 -0.496550
2013-01-04  1.402877 -1.093240  0.038879 -0.042461
2013-01-06  0.036625 -0.241288  0.154433 -1.564450

In [8]: df.where(df>0)
Out[8]: 
                   A         B         C         D
2013-01-01       NaN       NaN  0.148661  1.348457
2013-01-02       NaN       NaN  0.991305  0.087812
2013-01-03  1.169071       NaN       NaN       NaN
2013-01-04  1.402877       NaN  0.038879       NaN
2013-01-05       NaN  0.570596       NaN       NaN
2013-01-06  0.036625       NaN  0.154433       NaN

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"this is implemented by where internally" @jreback I don't think this is needed in the 10min section, which we should keep as a simple intro. But it can maybe be added in the boolean indexing section in the indexing docs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok that sounds fine. this is too much for 10min.

@prabhjotsumman
Copy link
Contributor Author

I would just remove the `` around 'where' to make it clear that it is not a method.

so, I just remove the `` from where
and also it is given in detail in the indexing section:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#the-where-method-and-masking

@prabhjotsumman
Copy link
Contributor Author

so, any further work to be done?

@jreback
Copy link
Contributor

jreback commented May 13, 2016

closing, but if updated pls reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: 10 minutes - 'where' not used
4 participants