-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Fixed #12661: more clarification in the where statement #12671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Improvement in the documentation w.r.t to issue pandas-dev#12661
Personally, I feel explanation of Maybe following sentence should be fixed, or pls add an explanation if you add
|
ok sir, i will add brief explanation to |
I agree with @sinhrks, I don't think the explanation of And sorry that this was not really clear from the issue #12661. I would just remove the ```` around 'where' to make it clear that it is not a method. |
@@ -282,7 +282,14 @@ Using a single column's values to select data. | |||
|
|||
df[df.A > 0] | |||
|
|||
A ``where`` operation for getting. | |||
A ``where`` is an attribute of the DataFrame class which helps in getting the results | |||
based upon the conditional statement that was passed as an argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So these are not the same at all. I would expand this section and say this is implemented by where internally. Then show an example (and explain the use of where)
In [3]: dates = pd.date_range('20130101', periods=6)
In [4]: dates
Out[4]: DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04', '2013-01-05', '2013-01-06'], dtype='datetime64[ns]', freq='D')
In [5]: df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))
In [6]: df
Out[6]:
A B C D
2013-01-01 -1.508166 -0.854516 0.148661 1.348457
2013-01-02 -0.890669 -0.329699 0.991305 0.087812
2013-01-03 1.169071 -1.126267 -0.609362 -0.496550
2013-01-04 1.402877 -1.093240 0.038879 -0.042461
2013-01-05 -2.529996 0.570596 -0.556111 -1.365104
2013-01-06 0.036625 -0.241288 0.154433 -1.564450
In [7]: df[df.A>0]
Out[7]:
A B C D
2013-01-03 1.169071 -1.126267 -0.609362 -0.496550
2013-01-04 1.402877 -1.093240 0.038879 -0.042461
2013-01-06 0.036625 -0.241288 0.154433 -1.564450
In [8]: df.where(df>0)
Out[8]:
A B C D
2013-01-01 NaN NaN 0.148661 1.348457
2013-01-02 NaN NaN 0.991305 0.087812
2013-01-03 1.169071 NaN NaN NaN
2013-01-04 1.402877 NaN 0.038879 NaN
2013-01-05 NaN 0.570596 NaN NaN
2013-01-06 0.036625 NaN 0.154433 NaN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"this is implemented by where internally" @jreback I don't think this is needed in the 10min section, which we should keep as a simple intro. But it can maybe be added in the boolean indexing section in the indexing docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok that sounds fine. this is too much for 10min.
so, I just remove the `` from where |
so, any further work to be done? |
closing, but if updated pls reopen |
git diff upstream/master | flake8 --diff
Improvement in the documentation w.r.t to issue #12661