-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: Improve reindex examples and docstring #10996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lmk when you are finsihed |
how's this coming? |
Working to finish this right now. ETA in 2 hours (depending on how fast the coffee kicks in...) |
hi @jreback , just trying to flesh out some doc for the filing methods in reindex and running into this situation Pandas version 0.17.0 # create original dataframe
index2 = pd.date_range('1/1/2010', periods=5, freq='D')
sample_df2 = pd.DataFrame({"prices":[100, 101, np.nan, np.nan, 89]}, index=index2)
print sample_df2
>>>
prices
2010-01-01 100
2010-01-02 101
2010-01-03 NaN
2010-01-04 NaN
2010-01-05 89
#create new index, reindex and use backfill
index_new = pd.date_range('12/29/2009', periods=8, freq='D')
sample_df2.reindex(index_new, method='bfill')
>>>
prices
2009-12-29 100
2009-12-30 100
2009-12-31 100
2010-01-01 100
2010-01-02 101
2010-01-03 NaN
2010-01-04 NaN
2010-01-05 89 Question: Why isn't '89' being propagated backwards to fill the NaN in 2010-01-03 and 2010-01-04? |
see the issue here: #5669 I still don't buy that these should be different. Filling WHILE reindexing is IMHO just confusing. These should be separate operations. |
Filling while reindexing does not look at the data values when deciding which should be propagated. That's a nice property, for a few reasons:
|
@shoyer well, I disagree. Reindexing is completely different that setting the index, which is the point the other issue trying to make. Yes under certainly limited circumstances this makes sense, though I completely discount point (3) (it doesn't matter mucho IMHO). Much more important is obvious semantics. This differs from |
It certainly is a source of confusion, but it is following a sensible rule: the values reindexing looks at are only in the index. It's certainly worth clarifying when feasible in the docs. I agree that efficiency does not matter here for pandas when the data is already all in memory, but it does make a difference for downstream libraries that copy the pandas API for out-of-core data. Namely, it's useful in xray when we define operations on dask arrays and also for dask.dataframe -- we never need to look at the array values to determine what the reindexing result looks like. |
@shoyer exactly my point. Reindexing is hard enough for users. The rule is sensible, but not to the average user. This should NOT be advertised as how to fill thing at all. Rather |
0ab213c
to
ea8959d
Compare
since this is long (yeh!) do we need to paginate somehow? @jorisvandenbossche @shoyer |
|
||
>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror'] | ||
>>> df = pd.DataFrame({ | ||
... 'http_status':[200,200,404,404,301], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pep8: add a space after :
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, thanks! Caught a few other ones like this as well.
eec6a10
to
9a511fc
Compare
the keyword ``fill_value``. Please note: due to the fact | ||
that the index is not monotonically increasing or | ||
decreasing, we cannot use arguments to the keyword | ||
``method`` to fill the ``NaN`` values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this note should rather go in the explanation of the argument itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche agreed. Will fix this now, but I think I should leave this note in the examples as well, in case someone is just browsing the examples without studying the keyword descriptions in the API itself (I know I'm sometimes guilty of doing this one).
@Winterflower can you update according to comments |
DOC: add more reindex examples DOC: fixing some PEP8 issues DOC: added shoyer suggestions DOC: Fixes to reindex based on comments
9a511fc
to
b6e9fe2
Compare
@jreback apologies for the delay! |
DOC: Improve reindex examples and docstring
thanks @Winterflower |
Fixes #10995
Done/waiting for feedback.