Reindex docs question / clarification #21429

bzier · 2018-06-11T18:50:32Z

From the bottom of the reindex docs here; relevant docs source here:

The index entries that did not have a value in the original data frame (for example, ‘2009-12-29’) are by default filled with NaN. If desired, we can fill in the missing values using one of several options.

For example, to backpropagate the last valid value to fill the NaN values, pass bfill as an argument to the method keyword.
>>> df2.reindex(date_index2, method='bfill')
           prices
2009-12-29     100
2009-12-30     100
2009-12-31     100
2010-01-01     100
2010-01-02     101
2010-01-03     NaN
2010-01-04     100
2010-01-05      89
2010-01-06      88
2010-01-07     NaN
Please note that the NaN value present in the original dataframe
(at index value 2010-01-03) will not be filled by any of the
value propagation schemes. This is because filling while reindexing
does not look at dataframe values, but only compares the original and
desired indexes. If you do want to fill in the NaN values present
in the original dataframe, use the fillna() method.

Problem description

Couldn't find any duplicates during search (but hard to say it isn't out there somewhere).

This is a question as much as anything. It may be my ignorance, or perhaps an oversight in the docs. The last value in the output shows 2010-01-07 NaN. It was not part of the original dataframe, so based on the note, it seems that it too should be auto-filled like the first 3 values were. I understand why 2010-01-03 NaN was not populated, but it doesn't seem right for the last value. Unless there is something I'm missing.

https://pandas-docs.github.io/pandas-docs-travis/
^^
FYI, this link from the issue template is giving a 404

The text was updated successfully, but these errors were encountered:

gfyoung · 2018-06-11T21:02:15Z

FYI, this link from the issue template is giving a 404

@jreback @jorisvandenbossche : I thought we were still pushing builds of the docs on Travis?

TomAugspurger · 2018-06-11T21:10:27Z

I thought we were still pushing builds of the docs on Travis?

Failing with https://travis-ci.org/pandas-dev/pandas/jobs/390828331#L2234 till #21397 is merged.

TomAugspurger · 2018-06-11T21:12:24Z

w.r.t. the original issue, 2010-01-07 is not filled since it's beyond the last original valid. bfill backfills valid value, and there isn't a valid value past 2010-01-07, so there's nothing to backfill. @bzier is there anything in the docstring that could better explain that? bfill and ffill are concepts most pandas users will see first via fillna: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.fillna.html

bzier · 2018-06-11T23:18:17Z

@TomAugspurger Thanks for the clarification, that makes sense.

I am brand new to pandas, so hadn't been exposed to bfill or ffill yet. I wound up on the reindex docs from the very bottom of this pandas intro notebook from the Google Machine Learning Crash Course. The rest of that intro made sense, but they piqued my interest with the point about string indexes, so I followed the link straight to that reindexing page.

The reindexing docs all made sense and the examples made things clear, up to that point. I think a couple things threw me off. The first section introduces it as

we can fill in the missing values

and

to fill the NaN values

This doesn't indicate that any NaN values wouldn't be filled. The note underneath then goes on to explain why the one original value was not filled (2010-01-03), but says nothing about the last value at the end. It says

This is because filling while reindexing does not look at dataframe values, but only compares the original and desired indexes

which almost implies that (or at least I read it as) the original values will be left alone and all the new indexes will be filled.

I think simply adding those two sentences from your response would make it clear. For those who are familiar with the fill concepts, it will seem obvious, but I think it would provide clarity for those who aren't.

2010-01-07 is not filled since it's beyond the last original valid. bfill backfills valid value, and there isn't a valid value past 2010-01-07, so there's nothing to backfill.

Alternatively, perhaps just referencing the fill strategies in the earlier statement would be sufficient. Along those lines, one more clarification... the docs say

If desired, we can fill in the missing values using one of several options.

Does that mean then that if we were to specify ffill as the method rather than bfill, the results would have left the first three values as NaN and populated the 2010-01-07 result with the previous valid value of 88?

Thanks again for the help.

gfyoung added Docs Usage Question labels Jun 11, 2018

mroeschke removed the Usage Question label Jun 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reindex docs question / clarification #21429

Reindex docs question / clarification #21429

bzier commented Jun 11, 2018

gfyoung commented Jun 11, 2018

TomAugspurger commented Jun 11, 2018

TomAugspurger commented Jun 11, 2018

bzier commented Jun 11, 2018

Reindex docs question / clarification #21429

Reindex docs question / clarification #21429

Comments

bzier commented Jun 11, 2018

Problem description

gfyoung commented Jun 11, 2018

TomAugspurger commented Jun 11, 2018

TomAugspurger commented Jun 11, 2018

bzier commented Jun 11, 2018