Skip to content

Commit b6e9fe2

Browse files
committed
DOC: Improve reindex examples
DOC: add more reindex examples DOC: fixing some PEP8 issues DOC: added shoyer suggestions DOC: Fixes to reindex based on comments
1 parent ae3b228 commit b6e9fe2

File tree

1 file changed

+115
-2
lines changed

1 file changed

+115
-2
lines changed

pandas/core/generic.py

+115-2
Original file line numberDiff line numberDiff line change
@@ -1775,7 +1775,9 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False,
17751775
New labels / index to conform to. Preferably an Index object to
17761776
avoid duplicating data
17771777
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}, optional
1778-
Method to use for filling holes in reindexed DataFrame:
1778+
method to use for filling holes in reindexed DataFrame.
1779+
Please note: this is only applicable to DataFrames/Series with a
1780+
monotonically increasing/decreasing index.
17791781
* default: don't fill gaps
17801782
* pad / ffill: propagate last valid observation forward to next valid
17811783
* backfill / bfill: use next valid observation to fill gap
@@ -1799,7 +1801,118 @@ def sort_index(self, axis=0, level=None, ascending=True, inplace=False,
17991801
18001802
Examples
18011803
--------
1802-
>>> df.reindex(index=[date1, date2, date3], columns=['A', 'B', 'C'])
1804+
1805+
Create a dataframe with some fictional data.
1806+
1807+
>>> index = ['Firefox', 'Chrome', 'Safari', 'IE10', 'Konqueror']
1808+
>>> df = pd.DataFrame({
1809+
... 'http_status': [200,200,404,404,301],
1810+
... 'response_time': [0.04, 0.02, 0.07, 0.08, 1.0]},
1811+
... index=index)
1812+
>>> df
1813+
http_status response_time
1814+
Firefox 200 0.04
1815+
Chrome 200 0.02
1816+
Safari 404 0.07
1817+
IE10 404 0.08
1818+
Konqueror 301 1.00
1819+
1820+
Create a new index and reindex the dataframe. By default
1821+
values in the new index that do not have corresponding
1822+
records in the dataframe are assigned ``NaN``.
1823+
1824+
>>> new_index= ['Safari', 'Iceweasel', 'Comodo Dragon', 'IE10',
1825+
... 'Chrome']
1826+
>>> df.reindex(new_index)
1827+
http_status response_time
1828+
Safari 404 0.07
1829+
Iceweasel NaN NaN
1830+
Comodo Dragon NaN NaN
1831+
IE10 404 0.08
1832+
Chrome 200 0.02
1833+
1834+
We can fill in the missing values by passing a value to
1835+
the keyword ``fill_value``. Because the index is not monotonically
1836+
increasing or decreasing, we cannot use arguments to the keyword
1837+
``method`` to fill the ``NaN`` values.
1838+
1839+
>>> df.reindex(new_index, fill_value=0)
1840+
http_status response_time
1841+
Safari 404 0.07
1842+
Iceweasel 0 0.00
1843+
Comodo Dragon 0 0.00
1844+
IE10 404 0.08
1845+
Chrome 200 0.02
1846+
1847+
>>> df.reindex(new_index, fill_value='missing')
1848+
http_status response_time
1849+
Safari 404 0.07
1850+
Iceweasel missing missing
1851+
Comodo Dragon missing missing
1852+
IE10 404 0.08
1853+
Chrome 200 0.02
1854+
1855+
To further illustrate the filling functionality in
1856+
``reindex``, we will create a dataframe with a
1857+
monotonically increasing index (for example, a sequence
1858+
of dates).
1859+
1860+
>>> date_index = pd.date_range('1/1/2010', periods=6, freq='D')
1861+
>>> df2 = pd.DataFrame({"prices": [100, 101, np.nan, 100, 89, 88]},
1862+
index=date_index)
1863+
>>> df2
1864+
prices
1865+
2010-01-01 100
1866+
2010-01-02 101
1867+
2010-01-03 NaN
1868+
2010-01-04 100
1869+
2010-01-05 89
1870+
2010-01-06 88
1871+
1872+
Suppose we decide to expand the dataframe to cover a wider
1873+
date range.
1874+
1875+
>>> date_index2 = pd.date_range('12/29/2009', periods=10, freq='D')
1876+
>>> df2.reindex(date_index2)
1877+
prices
1878+
2009-12-29 NaN
1879+
2009-12-30 NaN
1880+
2009-12-31 NaN
1881+
2010-01-01 100
1882+
2010-01-02 101
1883+
2010-01-03 NaN
1884+
2010-01-04 100
1885+
2010-01-05 89
1886+
2010-01-06 88
1887+
2010-01-07 NaN
1888+
1889+
The index entries that did not have a value in the original data frame
1890+
(for example, '2009-12-29') are by default filled with ``NaN``.
1891+
If desired, we can fill in the missing values using one of several
1892+
options.
1893+
1894+
For example, to backpropagate the last valid value to fill the ``NaN``
1895+
values, pass ``bfill`` as an argument to the ``method`` keyword.
1896+
1897+
>>> df2.reindex(date_index2, method='bfill')
1898+
prices
1899+
2009-12-29 100
1900+
2009-12-30 100
1901+
2009-12-31 100
1902+
2010-01-01 100
1903+
2010-01-02 101
1904+
2010-01-03 NaN
1905+
2010-01-04 100
1906+
2010-01-05 89
1907+
2010-01-06 88
1908+
2010-01-07 NaN
1909+
1910+
Please note that the ``NaN`` value present in the original dataframe
1911+
(at index value 2010-01-03) will not be filled by any of the
1912+
value propagation schemes. This is because filling while reindexing
1913+
does not look at dataframe values, but only compares the original and
1914+
desired indexes. If you do want to fill in the ``NaN`` values present
1915+
in the original dataframe, use the ``fillna()`` method.
18031916
18041917
Returns
18051918
-------

0 commit comments

Comments
 (0)