diff --git a/doc/source/whatsnew/v0.19.0.rst b/doc/source/whatsnew/v0.19.0.rst index 1e4e7a6c80fa4..6f4e8e36cdc04 100644 --- a/doc/source/whatsnew/v0.19.0.rst +++ b/doc/source/whatsnew/v0.19.0.rst @@ -5,12 +5,6 @@ v0.19.0 (October 2, 2016) {{ header }} -.. ipython:: python - :suppress: - - from pandas import * # noqa F401, F403 - - This is a major release from 0.18.1 and includes number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version. @@ -105,9 +99,8 @@ This also illustrates using the ``by`` parameter to group data before merging. '20160525 13:30:00.049', '20160525 13:30:00.072', '20160525 13:30:00.075']), - 'ticker': ['GOOG', 'MSFT', 'MSFT', - 'MSFT', 'GOOG', 'AAPL', 'GOOG', - 'MSFT'], + 'ticker': ['GOOG', 'MSFT', 'MSFT', 'MSFT', + 'GOOG', 'AAPL', 'GOOG', 'MSFT'], 'bid': [720.50, 51.95, 51.97, 51.99, 720.50, 97.99, 720.50, 52.01], 'ask': [720.93, 51.96, 51.98, 52.00, @@ -143,7 +136,8 @@ See the full documentation :ref:`here `. .. ipython:: python dft = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]}, - index=pd.date_range('20130101 09:00:00', periods=5, freq='s')) + index=pd.date_range('20130101 09:00:00', + periods=5, freq='s')) dft This is a regular frequency index. Using an integer window parameter works to roll along the window frequency. @@ -164,13 +158,13 @@ Using a non-regular, but still monotonic index, rolling with an integer window d .. ipython:: python - dft = DataFrame({'B': [0, 1, 2, np.nan, 4]}, - index = pd.Index([pd.Timestamp('20130101 09:00:00'), - pd.Timestamp('20130101 09:00:02'), - pd.Timestamp('20130101 09:00:03'), - pd.Timestamp('20130101 09:00:05'), - pd.Timestamp('20130101 09:00:06')], - name='foo')) + dft = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]}, + index=pd.Index([pd.Timestamp('20130101 09:00:00'), + pd.Timestamp('20130101 09:00:02'), + pd.Timestamp('20130101 09:00:03'), + pd.Timestamp('20130101 09:00:05'), + pd.Timestamp('20130101 09:00:06')], + name='foo')) dft dft.rolling(2).sum() @@ -277,10 +271,10 @@ Categorical Concatenation .. ipython:: python - from pandas.api.types import union_categoricals - a = pd.Categorical(["b", "c"]) - b = pd.Categorical(["a", "b"]) - union_categoricals([a, b]) + from pandas.api.types import union_categoricals + a = pd.Categorical(["b", "c"]) + b = pd.Categorical(["a", "b"]) + union_categoricals([a, b]) - ``concat`` and ``append`` now can concat ``category`` dtypes with different ``categories`` as ``object`` dtype (:issue:`13524`) @@ -289,18 +283,18 @@ Categorical Concatenation s1 = pd.Series(['a', 'b'], dtype='category') s2 = pd.Series(['b', 'c'], dtype='category') - **Previous behavior**: +**Previous behavior**: - .. code-block:: ipython +.. code-block:: ipython - In [1]: pd.concat([s1, s2]) - ValueError: incompatible categories in categorical concat + In [1]: pd.concat([s1, s2]) + ValueError: incompatible categories in categorical concat - **New behavior**: +**New behavior**: - .. ipython:: python +.. ipython:: python - pd.concat([s1, s2]) + pd.concat([s1, s2]) .. _whatsnew_0190.enhancements.semi_month_offsets: @@ -313,31 +307,31 @@ These provide date offsets anchored (by default) to the 15th and end of month, a .. ipython:: python - from pandas.tseries.offsets import SemiMonthEnd, SemiMonthBegin + from pandas.tseries.offsets import SemiMonthEnd, SemiMonthBegin **SemiMonthEnd**: .. ipython:: python - Timestamp('2016-01-01') + SemiMonthEnd() + pd.Timestamp('2016-01-01') + SemiMonthEnd() - pd.date_range('2015-01-01', freq='SM', periods=4) + pd.date_range('2015-01-01', freq='SM', periods=4) **SemiMonthBegin**: .. ipython:: python - Timestamp('2016-01-01') + SemiMonthBegin() + pd.Timestamp('2016-01-01') + SemiMonthBegin() - pd.date_range('2015-01-01', freq='SMS', periods=4) + pd.date_range('2015-01-01', freq='SMS', periods=4) Using the anchoring suffix, you can also specify the day of month to use instead of the 15th. .. ipython:: python - pd.date_range('2015-01-01', freq='SMS-16', periods=4) + pd.date_range('2015-01-01', freq='SMS-16', periods=4) - pd.date_range('2015-01-01', freq='SM-14', periods=4) + pd.date_range('2015-01-01', freq='SM-14', periods=4) .. _whatsnew_0190.enhancements.index: @@ -367,7 +361,7 @@ For ``MultiIndex``, values are dropped if any level is missing by default. Speci .. ipython:: python midx = pd.MultiIndex.from_arrays([[1, 2, np.nan, 4], - [1, 2, np.nan, np.nan]]) + [1, 2, np.nan, np.nan]]) midx midx.dropna() midx.dropna(how='all') @@ -377,7 +371,7 @@ For ``MultiIndex``, values are dropped if any level is missing by default. Speci .. ipython:: python idx = pd.Index(["a1a2", "b1", "c1"]) - idx.str.extractall("[ab](?P\d)") + idx.str.extractall(r"[ab](?P\d)") ``Index.astype()`` now accepts an optional boolean argument ``copy``, which allows optional copying if the requirements on dtype are satisfied (:issue:`13209`) @@ -453,7 +447,7 @@ The following are now part of this API: import pprint from pandas.api import types - funcs = [ f for f in dir(types) if not f.startswith('_') ] + funcs = [f for f in dir(types) if not f.startswith('_')] pprint.pprint(funcs) .. note:: @@ -470,9 +464,9 @@ Other enhancements .. ipython:: python - pd.Timestamp(2012, 1, 1) + pd.Timestamp(2012, 1, 1) - pd.Timestamp(year=2012, month=1, day=1, hour=8, minute=30) + pd.Timestamp(year=2012, month=1, day=1, hour=8, minute=30) - The ``.resample()`` function now accepts a ``on=`` or ``level=`` parameter for resampling on a datetimelike column or ``MultiIndex`` level (:issue:`13500`) @@ -480,10 +474,11 @@ Other enhancements df = pd.DataFrame({'date': pd.date_range('2015-01-01', freq='W', periods=5), 'a': np.arange(5)}, - index=pd.MultiIndex.from_arrays([ - [1,2,3,4,5], - pd.date_range('2015-01-01', freq='W', periods=5)], - names=['v','d'])) + index=pd.MultiIndex.from_arrays([[1, 2, 3, 4, 5], + pd.date_range('2015-01-01', + freq='W', + periods=5) + ], names=['v', 'd'])) df df.resample('M', on='date').sum() df.resample('M', level='d').sum() @@ -547,7 +542,7 @@ API changes .. ipython:: python - s = pd.Series([1,2,3]) + s = pd.Series([1, 2, 3]) **Previous behavior**: @@ -953,7 +948,7 @@ of integers (:issue:`13988`). In [6]: pi = pd.PeriodIndex(['2011-01', '2011-02'], freq='M') In [7]: pi.values - array([492, 493]) + Out[7]: array([492, 493]) **New behavior**: @@ -981,15 +976,15 @@ Previous behavior: .. code-block:: ipython - In [1]: pd.Index(['a', 'b']) + pd.Index(['a', 'c']) - FutureWarning: using '+' to provide set union with Indexes is deprecated, use '|' or .union() - Out[1]: Index(['a', 'b', 'c'], dtype='object') + In [1]: pd.Index(['a', 'b']) + pd.Index(['a', 'c']) + FutureWarning: using '+' to provide set union with Indexes is deprecated, use '|' or .union() + Out[1]: Index(['a', 'b', 'c'], dtype='object') **New behavior**: the same operation will now perform element-wise addition: .. ipython:: python - pd.Index(['a', 'b']) + pd.Index(['a', 'c']) + pd.Index(['a', 'b']) + pd.Index(['a', 'c']) Note that numeric Index objects already performed element-wise operations. For example, the behavior of adding two integer Indexes is unchanged. @@ -997,7 +992,7 @@ The base ``Index`` is now made consistent with this behavior. .. ipython:: python - pd.Index([1, 2, 3]) + pd.Index([2, 3, 4]) + pd.Index([1, 2, 3]) + pd.Index([2, 3, 4]) Further, because of this change, it is now possible to subtract two DatetimeIndex objects resulting in a TimedeltaIndex: @@ -1006,7 +1001,8 @@ DatetimeIndex objects resulting in a TimedeltaIndex: .. code-block:: ipython - In [1]: pd.DatetimeIndex(['2016-01-01', '2016-01-02']) - pd.DatetimeIndex(['2016-01-02', '2016-01-03']) + In [1]: (pd.DatetimeIndex(['2016-01-01', '2016-01-02']) + ...: - pd.DatetimeIndex(['2016-01-02', '2016-01-03'])) FutureWarning: using '-' to provide set differences with datetimelike Indexes is deprecated, use .difference() Out[1]: DatetimeIndex(['2016-01-01'], dtype='datetime64[ns]', freq=None) @@ -1014,7 +1010,8 @@ DatetimeIndex objects resulting in a TimedeltaIndex: .. ipython:: python - pd.DatetimeIndex(['2016-01-01', '2016-01-02']) - pd.DatetimeIndex(['2016-01-02', '2016-01-03']) + (pd.DatetimeIndex(['2016-01-01', '2016-01-02']) + - pd.DatetimeIndex(['2016-01-02', '2016-01-03'])) .. _whatsnew_0190.api.difference: @@ -1063,7 +1060,8 @@ Previously, most ``Index`` classes returned ``np.ndarray``, and ``DatetimeIndex` In [1]: pd.Index([1, 2, 3]).unique() Out[1]: array([1, 2, 3]) - In [2]: pd.DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03'], tz='Asia/Tokyo').unique() + In [2]: pd.DatetimeIndex(['2011-01-01', '2011-01-02', + ...: '2011-01-03'], tz='Asia/Tokyo').unique() Out[2]: DatetimeIndex(['2011-01-01 00:00:00+09:00', '2011-01-02 00:00:00+09:00', '2011-01-03 00:00:00+09:00'], @@ -1074,7 +1072,8 @@ Previously, most ``Index`` classes returned ``np.ndarray``, and ``DatetimeIndex` .. ipython:: python pd.Index([1, 2, 3]).unique() - pd.DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03'], tz='Asia/Tokyo').unique() + pd.DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03'], + tz='Asia/Tokyo').unique() .. _whatsnew_0190.api.multiindex: @@ -1236,29 +1235,29 @@ Operators now preserve dtypes - Sparse data structure now can preserve ``dtype`` after arithmetic ops (:issue:`13848`) - .. ipython:: python +.. ipython:: python - s = pd.SparseSeries([0, 2, 0, 1], fill_value=0, dtype=np.int64) - s.dtype + s = pd.SparseSeries([0, 2, 0, 1], fill_value=0, dtype=np.int64) + s.dtype - s + 1 + s + 1 - Sparse data structure now support ``astype`` to convert internal ``dtype`` (:issue:`13900`) - .. ipython:: python +.. ipython:: python - s = pd.SparseSeries([1., 0., 2., 0.], fill_value=0) - s - s.astype(np.int64) + s = pd.SparseSeries([1., 0., 2., 0.], fill_value=0) + s + s.astype(np.int64) ``astype`` fails if data contains values which cannot be converted to specified ``dtype``. Note that the limitation is applied to ``fill_value`` which default is ``np.nan``. - .. code-block:: ipython +.. code-block:: ipython - In [7]: pd.SparseSeries([1., np.nan, 2., np.nan], fill_value=np.nan).astype(np.int64) - Out[7]: - ValueError: unable to coerce current fill_value nan to int64 dtype + In [7]: pd.SparseSeries([1., np.nan, 2., np.nan], fill_value=np.nan).astype(np.int64) + Out[7]: + ValueError: unable to coerce current fill_value nan to int64 dtype Other sparse fixes """""""""""""""""" diff --git a/doc/source/whatsnew/v0.20.0.rst b/doc/source/whatsnew/v0.20.0.rst index d5a2422e456ee..2002c4bb9bc51 100644 --- a/doc/source/whatsnew/v0.20.0.rst +++ b/doc/source/whatsnew/v0.20.0.rst @@ -5,12 +5,6 @@ v0.20.1 (May 5, 2017) {{ header }} -.. ipython:: python - :suppress: - - from pandas import * # noqa F401, F403 - - This is a major release from 0.19.2 and includes a number of API changes, deprecations, new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version. @@ -71,7 +65,7 @@ Here is a sample .. ipython:: python df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'], - index=pd.date_range('1/1/2000', periods=10)) + index=pd.date_range('1/1/2000', periods=10)) df.iloc[3:7] = np.nan df @@ -95,7 +89,7 @@ per unique function. Those functions applied to a particular column will be ``Na .. ipython:: python - df.agg({'A' : ['sum', 'min'], 'B' : ['min', 'max']}) + df.agg({'A': ['sum', 'min'], 'B': ['min', 'max']}) The API also supports a ``.transform()`` function for broadcasting results. @@ -136,7 +130,7 @@ fixed-width text files and :func:`read_excel` for parsing Excel files, now accep data = "a b\n1 2\n3 4" pd.read_fwf(StringIO(data)).dtypes - pd.read_fwf(StringIO(data), dtype={'a':'float64', 'b':'object'}).dtypes + pd.read_fwf(StringIO(data), dtype={'a': 'float64', 'b': 'object'}).dtypes .. _whatsnew_0120.enhancements.datetime_origin: @@ -194,13 +188,12 @@ Previously, only ``gzip`` compression was supported. By default, compression of URLs and paths are now inferred using their file extensions. Additionally, support for bz2 compression in the python 2 C-engine improved (:issue:`14874`). -.. code-block:: python +.. ipython:: python - url = 'https://github.com/{repo}/raw/{branch}/{path}'.format( - repo = 'pandas-dev/pandas', - branch = 'master', - path = 'pandas/tests/io/parser/data/salaries.csv.bz2', - ) + url = ('https://github.com/{repo}/raw/{branch}/{path}' + .format(repo='pandas-dev/pandas', + branch='master', + path='pandas/tests/io/parser/data/salaries.csv.bz2')) df = pd.read_table(url, compression='infer') # default, infer compression df = pd.read_table(url, compression='bz2') # explicitly specify compression df.head(2) @@ -217,10 +210,9 @@ See :ref:`the docs here. ` .. ipython:: python - df = pd.DataFrame({ - 'A': np.random.randn(1000), - 'B': 'foo', - 'C': pd.date_range('20130101', periods=1000, freq='s')}) + df = pd.DataFrame({'A': np.random.randn(1000), + 'B': 'foo', + 'C': pd.date_range('20130101', periods=1000, freq='s')}) Using an explicit compression type @@ -281,29 +273,29 @@ In previous versions, ``.groupby(..., sort=False)`` would fail with a ``ValueErr .. ipython:: python - chromosomes = np.r_[np.arange(1, 23).astype(str), ['X', 'Y']] - df = pd.DataFrame({ - 'A': np.random.randint(100), - 'B': np.random.randint(100), - 'C': np.random.randint(100), - 'chromosomes': pd.Categorical(np.random.choice(chromosomes, 100), - categories=chromosomes, - ordered=True)}) - df + chromosomes = np.r_[np.arange(1, 23).astype(str), ['X', 'Y']] + df = pd.DataFrame({ + 'A': np.random.randint(100), + 'B': np.random.randint(100), + 'C': np.random.randint(100), + 'chromosomes': pd.Categorical(np.random.choice(chromosomes, 100), + categories=chromosomes, + ordered=True)}) + df **Previous Behavior**: .. code-block:: ipython - In [3]: df[df.chromosomes != '1'].groupby('chromosomes', sort=False).sum() - --------------------------------------------------------------------------- - ValueError: items in new_categories are not the same as in old categories + In [3]: df[df.chromosomes != '1'].groupby('chromosomes', sort=False).sum() + --------------------------------------------------------------------------- + ValueError: items in new_categories are not the same as in old categories **New Behavior**: .. ipython:: python - df[df.chromosomes != '1'].groupby('chromosomes', sort=False).sum() + df[df.chromosomes != '1'].groupby('chromosomes', sort=False).sum() .. _whatsnew_0200.enhancements.table_schema: @@ -319,8 +311,8 @@ the data. df = pd.DataFrame( {'A': [1, 2, 3], 'B': ['a', 'b', 'c'], - 'C': pd.date_range('2016-01-01', freq='d', periods=3), - }, index=pd.Index(range(3), name='idx')) + 'C': pd.date_range('2016-01-01', freq='d', periods=3)}, + index=pd.Index(range(3), name='idx')) df df.to_json(orient='table') @@ -384,9 +376,9 @@ For example, after running the following, ``styled.xlsx`` renders as below: axis=1) df.iloc[0, 2] = np.nan df - styled = df.style.\ - applymap(lambda val: 'color: %s' % 'red' if val < 0 else 'black').\ - highlight_max() + styled = (df.style + .applymap(lambda val: 'color: %s' % 'red' if val < 0 else 'black') + .highlight_max()) styled.to_excel('styled.xlsx', engine='openpyxl') .. image:: ../_static/style-excel.png @@ -449,8 +441,8 @@ An ``IntervalIndex`` can also be used in ``Series`` and ``DataFrame`` as the ind .. ipython:: python df = pd.DataFrame({'A': range(4), - 'B': pd.cut([0, 3, 1, 1], bins=c.categories)} - ).set_index('B') + 'B': pd.cut([0, 3, 1, 1], bins=c.categories) + }).set_index('B') df Selecting via a specific interval: @@ -551,7 +543,7 @@ then write them out again after applying the procedure below. .. code-block:: ipython - In [2]: s = pd.TimeSeries([1,2,3], index=pd.date_range('20130101', periods=3)) + In [2]: s = pd.TimeSeries([1, 2, 3], index=pd.date_range('20130101', periods=3)) In [3]: s Out[3]: @@ -585,9 +577,9 @@ Map on Index types now return other Index types .. ipython:: python - idx = Index([1, 2]) + idx = pd.Index([1, 2]) idx - mi = MultiIndex.from_tuples([(1, 2), (2, 4)]) + mi = pd.MultiIndex.from_tuples([(1, 2), (2, 4)]) mi Previous Behavior: @@ -622,7 +614,8 @@ New Behavior: .. ipython:: python - s = Series(date_range('2011-01-02T00:00', '2011-01-02T02:00', freq='H').tz_localize('Asia/Tokyo')) + s = pd.Series(pd.date_range('2011-01-02T00:00', '2011-01-02T02:00', freq='H') + .tz_localize('Asia/Tokyo')) s Previous Behavior: @@ -657,17 +650,17 @@ Previous behaviour: .. code-block:: ipython - In [1]: idx = pd.date_range("2015-01-01", periods=5, freq='10H') + In [1]: idx = pd.date_range("2015-01-01", periods=5, freq='10H') - In [2]: idx.hour - Out[2]: array([ 0, 10, 20, 6, 16], dtype=int32) + In [2]: idx.hour + Out[2]: array([ 0, 10, 20, 6, 16], dtype=int32) New Behavior: .. ipython:: python - idx = pd.date_range("2015-01-01", periods=5, freq='10H') - idx.hour + idx = pd.date_range("2015-01-01", periods=5, freq='10H') + idx.hour This has the advantage that specific ``Index`` methods are still available on the result. On the other hand, this might have backward incompatibilities: e.g. @@ -690,20 +683,20 @@ data-types would yield different return types. These are now made consistent. (: # Series In [5]: pd.Series([pd.Timestamp('20160101', tz='US/Eastern'), - pd.Timestamp('20160101', tz='US/Eastern')]).unique() + ...: pd.Timestamp('20160101', tz='US/Eastern')]).unique() Out[5]: array([Timestamp('2016-01-01 00:00:00-0500', tz='US/Eastern')], dtype=object) In [6]: pd.unique(pd.Series([pd.Timestamp('20160101', tz='US/Eastern'), - pd.Timestamp('20160101', tz='US/Eastern')])) + ...: pd.Timestamp('20160101', tz='US/Eastern')])) Out[6]: array(['2016-01-01T05:00:00.000000000'], dtype='datetime64[ns]') # Index In [7]: pd.Index([pd.Timestamp('20160101', tz='US/Eastern'), - pd.Timestamp('20160101', tz='US/Eastern')]).unique() + ...: pd.Timestamp('20160101', tz='US/Eastern')]).unique() Out[7]: DatetimeIndex(['2016-01-01 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None) In [8]: pd.unique([pd.Timestamp('20160101', tz='US/Eastern'), - pd.Timestamp('20160101', tz='US/Eastern')]) + ...: pd.Timestamp('20160101', tz='US/Eastern')]) Out[8]: array(['2016-01-01T05:00:00.000000000'], dtype='datetime64[ns]') New Behavior: @@ -711,10 +704,10 @@ data-types would yield different return types. These are now made consistent. (: .. ipython:: python # Series, returns an array of Timestamp tz-aware - pd.Series([pd.Timestamp('20160101', tz='US/Eastern'), - pd.Timestamp('20160101', tz='US/Eastern')]).unique() + pd.Series([pd.Timestamp(r'20160101', tz=r'US/Eastern'), + pd.Timestamp(r'20160101', tz=r'US/Eastern')]).unique() pd.unique(pd.Series([pd.Timestamp('20160101', tz='US/Eastern'), - pd.Timestamp('20160101', tz='US/Eastern')])) + pd.Timestamp('20160101', tz='US/Eastern')])) # Index, returns a DatetimeIndex pd.Index([pd.Timestamp('20160101', tz='US/Eastern'), @@ -762,33 +755,33 @@ Partial String Indexing Changes .. ipython:: python - df = DataFrame({'a': [1, 2, 3]}, DatetimeIndex(['2011-12-31 23:59:59', - '2012-01-01 00:00:00', - '2012-01-01 00:00:01'])) + df = pd.DataFrame({'a': [1, 2, 3]}, pd.DatetimeIndex(['2011-12-31 23:59:59', + '2012-01-01 00:00:00', + '2012-01-01 00:00:01'])) Previous Behavior: .. code-block:: ipython - In [4]: df['2011-12-31 23:59:59'] - Out[4]: - a - 2011-12-31 23:59:59 1 + In [4]: df['2011-12-31 23:59:59'] + Out[4]: + a + 2011-12-31 23:59:59 1 - In [5]: df['a']['2011-12-31 23:59:59'] - Out[5]: - 2011-12-31 23:59:59 1 - Name: a, dtype: int64 + In [5]: df['a']['2011-12-31 23:59:59'] + Out[5]: + 2011-12-31 23:59:59 1 + Name: a, dtype: int64 New Behavior: .. code-block:: ipython - In [4]: df['2011-12-31 23:59:59'] - KeyError: '2011-12-31 23:59:59' + In [4]: df['2011-12-31 23:59:59'] + KeyError: '2011-12-31 23:59:59' - In [5]: df['a']['2011-12-31 23:59:59'] - Out[5]: 1 + In [5]: df['a']['2011-12-31 23:59:59'] + Out[5]: 1 .. _whatsnew_0200.api_breaking.concat_dtypes: @@ -841,7 +834,7 @@ Previous Behavior: .. code-block:: ipython - In [8]: index = Index(['foo', 'bar', 'baz']) + In [8]: index = pd.Index(['foo', 'bar', 'baz']) In [9]: index.memory_usage(deep=True) Out[9]: 180 @@ -856,7 +849,7 @@ New Behavior: .. code-block:: ipython - In [8]: index = Index(['foo', 'bar', 'baz']) + In [8]: index = pd.Index(['foo', 'bar', 'baz']) In [9]: index.memory_usage(deep=True) Out[9]: 180 @@ -879,34 +872,34 @@ This is *unchanged* from prior versions, but shown for illustration purposes: .. ipython:: python - df = DataFrame(np.arange(6), columns=['value'], index=MultiIndex.from_product([list('BA'), range(3)])) - df + df = pd.DataFrame(np.arange(6), columns=['value'], + index=pd.MultiIndex.from_product([list('BA'), range(3)])) + df .. ipython:: python - df.index.is_lexsorted() - df.index.is_monotonic + df.index.is_lexsorted() + df.index.is_monotonic Sorting works as expected .. ipython:: python - df.sort_index() + df.sort_index() .. ipython:: python - df.sort_index().index.is_lexsorted() - df.sort_index().index.is_monotonic + df.sort_index().index.is_lexsorted() + df.sort_index().index.is_monotonic However, this example, which has a non-monotonic 2nd level, doesn't behave as desired. .. ipython:: python - df = pd.DataFrame( - {'value': [1, 2, 3, 4]}, - index=pd.MultiIndex([['a', 'b'], ['bb', 'aa']], - [[0, 0, 1, 1], [0, 1, 0, 1]])) + df = pd.DataFrame({'value': [1, 2, 3, 4]}, + index=pd.MultiIndex([['a', 'b'], ['bb', 'aa']], + [[0, 0, 1, 1], [0, 1, 0, 1]])) df Previous Behavior: @@ -1034,7 +1027,7 @@ Retrieving a correlation matrix for a cross-section df.rolling(12).corr().loc['2016-04-07'] -.. _whatsnew_0200.api_breaking.hdfstore_where: + .. _whatsnew_0200.api_breaking.hdfstore_where: HDFStore where string comparison ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ @@ -1121,7 +1114,7 @@ joins, :meth:`DataFrame.join` and :func:`merge`, and the ``.align`` method. In [4]: left.join(right, how='inner') Out[4]: - a b + a b 1 10 100 2 20 200 @@ -1141,9 +1134,9 @@ is fixed that allowed this to return a ``Series`` under certain circumstance. (: .. ipython:: python - df = DataFrame({'col1': [3, 4, 5], - 'col2': ['C', 'D', 'E'], - 'col3': [1, 3, 9]}) + df = pd.DataFrame({'col1': [3, 4, 5], + 'col2': ['C', 'D', 'E'], + 'col3': [1, 3, 9]}) df Previous Behavior: @@ -1330,33 +1323,33 @@ Using ``.ix`` will now show a ``DeprecationWarning`` with a link to some example .. ipython:: python - df = pd.DataFrame({'A': [1, 2, 3], - 'B': [4, 5, 6]}, - index=list('abc')) + df = pd.DataFrame({'A': [1, 2, 3], + 'B': [4, 5, 6]}, + index=list('abc')) - df + df Previous Behavior, where you wish to get the 0th and the 2nd elements from the index in the 'A' column. .. code-block:: ipython - In [3]: df.ix[[0, 2], 'A'] - Out[3]: - a 1 - c 3 - Name: A, dtype: int64 + In [3]: df.ix[[0, 2], 'A'] + Out[3]: + a 1 + c 3 + Name: A, dtype: int64 Using ``.loc``. Here we will select the appropriate indexes from the index, then use *label* indexing. .. ipython:: python - df.loc[df.index[[0, 2]], 'A'] + df.loc[df.index[[0, 2]], 'A'] Using ``.iloc``. Here we will get the location of the 'A' column, then use *positional* indexing to select things. .. ipython:: python - df.iloc[[0, 2], df.columns.get_loc('A')] + df.iloc[[0, 2], df.columns.get_loc('A')] .. _whatsnew_0200.api_breaking.deprecate_panel: @@ -1408,10 +1401,10 @@ This is an illustrative example: .. ipython:: python - df = pd.DataFrame({'A': [1, 1, 1, 2, 2], - 'B': range(5), - 'C': range(5)}) - df + df = pd.DataFrame({'A': [1, 1, 1, 2, 2], + 'B': range(5), + 'C': range(5)}) + df Here is a typical useful syntax for computing different aggregations for different columns. This is a natural, and useful syntax. We aggregate from the dict-to-list by taking the specified @@ -1448,8 +1441,8 @@ Here's an example of the second deprecation, passing a dict-of-dict to a grouped .. code-block:: python In [23]: (df.groupby('A') - .agg({'B': {'foo': 'sum'}, 'C': {'bar': 'min'}}) - ) + ...: .agg({'B': {'foo': 'sum'}, 'C': {'bar': 'min'}}) + ...: ) FutureWarning: using a dict with renaming is deprecated and will be removed in a future version @@ -1468,7 +1461,7 @@ You can accomplish nearly the same by: (df.groupby('A') .agg({'B': 'sum', 'C': 'min'}) .rename(columns={'B': 'foo', 'C': 'bar'}) - ) + ) @@ -1494,7 +1487,7 @@ Should be changed to: .. code-block:: python - pd.plotting.scatter_matrix(df) + pd.plotting.scatter_matrix(df) diff --git a/doc/source/whatsnew/v0.21.0.rst b/doc/source/whatsnew/v0.21.0.rst index 73bdedb3d3194..47cd17efe3f75 100644 --- a/doc/source/whatsnew/v0.21.0.rst +++ b/doc/source/whatsnew/v0.21.0.rst @@ -97,14 +97,14 @@ attribute on the ``DataFrame``: .. code-block:: ipython - In[1]: df = pd.DataFrame({'one': [1., 2., 3.]}) - In[2]: df.two = [4, 5, 6] + In [1]: df = pd.DataFrame({'one': [1., 2., 3.]}) + In [2]: df.two = [4, 5, 6] This does not raise any obvious exceptions, but also does not create a new column: .. code-block:: ipython - In[3]: df + In [3]: df Out[3]: one 0 1.0 @@ -126,7 +126,7 @@ For example: .. ipython:: python - df = pd.DataFrame(np.arange(8).reshape(2,4), + df = pd.DataFrame(np.arange(8).reshape(2, 4), columns=['A', 'B', 'C', 'D']) df df.drop(['B', 'C'], axis=1) @@ -244,8 +244,11 @@ First we set the data: import numpy as np n = 1000 df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n), - 'Product': np.random.choice(['Product_1', 'Product_2', 'Product_3'], n), - 'Revenue': (np.random.random(n)*50+10).round(2), + 'Product': np.random.choice(['Product_1', + 'Product_2', + 'Product_3' + ], n), + 'Revenue': (np.random.random(n) * 50 + 10).round(2), 'Quantity': np.random.randint(1, 10, size=n)}) df.head(2) @@ -254,7 +257,7 @@ Now, to find prices per store/product, we can simply do: .. ipython:: python (df.groupby(['Store', 'Product']) - .pipe(lambda grp: grp.Revenue.sum()/grp.Quantity.sum()) + .pipe(lambda grp: grp.Revenue.sum() / grp.Quantity.sum()) .unstack().round(2)) See the :ref:`documentation ` for more. @@ -393,7 +396,7 @@ Calling ``sum`` or ``prod`` on an empty or all-``NaN`` ``Series``, or columns of .. ipython:: python - s = Series([np.nan]) + s = pd.Series([np.nan]) Previously WITHOUT ``bottleneck`` installed: @@ -478,7 +481,7 @@ The idiomatic way to achieve selecting potentially not-found elements is via ``. .. ipython:: python - s.reindex([1, 2, 3]) + s.reindex([1, 2, 3]) Selection with all keys found is unchanged. @@ -531,7 +534,7 @@ Furthermore this will now correctly box the results of iteration for :func:`Data .. ipython:: python - d = {'a':[1], 'b':['b']} + d = {'a': [1], 'b': ['b']} df = pd.DataFrame(d) Previously: @@ -589,13 +592,13 @@ Previously Behavior: .. ipython:: python - s = pd.Series([1,2,3], index=['a', 'b', 'c']) - s + s = pd.Series([1, 2, 3], index=['a', 'b', 'c']) + s .. code-block:: ipython - In [39]: s.loc[pd.Index([True, False, True])] - KeyError: "None of [Index([True, False, True], dtype='object')] are in the [index]" + In [39]: s.loc[pd.Index([True, False, True])] + KeyError: "None of [Index([True, False, True], dtype='object')] are in the [index]" Current Behavior @@ -696,10 +699,10 @@ Previously, if you attempted the following expression, you would get a not very .. code-block:: ipython - In [3]: pd.eval("a = 1 + 2", target=arr, inplace=True) - ... - IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) - and integer or boolean arrays are valid indices + In [3]: pd.eval("a = 1 + 2", target=arr, inplace=True) + ... + IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) + and integer or boolean arrays are valid indices This is a very long way of saying numpy arrays don't support string-item indexing. With this change, the error message is now this: @@ -714,8 +717,8 @@ It also used to be possible to evaluate expressions inplace, even if there was n .. code-block:: ipython - In [4]: pd.eval("1 + 2", target=arr, inplace=True) - Out[4]: 3 + In [4]: pd.eval("1 + 2", target=arr, inplace=True) + Out[4]: 3 However, this input does not make much sense because the output is not being assigned to the target. Now, a ``ValueError`` will be raised when such an input is passed in: @@ -736,7 +739,7 @@ Previously assignments, ``.where()`` and ``.fillna()`` with a ``bool`` assignmen .. ipython:: python - s = Series([1, 2, 3]) + s = pd.Series([1, 2, 3]) .. code-block:: python @@ -819,7 +822,7 @@ Previous Behavior .. ipython:: python - s = Series(['20130101 00:00:00'] * 3) + s = pd.Series(['20130101 00:00:00'] * 3) .. code-block:: ipython @@ -851,11 +854,11 @@ Previous Behavior: .. code-block:: ipython - In [2]: pd.interval_range(start=0, end=4, periods=6) - Out[2]: - IntervalIndex([(0, 1], (1, 2], (2, 3]] - closed='right', - dtype='interval[int64]') + In [2]: pd.interval_range(start=0, end=4, periods=6) + Out[2]: + IntervalIndex([(0, 1], (1, 2], (2, 3]] + closed='right', + dtype='interval[int64]') In [3]: pd.period_range(start='2017Q1', end='2017Q4', periods=6, freq='Q') Out[3]: PeriodIndex(['2017Q1', '2017Q2', '2017Q3', '2017Q4', '2018Q1', '2018Q2'], dtype='period[Q-DEC]', freq='Q-DEC') @@ -878,11 +881,11 @@ Previous Behavior: .. code-block:: ipython - In [4]: pd.interval_range(start=0, end=4) - Out[4]: - IntervalIndex([(0, 1], (1, 2], (2, 3]] - closed='right', - dtype='interval[int64]') + In [4]: pd.interval_range(start=0, end=4) + Out[4]: + IntervalIndex([(0, 1], (1, 2], (2, 3]] + closed='right', + dtype='interval[int64]') New Behavior: @@ -966,7 +969,7 @@ The :meth:`Series.select` and :meth:`DataFrame.select` methods are deprecated in .. ipython:: python - df = DataFrame({'A': [1, 2, 3]}, index=['foo', 'bar', 'baz']) + df = pd.DataFrame({'A': [1, 2, 3]}, index=['foo', 'bar', 'baz']) .. code-block:: ipython diff --git a/setup.cfg b/setup.cfg index fd258e7334ff0..7f92882317927 100644 --- a/setup.cfg +++ b/setup.cfg @@ -68,9 +68,6 @@ exclude = doc/source/whatsnew/v0.17.1.rst doc/source/whatsnew/v0.18.0.rst doc/source/whatsnew/v0.18.1.rst - doc/source/whatsnew/v0.19.0.rst - doc/source/whatsnew/v0.20.0.rst - doc/source/whatsnew/v0.21.0.rst doc/source/whatsnew/v0.23.1.rst doc/source/whatsnew/v0.23.2.rst doc/source/basics.rst