diff --git a/doc/source/whatsnew/v0.10.0.rst b/doc/source/whatsnew/v0.10.0.rst index aa2749c85a232..2f5162dcd4b67 100644 --- a/doc/source/whatsnew/v0.10.0.rst +++ b/doc/source/whatsnew/v0.10.0.rst @@ -45,8 +45,7 @@ want to broadcast, we are phasing out this special case (Zen of Python: *Special cases aren't special enough to break the rules*). Here's what I'm talking about: -.. ipython:: python - :okwarning: +.. code-block:: python import pandas as pd @@ -180,7 +179,7 @@ labeled the aggregated group with the end of the interval: the next day). DataFrame constructor with no columns specified. The v0.9.0 behavior (names ``X0``, ``X1``, ...) can be reproduced by specifying ``prefix='X'``: -.. ipython:: python +.. code-block:: python import io @@ -197,7 +196,7 @@ labeled the aggregated group with the end of the interval: the next day). though this can be controlled by new ``true_values`` and ``false_values`` arguments: -.. ipython:: python +.. code-block:: python print(data) pd.read_csv(io.StringIO(data)) @@ -210,7 +209,7 @@ labeled the aggregated group with the end of the interval: the next day). - Calling ``fillna`` on Series or DataFrame with no arguments is no longer valid code. You must either specify a fill value or an interpolation method: -.. ipython:: python +.. code-block:: python s = pd.Series([np.nan, 1.0, 2.0, np.nan, 4]) s @@ -219,7 +218,7 @@ labeled the aggregated group with the end of the interval: the next day). Convenience methods ``ffill`` and ``bfill`` have been added: -.. ipython:: python +.. code-block:: python s.ffill() @@ -228,7 +227,7 @@ Convenience methods ``ffill`` and ``bfill`` have been added: function, that is itself a series, and possibly upcast the result to a DataFrame - .. ipython:: python + .. code-block:: python def f(x): return pd.Series([x, x ** 2], index=["x", "x^2"]) @@ -249,7 +248,7 @@ Convenience methods ``ffill`` and ``bfill`` have been added: Note: ``set_printoptions``/ ``reset_printoptions`` are now deprecated (but functioning), the print options now live under "display.XYZ". For example: - .. ipython:: python + .. code-block:: python pd.get_option("display.max_rows") @@ -264,7 +263,7 @@ Wide DataFrame printing Instead of printing the summary information, pandas now splits the string representation across multiple rows by default: -.. ipython:: python +.. code-block:: python wide_frame = pd.DataFrame(np.random.randn(5, 16)) @@ -273,14 +272,13 @@ representation across multiple rows by default: The old behavior of printing out summary information can be achieved via the 'expand_frame_repr' print option: -.. ipython:: python +.. code-block:: python pd.set_option("expand_frame_repr", False) wide_frame -.. ipython:: python - :suppress: +.. code-block:: python pd.reset_option("expand_frame_repr") diff --git a/doc/source/whatsnew/v0.10.1.rst b/doc/source/whatsnew/v0.10.1.rst index 611ac2021fcec..1bddfb8a85fd8 100644 --- a/doc/source/whatsnew/v0.10.1.rst +++ b/doc/source/whatsnew/v0.10.1.rst @@ -39,9 +39,7 @@ You may need to upgrade your existing data files. Please visit the **compatibility** section in the main docs. -.. ipython:: python - :suppress: - :okexcept: +.. code-block:: python import os @@ -50,7 +48,7 @@ You may need to upgrade your existing data files. Please visit the You can designate (and index) certain columns that you want to be able to perform queries on a table, by passing a list to ``data_columns`` -.. ipython:: python +.. code-block:: python store = pd.HDFStore("store.h5") df = pd.DataFrame( @@ -82,7 +80,7 @@ Retrieving unique values in an indexable or data column. You can now store ``datetime64`` in data columns -.. ipython:: python +.. code-block:: python df_mixed = df.copy() df_mixed["datetime64"] = pd.Timestamp("20010102") @@ -97,7 +95,7 @@ You can pass ``columns`` keyword to select to filter a list of the return columns, this is equivalent to passing a ``Term('columns',list_of_columns_to_filter)`` -.. ipython:: python +.. code-block:: python store.select("df", columns=["A", "B"]) @@ -160,7 +158,7 @@ Multi-table creation via ``append_to_multiple`` and selection via ``select_as_multiple`` can create/select from multiple tables and return a combined result, by using ``where`` on a selector table. -.. ipython:: python +.. code-block:: python df_mt = pd.DataFrame( np.random.randn(8, 6), @@ -184,8 +182,7 @@ combined result, by using ``where`` on a selector table. ["df1_mt", "df2_mt"], where=["A>0", "B>0"], selector="df1_mt" ) -.. ipython:: python - :suppress: +.. code-block:: python store.close() os.remove("store.h5") diff --git a/doc/source/whatsnew/v0.11.0.rst b/doc/source/whatsnew/v0.11.0.rst index 0fba784e36661..450ec73b411d9 100644 --- a/doc/source/whatsnew/v0.11.0.rst +++ b/doc/source/whatsnew/v0.11.0.rst @@ -72,7 +72,7 @@ Dtypes Numeric dtypes will propagate and can coexist in DataFrames. If a dtype is passed (either directly via the ``dtype`` keyword, a passed ``ndarray``, or a passed ``Series``, then it will be preserved in DataFrame operations. Furthermore, different numeric dtypes will **NOT** be combined. The following example will give you a taste. -.. ipython:: python +.. code-block:: python df1 = pd.DataFrame(np.random.randn(8, 1), columns=['A'], dtype='float32') df1 @@ -93,13 +93,13 @@ Dtype conversion This is lower-common-denominator upcasting, meaning you get the dtype which can accommodate all of the types -.. ipython:: python +.. code-block:: python df3.values.dtype Conversion -.. ipython:: python +.. code-block:: python df3.astype('float32').dtypes @@ -288,7 +288,7 @@ in addition to the traditional ``NaT``, or not-a-time. This allows convenient na Furthermore ``datetime64[ns]`` columns are created by default, when passed datetimelike objects (*this change was introduced in 0.10.1*) (:issue:`2809`, :issue:`2810`) -.. ipython:: python +.. code-block:: python df = pd.DataFrame(np.random.randn(6, 2), pd.date_range('20010102', periods=6), columns=['A', ' B']) @@ -304,7 +304,7 @@ Furthermore ``datetime64[ns]`` columns are created by default, when passed datet Astype conversion on ``datetime64[ns]`` to ``object``, implicitly converts ``NaT`` to ``np.nan`` -.. ipython:: python +.. code-block:: python import datetime s = pd.Series([datetime.datetime(2001, 1, 2, 0, 0) for i in range(3)]) @@ -344,15 +344,13 @@ Enhancements - support ``read_hdf/to_hdf`` API similar to ``read_csv/to_csv`` - .. ipython:: python + .. code-block:: python df = pd.DataFrame({'A': range(5), 'B': range(5)}) df.to_hdf('store.h5', 'table', append=True) pd.read_hdf('store.h5', 'table', where=['index > 2']) - .. ipython:: python - :suppress: - :okexcept: + .. code-block:: python import os @@ -367,8 +365,7 @@ Enhancements - You can now select with a string from a DataFrame with a datelike index, in a similar way to a Series (:issue:`3070`) - .. ipython:: python - :okwarning: + .. code-block:: python idx = pd.date_range("2001-10-1", periods=5, freq='M') ts = pd.Series(np.random.rand(len(idx)), index=idx) diff --git a/doc/source/whatsnew/v0.12.0.rst b/doc/source/whatsnew/v0.12.0.rst index c12adb2f1334f..126d250e9718f 100644 --- a/doc/source/whatsnew/v0.12.0.rst +++ b/doc/source/whatsnew/v0.12.0.rst @@ -45,7 +45,7 @@ API changes ``np.nan`` or ``np.inf`` as appropriate (:issue:`3590`). This correct a numpy bug that treats ``integer`` and ``float`` dtypes differently. - .. ipython:: python + .. code-block:: python p = pd.DataFrame({"first": [4, 5, 8], "second": [0, 0, 3]}) p % 0 @@ -93,7 +93,7 @@ API changes This case is rarely used, and there are plenty of alternatives. This preserves the ``iloc`` API to be *purely* positional based. - .. ipython:: python + .. code-block:: python df = pd.DataFrame(range(5), index=list("ABCDE"), columns=["a"]) mask = df.a % 2 == 0 @@ -200,8 +200,7 @@ IO enhancements You can use ``pd.read_html()`` to read the output from ``DataFrame.to_html()`` like so - .. ipython:: python - :okwarning: + .. code-block:: python df = pd.DataFrame({"a": range(3), "b": list("abc")}) print(df) @@ -248,7 +247,7 @@ IO enhancements with ``df.to_csv(..., index=False``), then any ``names`` on the columns index will be *lost*. - .. ipython:: python + .. code-block:: python from pandas._testing import makeCustomDataframe as mkdf @@ -257,8 +256,7 @@ IO enhancements print(open("mi.csv").read()) pd.read_csv("mi.csv", header=[0, 1, 2, 3], index_col=[0, 1]) - .. ipython:: python - :suppress: + .. code-block:: python import os @@ -307,7 +305,7 @@ Other enhancements For example you can do - .. ipython:: python + .. code-block:: python df = pd.DataFrame({"a": list("ab.."), "b": [1, 2, 3, 4]}) df.replace(regex=r"\s*\.\s*", value=np.nan) @@ -317,7 +315,7 @@ Other enhancements Regular string replacement still works as expected. For example, you can do - .. ipython:: python + .. code-block:: python df.replace(".", np.nan) @@ -351,7 +349,7 @@ Other enhancements object. Suppose we want to take only elements that belong to groups with a group sum greater than 2. - .. ipython:: python + .. code-block:: python sf = pd.Series([1, 1, 2, 3, 3, 3]) sf.groupby(sf).filter(lambda x: x.sum() > 2) @@ -362,7 +360,7 @@ Other enhancements Another useful operation is filtering out elements that belong to groups with only a couple members. - .. ipython:: python + .. code-block:: python dff = pd.DataFrame({"A": np.arange(8), "B": list("aabbbbcc")}) dff.groupby("B").filter(lambda x: len(x) > 2) @@ -371,7 +369,7 @@ Other enhancements like-indexed objects where the groups that do not pass the filter are filled with NaNs. - .. ipython:: python + .. code-block:: python dff.groupby("B").filter(lambda x: len(x) > 2, dropna=False) @@ -398,7 +396,7 @@ Experimental features This uses the ``numpy.busdaycalendar`` API introduced in Numpy 1.7 and therefore requires Numpy 1.7.0 or newer. - .. ipython:: python + .. code-block:: python from pandas.tseries.offsets import CustomBusinessDay from datetime import datetime @@ -433,8 +431,7 @@ Bug fixes a ``Series`` with either a single character at each index of the original ``Series`` or ``NaN``. For example, - .. ipython:: python - :okwarning: + .. code-block:: python strs = "go", "bow", "joe", "slow" ds = pd.Series(strs) diff --git a/doc/source/whatsnew/v0.13.0.rst b/doc/source/whatsnew/v0.13.0.rst index 3c6b70fb21383..3dc96731cbe36 100644 --- a/doc/source/whatsnew/v0.13.0.rst +++ b/doc/source/whatsnew/v0.13.0.rst @@ -153,7 +153,7 @@ API changes Added the ``.bool()`` method to ``NDFrame`` objects to facilitate evaluating of single-element boolean Series: - .. ipython:: python + .. code-block:: python pd.Series([True]).bool() pd.Series([False]).bool() @@ -170,15 +170,14 @@ API changes - Chained assignment will now by default warn if the user is assigning to a copy. This can be changed with the option ``mode.chained_assignment``, allowed options are ``raise/warn/None``. See :ref:`the docs`. - .. ipython:: python + .. code-block:: python dfc = pd.DataFrame({'A': ['aaa', 'bbb', 'ccc'], 'B': [1, 2, 3]}) pd.set_option('chained_assignment', 'warn') The following warning / exception will show if this is attempted. - .. ipython:: python - :okwarning: + .. code-block:: python dfc.loc[0]['A'] = 1111 @@ -192,7 +191,7 @@ API changes Here is the correct method of assignment. - .. ipython:: python + .. code-block:: python dfc.loc[0, 'A'] = 11 dfc @@ -242,14 +241,14 @@ was not contained in the index of a particular axis. (:issue:`2578`). See :ref:` In the ``Series`` case this is effectively an appending operation -.. ipython:: python +.. code-block:: python s = pd.Series([1, 2, 3]) s s[5] = 5. s -.. ipython:: python +.. code-block:: python dfi = pd.DataFrame(np.arange(6).reshape(3, 2), columns=['A', 'B']) @@ -257,14 +256,14 @@ In the ``Series`` case this is effectively an appending operation This would previously ``KeyError`` -.. ipython:: python +.. code-block:: python dfi.loc[:, 'C'] = dfi.loc[:, 'A'] dfi This is like an ``append`` operation. -.. ipython:: python +.. code-block:: python dfi.loc[3] = 5 dfi @@ -314,7 +313,7 @@ Float64Index API change Construction is by default for floating type values. - .. ipython:: python + .. code-block:: python index = pd.Index([1.5, 2, 3, 4.5, 5]) index @@ -323,14 +322,14 @@ Float64Index API change Scalar selection for ``[],.ix,.loc`` will always be label based. An integer will match an equal float index (e.g. ``3`` is equivalent to ``3.0``) - .. ipython:: python + .. code-block:: python s[3] s.loc[3] The only positional indexing is via ``iloc`` - .. ipython:: python + .. code-block:: python s.iloc[3] @@ -338,7 +337,7 @@ Float64Index API change Slicing is ALWAYS on the values of the index, for ``[],ix,loc`` and ALWAYS positional with ``iloc`` - .. ipython:: python + .. code-block:: python s[2:4] s.loc[2:4] @@ -346,7 +345,7 @@ Float64Index API change In float indexes, slicing using floats are allowed - .. ipython:: python + .. code-block:: python s[2.1:4.6] s.loc[2.1:4.6] @@ -374,7 +373,7 @@ HDFStore API changes - Query Format Changes. A much more string-like query format is now supported. See :ref:`the docs`. - .. ipython:: python + .. code-block:: python path = 'test.h5' dfq = pd.DataFrame(np.random.randn(10, 4), @@ -384,20 +383,19 @@ HDFStore API changes Use boolean expressions, with in-line function evaluation. - .. ipython:: python + .. code-block:: python pd.read_hdf(path, 'dfq', where="index>Timestamp('20130104') & columns=['A', 'B']") Use an inline column reference - .. ipython:: python + .. code-block:: python pd.read_hdf(path, 'dfq', where="A>0 or C>0") - .. ipython:: python - :suppress: + .. code-block:: python import os os.remove(path) @@ -406,7 +404,7 @@ HDFStore API changes the same defaults as prior < 0.13.0 remain, e.g. ``put`` implies ``fixed`` format and ``append`` implies ``table`` format. This default format can be set as an option by setting ``io.hdf.default_format``. - .. ipython:: python + .. code-block:: python path = 'test.h5' df = pd.DataFrame(np.random.randn(10, 2)) @@ -416,8 +414,7 @@ HDFStore API changes with pd.HDFStore(path) as store: print(store) - .. ipython:: python - :suppress: + .. code-block:: python import os os.remove(path) @@ -435,7 +432,7 @@ HDFStore API changes until they themselves are closed. Performing an action on a closed file will raise ``ClosedFileError`` - .. ipython:: python + .. code-block:: python path = 'test.h5' df = pd.DataFrame(np.random.randn(10, 2)) @@ -451,8 +448,7 @@ HDFStore API changes store2.close() store2 - .. ipython:: python - :suppress: + .. code-block:: python import os os.remove(path) @@ -500,7 +496,7 @@ Enhancements - ``NaN`` handing in get_dummies (:issue:`4446`) with ``dummy_na`` - .. ipython:: python + .. code-block:: python # previously, nan was erroneously counted as 2 here # now it is not counted at all @@ -519,7 +515,7 @@ Enhancements Using the new top-level ``to_timedelta``, you can convert a scalar or array from the standard timedelta format (produced by ``to_csv``) into a timedelta type (``np.timedelta64`` in ``nanoseconds``). - .. ipython:: python + .. code-block:: python pd.to_timedelta('1 days 06:05:01.00003') pd.to_timedelta('15.5us') @@ -531,7 +527,7 @@ Enhancements ``timedelta64[ns]`` object, or astyped to yield a ``float64`` dtyped Series. This is frequency conversion. See :ref:`the docs` for the docs. - .. ipython:: python + .. code-block:: python import datetime td = pd.Series(pd.date_range('20130101', periods=4)) - pd.Series( @@ -550,28 +546,28 @@ Enhancements Dividing or multiplying a ``timedelta64[ns]`` Series by an integer or integer Series - .. ipython:: python + .. code-block:: python td * -1 td * pd.Series([1, 2, 3, 4]) Absolute ``DateOffset`` objects can act equivalently to ``timedeltas`` - .. ipython:: python + .. code-block:: python from pandas import offsets td + offsets.Minute(5) + offsets.Milli(5) Fillna is now supported for timedeltas - .. ipython:: python + .. code-block:: python td.fillna(pd.Timedelta(0)) td.fillna(datetime.timedelta(days=1, seconds=5)) You can do numeric reduction operations on timedeltas. - .. ipython:: python + .. code-block:: python td.mean() td.quantile(.1) @@ -586,8 +582,7 @@ Enhancements - The new vectorized string method ``extract`` return regular expression matches more conveniently. - .. ipython:: python - :okwarning: + .. code-block:: python pd.Series(['a1', 'b2', 'c3']).str.extract('[ab](\\d)') @@ -595,8 +590,7 @@ Enhancements with more than one group returns a DataFrame with one column per group. - .. ipython:: python - :okwarning: + .. code-block:: python pd.Series(['a1', 'b2', 'c3']).str.extract('([ab])(\\d)') @@ -607,16 +601,14 @@ Enhancements Named groups like - .. ipython:: python - :okwarning: + .. code-block:: python pd.Series(['a1', 'b2', 'c3']).str.extract( '(?P[ab])(?P\\d)') and optional groups can also be used. - .. ipython:: python - :okwarning: + .. code-block:: python pd.Series(['a1', 'b2', '3']).str.extract( '(?P[ab])?(?P\\d)') @@ -636,19 +628,19 @@ Enhancements Period conversions in the range of seconds and below were reworked and extended up to nanoseconds. Periods in the nanosecond range are now available. - .. ipython:: python + .. code-block:: python pd.date_range('2013-01-01', periods=5, freq='5N') or with frequency as offset - .. ipython:: python + .. code-block:: python pd.date_range('2013-01-01', periods=5, freq=pd.offsets.Nano(5)) Timestamps can be modified in the nanosecond range - .. ipython:: python + .. code-block:: python t = pd.Timestamp('20130101 09:01:02') t + pd.tseries.offsets.Nano(123) @@ -657,7 +649,7 @@ Enhancements To get the rows where any of the conditions are met: - .. ipython:: python + .. code-block:: python dfi = pd.DataFrame({'A': [1, 2, 3, 4], 'B': ['a', 'b', 'f', 'n']}) dfi @@ -696,7 +688,7 @@ Enhancements - DataFrame has a new ``interpolate`` method, similar to Series (:issue:`4434`, :issue:`1892`) - .. ipython:: python + .. code-block:: python df = pd.DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8], 'B': [.25, np.nan, np.nan, 4, 12.2, 14.4]}) @@ -711,14 +703,14 @@ Enhancements Interpolate now also accepts a ``limit`` keyword argument. This works similar to ``fillna``'s limit: - .. ipython:: python + .. code-block:: python ser = pd.Series([1, 3, np.nan, np.nan, np.nan, 11]) ser.interpolate(limit=2) - Added ``wide_to_long`` panel data convenience function. See :ref:`the docs`. - .. ipython:: python + .. code-block:: python np.random.seed(123) df = pd.DataFrame({"A1970" : {0 : "a", 1 : "b", 2 : "c"}, @@ -750,18 +742,18 @@ Experimental ``numexpr`` behind the scenes. This results in large speedups for complicated expressions involving large DataFrames/Series. For example, - .. ipython:: python + .. code-block:: python nrows, ncols = 20000, 100 df1, df2, df3, df4 = [pd.DataFrame(np.random.randn(nrows, ncols)) for _ in range(4)] - .. ipython:: python + .. code-block:: python # eval with NumExpr backend %timeit pd.eval('df1 + df2 + df3 + df4') - .. ipython:: python + .. code-block:: python # pure Python evaluation %timeit df1 + df2 + df3 + df4 @@ -772,8 +764,7 @@ Experimental ``DataFrame.eval`` method that evaluates an expression in the context of the ``DataFrame``. For example, - .. ipython:: python - :suppress: + .. code-block:: python try: del a # noqa: F821 @@ -785,7 +776,7 @@ Experimental except NameError: pass - .. ipython:: python + .. code-block:: python df = pd.DataFrame(np.random.randn(10, 2), columns=['a', 'b']) df.eval('a + b') @@ -794,8 +785,7 @@ Experimental you to select elements of a ``DataFrame`` using a natural query syntax nearly identical to Python syntax. For example, - .. ipython:: python - :suppress: + .. code-block:: python try: del a # noqa: F821 @@ -812,7 +802,7 @@ Experimental except NameError: pass - .. ipython:: python + .. code-block:: python n = 20 df = pd.DataFrame(np.random.randint(n, size=(n, 3)), columns=['a', 'b', 'c']) @@ -845,9 +835,7 @@ Experimental for o in pd.read_msgpack('foo.msg', iterator=True): print(o) - .. ipython:: python - :suppress: - :okexcept: + .. code-block:: python os.remove('foo.msg') @@ -931,13 +919,13 @@ to unify methods and behaviors. Series formerly subclassed directly from as an argument. This seems only to affect ``np.ones_like``, ``np.empty_like``, ``np.diff`` and ``np.where``. These now return ``ndarrays``. - .. ipython:: python + .. code-block:: python s = pd.Series([1, 2, 3, 4]) Numpy Usage - .. ipython:: python + .. code-block:: python np.ones_like(s) np.diff(s) @@ -945,7 +933,7 @@ to unify methods and behaviors. Series formerly subclassed directly from Pandonic Usage - .. ipython:: python + .. code-block:: python pd.Series(1, index=s.index) s.diff() @@ -1021,7 +1009,7 @@ to unify methods and behaviors. Series formerly subclassed directly from - Refactor of ``_get_numeric_data/_get_bool_data`` to core/generic.py, allowing Series/Panel functionality - ``Series`` (for index) / ``Panel`` (for items) now allow attribute access to its elements (:issue:`1903`) - .. ipython:: python + .. code-block:: python s = pd.Series([1, 2, 3], index=list('abc')) s.b diff --git a/doc/source/whatsnew/v0.13.1.rst b/doc/source/whatsnew/v0.13.1.rst index 249b9555b7fd4..4ccc9318f9e3f 100644 --- a/doc/source/whatsnew/v0.13.1.rst +++ b/doc/source/whatsnew/v0.13.1.rst @@ -29,7 +29,7 @@ Highlights include: This would previously segfault: - .. ipython:: python + .. code-block:: python df = pd.DataFrame({"A": np.array(["foo", "bar", "bah", "foo", "bar"])}) df["A"].iloc[0] = np.nan @@ -37,7 +37,7 @@ Highlights include: The recommended way to do this type of assignment is: - .. ipython:: python + .. code-block:: python df = pd.DataFrame({"A": np.array(["foo", "bar", "bah", "foo", "bar"])}) df.loc[0, "A"] = np.nan @@ -50,7 +50,7 @@ Output formatting enhancements - df.info() now honors the option ``max_info_rows``, to disable null counts for large frames (:issue:`5974`) - .. ipython:: python + .. code-block:: python max_info_rows = pd.get_option("max_info_rows") @@ -63,13 +63,13 @@ Output formatting enhancements ) df.iloc[3:6, [0, 2]] = np.nan - .. ipython:: python + .. code-block:: python # set to not display the null counts pd.set_option("max_info_rows", 0) df.info() - .. ipython:: python + .. code-block:: python # this is the default (same as in 0.13.0) pd.set_option("max_info_rows", max_info_rows) @@ -77,7 +77,7 @@ Output formatting enhancements - Add ``show_dimensions`` display option for the new DataFrame repr to control whether the dimensions print. - .. ipython:: python + .. code-block:: python df = pd.DataFrame([[1, 2], [3, 4]]) pd.set_option("show_dimensions", False) @@ -99,7 +99,7 @@ Output formatting enhancements Now the output looks like: - .. ipython:: python + .. code-block:: python df = pd.DataFrame( [pd.Timestamp("20010101"), pd.Timestamp("20040601")], columns=["age"] @@ -117,7 +117,7 @@ API changes - Added ``Series.str.get_dummies`` vectorized string method (:issue:`6021`), to extract dummy/indicator variables for separated string columns: - .. ipython:: python + .. code-block:: python s = pd.Series(["a", "a|b", np.nan, "a|c"]) s.str.get_dummies(sep="|") @@ -218,7 +218,7 @@ Enhancements - ``MultiIndex.from_product`` convenience function for creating a MultiIndex from the cartesian product of a set of iterables (:issue:`6055`): - .. ipython:: python + .. code-block:: python shades = ["light", "dark"] colors = ["red", "green", "blue"] diff --git a/doc/source/whatsnew/v0.14.0.rst b/doc/source/whatsnew/v0.14.0.rst index b59938a9b9c9b..278a13f42e559 100644 --- a/doc/source/whatsnew/v0.14.0.rst +++ b/doc/source/whatsnew/v0.14.0.rst @@ -57,7 +57,7 @@ API changes values. A single indexer that is out-of-bounds and drops the dimensions of the object will still raise ``IndexError`` (:issue:`6296`, :issue:`6299`). This could result in an empty axis (e.g. an empty DataFrame being returned) - .. ipython:: python + .. code-block:: python dfl = pd.DataFrame(np.random.randn(5, 2), columns=list('AB')) dfl @@ -113,7 +113,7 @@ API changes as :meth:`Index.delete` and :meth:`Index.drop` methods will no longer change the type of the resulting index (:issue:`6440`, :issue:`7040`) - .. ipython:: python + .. code-block:: python i = pd.Index([1, 2, 3, 'a', 'b', 'c']) i[[0, 1, 2]] @@ -122,15 +122,14 @@ API changes Previously, the above operation would return ``Int64Index``. If you'd like to do this manually, use :meth:`Index.astype` - .. ipython:: python + .. code-block:: python i[[0, 1, 2]].astype(np.int_) - ``set_index`` no longer converts MultiIndexes to an Index of tuples. For example, the old behavior returned an Index in this case (:issue:`6459`): - .. ipython:: python - :suppress: + .. code-block:: python np.random.seed(1234) from itertools import product @@ -140,7 +139,7 @@ API changes tuple_ind = pd.Index(tuples, tupleize_cols=False) df_multi.index - .. ipython:: python + .. code-block:: python # Old behavior, casted MultiIndex to an Index tuple_ind @@ -152,7 +151,7 @@ API changes This also applies when passing multiple indices to ``set_index``: - .. ipython:: python + .. code-block:: python @suppress df_multi.index = tuple_ind @@ -272,7 +271,7 @@ Display changes The default for ``display.show_dimensions`` will now be ``truncate``. This is consistent with how Series display length. - .. ipython:: python + .. code-block:: python dfd = pd.DataFrame(np.arange(25).reshape(-1, 5), index=[0, 1, 2, 3, 4], @@ -328,7 +327,7 @@ More consistent behavior for some groupby methods: - groupby ``head`` and ``tail`` now act more like ``filter`` rather than an aggregation: - .. ipython:: python + .. code-block:: python df = pd.DataFrame([[1, 2], [1, 4], [5, 6]], columns=['A', 'B']) g = df.groupby('A') @@ -338,7 +337,7 @@ More consistent behavior for some groupby methods: - groupby head and tail respect column selection: - .. ipython:: python + .. code-block:: python g[['B']].head(1) @@ -347,7 +346,7 @@ More consistent behavior for some groupby methods: Reducing - .. ipython:: python + .. code-block:: python df = pd.DataFrame([[1, np.nan], [1, 4], [5, 6]], columns=['A', 'B']) g = df.groupby('A') @@ -361,7 +360,7 @@ More consistent behavior for some groupby methods: Filtering - .. ipython:: python + .. code-block:: python gf = df.groupby('A', as_index=False) gf.nth(0) @@ -370,7 +369,7 @@ More consistent behavior for some groupby methods: - groupby will now not return the grouped column for non-cython functions (:issue:`5610`, :issue:`5614`, :issue:`6732`), as its already the index - .. ipython:: python + .. code-block:: python df = pd.DataFrame([[1, np.nan], [1, 4], [5, 6], [5, 8]], columns=['A', 'B']) g = df.groupby('A') @@ -379,7 +378,7 @@ More consistent behavior for some groupby methods: - passing ``as_index`` will leave the grouped column in-place (this is not change in 0.14.0) - .. ipython:: python + .. code-block:: python df = pd.DataFrame([[1, np.nan], [1, 4], [5, 6], [5, 8]], columns=['A', 'B']) g = df.groupby('A', as_index=False) @@ -426,7 +425,7 @@ To connect with SQLAlchemy you use the :func:`create_engine` function to create object from database URI. You only need to create the engine once per database you are connecting to. For an in-memory sqlite database: -.. ipython:: python +.. code-block:: python from sqlalchemy import create_engine # Create your connection. @@ -434,20 +433,20 @@ connecting to. For an in-memory sqlite database: This ``engine`` can then be used to write or read data to/from this database: -.. ipython:: python +.. code-block:: python df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c']}) df.to_sql('db_table', engine, index=False) You can read data from a database by specifying the table name: -.. ipython:: python +.. code-block:: python pd.read_sql_table('db_table', engine) or by specifying a sql query: -.. ipython:: python +.. code-block:: python pd.read_sql_query('SELECT * FROM db_table', engine) @@ -512,7 +511,7 @@ See also issues (:issue:`6134`, :issue:`4036`, :issue:`3057`, :issue:`2598`, :is You will need to make sure that the selection axes are fully lexsorted! -.. ipython:: python +.. code-block:: python def mklbl(prefix, n): return ["%s%s" % (prefix, i) for i in range(n)] @@ -532,13 +531,13 @@ See also issues (:issue:`6134`, :issue:`4036`, :issue:`3057`, :issue:`2598`, :is Basic MultiIndex slicing using slices, lists, and labels. -.. ipython:: python +.. code-block:: python df.loc[(slice('A1', 'A3'), slice(None), ['C1', 'C3']), :] You can use a ``pd.IndexSlice`` to shortcut the creation of these slices -.. ipython:: python +.. code-block:: python idx = pd.IndexSlice df.loc[idx[:, :, ['C1', 'C3']], idx[:, 'foo']] @@ -546,14 +545,14 @@ You can use a ``pd.IndexSlice`` to shortcut the creation of these slices It is possible to perform quite complicated selections using this method on multiple axes at the same time. -.. ipython:: python +.. code-block:: python df.loc['A1', (slice(None), 'foo')] df.loc[idx[:, :, ['C1', 'C3']], idx[:, 'foo']] Using a boolean indexer you can provide selection related to the *values*. -.. ipython:: python +.. code-block:: python mask = df[('a', 'foo')] > 200 df.loc[idx[mask, :, ['C1', 'C3']], idx[:, 'foo']] @@ -561,13 +560,13 @@ Using a boolean indexer you can provide selection related to the *values*. You can also specify the ``axis`` argument to ``.loc`` to interpret the passed slicers on a single axis. -.. ipython:: python +.. code-block:: python df.loc(axis=0)[:, :, ['C1', 'C3']] Furthermore you can *set* the values using these methods -.. ipython:: python +.. code-block:: python df2 = df.copy() df2.loc(axis=0)[:, :, ['C1', 'C3']] = -10 @@ -575,7 +574,7 @@ Furthermore you can *set* the values using these methods You can use a right-hand-side of an alignable object as well. -.. ipython:: python +.. code-block:: python df2 = df.copy() df2.loc[idx[:, :, ['C1', 'C3']], :] = df2 * 1000 @@ -744,7 +743,7 @@ Enhancements - DataFrame and Series will create a MultiIndex object if passed a tuples dict, See :ref:`the docs` (:issue:`3323`) - .. ipython:: python + .. code-block:: python pd.Series({('a', 'b'): 1, ('a', 'a'): 0, ('a', 'c'): 2, ('b', 'a'): 3, ('b', 'b'): 4}) @@ -763,7 +762,7 @@ Enhancements See :ref:`the docs`. Joining MultiIndex DataFrames on both the left and right is not yet supported ATM. - .. ipython:: python + .. code-block:: python household = pd.DataFrame({'household_id': [1, 2, 3], 'male': [0, 1, 0], @@ -822,7 +821,7 @@ Enhancements - :meth:`~DataFrame.describe` now accepts an array of percentiles to include in the summary statistics (:issue:`4196`) - ``pivot_table`` can now accept ``Grouper`` by ``index`` and ``columns`` keywords (:issue:`6913`) - .. ipython:: python + .. code-block:: python import datetime df = pd.DataFrame({ @@ -853,7 +852,7 @@ Enhancements - ``PeriodIndex`` fully supports partial string indexing like ``DatetimeIndex`` (:issue:`7043`) - .. ipython:: python + .. code-block:: python prng = pd.period_range('2013-01-01 09:00', periods=100, freq='H') ps = pd.Series(np.random.randn(len(prng)), index=prng) diff --git a/doc/source/whatsnew/v0.14.1.rst b/doc/source/whatsnew/v0.14.1.rst index a8f8955c3c1b9..eb686bf4bee71 100644 --- a/doc/source/whatsnew/v0.14.1.rst +++ b/doc/source/whatsnew/v0.14.1.rst @@ -64,14 +64,13 @@ API changes Starting from 0.14.1 all offsets preserve time by default. The old behaviour can be obtained with ``normalize=True`` - .. ipython:: python - :suppress: + .. code-block:: python import pandas.tseries.offsets as offsets d = pd.Timestamp("2014-01-01 09:00") - .. ipython:: python + .. code-block:: python # new behaviour d + offsets.MonthEnd() @@ -122,7 +121,7 @@ Enhancements - Support for dateutil timezones, which can now be used in the same way as pytz timezones across pandas. (:issue:`4688`) - .. ipython:: python + .. code-block:: python rng = pd.date_range( "3/6/2012 00:00", periods=10, freq="D", tz="dateutil/Europe/London" diff --git a/doc/source/whatsnew/v0.15.0.rst b/doc/source/whatsnew/v0.15.0.rst index fc2b070df4392..2395c0cc19999 100644 --- a/doc/source/whatsnew/v0.15.0.rst +++ b/doc/source/whatsnew/v0.15.0.rst @@ -69,8 +69,7 @@ methods to manipulate. Thanks to Jan Schulz for much of this API/implementation. For full docs, see the :ref:`categorical introduction ` and the :ref:`API documentation `. -.. ipython:: python - :okwarning: +.. code-block:: python df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6], "raw_grade": ['a', 'b', 'b', 'a', 'a', 'e']}) @@ -146,7 +145,7 @@ This type is very similar to how ``Timestamp`` works for ``datetimes``. It is a Construct a scalar -.. ipython:: python +.. code-block:: python pd.Timedelta('1 days 06:05:01.00003') pd.Timedelta('15.5us') @@ -161,7 +160,7 @@ Construct a scalar Access fields for a ``Timedelta`` -.. ipython:: python +.. code-block:: python td = pd.Timedelta('1 hour 3m 15.5us') td.seconds @@ -170,12 +169,11 @@ Access fields for a ``Timedelta`` Construct a ``TimedeltaIndex`` -.. ipython:: python - :suppress: +.. code-block:: python import datetime -.. ipython:: python +.. code-block:: python pd.TimedeltaIndex(['1 days', '1 days, 00:00:05', np.timedelta64(2, 'D'), @@ -183,14 +181,14 @@ Construct a ``TimedeltaIndex`` Constructing a ``TimedeltaIndex`` with a regular range -.. ipython:: python +.. code-block:: python pd.timedelta_range('1 days', periods=5, freq='D') pd.timedelta_range(start='1 days', end='2 days', freq='30T') You can now use a ``TimedeltaIndex`` as the index of a pandas object -.. ipython:: python +.. code-block:: python s = pd.Series(np.arange(5), index=pd.timedelta_range('1 days', periods=5, freq='s')) @@ -198,14 +196,14 @@ You can now use a ``TimedeltaIndex`` as the index of a pandas object You can select with partial string selections -.. ipython:: python +.. code-block:: python s['1 day 00:00:02'] s['1 day':'1 day 00:00:02'] Finally, the combination of ``TimedeltaIndex`` with ``DatetimeIndex`` allow certain combination operations that are ``NaT`` preserving: -.. ipython:: python +.. code-block:: python tdi = pd.TimedeltaIndex(['1 days', pd.NaT, '2 days']) tdi.tolist() @@ -227,7 +225,7 @@ Implemented methods to find memory usage of a DataFrame. See the :ref:`FAQ ` -.. ipython:: python +.. code-block:: python # datetime s = pd.Series(pd.date_range('20130101 09:10:12', periods=4)) @@ -265,13 +263,13 @@ This will return a Series, indexed like the existing Series. See the :ref:`docs This enables nice expressions like this: -.. ipython:: python +.. code-block:: python s[s.dt.day == 2] You can easily produce tz aware transformations: -.. ipython:: python +.. code-block:: python stz = s.dt.tz_localize('US/Eastern') stz @@ -279,13 +277,13 @@ You can easily produce tz aware transformations: You can also chain these types of operations: -.. ipython:: python +.. code-block:: python s.dt.tz_localize('UTC').dt.tz_convert('US/Eastern') The ``.dt`` accessor works for period and timedelta dtypes. -.. ipython:: python +.. code-block:: python # period s = pd.Series(pd.period_range('20130101', periods=4, freq='D')) @@ -293,7 +291,7 @@ The ``.dt`` accessor works for period and timedelta dtypes. s.dt.year s.dt.day -.. ipython:: python +.. code-block:: python # timedelta s = pd.Series(pd.timedelta_range('1 day 00:00:05', periods=4, freq='s')) @@ -311,7 +309,7 @@ Timezone handling improvements - ``tz_localize(None)`` for tz-aware ``Timestamp`` and ``DatetimeIndex`` now removes timezone holding local time, previously this resulted in ``Exception`` or ``TypeError`` (:issue:`7812`) - .. ipython:: python + .. code-block:: python ts = pd.Timestamp('2014-08-01 09:00', tz='US/Eastern') ts @@ -347,7 +345,7 @@ Rolling/expanding moments improvements Prior to 0.15.0 - .. ipython:: python + .. code-block:: python s = pd.Series([10, 11, 12, 13]) @@ -407,7 +405,7 @@ Rolling/expanding moments improvements the calculated weighted means (e.g. 'triang', 'gaussian') are distributed about the same means as those calculated without weighting (i.e. 'boxcar'). See :ref:`the note on normalization ` for further details. (:issue:`7618`) - .. ipython:: python + .. code-block:: python s = pd.Series([10.5, 8.8, 11.4, 9.7, 9.3]) @@ -456,7 +454,7 @@ Rolling/expanding moments improvements Prior behavior (note values start at index ``2``, which is ``min_periods`` after index ``0`` (the index of the first non-empty value)): - .. ipython:: python + .. code-block:: python s = pd.Series([1, None, None, None, 2, 3]) @@ -551,7 +549,7 @@ Rolling/expanding moments improvements For example, consider the following pre-0.15.0 results for ``ewmvar(..., bias=False)``, and the corresponding debiasing factors: - .. ipython:: python + .. code-block:: python s = pd.Series([1., 2., 0., 4.]) @@ -665,7 +663,7 @@ Other notable API changes: - Consistency when indexing with ``.loc`` and a list-like indexer when no values are found. - .. ipython:: python + .. code-block:: python df = pd.DataFrame([['a'], ['b']], index=[1, 2]) df @@ -727,8 +725,7 @@ Other notable API changes: Furthermore, ``.loc`` will raise If no values are found in a MultiIndex with a list-like indexer: - .. ipython:: python - :okexcept: + .. code-block:: python s = pd.Series(np.arange(3, dtype='int64'), index=pd.MultiIndex.from_product([['A'], @@ -747,7 +744,7 @@ Other notable API changes: dtype to object (or errored, depending on the call). It now uses ``NaN``: - .. ipython:: python + .. code-block:: python s = pd.Series([1, 2, 3]) s.loc[0] = None @@ -758,7 +755,7 @@ Other notable API changes: For object containers, we now preserve ``None`` values (previously these were converted to ``NaN`` values). - .. ipython:: python + .. code-block:: python s = pd.Series(["a", "b", "c"]) s.loc[0] = None @@ -768,7 +765,7 @@ Other notable API changes: - In prior versions, updating a pandas object inplace would not reflect in other python references to this object. (:issue:`8511`, :issue:`5104`) - .. ipython:: python + .. code-block:: python s = pd.Series([1, 2, 3]) s2 = s @@ -798,7 +795,7 @@ Other notable API changes: This is now the correct behavior - .. ipython:: python + .. code-block:: python # the original object s @@ -820,7 +817,7 @@ Other notable API changes: In prior versions this would drop the timezone, now it retains the timezone, but gives a column of ``object`` dtype: - .. ipython:: python + .. code-block:: python i = pd.date_range('1/1/2011', periods=3, freq='10s', tz='US/Eastern') i @@ -841,7 +838,7 @@ Other notable API changes: - ``SettingWithCopy`` raise/warnings (according to the option ``mode.chained_assignment``) will now be issued when setting a value on a sliced mixed-dtype DataFrame using chained-assignment. (:issue:`7845`, :issue:`7950`) - .. code-block:: python + .. code-block:: ipython In [1]: df = pd.DataFrame(np.arange(0, 9), columns=['count']) @@ -859,7 +856,7 @@ Other notable API changes: - Previously an enlargement with a mixed-dtype frame would act unlike ``.append`` which will preserve dtypes (related :issue:`2578`, :issue:`8176`): - .. ipython:: python + .. code-block:: python df = pd.DataFrame([[True, 1], [False, 2]], columns=["female", "fitness"]) @@ -983,7 +980,7 @@ Other: - :func:`describe` on mixed-types DataFrames is more flexible. Type-based column filtering is now possible via the ``include``/``exclude`` arguments. See the :ref:`docs ` (:issue:`8164`). - .. ipython:: python + .. code-block:: python df = pd.DataFrame({'catA': ['foo', 'foo', 'bar'] * 8, 'catB': ['a', 'b', 'c', 'd'] * 6, @@ -994,7 +991,7 @@ Other: Requesting all columns is possible with the shorthand 'all' - .. ipython:: python + .. code-block:: python df.describe(include='all') @@ -1006,7 +1003,7 @@ Other: categorical columns are encoded as 0's and 1's, while other columns are left untouched. - .. ipython:: python + .. code-block:: python df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['c', 'c', 'b'], 'C': [1, 2, 3]}) @@ -1018,7 +1015,7 @@ Other: - ``pandas.tseries.holiday.Holiday`` now supports a days_of_week parameter (:issue:`7070`) - ``GroupBy.nth()`` now supports selecting multiple nth values (:issue:`7910`) - .. ipython:: python + .. code-block:: python business_dates = pd.date_range(start='4/1/2014', end='6/30/2014', freq='B') df = pd.DataFrame(1, index=business_dates, columns=['a', 'b']) @@ -1029,7 +1026,7 @@ Other: If ``Period`` freq is ``D``, ``H``, ``T``, ``S``, ``L``, ``U``, ``N``, ``Timedelta``-like can be added if the result can have same freq. Otherwise, only the same ``offsets`` can be added. - .. ipython:: python + .. code-block:: python idx = pd.period_range('2014-07-01 09:00', periods=5, freq='H') idx @@ -1055,7 +1052,7 @@ Other: - :func:`set_names`, :func:`set_labels`, and :func:`set_levels` methods now take an optional ``level`` keyword argument to all modification of specific level(s) of a MultiIndex. Additionally :func:`set_names` now accepts a scalar string value when operating on an ``Index`` or on a specific level of a ``MultiIndex`` (:issue:`7792`) - .. ipython:: python + .. code-block:: python idx = pd.MultiIndex.from_product([['a'], range(3), list("pqr")], names=['foo', 'bar', 'baz']) @@ -1079,7 +1076,7 @@ Other: - ``Index`` now supports ``duplicated`` and ``drop_duplicates``. (:issue:`4060`) - .. ipython:: python + .. code-block:: python idx = pd.Index([1, 2, 3, 4, 1, 2]) idx diff --git a/doc/source/whatsnew/v0.15.1.rst b/doc/source/whatsnew/v0.15.1.rst index a1d4f9d14a905..6ff3218ce9fb7 100644 --- a/doc/source/whatsnew/v0.15.1.rst +++ b/doc/source/whatsnew/v0.15.1.rst @@ -21,7 +21,7 @@ API changes - ``s.dt.hour`` and other ``.dt`` accessors will now return ``np.nan`` for missing values (rather than previously -1), (:issue:`8689`) - .. ipython:: python + .. code-block:: python s = pd.Series(pd.date_range("20130101", periods=5, freq="D")) s.iloc[2] = np.nan @@ -42,14 +42,14 @@ API changes current behavior: - .. ipython:: python + .. code-block:: python s.dt.hour - ``groupby`` with ``as_index=False`` will not add erroneous extra columns to result (:issue:`8582`): - .. ipython:: python + .. code-block:: python np.random.seed(2718281) df = pd.DataFrame(np.random.randint(0, 100, (10, 2)), columns=["jim", "joe"]) @@ -70,14 +70,14 @@ API changes current behavior: - .. ipython:: python + .. code-block:: python df.groupby(ts, as_index=False).max() - ``groupby`` will not erroneously exclude columns if the column name conflicts with the grouper name (:issue:`8112`): - .. ipython:: python + .. code-block:: python df = pd.DataFrame({"jim": range(5), "joe": range(5, 10)}) df @@ -96,14 +96,14 @@ API changes current behavior: - .. ipython:: python + .. code-block:: python gr.apply(sum) - Support for slicing with monotonic decreasing indexes, even if ``start`` or ``stop`` is not found in the index (:issue:`7860`): - .. ipython:: python + .. code-block:: python s = pd.Series(["a", "b", "c", "d"], [4, 3, 2, 1]) s @@ -117,7 +117,7 @@ API changes current behavior: - .. ipython:: python + .. code-block:: python s.loc[3.5:1.5] @@ -204,7 +204,7 @@ Enhancements - ``concat`` permits a wider variety of iterables of pandas objects to be passed as the first parameter (:issue:`8645`): - .. ipython:: python + .. code-block:: python from collections import deque @@ -220,13 +220,13 @@ Enhancements current behavior: - .. ipython:: python + .. code-block:: python pd.concat(deque((df1, df2))) - Represent ``MultiIndex`` labels with a dtype that utilizes memory based on the level size. In prior versions, the memory usage was a constant 8 bytes per element in each level. In addition, in prior versions, the *reported* memory usage was incorrect as it didn't show the usage for the memory occupied by the underling data array. (:issue:`8456`) - .. ipython:: python + .. code-block:: python dfi = pd.DataFrame( 1, index=pd.MultiIndex.from_product([["a"], range(1000)]), columns=["A"] @@ -246,7 +246,7 @@ Enhancements current behavior: - .. ipython:: python + .. code-block:: python dfi.memory_usage(index=True) diff --git a/doc/source/whatsnew/v0.15.2.rst b/doc/source/whatsnew/v0.15.2.rst index 2dae76dd6b461..b2f77f6e2a286 100644 --- a/doc/source/whatsnew/v0.15.2.rst +++ b/doc/source/whatsnew/v0.15.2.rst @@ -24,8 +24,7 @@ API changes - Indexing in ``MultiIndex`` beyond lex-sort depth is now supported, though a lexically sorted index will have a better performance. (:issue:`2646`) - .. ipython:: python - :okwarning: + .. code-block:: python df = pd.DataFrame({'jim':[0, 0, 1, 1], 'joe':['x', 'x', 'z', 'y'], @@ -61,7 +60,7 @@ API changes Now, only the categories that do effectively occur in the array are returned: - .. ipython:: python + .. code-block:: python cat = pd.Categorical(['a', 'b', 'a'], categories=['a', 'b', 'c']) cat.unique() @@ -72,7 +71,7 @@ API changes - Bug in ``NDFrame``: conflicting attribute/column names now behave consistently between getting and setting. Previously, when both a column and attribute named ``y`` existed, ``data.y`` would return the attribute, while ``data.y = z`` would update the column (:issue:`8994`) - .. ipython:: python + .. code-block:: python data = pd.DataFrame({'x': [1, 2, 3]}) data.y = 2 @@ -94,7 +93,7 @@ API changes New behavior: - .. ipython:: python + .. code-block:: python data.y data['y'].values @@ -121,7 +120,7 @@ API changes New behavior: - .. ipython:: python + .. code-block:: python s = pd.Series(np.arange(3), ['a', 'b', 'c']) s.loc['c':'a':-1] @@ -153,8 +152,7 @@ Other enhancements: - ``Series.all`` and ``Series.any`` now support the ``level`` and ``skipna`` parameters (:issue:`8302`): - .. ipython:: python - :okwarning: + .. code-block:: python s = pd.Series([False, True, False], index=[0, 0, 1]) s.any(level=0) diff --git a/doc/source/whatsnew/v0.16.0.rst b/doc/source/whatsnew/v0.16.0.rst index 8d0d6854cbf85..4c0b10cb315ec 100644 --- a/doc/source/whatsnew/v0.16.0.rst +++ b/doc/source/whatsnew/v0.16.0.rst @@ -51,7 +51,7 @@ to be inserted (for example, a ``Series`` or NumPy array), or a function of one argument to be called on the ``DataFrame``. The new values are inserted, and the entire DataFrame (with all original and new columns) is returned. -.. ipython:: python +.. code-block:: python iris = pd.read_csv('data/iris.data') iris.head() @@ -61,7 +61,7 @@ and the entire DataFrame (with all original and new columns) is returned. Above was an example of inserting a precomputed value. We can also pass in a function to be evaluated. -.. ipython:: python +.. code-block:: python iris.assign(sepal_ratio=lambda x: (x['SepalWidth'] / x['SepalLength'])).head() @@ -70,7 +70,7 @@ The power of ``assign`` comes when used in chains of operations. For example, we can limit the DataFrame to just those with a Sepal Length greater than 5, calculate the ratio, and plot -.. ipython:: python +.. code-block:: python iris = pd.read_csv('data/iris.data') (iris.query('SepalLength > 5') @@ -146,7 +146,7 @@ String methods enhancements ``find()`` ``rfind()`` ``ljust()`` ``rjust()`` ``zfill()`` ============= ============= ============= =============== =============== - .. ipython:: python + .. code-block:: python s = pd.Series(['abcd', '3456', 'EFGH']) s.str.isalpha() @@ -154,14 +154,14 @@ String methods enhancements - :meth:`Series.str.pad` and :meth:`Series.str.center` now accept ``fillchar`` option to specify filling character (:issue:`9352`) - .. ipython:: python + .. code-block:: python s = pd.Series(['12', '300', '25']) s.str.pad(5, fillchar='_') - Added :meth:`Series.str.slice_replace`, which previously raised ``NotImplementedError`` (:issue:`8888`) - .. ipython:: python + .. code-block:: python s = pd.Series(['ABCD', 'EFGH', 'IJK']) s.str.slice_replace(1, 3, 'X') @@ -175,7 +175,7 @@ Other enhancements - Reindex now supports ``method='nearest'`` for frames or series with a monotonic increasing or decreasing index (:issue:`9258`): - .. ipython:: python + .. code-block:: python df = pd.DataFrame({'x': range(5)}) df.reindex([0.2, 1.8, 3.5], method='nearest') @@ -243,7 +243,7 @@ Previous behavior New behavior -.. ipython:: python +.. code-block:: python t = pd.Timedelta('1 day, 10:11:12.100123') t.days @@ -252,7 +252,7 @@ New behavior Using ``.components`` allows the full component access -.. ipython:: python +.. code-block:: python t.components t.components.seconds @@ -266,7 +266,7 @@ The behavior of a small sub-set of edge cases for using ``.loc`` have changed (: - Slicing with ``.loc`` where the start and/or stop bound is not found in the index is now allowed; this previously would raise a ``KeyError``. This makes the behavior the same as ``.ix`` in this case. This change is only for slicing, not when indexing with a single label. - .. ipython:: python + .. code-block:: python df = pd.DataFrame(np.random.randn(5, 4), columns=list('ABCD'), @@ -287,7 +287,7 @@ The behavior of a small sub-set of edge cases for using ``.loc`` have changed (: New behavior - .. ipython:: python + .. code-block:: python df.loc['2013-01-02':'2013-01-10'] s.loc[-10:3] @@ -367,7 +367,7 @@ Previous behavior New behavior -.. ipython:: python +.. code-block:: python s = pd.Series([0, 1, 2], dtype='category') s @@ -491,7 +491,7 @@ Other API changes New behavior - .. ipython:: python + .. code-block:: python p = pd.Series([0, 1]) p / 0 @@ -511,7 +511,7 @@ Other API changes Fixed behavior: - .. ipython:: python + .. code-block:: python pd.to_datetime(['2000-01-31', '2000-02-28']).asof('2000-02') @@ -674,7 +674,7 @@ Bug fixes The following would previously report a ``SettingWithCopy`` Warning. - .. ipython:: python + .. code-block:: python df1 = pd.DataFrame({'x': pd.Series(['a', 'b', 'c']), 'y': pd.Series(['d', 'e', 'f'])}) diff --git a/doc/source/whatsnew/v0.16.1.rst b/doc/source/whatsnew/v0.16.1.rst index 269854111373f..74fbc5da37839 100644 --- a/doc/source/whatsnew/v0.16.1.rst +++ b/doc/source/whatsnew/v0.16.1.rst @@ -181,7 +181,7 @@ total number or rows or columns. It also has options for sampling with or withou for passing in a column for weights for non-uniform sampling, and for setting seed values to facilitate replication. (:issue:`2419`) -.. ipython:: python +.. code-block:: python example_series = pd.Series([0, 1, 2, 3, 4, 5]) @@ -207,7 +207,7 @@ facilitate replication. (:issue:`2419`) When applied to a DataFrame, one may pass the name of a column to specify sampling weights when sampling from rows. -.. ipython:: python +.. code-block:: python df = pd.DataFrame({"col1": [9, 8, 7, 6], "weight_column": [0.5, 0.4, 0.1, 0]}) df.sample(n=3, weights="weight_column") @@ -226,7 +226,7 @@ enhancements make string operations easier and more consistent with standard pyt The ``.str`` accessor is now available for both ``Series`` and ``Index``. - .. ipython:: python + .. code-block:: python idx = pd.Index([" jack", "jill ", " jesse ", "frank"]) idx.str.strip() @@ -235,7 +235,7 @@ enhancements make string operations easier and more consistent with standard pyt will return a ``np.array`` instead of a boolean ``Index`` (:issue:`8875`). This enables the following expression to work naturally: - .. ipython:: python + .. code-block:: python idx = pd.Index(["a1", "a2", "b1", "b2"]) s = pd.Series(range(4), index=idx) @@ -254,7 +254,7 @@ enhancements make string operations easier and more consistent with standard pyt - ``split`` now takes ``expand`` keyword to specify whether to expand dimensionality. ``return_type`` is deprecated. (:issue:`9847`) - .. ipython:: python + .. code-block:: python s = pd.Series(["a,b", "a,c", "b,c"]) @@ -283,7 +283,7 @@ Other enhancements - ``BusinessHour`` offset is now supported, which represents business hours starting from 09:00 - 17:00 on ``BusinessDay`` by default. See :ref:`Here ` for details. (:issue:`7905`) - .. ipython:: python + .. code-block:: python pd.Timestamp("2014-08-01 09:00") + pd.tseries.offsets.BusinessHour() pd.Timestamp("2014-08-01 07:00") + pd.tseries.offsets.BusinessHour() @@ -297,7 +297,7 @@ Other enhancements - ``drop`` function can now accept ``errors`` keyword to suppress ``ValueError`` raised when any of label does not exist in the target data. (:issue:`6736`) - .. ipython:: python + .. code-block:: python df = pd.DataFrame(np.random.randn(3, 3), columns=["A", "B", "C"]) df.drop(["A", "X"], axis=1, errors="ignore") @@ -379,7 +379,7 @@ Previous behavior New behavior -.. ipython:: python +.. code-block:: python pd.set_option("display.width", 80) pd.Index(range(4), name="foo") diff --git a/doc/source/whatsnew/v0.16.2.rst b/doc/source/whatsnew/v0.16.2.rst index 37e8c64ea9ced..514be342c67c5 100644 --- a/doc/source/whatsnew/v0.16.2.rst +++ b/doc/source/whatsnew/v0.16.2.rst @@ -61,7 +61,7 @@ In the example above, the functions ``f``, ``g``, and ``h`` each expected the Da When the function you wish to apply takes its data anywhere other than the first argument, pass a tuple of ``(function, keyword)`` indicating where the DataFrame should flow. For example: -.. ipython:: python +.. code-block:: python import statsmodels.formula.api as sm diff --git a/doc/source/whatsnew/v0.17.0.rst b/doc/source/whatsnew/v0.17.0.rst index d8f39a7d6e3c0..498663b3f4130 100644 --- a/doc/source/whatsnew/v0.17.0.rst +++ b/doc/source/whatsnew/v0.17.0.rst @@ -78,7 +78,7 @@ number rows. See the :ref:`docs ` for more details. The new implementation allows for having a single-timezone across all rows, with operations in a performant manner. -.. ipython:: python +.. code-block:: python df = pd.DataFrame( { @@ -90,14 +90,14 @@ The new implementation allows for having a single-timezone across all rows, with df df.dtypes -.. ipython:: python +.. code-block:: python df.B df.B.dt.tz_localize(None) This uses a new-dtype representation as well, that is very similar in look-and-feel to its numpy cousin ``datetime64[ns]`` -.. ipython:: python +.. code-block:: python df["B"].dtype type(df["B"].dtype) @@ -121,7 +121,7 @@ This uses a new-dtype representation as well, that is very similar in look-and-f New behavior: - .. ipython:: python + .. code-block:: python pd.date_range("20130101", periods=3, tz="US/Eastern") pd.date_range("20130101", periods=3, tz="US/Eastern").dtype @@ -161,8 +161,7 @@ The Series and DataFrame ``.plot()`` method allows for customizing :ref:`plot ty To alleviate this issue, we have added a new, optional plotting interface, which exposes each kind of plot as a method of the ``.plot`` attribute. Instead of writing ``series.plot(kind=, ...)``, you can now also use ``series.plot.(...)``: -.. ipython:: - :verbatim: +.. code-block:: ipython In [13]: df = pd.DataFrame(np.random.rand(10, 2), columns=['a', 'b']) @@ -172,8 +171,7 @@ To alleviate this issue, we have added a new, optional plotting interface, which As a result of this change, these methods are now all discoverable via tab-completion: -.. ipython:: - :verbatim: +.. code-block:: ipython In [15]: df.plot. # noqa: E225, E999 df.plot.area df.plot.barh df.plot.density df.plot.hist df.plot.line df.plot.scatter @@ -191,14 +189,14 @@ Series.dt.strftime We are now supporting a ``Series.dt.strftime`` method for datetime-likes to generate a formatted string (:issue:`10110`). Examples: -.. ipython:: python +.. code-block:: python # DatetimeIndex s = pd.Series(pd.date_range("20130101", periods=4)) s s.dt.strftime("%Y/%m/%d") -.. ipython:: python +.. code-block:: python # PeriodIndex s = pd.Series(pd.period_range("20130101", periods=4)) @@ -212,7 +210,7 @@ Series.dt.total_seconds ``pd.Series`` of type ``timedelta64`` has new method ``.dt.total_seconds()`` returning the duration of the timedelta in seconds (:issue:`10817`) -.. ipython:: python +.. code-block:: python # TimedeltaIndex s = pd.Series(pd.timedelta_range("1 minutes", periods=4)) @@ -228,7 +226,7 @@ Period frequency enhancement A multiplied freq represents a span of corresponding length. The example below creates a period of 3 days. Addition and subtraction will shift the period by its span. -.. ipython:: python +.. code-block:: python p = pd.Period("2015-08-01", freq="3D") p @@ -239,7 +237,7 @@ A multiplied freq represents a span of corresponding length. The example below c You can use the multiplied freq in ``PeriodIndex`` and ``period_range``. -.. ipython:: python +.. code-block:: python idx = pd.period_range("2015-08-01", periods=4, freq="2D") idx @@ -295,7 +293,7 @@ in the ``header`` and ``index_col`` parameters (:issue:`4679`) See the :ref:`documentation ` for more details. -.. ipython:: python +.. code-block:: python df = pd.DataFrame( [[1, 2, 3, 4], [5, 6, 7, 8]], @@ -311,8 +309,7 @@ See the :ref:`documentation ` for more details. df = pd.read_excel("test.xlsx", header=[0, 1], index_col=[0, 1]) df -.. ipython:: python - :suppress: +.. code-block:: python import os @@ -360,14 +357,14 @@ Some East Asian countries use Unicode characters its width is corresponding to 2 - ``display.unicode.east_asian_width``: Whether to use the Unicode East Asian Width to calculate the display text width. (:issue:`2612`) - ``display.unicode.ambiguous_as_wide``: Whether to handle Unicode characters belong to Ambiguous as Wide. (:issue:`11102`) -.. ipython:: python +.. code-block:: python df = pd.DataFrame({u"国籍": ["UK", u"日本"], u"名前": ["Alice", u"しのぶ"]}) df; .. image:: ../_static/option_unicode01.png -.. ipython:: python +.. code-block:: python pd.set_option("display.unicode.east_asian_width", True) df; @@ -376,8 +373,7 @@ Some East Asian countries use Unicode characters its width is corresponding to 2 For further details, see :ref:`here ` -.. ipython:: python - :suppress: +.. code-block:: python pd.set_option("display.unicode.east_asian_width", False) @@ -397,7 +393,7 @@ Other enhancements Merge key in both frames ``both`` =================================== ================ - .. ipython:: python + .. code-block:: python df1 = pd.DataFrame({"col1": [0, 1], "col_left": ["a", "b"]}) df2 = pd.DataFrame({"col1": [1, 2, 2], "col_right": [2, 2, 2]}) @@ -413,7 +409,7 @@ Other enhancements - ``pd.concat`` will now use existing Series names if provided (:issue:`10698`). - .. ipython:: python + .. code-block:: python foo = pd.Series([1, 2], name="foo") bar = pd.Series([1, 2]) @@ -431,7 +427,7 @@ Other enhancements New behavior: - .. ipython:: python + .. code-block:: python pd.concat([foo, bar, baz], 1) @@ -439,14 +435,14 @@ Other enhancements - Add a ``limit_direction`` keyword argument that works with ``limit`` to enable ``interpolate`` to fill ``NaN`` values forward, backward, or both (:issue:`9218`, :issue:`10420`, :issue:`11115`) - .. ipython:: python + .. code-block:: python ser = pd.Series([np.nan, np.nan, 5, np.nan, np.nan, np.nan, 13]) ser.interpolate(limit=1, limit_direction="both") - Added a ``DataFrame.round`` method to round the values to a variable number of decimal places (:issue:`10568`). - .. ipython:: python + .. code-block:: python df = pd.DataFrame( np.random.random([3, 3]), @@ -459,7 +455,7 @@ Other enhancements - ``drop_duplicates`` and ``duplicated`` now accept a ``keep`` keyword to target first, last, and all duplicates. The ``take_last`` keyword is deprecated, see :ref:`here ` (:issue:`6511`, :issue:`8505`) - .. ipython:: python + .. code-block:: python s = pd.Series(["A", "B", "C", "A", "B", "D"]) s.drop_duplicates() @@ -468,14 +464,14 @@ Other enhancements - Reindex now has a ``tolerance`` argument that allows for finer control of :ref:`basics.limits_on_reindex_fill` (:issue:`10411`): - .. ipython:: python + .. code-block:: python df = pd.DataFrame({"x": range(5), "t": pd.date_range("2000-01-01", periods=5)}) df.reindex([0.1, 1.9, 3.5], method="nearest", tolerance=0.2) When used on a ``DatetimeIndex``, ``TimedeltaIndex`` or ``PeriodIndex``, ``tolerance`` will coerced into a ``Timedelta`` if possible. This allows you to specify tolerance with a string: - .. ipython:: python + .. code-block:: python df = df.set_index("t") df.reindex(pd.to_datetime(["1999-12-31"]), method="nearest", tolerance="1 day") @@ -630,13 +626,13 @@ New behavior: Of course you can coerce this as well. -.. ipython:: python +.. code-block:: python pd.to_datetime(["2009-07-31", "asd"], errors="coerce") To keep the previous behavior, you can use ``errors='ignore'``: -.. ipython:: python +.. code-block:: python pd.to_datetime(["2009-07-31", "asd"], errors="ignore") @@ -670,7 +666,7 @@ v0.17.0 can parse them as below. It works on ``DatetimeIndex`` also. New behavior: -.. ipython:: python +.. code-block:: python pd.Timestamp("2012Q2") pd.Timestamp("2014") @@ -680,7 +676,7 @@ New behavior: If you want to perform calculations based on today's date, use ``Timestamp.now()`` and ``pandas.tseries.offsets``. - .. ipython:: python + .. code-block:: python import pandas.tseries.offsets as offsets @@ -724,14 +720,13 @@ New behavior: Note that this is different from the ``numpy`` behavior where a comparison can be broadcast: -.. ipython:: python +.. code-block:: python np.array([1, 2, 3]) == np.array([1]) or it can return False if broadcasting can not be done: -.. ipython:: python - :okwarning: +.. code-block:: python np.array([1, 2, 3]) == np.array([1, 2]) @@ -740,7 +735,7 @@ Changes to boolean comparisons vs. None Boolean comparisons of a ``Series`` vs ``None`` will now be equivalent to comparing with ``np.nan``, rather than raise ``TypeError``. (:issue:`1079`). -.. ipython:: python +.. code-block:: python s = pd.Series(range(3)) s.iloc[1] = None @@ -755,13 +750,13 @@ Previous behavior: New behavior: -.. ipython:: python +.. code-block:: python s == None Usually you simply want to know which values are null. -.. ipython:: python +.. code-block:: python s.isnull() @@ -770,7 +765,7 @@ Usually you simply want to know which values are null. You generally will want to use ``isnull/notnull`` for these types of comparisons, as ``isnull/notnull`` tells you which elements are null. One has to be mindful that ``nan's`` don't compare equal, but ``None's`` do. Note that pandas/numpy uses the fact that ``np.nan != np.nan``, and treats ``None`` like ``np.nan``. - .. ipython:: python + .. code-block:: python None == None np.nan == np.nan @@ -784,7 +779,7 @@ The default behavior for HDFStore write functions with ``format='table'`` is now Previous behavior: -.. ipython:: python +.. code-block:: python df_with_missing = pd.DataFrame( {"col1": [0, np.nan, 2], "col2": [1, np.nan, np.nan]} @@ -811,14 +806,13 @@ Previous behavior: New behavior: -.. ipython:: python +.. code-block:: python df_with_missing.to_hdf("file.h5", "df_with_missing", format="table", mode="w") pd.read_hdf("file.h5", "df_with_missing") -.. ipython:: python - :suppress: +.. code-block:: python import os @@ -851,7 +845,7 @@ did not work for values with standard formatting. It was also out of step with h Going forward the value of ``display.precision`` will directly control the number of places after the decimal, for regular formatting as well as scientific notation, similar to how numpy's ``precision`` print option works. -.. ipython:: python +.. code-block:: python pd.set_option("display.precision", 2) pd.DataFrame({"x": [123.456789]}) @@ -859,8 +853,7 @@ regular formatting as well as scientific notation, similar to how numpy's ``prec To preserve output behavior with prior versions the default value of ``display.precision`` has been reduced to ``6`` from ``7``. -.. ipython:: python - :suppress: +.. code-block:: python pd.set_option("display.precision", 6) @@ -874,7 +867,7 @@ Changes to ``Categorical.unique`` - unordered category: values and categories are sorted by appearance order. - ordered category: values are sorted by appearance order, categories keep existing order. -.. ipython:: python +.. code-block:: python cat = pd.Categorical(["C", "A", "B", "C"], categories=["A", "B", "C"], ordered=True) cat @@ -979,7 +972,7 @@ Removal of prior version deprecations/changes - Removal of ``colSpace`` parameter from ``DataFrame.to_string()``, in favor of ``col_space``, circa 0.8.0 version. - Removal of automatic time-series broadcasting (:issue:`2304`) - .. ipython:: python + .. code-block:: python np.random.seed(1234) df = pd.DataFrame( @@ -1007,7 +1000,7 @@ Removal of prior version deprecations/changes Current - .. ipython:: python + .. code-block:: python df.add(df.A, axis="index") diff --git a/doc/source/whatsnew/v0.17.1.rst b/doc/source/whatsnew/v0.17.1.rst index 6b0a28ec47568..2db4a9c201e95 100644 --- a/doc/source/whatsnew/v0.17.1.rst +++ b/doc/source/whatsnew/v0.17.1.rst @@ -49,7 +49,7 @@ an instance of :class:`~pandas.core.style.Styler` with your data attached. Here's a quick example: - .. ipython:: python + .. code-block:: python np.random.seed(123) df = pd.DataFrame(np.random.randn(10, 5), columns=list("abcde")) @@ -78,7 +78,7 @@ Enhancements - Added ``axvlines_kwds`` to parallel coordinates plot (:issue:`10709`) - Option to ``.info()`` and ``.memory_usage()`` to provide for deep introspection of memory consumption. Note that this can be expensive to compute and therefore is an optional parameter. (:issue:`11595`) - .. ipython:: python + .. code-block:: python df = pd.DataFrame({"A": ["foo"] * 1000}) # noqa: F821 df["B"] = df["A"].astype("category") @@ -91,13 +91,13 @@ Enhancements - ``Index`` now has a ``fillna`` method (:issue:`10089`) - .. ipython:: python + .. code-block:: python pd.Index([1, np.nan, 3]).fillna(2) - Series of type ``category`` now make ``.str.<...>`` and ``.dt.<...>`` accessor methods / properties available, if the categories are of that type. (:issue:`10661`) - .. ipython:: python + .. code-block:: python s = pd.Series(list("aabb")).astype("category") s diff --git a/doc/source/whatsnew/v0.18.0.rst b/doc/source/whatsnew/v0.18.0.rst index 829c04dac9f2d..07a078eee18e5 100644 --- a/doc/source/whatsnew/v0.18.0.rst +++ b/doc/source/whatsnew/v0.18.0.rst @@ -56,7 +56,7 @@ Window functions are now methods Window functions have been refactored to be methods on ``Series/DataFrame`` objects, rather than top-level functions, which are now deprecated. This allows these window-type functions, to have a similar API to that of ``.groupby``. See the full documentation :ref:`here ` (:issue:`11603`, :issue:`12373`) -.. ipython:: python +.. code-block:: python np.random.seed(1234) df = pd.DataFrame({'A': range(10), 'B': np.random.randn(10)}) @@ -84,15 +84,16 @@ Previous behavior: New behavior: -.. ipython:: python +.. code-block:: python r = df.rolling(window=3) These show a descriptive repr -.. ipython:: python +.. code-block:: python r + with tab-completion of available methods and properties. .. code-block:: ipython @@ -103,19 +104,19 @@ with tab-completion of available methods and properties. The methods operate on the ``Rolling`` object itself -.. ipython:: python +.. code-block:: python r.mean() They provide getitem accessors -.. ipython:: python +.. code-block:: python r['A'].mean() And multiple aggregations -.. ipython:: python +.. code-block:: python r.agg({'A': ['mean', 'std'], 'B': ['mean', 'std']}) @@ -128,12 +129,12 @@ Changes to rename ``Series.rename`` and ``NDFrame.rename_axis`` can now take a scalar or list-like argument for altering the Series or axis *name*, in addition to their old behaviors of altering labels. (:issue:`9494`, :issue:`11965`) -.. ipython:: python +.. code-block:: python s = pd.Series(np.random.randn(5)) s.rename('newname') -.. ipython:: python +.. code-block:: python df = pd.DataFrame(np.random.randn(5, 2)) (df.rename_axis("indexname") @@ -170,7 +171,7 @@ Previous behavior: New behavior: -.. ipython:: python +.. code-block:: python s = pd.Series(range(1000)) s.index @@ -209,20 +210,20 @@ Currently the default is ``expand=None`` which gives a ``FutureWarning`` and use Extracting a regular expression with one group returns a Series if ``expand=False``. -.. ipython:: python +.. code-block:: python pd.Series(['a1', 'b2', 'c3']).str.extract(r'[ab](\d)', expand=False) It returns a ``DataFrame`` with one column if ``expand=True``. -.. ipython:: python +.. code-block:: python pd.Series(['a1', 'b2', 'c3']).str.extract(r'[ab](\d)', expand=True) Calling on an ``Index`` with a regex with exactly one capture group returns an ``Index`` if ``expand=False``. -.. ipython:: python +.. code-block:: python s = pd.Series(["a1", "b2", "c3"], ["A11", "B22", "C33"]) s.index @@ -230,7 +231,7 @@ returns an ``Index`` if ``expand=False``. It returns a ``DataFrame`` with one column if ``expand=True``. -.. ipython:: python +.. code-block:: python s.index.str.extract("(?P[a-zA-Z])", expand=True) @@ -244,7 +245,7 @@ raises ``ValueError`` if ``expand=False``. It returns a ``DataFrame`` if ``expand=True``. -.. ipython:: python +.. code-block:: python s.index.str.extract("(?P[a-zA-Z])([0-9]+)", expand=True) @@ -261,7 +262,7 @@ The :ref:`.str.extractall ` method was added (:issue:`11386`). Unlike ``extract``, which returns only the first match. -.. ipython:: python +.. code-block:: python s = pd.Series(["a1a2", "b1", "c1"], ["A", "B", "C"]) s @@ -269,7 +270,7 @@ match. The ``extractall`` method returns all matches. -.. ipython:: python +.. code-block:: python s.str.extractall(r"(?P[ab])(?P\d)") @@ -282,7 +283,7 @@ The method ``.str.cat()`` concatenates the members of a ``Series``. Before, if ` A new, friendlier ``ValueError`` is added to protect against the mistake of supplying the ``sep`` as an arg, rather than as a kwarg. (:issue:`11334`). -.. ipython:: python +.. code-block:: python pd.Series(['a', 'b', np.nan, 'c']).str.cat(sep=' ') pd.Series(['a', 'b', np.nan, 'c']).str.cat(sep=' ', na_rep='?') @@ -302,7 +303,7 @@ Datetimelike rounding Naive datetimes -.. ipython:: python +.. code-block:: python dr = pd.date_range('20130101 09:12:56.1234', periods=3) dr @@ -314,7 +315,7 @@ Naive datetimes Tz-aware are rounded, floored and ceiled in local times -.. ipython:: python +.. code-block:: python dr = dr.tz_localize('US/Eastern') dr @@ -322,7 +323,7 @@ Tz-aware are rounded, floored and ceiled in local times Timedeltas -.. ipython:: python +.. code-block:: python t = pd.timedelta_range('1 days 2 hr 13 min 45 us', periods=3, freq='d') t @@ -335,7 +336,7 @@ Timedeltas In addition, ``.round()``, ``.floor()`` and ``.ceil()`` will be available through the ``.dt`` accessor of ``Series``. -.. ipython:: python +.. code-block:: python s = pd.Series(dr) s @@ -371,7 +372,7 @@ Previous behavior: New behavior: -.. ipython:: python +.. code-block:: python s = pd.Series([1, 2, 3], index=np.arange(3.)) s @@ -408,7 +409,7 @@ Previous behavior: New behavior: -.. ipython:: python +.. code-block:: python df = pd.DataFrame({'a': [0, 1, 1], 'b': pd.Series([100, 200, 300], dtype='uint32')}) @@ -445,7 +446,7 @@ Previous behavior: New behavior: -.. ipython:: python +.. code-block:: python df = pd.DataFrame(np.array(range(1,10)).reshape(3,3), columns=list('abc'), @@ -550,7 +551,7 @@ are now also defined for ``NaT`` (:issue:`11564`). ``NaT`` now supports arithmetic operations with integers and floats. -.. ipython:: python +.. code-block:: python pd.NaT * 1 pd.NaT * 1.5 @@ -559,7 +560,7 @@ are now also defined for ``NaT`` (:issue:`11564`). ``NaT`` defines more arithmetic operations with ``datetime64[ns]`` and ``timedelta64[ns]``. -.. ipython:: python +.. code-block:: python pd.NaT / pd.NaT pd.Timedelta('1s') / pd.NaT @@ -568,7 +569,7 @@ are now also defined for ``NaT`` (:issue:`11564`). Given the ambiguity, it is treated as a ``timedelta64[ns]``, which allows more operations to succeed. -.. ipython:: python +.. code-block:: python pd.NaT + pd.NaT @@ -591,19 +592,19 @@ the ``dtype`` information is respected. TypeError: can only operate on a datetimes for subtraction, but the operator [__add__] was passed -.. ipython:: python +.. code-block:: python pd.Series([pd.NaT], dtype='`, ``.resample(...)`` is changing to have a more groupby-like API. (:issue:`11732`, :issue:`12702`, :issue:`12202`, :issue:`12332`, :issue:`12334`, :issue:`12348`, :issue:`12448`). -.. ipython:: python +.. code-block:: python np.random.seed(1234) df = pd.DataFrame(np.random.rand(10,4), @@ -761,8 +762,7 @@ You could also specify a ``how`` directly Now, you can write ``.resample(..)`` as a 2-stage operation like ``.groupby(...)``, which yields a ``Resampler``. -.. ipython:: python - :okwarning: +.. code-block:: python r = df.resample('2s') r @@ -773,29 +773,29 @@ Downsampling You can then use this object to perform operations. These are downsampling operations (going from a higher frequency to a lower one). -.. ipython:: python +.. code-block:: python r.mean() -.. ipython:: python +.. code-block:: python r.sum() Furthermore, resample now supports ``getitem`` operations to perform the resample on specific columns. -.. ipython:: python +.. code-block:: python r[['A','C']].mean() and ``.aggregate`` type operations. -.. ipython:: python +.. code-block:: python r.agg({'A' : 'mean', 'B' : 'sum'}) These accessors can of course, be combined -.. ipython:: python +.. code-block:: python r[['A','B']].agg(['mean','sum']) @@ -808,7 +808,7 @@ Upsampling operations take you from a lower frequency to a higher frequency. The performed with the ``Resampler`` objects with :meth:`~Resampler.backfill`, :meth:`~Resampler.ffill`, :meth:`~Resampler.fillna` and :meth:`~Resampler.asfreq` methods. -.. ipython:: python +.. code-block:: python s = pd.Series(np.arange(5, dtype='int64'), index=pd.date_range('2010-01-01', periods=5, freq='Q')) @@ -837,7 +837,7 @@ Previously New API -.. ipython:: python +.. code-block:: python s.resample('M').ffill() @@ -891,7 +891,7 @@ Previous API will work but with deprecations The new API will: - .. ipython:: python + .. code-block:: python df.resample('2s').min() @@ -900,7 +900,7 @@ Previous API will work but with deprecations To replicate the original operation - .. ipython:: python + .. code-block:: python df.resample('2s').mean().min() @@ -910,13 +910,12 @@ Changes to eval In prior versions, new columns assignments in an ``eval`` expression resulted in an inplace change to the ``DataFrame``. (:issue:`9297`, :issue:`8664`, :issue:`10486`) -.. ipython:: python +.. code-block:: python df = pd.DataFrame({'a': np.linspace(0, 10, 5), 'b': range(5)}) df -.. ipython:: python - :suppress: +.. code-block:: python df.eval('c = a + b', inplace=True) @@ -938,7 +937,7 @@ in an inplace change to the ``DataFrame``. (:issue:`9297`, :issue:`8664`, :issue In version 0.18.0, a new ``inplace`` keyword was added to choose whether the assignment should be done inplace or return a copy. -.. ipython:: python +.. code-block:: python df df.eval('d = c - b', inplace=False) @@ -954,7 +953,7 @@ assignment should be done inplace or return a copy. The ``inplace`` keyword parameter was also added the ``query`` method. -.. ipython:: python +.. code-block:: python df.query('a > 5') df.query('a > 5', inplace=True) @@ -969,7 +968,7 @@ The ``inplace`` keyword parameter was also added the ``query`` method. assignments. These expressions will be evaluated one at a time in order. Only assignments are valid for multi-line expressions. -.. ipython:: python +.. code-block:: python df df.eval(""" @@ -985,7 +984,7 @@ Other API changes ^^^^^^^^^^^^^^^^^ - ``DataFrame.between_time`` and ``Series.between_time`` now only parse a fixed set of time strings. Parsing of date strings is no longer supported and raises a ``ValueError``. (:issue:`11818`) - .. ipython:: python + .. code-block:: python s = pd.Series(range(10), pd.date_range('2015-01-01', freq='H', periods=10)) s.between_time("7:00am", "9:00am") @@ -1067,7 +1066,7 @@ Removal of deprecated float indexers In :issue:`4892` indexing with floating point numbers on a non-``Float64Index`` was deprecated (in version 0.14.0). In 0.18.0, this deprecation warning is removed and these will now raise a ``TypeError``. (:issue:`12165`, :issue:`12333`) -.. ipython:: python +.. code-block:: python s = pd.Series([1, 2, 3], index=[4, 5, 6]) s @@ -1115,14 +1114,14 @@ For iloc, getting & setting via a float scalar will always raise. Other indexers will coerce to a like integer for both getting and setting. The ``FutureWarning`` has been dropped for ``.loc``, ``.ix`` and ``[]``. -.. ipython:: python +.. code-block:: python s[5.0] s.loc[5.0] and setting -.. ipython:: python +.. code-block:: python s_copy = s.copy() s_copy[5.0] = 10 @@ -1146,19 +1145,19 @@ Positional setting with ``.ix`` and a float indexer will ADD this value to the i Slicing will also coerce integer-like floats to integers for a non-``Float64Index``. -.. ipython:: python +.. code-block:: python s.loc[5.0:6] Note that for floats that are NOT coercible to ints, the label based bounds will be excluded -.. ipython:: python +.. code-block:: python s.loc[5.1:6] Float indexing on a ``Float64Index`` is unchanged. -.. ipython:: python +.. code-block:: python s = pd.Series([1, 2, 3], index=np.arange(3.)) s[1.0] diff --git a/doc/source/whatsnew/v0.18.1.rst b/doc/source/whatsnew/v0.18.1.rst index 3db00f686d62c..0d7783d24ae80 100644 --- a/doc/source/whatsnew/v0.18.1.rst +++ b/doc/source/whatsnew/v0.18.1.rst @@ -38,7 +38,7 @@ The ``CustomBusinessHour`` is a mixture of ``BusinessHour`` and ``CustomBusiness allows you to specify arbitrary holidays. For details, see :ref:`Custom Business Hour ` (:issue:`11514`) -.. ipython:: python +.. code-block:: python from pandas.tseries.offsets import CustomBusinessHour from pandas.tseries.holiday import USFederalHolidayCalendar @@ -47,7 +47,7 @@ see :ref:`Custom Business Hour ` (:issue:`11514`) Friday before MLK Day -.. ipython:: python +.. code-block:: python import datetime @@ -57,7 +57,7 @@ Friday before MLK Day Tuesday after MLK Day (Monday is skipped because it's a holiday) -.. ipython:: python +.. code-block:: python dt + bhour_us * 2 @@ -72,24 +72,24 @@ You can now use ``.rolling(..)`` and ``.expanding(..)`` as methods on groupbys. Previously you would have to do this to get a rolling window mean per-group: -.. ipython:: python +.. code-block:: python df = pd.DataFrame({"A": [1] * 20 + [2] * 12 + [3] * 8, "B": np.arange(40)}) df -.. ipython:: python +.. code-block:: python df.groupby("A").apply(lambda x: x.rolling(4).B.mean()) Now you can do: -.. ipython:: python +.. code-block:: python df.groupby("A").rolling(4).B.mean() For ``.resample(..)`` type of operations, previously you would have to: -.. ipython:: python +.. code-block:: python df = pd.DataFrame( { @@ -101,13 +101,13 @@ For ``.resample(..)`` type of operations, previously you would have to: df -.. ipython:: python +.. code-block:: python df.groupby("group").apply(lambda x: x.resample("1D").ffill()) Now you can do: -.. ipython:: python +.. code-block:: python df.groupby("group").resample("1D").ffill() @@ -130,7 +130,7 @@ Methods ``.where()`` and ``.mask()`` These can accept a callable for the condition and ``other`` arguments. -.. ipython:: python +.. code-block:: python df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6], "C": [7, 8, 9]}) df.where(lambda x: x > 4, lambda x: x + 10) @@ -141,7 +141,7 @@ Methods ``.loc[]``, ``.iloc[]``, ``.ix[]`` These can accept a callable, and a tuple of callable as a slicer. The callable can return a valid boolean indexer or anything which is valid for these indexer's input. -.. ipython:: python +.. code-block:: python # callable returns bool indexer df.loc[lambda x: x.A >= 2, lambda x: x.sum() > 10] @@ -156,14 +156,14 @@ Finally, you can use a callable in ``[]`` indexing of Series, DataFrame and Pane The callable must return a valid input for ``[]`` indexing depending on its class and index type. -.. ipython:: python +.. code-block:: python df[lambda x: "A"] Using these methods / indexers, you can chain data selection operations without using temporary variable. -.. ipython:: python +.. code-block:: python bb = pd.read_csv("data/baseball.csv", index_col="id") (bb.groupby(["year", "team"]).sum().loc[lambda df: df.r > 100]) @@ -175,7 +175,7 @@ Partial string indexing on ``DatetimeIndex`` when part of a ``MultiIndex`` Partial string indexing now matches on ``DateTimeIndex`` when part of a ``MultiIndex`` (:issue:`10331`) -.. ipython:: python +.. code-block:: python dft2 = pd.DataFrame( np.random.randn(20, 1), @@ -189,7 +189,7 @@ Partial string indexing now matches on ``DateTimeIndex`` when part of a ``MultiI On other levels -.. ipython:: python +.. code-block:: python idx = pd.IndexSlice dft2 = dft2.swaplevel(0, 1).sort_index() @@ -203,7 +203,7 @@ Assembling datetimes ``pd.to_datetime()`` has gained the ability to assemble datetimes from a passed in ``DataFrame`` or a dict. (:issue:`8158`). -.. ipython:: python +.. code-block:: python df = pd.DataFrame( {"year": [2015, 2016], "month": [2, 3], "day": [4, 5], "hour": [2, 3]} @@ -212,13 +212,13 @@ Assembling datetimes Assembling using the passed frame. -.. ipython:: python +.. code-block:: python pd.to_datetime(df) You can pass only the columns that you need to assemble. -.. ipython:: python +.. code-block:: python pd.to_datetime(df[["year", "month", "day"]]) @@ -239,7 +239,7 @@ Other enhancements - ``Index.take`` now handles ``allow_fill`` and ``fill_value`` consistently (:issue:`12631`) - .. ipython:: python + .. code-block:: python idx = pd.Index([1.0, 2.0, 3.0, 4.0], dtype="float") @@ -249,7 +249,7 @@ Other enhancements - ``Index`` now supports ``.str.get_dummies()`` which returns ``MultiIndex``, see :ref:`Creating Indicator Variables ` (:issue:`10008`, :issue:`10103`) - .. ipython:: python + .. code-block:: python idx = pd.Index(["a|b", "a|c", "b|c"]) idx.str.get_dummies("|") @@ -309,7 +309,7 @@ Method ``.groupby(..).nth()`` changes The index in ``.groupby(..).nth()`` output is now more consistent when the ``as_index`` argument is passed (:issue:`11039`): -.. ipython:: python +.. code-block:: python df = pd.DataFrame({"A": ["a", "b", "a"], "B": [1, 2, 3]}) df @@ -332,14 +332,14 @@ Previous behavior: New behavior: -.. ipython:: python +.. code-block:: python df.groupby("A", as_index=True)["B"].nth(0) df.groupby("A", as_index=False)["B"].nth(0) Furthermore, previously, a ``.groupby`` would always sort, regardless if ``sort=False`` was passed with ``.nth()``. -.. ipython:: python +.. code-block:: python np.random.seed(1234) df = pd.DataFrame(np.random.randn(100, 2), columns=["a", "b"]) @@ -369,7 +369,7 @@ Previous behavior: New behavior: -.. ipython:: python +.. code-block:: python df.groupby("c", sort=True).nth(1) df.groupby("c", sort=False).nth(1) @@ -416,7 +416,7 @@ Using ``.apply`` on GroupBy resampling Using ``apply`` on resampling groupby operations (using a ``pd.TimeGrouper``) now has the same output types as similar ``apply`` calls on other groupby operations. (:issue:`11742`). -.. ipython:: python +.. code-block:: python df = pd.DataFrame( {"date": pd.to_datetime(["10/10/2000", "11/10/2000"]), "value": [10, 13]} diff --git a/doc/source/whatsnew/v0.19.0.rst b/doc/source/whatsnew/v0.19.0.rst index 340e1ce9ee1ef..29abc32f16bae 100644 --- a/doc/source/whatsnew/v0.19.0.rst +++ b/doc/source/whatsnew/v0.19.0.rst @@ -47,7 +47,7 @@ support asof style joining of time-series (:issue:`1870`, :issue:`13695`, :issue The :func:`merge_asof` performs an asof merge, which is similar to a left-join except that we match on nearest key rather than equal keys. -.. ipython:: python +.. code-block:: python left = pd.DataFrame({"a": [1, 5, 10], "left_val": ["a", "b", "c"]}) right = pd.DataFrame({"a": [1, 2, 3, 6, 7], "right_val": [1, 2, 3, 6, 7]}) @@ -58,13 +58,13 @@ except that we match on nearest key rather than equal keys. We typically want to match exactly when possible, and use the most recent value otherwise. -.. ipython:: python +.. code-block:: python pd.merge_asof(left, right, on="a") We can also match rows ONLY with prior data, and not an exact match. -.. ipython:: python +.. code-block:: python pd.merge_asof(left, right, on="a", allow_exact_matches=False) @@ -72,7 +72,7 @@ We can also match rows ONLY with prior data, and not an exact match. In a typical time-series example, we have ``trades`` and ``quotes`` and we want to ``asof-join`` them. This also illustrates using the ``by`` parameter to group data before merging. -.. ipython:: python +.. code-block:: python trades = pd.DataFrame( { @@ -113,7 +113,7 @@ This also illustrates using the ``by`` parameter to group data before merging. columns=["time", "ticker", "bid", "ask"], ) -.. ipython:: python +.. code-block:: python trades quotes @@ -122,7 +122,7 @@ An asof merge joins on the ``on``, typically a datetimelike field, which is orde in this case we are using a grouper in the ``by`` field. This is like a left-outer join, except that forward filling happens automatically taking the most recent non-NaN value. -.. ipython:: python +.. code-block:: python pd.merge_asof(trades, quotes, on="time", by="ticker") @@ -137,7 +137,7 @@ Method ``.rolling()`` is now time-series aware ``.rolling()`` objects are now time-series aware and can accept a time-series offset (or convertible) for the ``window`` argument (:issue:`13327`, :issue:`12995`). See the full documentation :ref:`here `. -.. ipython:: python +.. code-block:: python dft = pd.DataFrame( {"B": [0, 1, 2, np.nan, 4]}, @@ -147,20 +147,20 @@ See the full documentation :ref:`here `. This is a regular frequency index. Using an integer window parameter works to roll along the window frequency. -.. ipython:: python +.. code-block:: python dft.rolling(2).sum() dft.rolling(2, min_periods=1).sum() Specifying an offset allows a more intuitive specification of the rolling frequency. -.. ipython:: python +.. code-block:: python dft.rolling("2s").sum() Using a non-regular, but still monotonic index, rolling with an integer window does not impart any special calculation. -.. ipython:: python +.. code-block:: python dft = pd.DataFrame( @@ -182,14 +182,14 @@ Using a non-regular, but still monotonic index, rolling with an integer window d Using the time-specification generates variable windows for this sparse data. -.. ipython:: python +.. code-block:: python dft.rolling("2s").sum() Furthermore, we now allow an optional ``on`` parameter to specify a column (rather than the default of the index) in a DataFrame. -.. ipython:: python +.. code-block:: python dft = dft.reset_index() dft @@ -200,15 +200,14 @@ default of the index) in a DataFrame. Method ``read_csv`` has improved support for duplicate column names ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -.. ipython:: python - :suppress: +.. code-block:: python from io import StringIO :ref:`Duplicate column names ` are now supported in :func:`read_csv` whether they are in the file or passed in as the ``names`` parameter (:issue:`7160`, :issue:`9424`) -.. ipython:: python +.. code-block:: python data = "0,1,2\n3,4,5" names = ["a", "b", "a"] @@ -228,8 +227,7 @@ contained the values ``[0, 3]``. **New behavior**: -.. ipython:: python - :okexcept: +.. code-block:: python pd.read_csv(StringIO(data), names=names) @@ -244,7 +242,7 @@ specified as a dtype (:issue:`10153`). Depending on the structure of the data, this can result in a faster parse time and lower memory usage compared to converting to ``Categorical`` after parsing. See the io :ref:`docs here `. -.. ipython:: python +.. code-block:: python data = """ col1,col2,col3 @@ -259,7 +257,7 @@ converting to ``Categorical`` after parsing. See the io :ref:`docs here ` (:issue:`13361`, :issue:`13763`, :issue:`13846`, :issue:`14173`) - .. ipython:: python + .. code-block:: python from pandas.api.types import union_categoricals @@ -295,7 +293,7 @@ Categorical concatenation - ``concat`` and ``append`` now can concat ``category`` dtypes with different ``categories`` as ``object`` dtype (:issue:`13524`) - .. ipython:: python + .. code-block:: python s1 = pd.Series(["a", "b"], dtype="category") s2 = pd.Series(["b", "c"], dtype="category") @@ -309,7 +307,7 @@ Categorical concatenation **New behavior**: -.. ipython:: python +.. code-block:: python pd.concat([s1, s2]) @@ -322,13 +320,13 @@ pandas has gained new frequency offsets, ``SemiMonthEnd`` ('SM') and ``SemiMonth These provide date offsets anchored (by default) to the 15th and end of month, and 15th and 1st of month respectively. (:issue:`1543`) -.. ipython:: python +.. code-block:: python from pandas.tseries.offsets import SemiMonthEnd, SemiMonthBegin **SemiMonthEnd**: -.. ipython:: python +.. code-block:: python pd.Timestamp("2016-01-01") + SemiMonthEnd() @@ -336,7 +334,7 @@ These provide date offsets anchored (by default) to the 15th and end of month, a **SemiMonthBegin**: -.. ipython:: python +.. code-block:: python pd.Timestamp("2016-01-01") + SemiMonthBegin() @@ -344,7 +342,7 @@ These provide date offsets anchored (by default) to the 15th and end of month, a Using the anchoring suffix, you can also specify the day of month to use instead of the 15th. -.. ipython:: python +.. code-block:: python pd.date_range("2015-01-01", freq="SMS-16", periods=4) @@ -359,7 +357,7 @@ The following methods and options are added to ``Index``, to be more consistent ``Index`` now supports the ``.where()`` function for same shape indexing (:issue:`13170`) -.. ipython:: python +.. code-block:: python idx = pd.Index(["a", "b", "c"]) idx.where([True, False, True]) @@ -367,7 +365,7 @@ The following methods and options are added to ``Index``, to be more consistent ``Index`` now supports ``.dropna()`` to exclude missing values (:issue:`6194`) -.. ipython:: python +.. code-block:: python idx = pd.Index([1, 2, np.nan, 4]) idx.dropna() @@ -375,7 +373,7 @@ The following methods and options are added to ``Index``, to be more consistent For ``MultiIndex``, values are dropped if any level is missing by default. Specifying ``how='all'`` only drops values where all levels are missing. -.. ipython:: python +.. code-block:: python midx = pd.MultiIndex.from_arrays([[1, 2, np.nan, 4], [1, 2, np.nan, np.nan]]) midx @@ -384,7 +382,7 @@ For ``MultiIndex``, values are dropped if any level is missing by default. Speci ``Index`` now supports ``.str.extractall()`` which returns a ``DataFrame``, see the :ref:`docs here ` (:issue:`10008`, :issue:`13156`) -.. ipython:: python +.. code-block:: python idx = pd.Index(["a1a2", "b1", "c1"]) idx.str.extractall(r"[ab](?P\d)") @@ -429,7 +427,7 @@ The ``pd.get_dummies`` function now returns dummy-encoded columns as small integ **New behavior**: -.. ipython:: python +.. code-block:: python pd.get_dummies(["a", "b", "a", "c"]).dtypes @@ -441,7 +439,7 @@ Downcast values to smallest possible dtype in ``to_numeric`` ``pd.to_numeric()`` now accepts a ``downcast`` parameter, which will downcast the data if possible to smallest specified numerical dtype (:issue:`13352`) -.. ipython:: python +.. code-block:: python s = ["1", 2, 3] pd.to_numeric(s, downcast="unsigned") @@ -459,7 +457,7 @@ will be published in future versions of pandas (:issue:`13147`, :issue:`13634`) The following are now part of this API: -.. ipython:: python +.. code-block:: python import pprint from pandas.api import types @@ -479,7 +477,7 @@ Other enhancements - ``Timestamp`` can now accept positional and keyword parameters similar to :func:`datetime.datetime` (:issue:`10758`, :issue:`11630`) - .. ipython:: python + .. code-block:: python pd.Timestamp(2012, 1, 1) @@ -487,7 +485,7 @@ Other enhancements - The ``.resample()`` function now accepts a ``on=`` or ``level=`` parameter for resampling on a datetimelike column or ``MultiIndex`` level (:issue:`13500`) - .. ipython:: python + .. code-block:: python df = pd.DataFrame( {"date": pd.date_range("2015-01-01", freq="W", periods=5), "a": np.arange(5)}, @@ -522,7 +520,7 @@ Other enhancements - ``DataFrame`` has gained support to re-order the columns based on the values in a row using ``df.sort_values(by='...', axis=1)`` (:issue:`10806`) - .. ipython:: python + .. code-block:: python df = pd.DataFrame({"A": [2, 7], "B": [3, 5], "C": [4, 8]}, index=["row1", "row2"]) df @@ -556,7 +554,7 @@ API changes ``Series.tolist()`` will now return Python types in the output, mimicking NumPy ``.tolist()`` behavior (:issue:`10904`) -.. ipython:: python +.. code-block:: python s = pd.Series([1, 2, 3]) @@ -570,7 +568,7 @@ API changes **New behavior**: -.. ipython:: python +.. code-block:: python type(s.tolist()[0]) @@ -597,7 +595,7 @@ Arithmetic operators Arithmetic operators align both ``index`` (no changes). -.. ipython:: python +.. code-block:: python s1 = pd.Series([1, 2, 3], index=list("ABC")) s2 = pd.Series([2, 2, 2], index=list("ABD")) @@ -637,13 +635,13 @@ Comparison operators raise ``ValueError`` when ``.index`` are different. To achieve the same result as previous versions (compare values based on locations ignoring ``.index``), compare both ``.values``. - .. ipython:: python + .. code-block:: python s1.values == s2.values If you want to compare ``Series`` aligning its ``.index``, see flexible comparison methods section below: - .. ipython:: python + .. code-block:: python s1.eq(s2) @@ -675,7 +673,7 @@ Logical operators align both ``.index`` of left and right hand side. **New behavior** (``Series``): -.. ipython:: python +.. code-block:: python s1 = pd.Series([True, False, True], index=list("ABC")) s2 = pd.Series([True, True, True], index=list("ABD")) @@ -687,13 +685,13 @@ Logical operators align both ``.index`` of left and right hand side. .. note:: To achieve the same result as previous versions (compare values based on only left hand side index), you can use ``reindex_like``: - .. ipython:: python + .. code-block:: python s1 & s2.reindex_like(s1) **Current behavior** (``DataFrame``, no change): -.. ipython:: python +.. code-block:: python df1 = pd.DataFrame([True, False, True], index=list("ABC")) df2 = pd.DataFrame([True, True, True], index=list("ABD")) @@ -705,7 +703,7 @@ Flexible comparison methods ``Series`` flexible comparison methods like ``eq``, ``ne``, ``le``, ``lt``, ``ge`` and ``gt`` now align both ``index``. Use these operators if you want to compare two ``Series`` which has the different ``index``. -.. ipython:: python +.. code-block:: python s1 = pd.Series([1, 2, 3], index=["a", "b", "c"]) s2 = pd.Series([2, 2, 2], index=["b", "c", "d"]) @@ -722,8 +720,7 @@ Previously, this worked the same as comparison operators (see above). A ``Series`` will now correctly promote its dtype for assignment with incompat values to the current dtype (:issue:`13234`) -.. ipython:: python - :okwarning: +.. code-block:: python s = pd.Series() @@ -738,7 +735,7 @@ A ``Series`` will now correctly promote its dtype for assignment with incompat v **New behavior**: -.. ipython:: python +.. code-block:: python s["a"] = pd.Timestamp("2016-01-01") s["b"] = 3.0 @@ -763,7 +760,7 @@ Previously if ``.to_datetime()`` encountered mixed integers/floats and strings, This will now convert integers/floats with the default unit of ``ns``. -.. ipython:: python +.. code-block:: python pd.to_datetime([1, "foo"], errors="coerce") @@ -782,7 +779,7 @@ Merging changes Merging will now preserve the dtype of the join keys (:issue:`8596`) -.. ipython:: python +.. code-block:: python df1 = pd.DataFrame({"key": [1], "v1": [10]}) df1 @@ -810,7 +807,7 @@ Merging will now preserve the dtype of the join keys (:issue:`8596`) We are able to preserve the join keys -.. ipython:: python +.. code-block:: python pd.merge(df1, df2, how="outer") pd.merge(df1, df2, how="outer").dtypes @@ -818,7 +815,7 @@ We are able to preserve the join keys Of course if you have missing values that are introduced, then the resulting dtype will be upcast, which is unchanged from previous. -.. ipython:: python +.. code-block:: python pd.merge(df1, df2, how="outer", on="key") pd.merge(df1, df2, how="outer", on="key").dtypes @@ -830,7 +827,7 @@ Method ``.describe()`` changes Percentile identifiers in the index of a ``.describe()`` output will now be rounded to the least precision that keeps them distinct (:issue:`13104`) -.. ipython:: python +.. code-block:: python s = pd.Series([0, 1, 2, 3, 4]) df = pd.DataFrame([0, 1, 2, 3, 4]) @@ -864,7 +861,7 @@ The percentiles were rounded to at most one decimal place, which could raise ``V **New behavior**: -.. ipython:: python +.. code-block:: python s.describe(percentiles=[0.0001, 0.0005, 0.001, 0.999, 0.9995, 0.9999]) df.describe(percentiles=[0.0001, 0.0005, 0.001, 0.999, 0.9995, 0.9999]) @@ -903,7 +900,7 @@ As a consequence of this change, ``PeriodIndex`` no longer has an integer dtype: **New behavior**: -.. ipython:: python +.. code-block:: python pi = pd.PeriodIndex(["2016-08-01"], freq="D") pi @@ -930,7 +927,7 @@ Previously, ``Period`` has its own ``Period('NaT')`` representation different fr These result in ``pd.NaT`` without providing ``freq`` option. -.. ipython:: python +.. code-block:: python pd.Period("NaT") pd.Period(None) @@ -948,7 +945,7 @@ To be compatible with ``Period`` addition and subtraction, ``pd.NaT`` now suppor **New behavior**: -.. ipython:: python +.. code-block:: python pd.NaT + 1 pd.NaT - 1 @@ -969,7 +966,7 @@ of integers (:issue:`13988`). **New behavior**: -.. ipython:: python +.. code-block:: python pi = pd.PeriodIndex(["2011-01", "2011-02"], freq="M") pi.values @@ -999,7 +996,7 @@ Previous behavior: **New behavior**: the same operation will now perform element-wise addition: -.. ipython:: python +.. code-block:: python pd.Index(["a", "b"]) + pd.Index(["a", "c"]) @@ -1007,7 +1004,7 @@ Note that numeric Index objects already performed element-wise operations. For example, the behavior of adding two integer Indexes is unchanged. The base ``Index`` is now made consistent with this behavior. -.. ipython:: python +.. code-block:: python pd.Index([1, 2, 3]) + pd.Index([2, 3, 4]) @@ -1025,7 +1022,7 @@ DatetimeIndex objects resulting in a TimedeltaIndex: **New behavior**: -.. ipython:: python +.. code-block:: python ( pd.DatetimeIndex(["2016-01-01", "2016-01-02"]) @@ -1040,7 +1037,7 @@ DatetimeIndex objects resulting in a TimedeltaIndex: ``Index.difference`` and ``Index.symmetric_difference`` will now, more consistently, treat ``NaN`` values as any other values. (:issue:`13514`) -.. ipython:: python +.. code-block:: python idx1 = pd.Index([1, 2, 3, np.nan]) idx2 = pd.Index([0, 1, np.nan]) @@ -1057,7 +1054,7 @@ DatetimeIndex objects resulting in a TimedeltaIndex: **New behavior**: -.. ipython:: python +.. code-block:: python idx1.difference(idx2) idx1.symmetric_difference(idx2) @@ -1088,7 +1085,7 @@ Previously, most ``Index`` classes returned ``np.ndarray``, and ``DatetimeIndex` **New behavior**: -.. ipython:: python +.. code-block:: python pd.Index([1, 2, 3]).unique() pd.DatetimeIndex( @@ -1103,7 +1100,7 @@ Previously, most ``Index`` classes returned ``np.ndarray``, and ``DatetimeIndex` ``MultiIndex.from_arrays`` and ``MultiIndex.from_product`` will now preserve categorical dtype in ``MultiIndex`` levels (:issue:`13743`, :issue:`13854`). -.. ipython:: python +.. code-block:: python cat = pd.Categorical(["a", "b"], categories=list("bac")) lvl1 = ["foo", "bar"] @@ -1122,7 +1119,7 @@ in ``MultiIndex`` levels (:issue:`13743`, :issue:`13854`). **New behavior**: the single level is now a ``CategoricalIndex``: -.. ipython:: python +.. code-block:: python midx.levels[0] midx.get_level_values(0) @@ -1130,7 +1127,7 @@ in ``MultiIndex`` levels (:issue:`13743`, :issue:`13854`). An analogous change has been made to ``MultiIndex.from_product``. As a consequence, ``groupby`` and ``set_index`` also preserve categorical dtypes in indexes -.. ipython:: python +.. code-block:: python df = pd.DataFrame({"A": [0, 1], "B": [10, 11], "C": cat}) df_grouped = df.groupby(by=["A", "C"]).first() @@ -1160,7 +1157,7 @@ As a consequence, ``groupby`` and ``set_index`` also preserve categorical dtypes **New behavior**: -.. ipython:: python +.. code-block:: python df_grouped.index.levels[1] df_grouped.reset_index().dtypes @@ -1180,7 +1177,7 @@ from ``n`` for the second, and so on, so that, when concatenated, they are ident the result of calling :func:`read_csv` without the ``chunksize=`` argument (:issue:`12185`). -.. ipython:: python +.. code-block:: python data = "A,B\n0,1\n2,3\n4,5\n6,7" @@ -1198,7 +1195,7 @@ the result of calling :func:`read_csv` without the ``chunksize=`` argument **New behavior**: -.. ipython:: python +.. code-block:: python pd.concat(pd.read_csv(StringIO(data), chunksize=2)) @@ -1243,8 +1240,7 @@ Previously, sparse data were ``float64`` dtype by default, even if all inputs we As of v0.19.0, sparse data keeps the input dtype, and uses more appropriate ``fill_value`` defaults (``0`` for ``int64`` dtype, ``False`` for ``bool`` dtype). -.. ipython:: python - :okwarning: +.. code-block:: python pd.SparseArray([1, 2, 0, 0], dtype=np.int64) pd.SparseArray([True, False, False, False]) diff --git a/doc/source/whatsnew/v0.19.1.rst b/doc/source/whatsnew/v0.19.1.rst index 6ff3fb6900a99..693819d666a12 100644 --- a/doc/source/whatsnew/v0.19.1.rst +++ b/doc/source/whatsnew/v0.19.1.rst @@ -5,8 +5,7 @@ Version 0.19.1 (November 3, 2016) {{ header }} -.. ipython:: python - :suppress: +.. code-block:: python from pandas import * # noqa F401, F403 diff --git a/doc/source/whatsnew/v0.19.2.rst b/doc/source/whatsnew/v0.19.2.rst index bba89d78be869..a7b80605e6241 100644 --- a/doc/source/whatsnew/v0.19.2.rst +++ b/doc/source/whatsnew/v0.19.2.rst @@ -5,8 +5,7 @@ Version 0.19.2 (December 24, 2016) {{ header }} -.. ipython:: python - :suppress: +.. code-block:: python from pandas import * # noqa F401, F403 diff --git a/doc/source/whatsnew/v0.20.0.rst b/doc/source/whatsnew/v0.20.0.rst index 733995cc718dd..f9c47fd0c83d8 100644 --- a/doc/source/whatsnew/v0.20.0.rst +++ b/doc/source/whatsnew/v0.20.0.rst @@ -57,7 +57,7 @@ is :ref:`here ` (:issue:`1623`). Here is a sample -.. ipython:: python +.. code-block:: python df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'], index=pd.date_range('1/1/2000', periods=10)) @@ -68,13 +68,13 @@ One can operate using string function names, callables, lists, or dictionaries o Using a single function is equivalent to ``.apply``. -.. ipython:: python +.. code-block:: python df.agg('sum') Multiple aggregations with a list of functions. -.. ipython:: python +.. code-block:: python df.agg(['sum', 'min']) @@ -82,21 +82,20 @@ Using a dict provides the ability to apply specific aggregations per column. You will get a matrix-like output of all of the aggregators. The output has one column per unique function. Those functions applied to a particular column will be ``NaN``: -.. ipython:: python +.. code-block:: python df.agg({'A': ['sum', 'min'], 'B': ['min', 'max']}) The API also supports a ``.transform()`` function for broadcasting results. -.. ipython:: python - :okwarning: +.. code-block:: python df.transform(['abs', lambda x: x - x.min()]) When presented with mixed dtypes that cannot be aggregated, ``.agg()`` will only take the valid aggregations. This is similar to how groupby ``.agg()`` works. (:issue:`15015`) -.. ipython:: python +.. code-block:: python df = pd.DataFrame({'A': [1, 2, 3], 'B': [1., 2., 3.], @@ -104,7 +103,7 @@ aggregations. This is similar to how groupby ``.agg()`` works. (:issue:`15015`) 'D': pd.date_range('20130101', periods=3)}) df.dtypes -.. ipython:: python +.. code-block:: python df.agg(['min', 'sum']) @@ -116,12 +115,11 @@ Keyword argument ``dtype`` for data IO The ``'python'`` engine for :func:`read_csv`, as well as the :func:`read_fwf` function for parsing fixed-width text files and :func:`read_excel` for parsing Excel files, now accept the ``dtype`` keyword argument for specifying the types of specific columns (:issue:`14295`). See the :ref:`io docs ` for more information. -.. ipython:: python - :suppress: +.. code-block:: python from io import StringIO -.. ipython:: python +.. code-block:: python data = "a b\n1 2\n3 4" pd.read_fwf(StringIO(data)).dtypes @@ -137,14 +135,14 @@ from where to compute the resulting timestamps when parsing numerical values wit For example, with 1960-01-01 as the starting date: -.. ipython:: python +.. code-block:: python pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp('1960-01-01')) The default is set at ``origin='unix'``, which defaults to ``1970-01-01 00:00:00``, which is commonly called 'unix epoch' or POSIX time. This was the previous default, so this is a backward compatible change. -.. ipython:: python +.. code-block:: python pd.to_datetime([1, 2, 3], unit='D') @@ -156,7 +154,7 @@ GroupBy enhancements Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names. Previously, only column names could be referenced. This allows to easily group by a column and index level at the same time. (:issue:`5677`) -.. ipython:: python +.. code-block:: python arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] @@ -183,7 +181,7 @@ Previously, only ``gzip`` compression was supported. By default, compression of URLs and paths are now inferred using their file extensions. Additionally, support for bz2 compression in the python 2 C-engine improved (:issue:`14874`). -.. ipython:: python +.. code-block:: python url = ('https://github.com/{repo}/raw/{branch}/{path}' .format(repo='pandas-dev/pandas', @@ -205,7 +203,7 @@ can now read from and write to compressed pickle files. Compression methods can be an explicit parameter or be inferred from the file extension. See :ref:`the docs here. ` -.. ipython:: python +.. code-block:: python df = pd.DataFrame({'A': np.random.randn(1000), 'B': 'foo', @@ -213,7 +211,7 @@ See :ref:`the docs here. ` Using an explicit compression type -.. ipython:: python +.. code-block:: python df.to_pickle("data.pkl.compress", compression="gzip") rt = pd.read_pickle("data.pkl.compress", compression="gzip") @@ -221,7 +219,7 @@ Using an explicit compression type The default is to infer the compression type from the extension (``compression='infer'``): -.. ipython:: python +.. code-block:: python df.to_pickle("data.pkl.gz") rt = pd.read_pickle("data.pkl.gz") @@ -230,8 +228,7 @@ The default is to infer the compression type from the extension (``compression=' rt = pd.read_pickle("s1.pkl.bz2") rt.head() -.. ipython:: python - :suppress: +.. code-block:: python import os os.remove("data.pkl.compress") @@ -248,7 +245,7 @@ or purely non-negative, integers. Previously, handling these integers would result in improper rounding or data-type casting, leading to incorrect results. Notably, a new numerical index, ``UInt64Index``, has been created (:issue:`14937`) -.. ipython:: python +.. code-block:: python idx = pd.UInt64Index([1, 2, 3]) df = pd.DataFrame({'A': ['a', 'b', 'c']}, index=idx) @@ -268,7 +265,7 @@ GroupBy on categoricals In previous versions, ``.groupby(..., sort=False)`` would fail with a ``ValueError`` when grouping on a categorical series with some categories not appearing in the data. (:issue:`13179`) -.. ipython:: python +.. code-block:: python chromosomes = np.r_[np.arange(1, 23).astype(str), ['X', 'Y']] df = pd.DataFrame({ @@ -290,7 +287,7 @@ In previous versions, ``.groupby(..., sort=False)`` would fail with a ``ValueErr **New behavior**: -.. ipython:: python +.. code-block:: python df[df.chromosomes != '1'].groupby('chromosomes', sort=False).sum() @@ -303,7 +300,7 @@ The new orient ``'table'`` for :meth:`DataFrame.to_json` will generate a `Table Schema`_ compatible string representation of the data. -.. ipython:: python +.. code-block:: python df = pd.DataFrame( {'A': [1, 2, 3], @@ -363,8 +360,7 @@ Experimental support has been added to export ``DataFrame.style`` formats to Exc For example, after running the following, ``styled.xlsx`` renders as below: -.. ipython:: python - :okwarning: +.. code-block:: python np.random.seed(24) df = pd.DataFrame({'A': np.linspace(1, 10, 10)}) @@ -380,8 +376,7 @@ For example, after running the following, ``styled.xlsx`` renders as below: .. image:: ../_static/style-excel.png -.. ipython:: python - :suppress: +.. code-block:: python import os os.remove('styled.xlsx') @@ -420,7 +415,7 @@ The returned categories were strings, representing Intervals New behavior: -.. ipython:: python +.. code-block:: python c = pd.cut(range(4), bins=2) c @@ -429,13 +424,13 @@ New behavior: Furthermore, this allows one to bin *other* data with these same bins, with ``NaN`` representing a missing value similar to other dtypes. -.. ipython:: python +.. code-block:: python pd.cut([0, 3, 5, 1], bins=c.categories) An ``IntervalIndex`` can also be used in ``Series`` and ``DataFrame`` as the index. -.. ipython:: python +.. code-block:: python df = pd.DataFrame({'A': range(4), 'B': pd.cut([0, 3, 1, 1], bins=c.categories) @@ -444,13 +439,13 @@ An ``IntervalIndex`` can also be used in ``Series`` and ``DataFrame`` as the ind Selecting via a specific interval: -.. ipython:: python +.. code-block:: python df.loc[pd.Interval(1.5, 3.0)] Selecting via a scalar value that is contained *in* the intervals. -.. ipython:: python +.. code-block:: python df.loc[0] @@ -572,7 +567,7 @@ Map on Index types now return other Index types ``map`` on an ``Index`` now returns an ``Index``, not a numpy array (:issue:`12766`) -.. ipython:: python +.. code-block:: python idx = pd.Index([1, 2]) idx @@ -597,7 +592,7 @@ Previous behavior: New behavior: -.. ipython:: python +.. code-block:: python idx.map(lambda x: x * 2) idx.map(lambda x: (x, x * 2)) @@ -609,7 +604,7 @@ New behavior: ``map`` on a ``Series`` with ``datetime64`` values may return ``int64`` dtypes rather than ``int32`` -.. ipython:: python +.. code-block:: python s = pd.Series(pd.date_range('2011-01-02T00:00', '2011-01-02T02:00', freq='H') .tz_localize('Asia/Tokyo')) @@ -628,7 +623,7 @@ Previous behavior: New behavior: -.. ipython:: python +.. code-block:: python s.map(lambda x: x.hour) @@ -654,7 +649,7 @@ Previous behaviour: New behavior: -.. ipython:: python +.. code-block:: python idx = pd.date_range("2015-01-01", periods=5, freq='10H') idx.hour @@ -698,7 +693,7 @@ data-types would yield different return types. These are now made consistent. (: New behavior: - .. ipython:: python + .. code-block:: python # Series, returns an array of Timestamp tz-aware pd.Series([pd.Timestamp(r'20160101', tz=r'US/Eastern'), @@ -728,7 +723,7 @@ data-types would yield different return types. These are now made consistent. (: New behavior: - .. ipython:: python + .. code-block:: python # returns a Categorical pd.Series(list('baabc'), dtype='category').unique() @@ -750,11 +745,12 @@ Partial string indexing changes :ref:`DatetimeIndex Partial String Indexing ` now works as an exact match, provided that string resolution coincides with index resolution, including a case when both are seconds (:issue:`14826`). See :ref:`Slice vs. Exact Match ` for details. -.. ipython:: python +.. code-block:: python df = pd.DataFrame({'a': [1, 2, 3]}, pd.DatetimeIndex(['2011-12-31 23:59:59', '2012-01-01 00:00:00', '2012-01-01 00:00:01'])) + Previous behavior: .. code-block:: ipython @@ -788,7 +784,7 @@ Concat of different float dtypes will not automatically upcast Previously, ``concat`` of multiple objects with different ``float`` dtypes would automatically upcast results to a dtype of ``float64``. Now the smallest acceptable dtype will be used (:issue:`13247`) -.. ipython:: python +.. code-block:: python df1 = pd.DataFrame(np.array([1.0], dtype=np.float32, ndmin=2)) df1.dtypes @@ -807,7 +803,7 @@ Previous behavior: New behavior: -.. ipython:: python +.. code-block:: python pd.concat([df1, df2]).dtypes @@ -867,7 +863,7 @@ This would happen with a ``lexsorted``, but non-monotonic levels. (:issue:`15622 This is *unchanged* from prior versions, but shown for illustration purposes: -.. ipython:: python +.. code-block:: python df = pd.DataFrame(np.arange(6), columns=['value'], index=pd.MultiIndex.from_product([list('BA'), range(3)])) @@ -883,7 +879,7 @@ This is *unchanged* from prior versions, but shown for illustration purposes: Sorting works as expected -.. ipython:: python +.. code-block:: python df.sort_index() @@ -898,7 +894,7 @@ Sorting works as expected However, this example, which has a non-monotonic 2nd level, doesn't behave as desired. -.. ipython:: python +.. code-block:: python df = pd.DataFrame({'value': [1, 2, 3, 4]}, index=pd.MultiIndex([['a', 'b'], ['bb', 'aa']], @@ -989,7 +985,7 @@ Previous behavior: New behavior: -.. ipython:: python +.. code-block:: python df = pd.DataFrame({'A': [1, 1, 2, 2], 'B': [1, 2, 3, 4]}) @@ -1008,7 +1004,7 @@ see :ref:`here `. These are equivale but a MultiIndexed ``DataFrame`` enjoys more support in pandas. See the section on :ref:`Windowed Binary Operations ` for more information. (:issue:`15677`) -.. ipython:: python +.. code-block:: python np.random.seed(1234) df = pd.DataFrame(np.random.rand(100, 2), @@ -1031,14 +1027,14 @@ Previous behavior: New behavior: -.. ipython:: python +.. code-block:: python res = df.rolling(12).corr() res.tail() Retrieving a correlation matrix for a cross-section -.. ipython:: python +.. code-block:: python df.rolling(12).corr().loc['2016-04-07'] @@ -1051,7 +1047,7 @@ In previous versions most types could be compared to string column in a ``HDFSto usually resulting in an invalid comparison, returning an empty result frame. These comparisons will now raise a ``TypeError`` (:issue:`15492`) -.. ipython:: python +.. code-block:: python df = pd.DataFrame({'unparsed_date': ['2014-01-01', '2014-01-01']}) df.to_hdf('store.h5', 'key', format='table', data_columns=True) @@ -1077,8 +1073,7 @@ New behavior: TypeError: Cannot compare 2014-01-01 00:00:00 of type to string column -.. ipython:: python - :suppress: +.. code-block:: python import os os.remove('store.h5') @@ -1094,7 +1089,7 @@ joins, :meth:`DataFrame.join` and :func:`merge`, and the ``.align`` method. - ``Index.intersection`` - .. ipython:: python + .. code-block:: python left = pd.Index([2, 1, 0]) left @@ -1110,13 +1105,13 @@ joins, :meth:`DataFrame.join` and :func:`merge`, and the ``.align`` method. New behavior: - .. ipython:: python + .. code-block:: python left.intersection(right) - ``DataFrame.join`` and ``pd.merge`` - .. ipython:: python + .. code-block:: python left = pd.DataFrame({'a': [20, 10, 0]}, index=[2, 1, 0]) left @@ -1135,7 +1130,7 @@ joins, :meth:`DataFrame.join` and :func:`merge`, and the ``.align`` method. New behavior: - .. ipython:: python + .. code-block:: python left.join(right, how='inner') @@ -1147,7 +1142,7 @@ Pivot table always returns a DataFrame The documentation for :meth:`pivot_table` states that a ``DataFrame`` is *always* returned. Here a bug is fixed that allowed this to return a ``Series`` under certain circumstance. (:issue:`4386`) -.. ipython:: python +.. code-block:: python df = pd.DataFrame({'col1': [3, 4, 5], 'col2': ['C', 'D', 'E'], @@ -1168,7 +1163,7 @@ Previous behavior: New behavior: -.. ipython:: python +.. code-block:: python df.pivot_table('col1', index=['col3', 'col2'], aggfunc=np.sum) @@ -1336,7 +1331,7 @@ The recommended methods of indexing are: Using ``.ix`` will now show a ``DeprecationWarning`` with a link to some examples of how to convert code `here `__. -.. ipython:: python +.. code-block:: python df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, @@ -1356,13 +1351,13 @@ Previous behavior, where you wish to get the 0th and the 2nd elements from the i Using ``.loc``. Here we will select the appropriate indexes from the index, then use *label* indexing. -.. ipython:: python +.. code-block:: python df.loc[df.index[[0, 2]], 'A'] Using ``.iloc``. Here we will get the location of the 'A' column, then use *positional* indexing to select things. -.. ipython:: python +.. code-block:: python df.iloc[[0, 2], df.columns.get_loc('A')] @@ -1455,7 +1450,7 @@ between ``Series`` and ``DataFrame``. We are deprecating this 'renaming' functio This is an illustrative example: -.. ipython:: python +.. code-block:: python df = pd.DataFrame({'A': [1, 1, 1, 2, 2], 'B': range(5), @@ -1466,7 +1461,7 @@ Here is a typical useful syntax for computing different aggregations for differe is a natural, and useful syntax. We aggregate from the dict-to-list by taking the specified columns and applying the list of functions. This returns a ``MultiIndex`` for the columns (this is *not* deprecated). -.. ipython:: python +.. code-block:: python df.groupby('A').agg({'B': 'sum', 'C': 'min'}) @@ -1487,7 +1482,7 @@ is a combination aggregation & renaming: You can accomplish the same operation, more idiomatically by: -.. ipython:: python +.. code-block:: python df.groupby('A').B.agg(['count']).rename(columns={'count': 'foo'}) @@ -1512,7 +1507,7 @@ Here's an example of the second deprecation, passing a dict-of-dict to a grouped You can accomplish nearly the same by: -.. ipython:: python +.. code-block:: python (df.groupby('A') .agg({'B': 'sum', 'C': 'min'}) diff --git a/doc/source/whatsnew/v0.20.2.rst b/doc/source/whatsnew/v0.20.2.rst index 430a39d2d2e97..fd27887e3dd91 100644 --- a/doc/source/whatsnew/v0.20.2.rst +++ b/doc/source/whatsnew/v0.20.2.rst @@ -5,8 +5,7 @@ Version 0.20.2 (June 4, 2017) {{ header }} -.. ipython:: python - :suppress: +.. code-block:: python from pandas import * # noqa F401, F403 diff --git a/doc/source/whatsnew/v0.20.3.rst b/doc/source/whatsnew/v0.20.3.rst index ff28f6830783e..647da61051ed8 100644 --- a/doc/source/whatsnew/v0.20.3.rst +++ b/doc/source/whatsnew/v0.20.3.rst @@ -5,8 +5,7 @@ Version 0.20.3 (July 7, 2017) {{ header }} -.. ipython:: python - :suppress: +.. code-block:: python from pandas import * # noqa F401, F403 diff --git a/doc/source/whatsnew/v0.5.0.rst b/doc/source/whatsnew/v0.5.0.rst index 7447a10fa1d6b..7b2af577a24d5 100644 --- a/doc/source/whatsnew/v0.5.0.rst +++ b/doc/source/whatsnew/v0.5.0.rst @@ -6,8 +6,7 @@ Version 0.5.0 (October 24, 2011) {{ header }} -.. ipython:: python - :suppress: +.. code-block:: python from pandas import * # noqa F401, F403 diff --git a/doc/source/whatsnew/v0.6.0.rst b/doc/source/whatsnew/v0.6.0.rst index 253ca4d4188e5..a3c28a23e07b4 100644 --- a/doc/source/whatsnew/v0.6.0.rst +++ b/doc/source/whatsnew/v0.6.0.rst @@ -5,8 +5,7 @@ Version 0.6.0 (November 25, 2011) {{ header }} -.. ipython:: python - :suppress: +.. code-block:: python from pandas import * # noqa F401, F403 diff --git a/doc/source/whatsnew/v0.7.0.rst b/doc/source/whatsnew/v0.7.0.rst index 2fe686d8858a2..71aeb2070e8d4 100644 --- a/doc/source/whatsnew/v0.7.0.rst +++ b/doc/source/whatsnew/v0.7.0.rst @@ -31,7 +31,7 @@ New features - Handle differently-indexed output values in ``DataFrame.apply`` (:issue:`498`) -.. ipython:: python +.. code-block:: python df = pd.DataFrame(np.random.randn(10, 4)) df.apply(lambda x: x.describe()) @@ -116,7 +116,7 @@ One of the potentially riskiest API changes in 0.7.0, but also one of the most important, was a complete review of how **integer indexes** are handled with regard to label-based indexing. Here is an example: -.. ipython:: python +.. code-block:: python s = pd.Series(np.random.randn(10), index=range(0, 20, 2)) s @@ -235,7 +235,7 @@ slice to a Series when getting and setting values via ``[]`` (i.e. the ``__getitem__`` and ``__setitem__`` methods). The behavior will be the same as passing similar input to ``ix`` **except in the case of integer indexing**: -.. ipython:: python +.. code-block:: python s = pd.Series(np.random.randn(6), index=list('acegkm')) s @@ -246,7 +246,7 @@ passing similar input to ``ix`` **except in the case of integer indexing**: In the case of integer indexes, the behavior will be exactly as before (shadowing ``ndarray``): -.. ipython:: python +.. code-block:: python s = pd.Series(np.random.randn(6), index=range(0, 12, 2)) s[[4, 0, 2]] diff --git a/doc/source/whatsnew/v0.7.3.rst b/doc/source/whatsnew/v0.7.3.rst index 4ca31baf560bb..b52581c53d54b 100644 --- a/doc/source/whatsnew/v0.7.3.rst +++ b/doc/source/whatsnew/v0.7.3.rst @@ -51,7 +51,7 @@ NA boolean comparison API change Reverted some changes to how NA values (represented typically as ``NaN`` or ``None``) are handled in non-numeric Series: -.. ipython:: python +.. code-block:: python series = pd.Series(["Steve", np.nan, "Joe"]) series == "Steve" @@ -62,7 +62,7 @@ In comparisons, NA / NaN will always come through as ``False`` except with negation, in the presence of NA data. You may wish to add an explicit NA filter into boolean array operations if you are worried about this: -.. ipython:: python +.. code-block:: python mask = series == "Steve" series[mask & series.notnull()] @@ -80,8 +80,7 @@ Other API changes When calling ``apply`` on a grouped Series, the return value will also be a Series, to be more consistent with the ``groupby`` behavior with DataFrame: -.. ipython:: python - :okwarning: +.. code-block:: python df = pd.DataFrame( { diff --git a/doc/source/whatsnew/v0.8.0.rst b/doc/source/whatsnew/v0.8.0.rst index 490175914cef1..0ae7347f5ded2 100644 --- a/doc/source/whatsnew/v0.8.0.rst +++ b/doc/source/whatsnew/v0.8.0.rst @@ -204,7 +204,7 @@ have code that converts ``DateRange`` or ``Index`` objects that used to contain ``datetime.datetime`` values to plain NumPy arrays, you may have bugs lurking with code using scalar values because you are handing control over to NumPy: -.. ipython:: python +.. code-block:: python import datetime @@ -225,7 +225,7 @@ If you have code that requires an array of ``datetime.datetime`` objects, you have a couple of options. First, the ``astype(object)`` method of ``DatetimeIndex`` produces an array of ``Timestamp`` objects: -.. ipython:: python +.. code-block:: python stamp_array = rng.astype(object) stamp_array @@ -234,7 +234,7 @@ produces an array of ``Timestamp`` objects: To get an array of proper ``datetime.datetime`` objects, use the ``to_pydatetime`` method: -.. ipython:: python +.. code-block:: python dt_array = rng.to_pydatetime() dt_array @@ -252,7 +252,7 @@ type. See `matplotlib documentation in NumPy 1.6. In particular, the string version of the array shows garbage values, and conversion to ``dtype=object`` is similarly broken. - .. ipython:: python + .. code-block:: python rng = pd.date_range("1/1/2000", periods=10) rng diff --git a/doc/source/whatsnew/v0.9.0.rst b/doc/source/whatsnew/v0.9.0.rst index 44ded51e31fda..d7e5698f1868e 100644 --- a/doc/source/whatsnew/v0.9.0.rst +++ b/doc/source/whatsnew/v0.9.0.rst @@ -37,7 +37,7 @@ API changes functions like ``read_csv`` has changed to be more Pythonic and amenable to attribute access: -.. ipython:: python +.. code-block:: python import io @@ -56,7 +56,7 @@ API changes "by accident" (this was never intended) will lead to all NA Series in some cases. To be perfectly clear: -.. ipython:: python +.. code-block:: python s1 = pd.Series([1, 2, 3]) s1 diff --git a/doc/source/whatsnew/v0.9.1.rst b/doc/source/whatsnew/v0.9.1.rst index 6b05e5bcded7e..03a2e3f408f56 100644 --- a/doc/source/whatsnew/v0.9.1.rst +++ b/doc/source/whatsnew/v0.9.1.rst @@ -38,7 +38,7 @@ New features ``na_option`` parameter so missing values can be assigned either the largest or the smallest rank (:issue:`1508`, :issue:`2159`) - .. ipython:: python + .. code-block:: python df = pd.DataFrame(np.random.randn(6, 3), columns=['A', 'B', 'C']) @@ -57,7 +57,7 @@ New features DataFrame currently supports slicing via a boolean vector the same length as the DataFrame (inside the ``[]``). The returned DataFrame has the same number of columns as the original, but is sliced on its index. - .. ipython:: python + .. code-block:: python df = DataFrame(np.random.randn(5, 3), columns = ['A','B','C']) @@ -70,7 +70,7 @@ New features elements that do not meet the boolean condition as ``NaN``. This is accomplished via the new method ``DataFrame.where``. In addition, ``where`` takes an optional ``other`` argument for replacement. - .. ipython:: python + .. code-block:: python df[df>0] @@ -81,7 +81,7 @@ New features Furthermore, ``where`` now aligns the input boolean condition (ndarray or DataFrame), such that partial selection with setting is possible. This is analogous to partial setting via ``.ix`` (but on the contents rather than the axis labels) - .. ipython:: python + .. code-block:: python df2 = df.copy() df2[ df2[1:4] > 0 ] = 3 @@ -89,13 +89,13 @@ New features ``DataFrame.mask`` is the inverse boolean operation of ``where``. - .. ipython:: python + .. code-block:: python df.mask(df<=0) - Enable referencing of Excel columns by their column names (:issue:`1936`) - .. ipython:: python + .. code-block:: python xl = pd.ExcelFile('data/test.xls') xl.parse('Sheet1', index_col=0, parse_dates=True, @@ -137,7 +137,7 @@ API changes - Period.end_time now returns the last nanosecond in the time interval (:issue:`2124`, :issue:`2125`, :issue:`1764`) - .. ipython:: python + .. code-block:: python p = pd.Period('2012') @@ -147,7 +147,7 @@ API changes - File parsers no longer coerce to float or bool for columns that have custom converters specified (:issue:`2184`) - .. ipython:: python + .. code-block:: python import io