Skip to content

Commit 77eb47b

Browse files
author
Christopher C. Aycock
committed
Merge master branch into GH13936
2 parents 89256f0 + abdfa3e commit 77eb47b

28 files changed

+1157
-349
lines changed

asv_bench/benchmarks/reshape.py

+23-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
from .pandas_vb_common import *
2-
from pandas.core.reshape import melt
2+
from pandas.core.reshape import melt, wide_to_long
33

44

55
class melt_dataframe(object):
@@ -74,3 +74,25 @@ def setup(self):
7474

7575
def time_unstack_sparse_keyspace(self):
7676
self.idf.unstack()
77+
78+
79+
class wide_to_long_big(object):
80+
goal_time = 0.2
81+
82+
def setup(self):
83+
vars = 'ABCD'
84+
nyrs = 20
85+
nidvars = 20
86+
N = 5000
87+
yrvars = []
88+
for var in vars:
89+
for yr in range(1, nyrs + 1):
90+
yrvars.append(var + str(yr))
91+
92+
self.df = pd.DataFrame(np.random.randn(N, nidvars + len(yrvars)),
93+
columns=list(range(nidvars)) + yrvars)
94+
self.vars = vars
95+
96+
def time_wide_to_long_big(self):
97+
self.df['id'] = self.df.index
98+
wide_to_long(self.df, list(self.vars), i='id', j='year')

doc/source/api.rst

+1
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,7 @@ Data manipulations
157157
concat
158158
get_dummies
159159
factorize
160+
wide_to_long
160161

161162
Top-level missing data
162163
~~~~~~~~~~~~~~~~~~~~~~

doc/source/basics.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -486,7 +486,9 @@ standard deviation 1), very concisely:
486486
xs_stand.std(1)
487487
488488
Note that methods like :meth:`~DataFrame.cumsum` and :meth:`~DataFrame.cumprod`
489-
preserve the location of NA values:
489+
preserve the location of ``NaN`` values. This is somewhat different from
490+
:meth:`~DataFrame.expanding` and :meth:`~DataFrame.rolling`.
491+
For more details please see :ref:`this note <stats.moments.expanding.note>`.
490492

491493
.. ipython:: python
492494

doc/source/computation.rst

+28-6
Original file line numberDiff line numberDiff line change
@@ -691,6 +691,8 @@ Method Summary
691691
:meth:`~Expanding.cov`, Unbiased covariance (binary)
692692
:meth:`~Expanding.corr`, Correlation (binary)
693693

694+
.. currentmodule:: pandas
695+
694696
Aside from not having a ``window`` parameter, these functions have the same
695697
interfaces as their ``.rolling`` counterparts. Like above, the parameters they
696698
all accept are:
@@ -700,18 +702,37 @@ all accept are:
700702
``min_periods`` non-null data points have been seen.
701703
- ``center``: boolean, whether to set the labels at the center (default is False)
702704

705+
.. _stats.moments.expanding.note:
703706
.. note::
704707

705708
The output of the ``.rolling`` and ``.expanding`` methods do not return a
706709
``NaN`` if there are at least ``min_periods`` non-null values in the current
707-
window. This differs from ``cumsum``, ``cumprod``, ``cummax``, and
708-
``cummin``, which return ``NaN`` in the output wherever a ``NaN`` is
709-
encountered in the input.
710+
window. For example,
711+
712+
.. ipython:: python
713+
714+
sn = pd.Series([1, 2, np.nan, 3, np.nan, 4])
715+
sn
716+
sn.rolling(2).max()
717+
sn.rolling(2, min_periods=1).max()
718+
719+
In case of expanding functions, this differs from :meth:`~DataFrame.cumsum`,
720+
:meth:`~DataFrame.cumprod`, :meth:`~DataFrame.cummax`,
721+
and :meth:`~DataFrame.cummin`, which return ``NaN`` in the output wherever
722+
a ``NaN`` is encountered in the input. In order to match the output of ``cumsum``
723+
with ``expanding``, use :meth:`~DataFrame.fillna`:
724+
725+
.. ipython:: python
726+
727+
sn.expanding().sum()
728+
sn.cumsum()
729+
sn.cumsum().fillna(method='ffill')
730+
710731
711732
An expanding window statistic will be more stable (and less responsive) than
712733
its rolling window counterpart as the increasing window size decreases the
713734
relative impact of an individual data point. As an example, here is the
714-
:meth:`~Expanding.mean` output for the previous time series dataset:
735+
:meth:`~core.window.Expanding.mean` output for the previous time series dataset:
715736

716737
.. ipython:: python
717738
:suppress:
@@ -731,13 +752,14 @@ relative impact of an individual data point. As an example, here is the
731752
Exponentially Weighted Windows
732753
------------------------------
733754

755+
.. currentmodule:: pandas.core.window
756+
734757
A related set of functions are exponentially weighted versions of several of
735758
the above statistics. A similar interface to ``.rolling`` and ``.expanding`` is accessed
736-
thru the ``.ewm`` method to receive an :class:`~pandas.core.window.EWM` object.
759+
through the ``.ewm`` method to receive an :class:`~EWM` object.
737760
A number of expanding EW (exponentially weighted)
738761
methods are provided:
739762

740-
.. currentmodule:: pandas.core.window
741763

742764
.. csv-table::
743765
:header: "Function", "Description"

doc/source/io.rst

+6
Original file line numberDiff line numberDiff line change
@@ -867,6 +867,12 @@ data columns:
867867
index_col=0) #index is the nominal column
868868
df
869869
870+
.. note::
871+
If a column or index contains an unparseable date, the entire column or
872+
index will be returned unaltered as an object data type. For non-standard
873+
datetime parsing, use :func:`to_datetime` after ``pd.read_csv``.
874+
875+
870876
.. note::
871877
read_csv has a fast_path for parsing datetime strings in iso8601 format,
872878
e.g "2000-01-01T00:01:02+00:00" and similar variations. If you can arrange

doc/source/whatsnew.rst

+2
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ What's New
1818

1919
These are new features and improvements of note in each release.
2020

21+
.. include:: whatsnew/v0.20.0.txt
22+
2123
.. include:: whatsnew/v0.19.2.txt
2224

2325
.. include:: whatsnew/v0.19.1.txt

doc/source/whatsnew/v0.19.1.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -58,4 +58,4 @@ Bug Fixes
5858
- Bug in ``df.groupby`` causing an ``AttributeError`` when grouping a single index frame by a column and the index level (:issue`14327`)
5959
- Bug in ``df.groupby`` where ``TypeError`` raised when ``pd.Grouper(key=...)`` is passed in a list (:issue:`14334`)
6060
- Bug in ``pd.pivot_table`` may raise ``TypeError`` or ``ValueError`` when ``index`` or ``columns``
61-
is not scalar and ``values`` is not specified (:issue:`14380`)
61+
is not scalar and ``values`` is not specified (:issue:`14380`)

doc/source/whatsnew/v0.19.2.txt

+10-8
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,16 @@ Performance Improvements
2323

2424
- Improved performance of ``.replace()`` (:issue:`12745`)
2525

26+
.. _whatsnew_0192.enhancements.other:
27+
28+
Other enhancements
29+
^^^^^^^^^^^^^^^^^^
30+
31+
- ``pd.merge_asof()`` gained ``left_index``/``right_index`` and ``left_by``/``right_by`` arguments (:issue:`14253`)
32+
- ``pd.merge_asof()`` can take multiple columns in ``by`` parameter and has specialized dtypes for better performace (:issue:`13936`)
33+
34+
35+
2636
.. _whatsnew_0192.bug_fixes:
2737

2838
Bug Fixes
@@ -82,11 +92,3 @@ Bug Fixes
8292
- Bug in ``unstack()`` if called with a list of column(s) as an argument, regardless of the dtypes of all columns, they get coerced to ``object`` (:issue:`11847`)
8393

8494

85-
.. _whatsnew_0192.enhancements.other:
86-
87-
Other enhancements
88-
^^^^^^^^^^^^^^^^^^
89-
90-
- ``pd.merge_asof()`` can take multiple columns in ``by`` parameter and has specialized dtypes for better performace (:issue:`13936`)
91-
92-

doc/source/whatsnew/v0.20.0.txt

+6
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,9 @@ Other enhancements
5252

5353
- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)
5454

55+
56+
- Multiple offset aliases with decimal points are now supported (e.g. '0.5min' is parsed as '30s') (:issue:`8419`)
57+
5558
- New ``UnsortedIndexError`` (subclass of ``KeyError``) raised when indexing/slicing into an
5659
unsorted MultiIndex (:issue:`11897`). This allows differentiation between errors due to lack
5760
of sorting or an incorrect key. See :ref:`here <advanced.unsorted>`
@@ -61,6 +64,8 @@ Other enhancements
6164
- The ``usecols`` argument in ``pd.read_csv`` now accepts a callable function as a value (:issue:`14154`)
6265
- ``pd.DataFrame.plot`` now prints a title above each subplot if ``suplots=True`` and ``title`` is a list of strings (:issue:`14753`)
6366
- ``pd.Series.interpolate`` now supports timedelta as an index type with ``method='time'`` (:issue:`6424`)
67+
- ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`)
68+
6469

6570
.. _whatsnew_0200.api_breaking:
6671

@@ -111,6 +116,7 @@ Removal of prior version deprecations/changes
111116
Performance Improvements
112117
~~~~~~~~~~~~~~~~~~~~~~~~
113118

119+
- Improved performance of ``pd.wide_to_long()`` (:issue:`14779`)
114120

115121

116122

pandas/core/generic.py

+24-10
Original file line numberDiff line numberDiff line change
@@ -3354,12 +3354,16 @@ def fillna(self, value=None, method=None, axis=None, inplace=False,
33543354
return self._constructor(new_data).__finalize__(self)
33553355

33563356
def ffill(self, axis=None, inplace=False, limit=None, downcast=None):
3357-
"""Synonym for NDFrame.fillna(method='ffill')"""
3357+
"""
3358+
Synonym for :meth:`DataFrame.fillna(method='ffill') <DataFrame.fillna>`
3359+
"""
33583360
return self.fillna(method='ffill', axis=axis, inplace=inplace,
33593361
limit=limit, downcast=downcast)
33603362

33613363
def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
3362-
"""Synonym for NDFrame.fillna(method='bfill')"""
3364+
"""
3365+
Synonym for :meth:`DataFrame.fillna(method='bfill') <DataFrame.fillna>`
3366+
"""
33633367
return self.fillna(method='bfill', axis=axis, inplace=inplace,
33643368
limit=limit, downcast=downcast)
33653369

@@ -5477,16 +5481,18 @@ def compound(self, axis=None, skipna=None, level=None):
54775481

54785482
cls.cummin = _make_cum_function(
54795483
cls, 'cummin', name, name2, axis_descr, "cumulative minimum",
5480-
lambda y, axis: np.minimum.accumulate(y, axis), np.inf, np.nan)
5484+
lambda y, axis: np.minimum.accumulate(y, axis), "min",
5485+
np.inf, np.nan)
54815486
cls.cumsum = _make_cum_function(
54825487
cls, 'cumsum', name, name2, axis_descr, "cumulative sum",
5483-
lambda y, axis: y.cumsum(axis), 0., np.nan)
5488+
lambda y, axis: y.cumsum(axis), "sum", 0., np.nan)
54845489
cls.cumprod = _make_cum_function(
54855490
cls, 'cumprod', name, name2, axis_descr, "cumulative product",
5486-
lambda y, axis: y.cumprod(axis), 1., np.nan)
5491+
lambda y, axis: y.cumprod(axis), "prod", 1., np.nan)
54875492
cls.cummax = _make_cum_function(
54885493
cls, 'cummax', name, name2, axis_descr, "cumulative max",
5489-
lambda y, axis: np.maximum.accumulate(y, axis), -np.inf, np.nan)
5494+
lambda y, axis: np.maximum.accumulate(y, axis), "max",
5495+
-np.inf, np.nan)
54905496

54915497
cls.sum = _make_stat_function(
54925498
cls, 'sum', name, name2, axis_descr,
@@ -5674,7 +5680,15 @@ def _doc_parms(cls):
56745680
56755681
Returns
56765682
-------
5677-
%(outname)s : %(name1)s\n"""
5683+
%(outname)s : %(name1)s\n
5684+
5685+
5686+
See also
5687+
--------
5688+
pandas.core.window.Expanding.%(accum_func_name)s : Similar functionality
5689+
but ignores ``NaN`` values.
5690+
5691+
"""
56785692

56795693

56805694
def _make_stat_function(cls, name, name1, name2, axis_descr, desc, f):
@@ -5717,10 +5731,10 @@ def stat_func(self, axis=None, skipna=None, level=None, ddof=1,
57175731
return set_function_name(stat_func, name, cls)
57185732

57195733

5720-
def _make_cum_function(cls, name, name1, name2, axis_descr, desc, accum_func,
5721-
mask_a, mask_b):
5734+
def _make_cum_function(cls, name, name1, name2, axis_descr, desc,
5735+
accum_func, accum_func_name, mask_a, mask_b):
57225736
@Substitution(outname=name, desc=desc, name1=name1, name2=name2,
5723-
axis_descr=axis_descr)
5737+
axis_descr=axis_descr, accum_func_name=accum_func_name)
57245738
@Appender("Return {0} over requested axis.".format(desc) +
57255739
_cnum_doc)
57265740
def cum_func(self, axis=None, skipna=True, *args, **kwargs):

0 commit comments

Comments
 (0)