Skip to content

Commit 4ebd97b

Browse files
committed
Merge branch 'master' of https://github.com/pandas-dev/pandas into doc-decorator
2 parents 4c9dc28 + c8db9b9 commit 4ebd97b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

77 files changed

+538
-359
lines changed

asv_bench/benchmarks/frame_methods.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -564,7 +564,7 @@ def setup(self):
564564

565565
def time_frame_get_dtype_counts(self):
566566
with warnings.catch_warnings(record=True):
567-
self.df._data.get_dtype_counts()
567+
self.df.dtypes.value_counts()
568568

569569
def time_info(self):
570570
self.df.info()

doc/source/getting_started/intro_tutorials/10_text_data.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ names in the ``Name`` column. By using pandas string methods, the
199199
200200
Next, we need to get the corresponding location, preferably the index
201201
label, in the table for which the name length is the largest. The
202-
:meth:`~Series.idxmax`` method does exactly that. It is not a string method and is
202+
:meth:`~Series.idxmax` method does exactly that. It is not a string method and is
203203
applied to integers, so no ``str`` is used.
204204

205205
.. ipython:: python

doc/source/user_guide/computation.rst

+31-4
Original file line numberDiff line numberDiff line change
@@ -312,15 +312,35 @@ We provide a number of common statistical functions:
312312
:meth:`~Rolling.median`, Arithmetic median of values
313313
:meth:`~Rolling.min`, Minimum
314314
:meth:`~Rolling.max`, Maximum
315-
:meth:`~Rolling.std`, Bessel-corrected sample standard deviation
316-
:meth:`~Rolling.var`, Unbiased variance
315+
:meth:`~Rolling.std`, Sample standard deviation
316+
:meth:`~Rolling.var`, Sample variance
317317
:meth:`~Rolling.skew`, Sample skewness (3rd moment)
318318
:meth:`~Rolling.kurt`, Sample kurtosis (4th moment)
319319
:meth:`~Rolling.quantile`, Sample quantile (value at %)
320320
:meth:`~Rolling.apply`, Generic apply
321321
:meth:`~Rolling.cov`, Unbiased covariance (binary)
322322
:meth:`~Rolling.corr`, Correlation (binary)
323323

324+
.. _computation.window_variance.caveats:
325+
326+
.. note::
327+
328+
Please note that :meth:`~Rolling.std` and :meth:`~Rolling.var` use the sample
329+
variance formula by default, i.e. the sum of squared differences is divided by
330+
``window_size - 1`` and not by ``window_size`` during averaging. In statistics,
331+
we use sample when the dataset is drawn from a larger population that we
332+
don't have access to. Using it implies that the data in our window is a
333+
random sample from the population, and we are interested not in the variance
334+
inside the specific window but in the variance of some general window that
335+
our windows represent. In this situation, using the sample variance formula
336+
results in an unbiased estimator and so is preferred.
337+
338+
Usually, we are instead interested in the variance of each window as we slide
339+
it over the data, and in this case we should specify ``ddof=0`` when calling
340+
these methods to use population variance instead of sample variance. Using
341+
sample variance under the circumstances would result in a biased estimator
342+
of the variable we are trying to determine.
343+
324344
.. _stats.rolling_apply:
325345

326346
Rolling apply
@@ -848,15 +868,22 @@ Method summary
848868
:meth:`~Expanding.median`, Arithmetic median of values
849869
:meth:`~Expanding.min`, Minimum
850870
:meth:`~Expanding.max`, Maximum
851-
:meth:`~Expanding.std`, Unbiased standard deviation
852-
:meth:`~Expanding.var`, Unbiased variance
871+
:meth:`~Expanding.std`, Sample standard deviation
872+
:meth:`~Expanding.var`, Sample variance
853873
:meth:`~Expanding.skew`, Unbiased skewness (3rd moment)
854874
:meth:`~Expanding.kurt`, Unbiased kurtosis (4th moment)
855875
:meth:`~Expanding.quantile`, Sample quantile (value at %)
856876
:meth:`~Expanding.apply`, Generic apply
857877
:meth:`~Expanding.cov`, Unbiased covariance (binary)
858878
:meth:`~Expanding.corr`, Correlation (binary)
859879

880+
.. note::
881+
882+
Using sample variance formulas for :meth:`~Expanding.std` and
883+
:meth:`~Expanding.var` comes with the same caveats as using them with rolling
884+
windows. See :ref:`this section <computation.window_variance.caveats>` for more
885+
information.
886+
860887
.. currentmodule:: pandas
861888

862889
Aside from not having a ``window`` parameter, these functions have the same

doc/source/whatsnew/v0.14.0.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0140:
22

3-
v0.14.0 (May 31 , 2014)
4-
-----------------------
3+
Version 0.14.0 (May 31 , 2014)
4+
------------------------------
55

66
{{ header }}
77

@@ -321,7 +321,7 @@ Text parsing API changes
321321

322322
.. _whatsnew_0140.groupby:
323323

324-
Groupby API changes
324+
GroupBy API changes
325325
~~~~~~~~~~~~~~~~~~~
326326

327327
More consistent behavior for some groupby methods:
@@ -473,8 +473,8 @@ Some other enhancements to the sql functions include:
473473

474474
.. _whatsnew_0140.slicers:
475475

476-
Multiindexing using slicers
477-
~~~~~~~~~~~~~~~~~~~~~~~~~~~
476+
Multi-indexing using slicers
477+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
478478

479479
In 0.14.0 we added a new way to slice MultiIndexed objects.
480480
You can slice a MultiIndex by providing multiple indexers.

doc/source/whatsnew/v0.14.1.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0141:
22

3-
v0.14.1 (July 11, 2014)
4-
-----------------------
3+
Version 0.14.1 (July 11, 2014)
4+
------------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v0.15.0.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0150:
22

3-
v0.15.0 (October 18, 2014)
4-
--------------------------
3+
Version 0.15.0 (October 18, 2014)
4+
---------------------------------
55

66
{{ header }}
77

@@ -105,7 +105,7 @@ For full docs, see the :ref:`categorical introduction <categorical>` and the
105105

106106
.. _whatsnew_0150.timedeltaindex:
107107

108-
TimedeltaIndex/Scalar
108+
TimedeltaIndex/scalar
109109
^^^^^^^^^^^^^^^^^^^^^
110110

111111
We introduce a new scalar type ``Timedelta``, which is a subclass of ``datetime.timedelta``, and behaves in a similar manner,
@@ -247,8 +247,8 @@ Additionally :meth:`~pandas.DataFrame.memory_usage` is an available method for a
247247
248248
.. _whatsnew_0150.dt:
249249

250-
.dt accessor
251-
^^^^^^^^^^^^
250+
Series.dt accessor
251+
^^^^^^^^^^^^^^^^^^
252252

253253
``Series`` has gained an accessor to succinctly return datetime like properties for the *values* of the Series, if its a datetime/period like Series. (:issue:`7207`)
254254
This will return a Series, indexed like the existing Series. See the :ref:`docs <basics.dt_accessors>`
@@ -600,7 +600,7 @@ Rolling/expanding moments improvements
600600

601601
.. _whatsnew_0150.sql:
602602

603-
Improvements in the SQL io module
603+
Improvements in the SQL IO module
604604
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
605605

606606
- Added support for a ``chunksize`` parameter to ``to_sql`` function. This allows DataFrame to be written in chunks and avoid packet-size overflow errors (:issue:`8062`).

doc/source/whatsnew/v0.15.1.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0151:
22

3-
v0.15.1 (November 9, 2014)
4-
--------------------------
3+
Version 0.15.1 (November 9, 2014)
4+
---------------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v0.15.2.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0152:
22

3-
v0.15.2 (December 12, 2014)
4-
---------------------------
3+
Version 0.15.2 (December 12, 2014)
4+
----------------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v0.16.0.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0160:
22

3-
v0.16.0 (March 22, 2015)
4-
------------------------
3+
Version 0.16.0 (March 22, 2015)
4+
-------------------------------
55

66
{{ header }}
77

@@ -218,7 +218,7 @@ Backwards incompatible API changes
218218

219219
.. _whatsnew_0160.api_breaking.timedelta:
220220

221-
Changes in Timedelta
221+
Changes in timedelta
222222
^^^^^^^^^^^^^^^^^^^^
223223

224224
In v0.15.0 a new scalar type ``Timedelta`` was introduced, that is a

doc/source/whatsnew/v0.16.1.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0161:
22

3-
v0.16.1 (May 11, 2015)
4-
----------------------
3+
Version 0.16.1 (May 11, 2015)
4+
-----------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v0.16.2.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0162:
22

3-
v0.16.2 (June 12, 2015)
4-
-----------------------
3+
Version 0.16.2 (June 12, 2015)
4+
------------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v0.17.0.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0170:
22

3-
v0.17.0 (October 9, 2015)
4-
-------------------------
3+
Version 0.17.0 (October 9, 2015)
4+
--------------------------------
55

66
{{ header }}
77

@@ -181,8 +181,8 @@ Each method signature only includes relevant arguments. Currently, these are lim
181181
Additional methods for ``dt`` accessor
182182
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
183183

184-
strftime
185-
""""""""
184+
Series.dt.strftime
185+
""""""""""""""""""
186186

187187
We are now supporting a ``Series.dt.strftime`` method for datetime-likes to generate a formatted string (:issue:`10110`). Examples:
188188

@@ -202,8 +202,8 @@ We are now supporting a ``Series.dt.strftime`` method for datetime-likes to gene
202202
203203
The string format is as the python standard library and details can be found `here <https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior>`_
204204

205-
total_seconds
206-
"""""""""""""
205+
Series.dt.total_seconds
206+
"""""""""""""""""""""""
207207

208208
``pd.Series`` of type ``timedelta64`` has new method ``.dt.total_seconds()`` returning the duration of the timedelta in seconds (:issue:`10817`)
209209

doc/source/whatsnew/v0.17.1.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0171:
22

3-
v0.17.1 (November 21, 2015)
4-
---------------------------
3+
Version 0.17.1 (November 21, 2015)
4+
----------------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v1.1.0.rst

+7-2
Original file line numberDiff line numberDiff line change
@@ -174,8 +174,8 @@ Other API changes
174174
- Added :meth:`DataFrame.value_counts` (:issue:`5377`)
175175
- :meth:`Groupby.groups` now returns an abbreviated representation when called on large dataframes (:issue:`1135`)
176176
- ``loc`` lookups with an object-dtype :class:`Index` and an integer key will now raise ``KeyError`` instead of ``TypeError`` when key is missing (:issue:`31905`)
177-
- Using a :func:`pandas.api.indexers.BaseIndexer` with ``std``, ``var``, ``count``, ``skew``, ``cov``, ``corr`` will now raise a ``NotImplementedError`` (:issue:`32865`)
178-
- Using a :func:`pandas.api.indexers.BaseIndexer` with ``min``, ``max`` will now return correct results for any monotonic :func:`pandas.api.indexers.BaseIndexer` descendant (:issue:`32865`)
177+
- Using a :func:`pandas.api.indexers.BaseIndexer` with ``skew``, ``cov``, ``corr`` will now raise a ``NotImplementedError`` (:issue:`32865`)
178+
- Using a :func:`pandas.api.indexers.BaseIndexer` with ``count``, ``min``, ``max`` will now return correct results for any monotonic :func:`pandas.api.indexers.BaseIndexer` descendant (:issue:`32865`)
179179
- Added a :func:`pandas.api.indexers.FixedForwardWindowIndexer` class to support forward-looking windows during ``rolling`` operations.
180180
-
181181

@@ -458,6 +458,7 @@ Datetimelike
458458
- Bug in :class:`Timestamp` arithmetic when adding or subtracting a ``np.ndarray`` with ``timedelta64`` dtype (:issue:`33296`)
459459
- Bug in :meth:`DatetimeIndex.to_period` not infering the frequency when called with no arguments (:issue:`33358`)
460460
- Bug in :meth:`DatetimeIndex.tz_localize` incorrectly retaining ``freq`` in some cases where the original freq is no longer valid (:issue:`30511`)
461+
- Bug in :meth:`DatetimeIndex.intersection` losing ``freq`` and timezone in some cases (:issue:`33604`)
461462

462463
Timedelta
463464
^^^^^^^^^
@@ -525,6 +526,7 @@ Indexing
525526
- Bug in `Series.__getitem__` with an integer key and a :class:`MultiIndex` with leading integer level failing to raise ``KeyError`` if the key is not present in the first level (:issue:`33355`)
526527
- Bug in :meth:`DataFrame.iloc` when slicing a single column-:class:`DataFrame`` with ``ExtensionDtype`` (e.g. ``df.iloc[:, :1]``) returning an invalid result (:issue:`32957`)
527528
- Bug in :meth:`DatetimeIndex.insert` and :meth:`TimedeltaIndex.insert` causing index ``freq`` to be lost when setting an element into an empty :class:`Series` (:issue:33573`)
529+
- Bug in :meth:`Series.__setitem__` with an :class:`IntervalIndex` and a list-like key of integers (:issue:`33473`)
528530

529531
Missing
530532
^^^^^^^
@@ -576,6 +578,8 @@ I/O
576578
- Bug in :meth:`read_excel` did not correctly handle multiple embedded spaces in OpenDocument text cells. (:issue:`32207`)
577579
- Bug in :meth:`read_json` was raising ``TypeError`` when reading a list of booleans into a Series. (:issue:`31464`)
578580
- Bug in :func:`pandas.io.json.json_normalize` where location specified by `record_path` doesn't point to an array. (:issue:`26284`)
581+
- :func:`pandas.read_hdf` has a more explicit error message when loading an
582+
unsupported HDF file (:issue:`9539`)
579583

580584
Plotting
581585
^^^^^^^^
@@ -598,6 +602,7 @@ Groupby/resample/rolling
598602
- Bug in :meth:`DataFrameGroupBy.agg` with dictionary input losing ``ExtensionArray`` dtypes (:issue:`32194`)
599603
- Bug in :meth:`DataFrame.resample` where an ``AmbiguousTimeError`` would be raised when the resulting timezone aware :class:`DatetimeIndex` had a DST transition at midnight (:issue:`25758`)
600604
- Bug in :meth:`DataFrame.groupby` where a ``ValueError`` would be raised when grouping by a categorical column with read-only categories and ``sort=False`` (:issue:`33410`)
605+
- Bug in :meth:`GroupBy.first` and :meth:`GroupBy.last` where None is not preserved in object dtype (:issue:`32800`)
601606

602607
Reshaping
603608
^^^^^^^^^

pandas/_libs/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,11 @@
66
"Timedelta",
77
"Timestamp",
88
"iNaT",
9+
"Interval",
910
]
1011

1112

13+
from pandas._libs.interval import Interval
1214
from pandas._libs.tslibs import (
1315
NaT,
1416
NaTType,

pandas/_libs/groupby.pyx

+6-2
Original file line numberDiff line numberDiff line change
@@ -893,7 +893,9 @@ def group_last(rank_t[:, :] out,
893893
for j in range(K):
894894
val = values[i, j]
895895

896-
if not checknull(val):
896+
# None should not be treated like other NA-like
897+
# so that it won't be converted to nan
898+
if not checknull(val) or val is None:
897899
# NB: use _treat_as_na here once
898900
# conditional-nogil is available.
899901
nobs[lab, j] += 1
@@ -986,7 +988,9 @@ def group_nth(rank_t[:, :] out,
986988
for j in range(K):
987989
val = values[i, j]
988990

989-
if not checknull(val):
991+
# None should not be treated like other NA-like
992+
# so that it won't be converted to nan
993+
if not checknull(val) or val is None:
990994
# NB: use _treat_as_na here once
991995
# conditional-nogil is available.
992996
nobs[lab, j] += 1

pandas/_libs/parsers.pyx

+2-3
Original file line numberDiff line numberDiff line change
@@ -55,9 +55,7 @@ from pandas.core.dtypes.common import (
5555
is_bool_dtype, is_object_dtype,
5656
is_datetime64_dtype,
5757
pandas_dtype, is_extension_array_dtype)
58-
from pandas.core.arrays import Categorical
5958
from pandas.core.dtypes.concat import union_categoricals
60-
import pandas.io.common as icom
6159

6260
from pandas.compat import _import_lzma, _get_lzma_file
6361
from pandas.errors import (ParserError, DtypeWarning,
@@ -1149,7 +1147,8 @@ cdef class TextReader:
11491147

11501148
# Method accepts list of strings, not encoded ones.
11511149
true_values = [x.decode() for x in self.true_values]
1152-
cat = Categorical._from_inferred_categories(
1150+
array_type = dtype.construct_array_type()
1151+
cat = array_type._from_inferred_categories(
11531152
cats, codes, dtype, true_values=true_values)
11541153
return cat, na_count
11551154

pandas/_libs/tslibs/offsets.pyx

+2-2
Original file line numberDiff line numberDiff line change
@@ -157,8 +157,8 @@ cdef _wrap_timedelta_result(result):
157157
"""
158158
if PyDelta_Check(result):
159159
# convert Timedelta back to a Tick
160-
from pandas.tseries.offsets import _delta_to_tick
161-
return _delta_to_tick(result)
160+
from pandas.tseries.offsets import delta_to_tick
161+
return delta_to_tick(result)
162162

163163
return result
164164

0 commit comments

Comments
 (0)