Skip to content

Commit 6ae7a09

Browse files
authored
Merge branch 'master' into original-dtype-with-replace
2 parents 571ab8a + 77a0f19 commit 6ae7a09

File tree

195 files changed

+3025
-1944
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

195 files changed

+3025
-1944
lines changed

asv_bench/benchmarks/frame_methods.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -564,7 +564,7 @@ def setup(self):
564564

565565
def time_frame_get_dtype_counts(self):
566566
with warnings.catch_warnings(record=True):
567-
self.df._data.get_dtype_counts()
567+
self.df.dtypes.value_counts()
568568

569569
def time_info(self):
570570
self.df.info()

asv_bench/benchmarks/groupby.py

+34
Original file line numberDiff line numberDiff line change
@@ -626,4 +626,38 @@ def time_first(self):
626626
self.df_nans.groupby("key").transform("first")
627627

628628

629+
class TransformEngine:
630+
def setup(self):
631+
N = 10 ** 3
632+
data = DataFrame(
633+
{0: [str(i) for i in range(100)] * N, 1: list(range(100)) * N},
634+
columns=[0, 1],
635+
)
636+
self.grouper = data.groupby(0)
637+
638+
def time_series_numba(self):
639+
def function(values, index):
640+
return values * 5
641+
642+
self.grouper[1].transform(function, engine="numba")
643+
644+
def time_series_cython(self):
645+
def function(values):
646+
return values * 5
647+
648+
self.grouper[1].transform(function, engine="cython")
649+
650+
def time_dataframe_numba(self):
651+
def function(values, index):
652+
return values * 5
653+
654+
self.grouper.transform(function, engine="numba")
655+
656+
def time_dataframe_cython(self):
657+
def function(values):
658+
return values * 5
659+
660+
self.grouper.transform(function, engine="cython")
661+
662+
629663
from .pandas_vb_common import setup # noqa: F401 isort:skip

ci/code_checks.sh

+7-1
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,13 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
150150
# Check for imports from pandas._testing instead of `import pandas._testing as tm`
151151
invgrep -R --include="*.py*" -E "from pandas._testing import" pandas/tests
152152
RET=$(($RET + $?)) ; echo $MSG "DONE"
153-
invgrep -R --include="*.py*" -E "from pandas.util import testing as tm" pandas/tests
153+
invgrep -R --include="*.py*" -E "from pandas import _testing as tm" pandas/tests
154+
RET=$(($RET + $?)) ; echo $MSG "DONE"
155+
156+
# No direct imports from conftest
157+
invgrep -R --include="*.py*" -E "conftest import" pandas/tests
158+
RET=$(($RET + $?)) ; echo $MSG "DONE"
159+
invgrep -R --include="*.py*" -E "import conftest" pandas/tests
154160
RET=$(($RET + $?)) ; echo $MSG "DONE"
155161

156162
MSG='Check for use of exec' ; echo $MSG

ci/deps/azure-36-minimum_versions.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ dependencies:
2121
- numexpr=2.6.2
2222
- numpy=1.13.3
2323
- openpyxl=2.5.7
24-
- pytables=3.4.2
24+
- pytables=3.4.3
2525
- python-dateutil=2.7.3
2626
- pytz=2017.2
2727
- scipy=0.19.0

doc/source/getting_started/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -398,7 +398,7 @@ data set, a sliding window of the data or grouped by categories. The latter is a
398398
<div class="card-body">
399399

400400
Change the structure of your data table in multiple ways. You can :func:`~pandas.melt` your data table from wide to long/tidy form or :func:`~pandas.pivot`
401-
from long to wide format. With aggregations built-in, a pivot table is created with a sinlge command.
401+
from long to wide format. With aggregations built-in, a pivot table is created with a single command.
402402

403403
.. image:: ../_static/schemas/07_melt.svg
404404
:align: center

doc/source/getting_started/install.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,7 @@ BeautifulSoup4 4.6.0 HTML parser for read_html (see :ref
262262
Jinja2 Conditional formatting with DataFrame.style
263263
PyQt4 Clipboard I/O
264264
PyQt5 Clipboard I/O
265-
PyTables 3.4.2 HDF5-based reading / writing
265+
PyTables 3.4.3 HDF5-based reading / writing
266266
SQLAlchemy 1.1.4 SQL support for databases other than sqlite
267267
SciPy 0.19.0 Miscellaneous statistical functions
268268
XLsxWriter 0.9.8 Excel writing
@@ -279,7 +279,7 @@ psycopg2 PostgreSQL engine for sqlalchemy
279279
pyarrow 0.12.0 Parquet, ORC (requires 0.13.0), and feather reading / writing
280280
pymysql 0.7.11 MySQL engine for sqlalchemy
281281
pyreadstat SPSS files (.sav) reading
282-
pytables 3.4.2 HDF5 reading / writing
282+
pytables 3.4.3 HDF5 reading / writing
283283
pyxlsb 1.0.6 Reading for xlsb files
284284
qtpy Clipboard I/O
285285
s3fs 0.3.0 Amazon S3 access

doc/source/getting_started/intro_tutorials/10_text_data.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ names in the ``Name`` column. By using pandas string methods, the
199199
200200
Next, we need to get the corresponding location, preferably the index
201201
label, in the table for which the name length is the largest. The
202-
:meth:`~Series.idxmax`` method does exactly that. It is not a string method and is
202+
:meth:`~Series.idxmax` method does exactly that. It is not a string method and is
203203
applied to integers, so no ``str`` is used.
204204

205205
.. ipython:: python

doc/source/user_guide/computation.rst

+31-4
Original file line numberDiff line numberDiff line change
@@ -312,15 +312,35 @@ We provide a number of common statistical functions:
312312
:meth:`~Rolling.median`, Arithmetic median of values
313313
:meth:`~Rolling.min`, Minimum
314314
:meth:`~Rolling.max`, Maximum
315-
:meth:`~Rolling.std`, Bessel-corrected sample standard deviation
316-
:meth:`~Rolling.var`, Unbiased variance
315+
:meth:`~Rolling.std`, Sample standard deviation
316+
:meth:`~Rolling.var`, Sample variance
317317
:meth:`~Rolling.skew`, Sample skewness (3rd moment)
318318
:meth:`~Rolling.kurt`, Sample kurtosis (4th moment)
319319
:meth:`~Rolling.quantile`, Sample quantile (value at %)
320320
:meth:`~Rolling.apply`, Generic apply
321321
:meth:`~Rolling.cov`, Unbiased covariance (binary)
322322
:meth:`~Rolling.corr`, Correlation (binary)
323323

324+
.. _computation.window_variance.caveats:
325+
326+
.. note::
327+
328+
Please note that :meth:`~Rolling.std` and :meth:`~Rolling.var` use the sample
329+
variance formula by default, i.e. the sum of squared differences is divided by
330+
``window_size - 1`` and not by ``window_size`` during averaging. In statistics,
331+
we use sample when the dataset is drawn from a larger population that we
332+
don't have access to. Using it implies that the data in our window is a
333+
random sample from the population, and we are interested not in the variance
334+
inside the specific window but in the variance of some general window that
335+
our windows represent. In this situation, using the sample variance formula
336+
results in an unbiased estimator and so is preferred.
337+
338+
Usually, we are instead interested in the variance of each window as we slide
339+
it over the data, and in this case we should specify ``ddof=0`` when calling
340+
these methods to use population variance instead of sample variance. Using
341+
sample variance under the circumstances would result in a biased estimator
342+
of the variable we are trying to determine.
343+
324344
.. _stats.rolling_apply:
325345

326346
Rolling apply
@@ -848,15 +868,22 @@ Method summary
848868
:meth:`~Expanding.median`, Arithmetic median of values
849869
:meth:`~Expanding.min`, Minimum
850870
:meth:`~Expanding.max`, Maximum
851-
:meth:`~Expanding.std`, Unbiased standard deviation
852-
:meth:`~Expanding.var`, Unbiased variance
871+
:meth:`~Expanding.std`, Sample standard deviation
872+
:meth:`~Expanding.var`, Sample variance
853873
:meth:`~Expanding.skew`, Unbiased skewness (3rd moment)
854874
:meth:`~Expanding.kurt`, Unbiased kurtosis (4th moment)
855875
:meth:`~Expanding.quantile`, Sample quantile (value at %)
856876
:meth:`~Expanding.apply`, Generic apply
857877
:meth:`~Expanding.cov`, Unbiased covariance (binary)
858878
:meth:`~Expanding.corr`, Correlation (binary)
859879

880+
.. note::
881+
882+
Using sample variance formulas for :meth:`~Expanding.std` and
883+
:meth:`~Expanding.var` comes with the same caveats as using them with rolling
884+
windows. See :ref:`this section <computation.window_variance.caveats>` for more
885+
information.
886+
860887
.. currentmodule:: pandas
861888

862889
Aside from not having a ``window`` parameter, these functions have the same

doc/source/user_guide/cookbook.rst

-27
Original file line numberDiff line numberDiff line change
@@ -1333,33 +1333,6 @@ Values can be set to NaT using np.nan, similar to datetime
13331333
y[1] = np.nan
13341334
y
13351335
1336-
Aliasing axis names
1337-
-------------------
1338-
1339-
To globally provide aliases for axis names, one can define these 2 functions:
1340-
1341-
.. ipython:: python
1342-
1343-
def set_axis_alias(cls, axis, alias):
1344-
if axis not in cls._AXIS_NUMBERS:
1345-
raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias))
1346-
cls._AXIS_ALIASES[alias] = axis
1347-
1348-
.. ipython:: python
1349-
1350-
def clear_axis_alias(cls, axis, alias):
1351-
if axis not in cls._AXIS_NUMBERS:
1352-
raise Exception("invalid axis [%s] for alias [%s]" % (axis, alias))
1353-
cls._AXIS_ALIASES.pop(alias, None)
1354-
1355-
.. ipython:: python
1356-
1357-
set_axis_alias(pd.DataFrame, 'columns', 'myaxis2')
1358-
df2 = pd.DataFrame(np.random.randn(3, 2), columns=['c1', 'c2'],
1359-
index=['i1', 'i2', 'i3'])
1360-
df2.sum(axis='myaxis2')
1361-
clear_axis_alias(pd.DataFrame, 'columns', 'myaxis2')
1362-
13631336
Creating example data
13641337
---------------------
13651338

doc/source/whatsnew/v0.14.0.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0140:
22

3-
v0.14.0 (May 31 , 2014)
4-
-----------------------
3+
Version 0.14.0 (May 31 , 2014)
4+
------------------------------
55

66
{{ header }}
77

@@ -321,7 +321,7 @@ Text parsing API changes
321321

322322
.. _whatsnew_0140.groupby:
323323

324-
Groupby API changes
324+
GroupBy API changes
325325
~~~~~~~~~~~~~~~~~~~
326326

327327
More consistent behavior for some groupby methods:
@@ -473,8 +473,8 @@ Some other enhancements to the sql functions include:
473473

474474
.. _whatsnew_0140.slicers:
475475

476-
Multiindexing using slicers
477-
~~~~~~~~~~~~~~~~~~~~~~~~~~~
476+
Multi-indexing using slicers
477+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
478478

479479
In 0.14.0 we added a new way to slice MultiIndexed objects.
480480
You can slice a MultiIndex by providing multiple indexers.

doc/source/whatsnew/v0.14.1.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0141:
22

3-
v0.14.1 (July 11, 2014)
4-
-----------------------
3+
Version 0.14.1 (July 11, 2014)
4+
------------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v0.15.0.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0150:
22

3-
v0.15.0 (October 18, 2014)
4-
--------------------------
3+
Version 0.15.0 (October 18, 2014)
4+
---------------------------------
55

66
{{ header }}
77

@@ -105,7 +105,7 @@ For full docs, see the :ref:`categorical introduction <categorical>` and the
105105

106106
.. _whatsnew_0150.timedeltaindex:
107107

108-
TimedeltaIndex/Scalar
108+
TimedeltaIndex/scalar
109109
^^^^^^^^^^^^^^^^^^^^^
110110

111111
We introduce a new scalar type ``Timedelta``, which is a subclass of ``datetime.timedelta``, and behaves in a similar manner,
@@ -247,8 +247,8 @@ Additionally :meth:`~pandas.DataFrame.memory_usage` is an available method for a
247247
248248
.. _whatsnew_0150.dt:
249249

250-
.dt accessor
251-
^^^^^^^^^^^^
250+
Series.dt accessor
251+
^^^^^^^^^^^^^^^^^^
252252

253253
``Series`` has gained an accessor to succinctly return datetime like properties for the *values* of the Series, if its a datetime/period like Series. (:issue:`7207`)
254254
This will return a Series, indexed like the existing Series. See the :ref:`docs <basics.dt_accessors>`
@@ -600,7 +600,7 @@ Rolling/expanding moments improvements
600600

601601
.. _whatsnew_0150.sql:
602602

603-
Improvements in the SQL io module
603+
Improvements in the SQL IO module
604604
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
605605

606606
- Added support for a ``chunksize`` parameter to ``to_sql`` function. This allows DataFrame to be written in chunks and avoid packet-size overflow errors (:issue:`8062`).

doc/source/whatsnew/v0.15.1.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0151:
22

3-
v0.15.1 (November 9, 2014)
4-
--------------------------
3+
Version 0.15.1 (November 9, 2014)
4+
---------------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v0.15.2.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0152:
22

3-
v0.15.2 (December 12, 2014)
4-
---------------------------
3+
Version 0.15.2 (December 12, 2014)
4+
----------------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v0.16.0.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0160:
22

3-
v0.16.0 (March 22, 2015)
4-
------------------------
3+
Version 0.16.0 (March 22, 2015)
4+
-------------------------------
55

66
{{ header }}
77

@@ -218,7 +218,7 @@ Backwards incompatible API changes
218218

219219
.. _whatsnew_0160.api_breaking.timedelta:
220220

221-
Changes in Timedelta
221+
Changes in timedelta
222222
^^^^^^^^^^^^^^^^^^^^
223223

224224
In v0.15.0 a new scalar type ``Timedelta`` was introduced, that is a

doc/source/whatsnew/v0.16.1.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0161:
22

3-
v0.16.1 (May 11, 2015)
4-
----------------------
3+
Version 0.16.1 (May 11, 2015)
4+
-----------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v0.16.2.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0162:
22

3-
v0.16.2 (June 12, 2015)
4-
-----------------------
3+
Version 0.16.2 (June 12, 2015)
4+
------------------------------
55

66
{{ header }}
77

doc/source/whatsnew/v0.17.0.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0170:
22

3-
v0.17.0 (October 9, 2015)
4-
-------------------------
3+
Version 0.17.0 (October 9, 2015)
4+
--------------------------------
55

66
{{ header }}
77

@@ -181,8 +181,8 @@ Each method signature only includes relevant arguments. Currently, these are lim
181181
Additional methods for ``dt`` accessor
182182
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
183183

184-
strftime
185-
""""""""
184+
Series.dt.strftime
185+
""""""""""""""""""
186186

187187
We are now supporting a ``Series.dt.strftime`` method for datetime-likes to generate a formatted string (:issue:`10110`). Examples:
188188

@@ -202,8 +202,8 @@ We are now supporting a ``Series.dt.strftime`` method for datetime-likes to gene
202202
203203
The string format is as the python standard library and details can be found `here <https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior>`_
204204

205-
total_seconds
206-
"""""""""""""
205+
Series.dt.total_seconds
206+
"""""""""""""""""""""""
207207

208208
``pd.Series`` of type ``timedelta64`` has new method ``.dt.total_seconds()`` returning the duration of the timedelta in seconds (:issue:`10817`)
209209

doc/source/whatsnew/v0.17.1.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0171:
22

3-
v0.17.1 (November 21, 2015)
4-
---------------------------
3+
Version 0.17.1 (November 21, 2015)
4+
----------------------------------
55

66
{{ header }}
77

0 commit comments

Comments
 (0)