Skip to content

0.23.1 backports 2 #21442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jun 12, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
1b8c041
CI: revert skip of geopandas downstream test (#21217)
jorisvandenbossche Jun 4, 2018
21775b6
Improve performance of CategoricalIndex.is_unique (#21107)
topper-123 Jun 4, 2018
9b83762
DOC: whatsnew note for MultiIndex Sorting Fix (#21316)
WillAyd Jun 5, 2018
9296995
DOC: fix mistake in Series.str.cat (#21330)
h-vetinari Jun 6, 2018
3cd496b
BUG: Using DatetimeIndex.date with timezone returns incorrect date (#…
jamestran201-alt Jun 7, 2018
12e9ef6
BUG: Fixed concat warning message (#21362)
TomAugspurger Jun 7, 2018
222dff8
Revert "enable multivalues insert (#19664)" (#21355)
jorisvandenbossche Jun 7, 2018
85c0177
BUG: invalid rolling window on empty input (#21291)
uds5501 Jun 8, 2018
7688534
DOC: clean-up 0.23.1 whatsnew (#21368)
jorisvandenbossche Jun 7, 2018
17dad0d
Fix #21356: JSON nested_to_record Silently Drops Top-Level None Value…
daminisatya Jun 8, 2018
5ceba74
Revert change to comparison op with datetime.date objects (#21361)
TomAugspurger Jun 8, 2018
c92d2f9
REGR: NA-values in ctors with string dtype (#21366)
TomAugspurger Jun 8, 2018
c64d52f
BUG: Fix empty Data frames to JSON round-trippable back to data frame…
Jun 8, 2018
6eea28a
BLD: include dll in package_data on Windows (#21321)
chris-b1 Jun 8, 2018
64409d9
REGR: allow merging on object boolean columns (#21310)
jorisvandenbossche Jun 8, 2018
c5850c1
BUG: dropna incorrect with categoricals in pivot_table (#21252)
jreback Jun 7, 2018
5359aea
Fix nested_to_record with None values in nested levels (#21164)
ssikdar1 Jun 7, 2018
182a16d
Fix typo in error message in the PlanePlot class (#21350)
CianciuStyles Jun 7, 2018
9646587
Add Featuretools to Pandas Ecosystem Page (#21297)
kmax12 Jun 5, 2018
8350429
BUG: Fix encoding error in to_csv compression (#21300)
minggli Jun 5, 2018
c65c124
BUG: Allow IntervalIndex to be constructed from categorical data with…
jschendel Jun 4, 2018
8c0d56b
DOC: update whatsnew 0.23.1 (#21387)
jorisvandenbossche Jun 8, 2018
a32c4e4
Fixup whatsnew
TomAugspurger Jun 12, 2018
47dc5e8
Backport fixture
TomAugspurger Jun 12, 2018
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/source/ecosystem.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,10 @@ Statsmodels leverages pandas objects as the underlying data container for comput
Use pandas DataFrames in your `scikit-learn <http://scikit-learn.org/>`__
ML pipeline.

`Featuretools <https://github.com/featuretools/featuretools/>`__
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Featuretools is a Python library for automated feature engineering built on top of pandas. It excels at transforming temporal and relational datasets into feature matrices for machine learning using reusable feature engineering "primitives". Users can contribute their own primitives in Python and share them with the rest of the community.

.. _ecosystem.visualization:

Expand Down
8 changes: 0 additions & 8 deletions doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4719,14 +4719,6 @@ writes ``data`` to the database in batches of 1000 rows at a time:

data.to_sql('data_chunked', engine, chunksize=1000)

.. note::

The function :func:`~pandas.DataFrame.to_sql` will perform a multivalue
insert if the engine dialect ``supports_multivalues_insert``. This will
greatly speed up the insert in some cases.

SQL data types
++++++++++++++

:func:`~pandas.DataFrame.to_sql` will try to map your data to an appropriate
SQL data type based on the dtype of the data. When you have columns of dtype
Expand Down
142 changes: 83 additions & 59 deletions doc/source/whatsnew/v0.23.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,102 +10,126 @@ and bug fixes. We recommend that all users upgrade to this version.
:local:
:backlinks: none

.. _whatsnew_0231.enhancements:

New features
~~~~~~~~~~~~


.. _whatsnew_0231.deprecations:

Deprecations
~~~~~~~~~~~~

-
-
.. _whatsnew_0231.fixed_regressions:

Fixed Regressions
~~~~~~~~~~~~~~~~~

**Comparing Series with datetime.date**

We've reverted a 0.23.0 change to comparing a :class:`Series` holding datetimes and a ``datetime.date`` object (:issue:`21152`).
In pandas 0.22 and earlier, comparing a Series holding datetimes and ``datetime.date`` objects would coerce the ``datetime.date`` to a datetime before comapring.
This was inconsistent with Python, NumPy, and :class:`DatetimeIndex`, which never consider a datetime and ``datetime.date`` equal.

In 0.23.0, we unified operations between DatetimeIndex and Series, and in the process changed comparisons between a Series of datetimes and ``datetime.date`` without warning.

We've temporarily restored the 0.22.0 behavior, so datetimes and dates may again compare equal, but restore the 0.23.0 behavior in a future release.

To summarize, here's the behavior in 0.22.0, 0.23.0, 0.23.1:

.. code-block:: python

# 0.22.0... Silently coerce the datetime.date
>>> Series(pd.date_range('2017', periods=2)) == datetime.date(2017, 1, 1)
0 True
1 False
dtype: bool

# 0.23.0... Do not coerce the datetime.date
>>> Series(pd.date_range('2017', periods=2)) == datetime.date(2017, 1, 1)
0 False
1 False
dtype: bool

# 0.23.1... Coerce the datetime.date with a warning
>>> Series(pd.date_range('2017', periods=2)) == datetime.date(2017, 1, 1)
/bin/python:1: FutureWarning: Comparing Series of datetimes with 'datetime.date'. Currently, the
'datetime.date' is coerced to a datetime. In the future pandas will
not coerce, and the values not compare equal to the 'datetime.date'.
To retain the current behavior, convert the 'datetime.date' to a
datetime with 'pd.Timestamp'.
#!/bin/python3
0 True
1 False
dtype: bool

In addition, ordering comparisons will raise a ``TypeError`` in the future.

**Other Fixes**

- Reverted the ability of :func:`~DataFrame.to_sql` to perform multivalue
inserts as this caused regression in certain cases (:issue:`21103`).
In the future this will be made configurable.
- Fixed regression in the :attr:`DatetimeIndex.date` and :attr:`DatetimeIndex.time`
attributes in case of timezone-aware data: :attr:`DatetimeIndex.time` returned
a tz-aware time instead of tz-naive (:issue:`21267`) and :attr:`DatetimeIndex.date`
returned incorrect date when the input date has a non-UTC timezone (:issue:`21230`).
- Fixed regression in :meth:`pandas.io.json.json_normalize` when called with ``None`` values
in nested levels in JSON, and to not drop keys with value as `None` (:issue:`21158`, :issue:`21356`).
- Bug in :meth:`~DataFrame.to_csv` causes encoding error when compression and encoding are specified (:issue:`21241`, :issue:`21118`)
- Bug preventing pandas from being importable with -OO optimization (:issue:`21071`)
- Bug in :meth:`Categorical.fillna` incorrectly raising a ``TypeError`` when `value` the individual categories are iterable and `value` is an iterable (:issue:`21097`, :issue:`19788`)
- Fixed regression in constructors coercing NA values like ``None`` to strings when passing ``dtype=str`` (:issue:`21083`)
- Regression in :func:`pivot_table` where an ordered ``Categorical`` with missing
values for the pivot's ``index`` would give a mis-aligned result (:issue:`21133`)
- Fixed regression in merging on boolean index/columns (:issue:`21119`).

.. _whatsnew_0231.performance:

Performance Improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- Improved performance of :meth:`CategoricalIndex.is_monotonic_increasing`, :meth:`CategoricalIndex.is_monotonic_decreasing` and :meth:`CategoricalIndex.is_monotonic` (:issue:`21025`)
-
-

Documentation Changes
~~~~~~~~~~~~~~~~~~~~~
- Improved performance of :meth:`CategoricalIndex.is_unique` (:issue:`21107`)

-
-

.. _whatsnew_0231.bug_fixes:

Bug Fixes
~~~~~~~~~

Groupby/Resample/Rolling
^^^^^^^^^^^^^^^^^^^^^^^^
**Groupby/Resample/Rolling**

- Bug in :func:`DataFrame.agg` where applying multiple aggregation functions to a :class:`DataFrame` with duplicated column names would cause a stack overflow (:issue:`21063`)
- Bug in :func:`pandas.core.groupby.GroupBy.ffill` and :func:`pandas.core.groupby.GroupBy.bfill` where the fill within a grouping would not always be applied as intended due to the implementations' use of a non-stable sort (:issue:`21207`)
- Bug in :func:`pandas.core.groupby.GroupBy.rank` where results did not scale to 100% when specifying ``method='dense'`` and ``pct=True``
- Bug in :func:`pandas.DataFrame.rolling` and :func:`pandas.Series.rolling` which incorrectly accepted a 0 window size rather than raising (:issue:`21286`)

Strings
^^^^^^^
**Data-type specific**

- Bug in :meth:`Series.str.replace()` where the method throws `TypeError` on Python 3.5.2 (:issue: `21078`)

Timedelta
^^^^^^^^^
- Bug in :class:`Timedelta`: where passing a float with a unit would prematurely round the float precision (:issue: `14156`)
- Bug in :func:`pandas.testing.assert_index_equal` which raised ``AssertionError`` incorrectly, when comparing two :class:`CategoricalIndex` objects with param ``check_categorical=False`` (:issue:`19776`)

Categorical
^^^^^^^^^^^

- Bug in :func:`pandas.util.testing.assert_index_equal` which raised ``AssertionError`` incorrectly, when comparing two :class:`CategoricalIndex` objects with param ``check_categorical=False`` (:issue:`19776`)
- Bug in :meth:`Categorical.fillna` incorrectly raising a ``TypeError`` when `value` the individual categories are iterable and `value` is an iterable (:issue:`21097`, :issue:`19788`)

Sparse
^^^^^^
**Sparse**

- Bug in :attr:`SparseArray.shape` which previously only returned the shape :attr:`SparseArray.sp_values` (:issue:`21126`)

Conversion
^^^^^^^^^^

-
-

Indexing
^^^^^^^^
**Indexing**

- Bug in :meth:`Series.reset_index` where appropriate error was not raised with an invalid level name (:issue:`20925`)
- Bug in :func:`interval_range` when ``start``/``periods`` or ``end``/``periods`` are specified with float ``start`` or ``end`` (:issue:`21161`)
- Bug in :meth:`MultiIndex.set_names` where error raised for a ``MultiIndex`` with ``nlevels == 1`` (:issue:`21149`)
-
- Bug in :class:`IntervalIndex` constructors where creating an ``IntervalIndex`` from categorical data was not fully supported (:issue:`21243`, issue:`21253`)
- Bug in :meth:`MultiIndex.sort_index` which was not guaranteed to sort correctly with ``level=1``; this was also causing data misalignment in particular :meth:`DataFrame.stack` operations (:issue:`20994`, :issue:`20945`, :issue:`21052`)

I/O
^^^
**Plotting**

- Bug in IO methods specifying ``compression='zip'`` which produced uncompressed zip archives (:issue:`17778`, :issue:`21144`)
- Bug in :meth:`DataFrame.to_stata` which prevented exporting DataFrames to buffers and most file-like objects (:issue:`21041`)
-
- New keywords (sharex, sharey) to turn on/off sharing of x/y-axis by subplots generated with pandas.DataFrame().groupby().boxplot() (:issue: `20968`)

Plotting
^^^^^^^^
**I/O**

-
-
- Bug in IO methods specifying ``compression='zip'`` which produced uncompressed zip archives (:issue:`17778`, :issue:`21144`)
- Bug in :meth:`DataFrame.to_stata` which prevented exporting DataFrames to buffers and most file-like objects (:issue:`21041`)
- Bug in :meth:`read_stata` and :class:`StataReader` which did not correctly decode utf-8 strings on Python 3 from Stata 14 files (dta version 118) (:issue:`21244`)
- Bug in IO JSON :func:`read_json` reading empty JSON schema with ``orient='table'`` back to :class:`DataFrame` caused an error (:issue:`21287`)

Reshaping
^^^^^^^^^
**Reshaping**

- Bug in :func:`concat` where error was raised in concatenating :class:`Series` with numpy scalar and tuple names (:issue:`21015`)
-
- Bug in :func:`concat` warning message providing the wrong guidance for future behavior (:issue:`21101`)

Other
^^^^^
**Other**

- Tab completion on :class:`Index` in IPython no longer outputs deprecation warnings (:issue:`21125`)
- Bug preventing pandas from being importable with -OO optimization (:issue:`21071`)
- Bug preventing pandas being used on Windows without C++ redistributable installed (:issue:`21106`)
2 changes: 1 addition & 1 deletion pandas/_libs/tslib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ cdef inline object create_time_from_ts(
int64_t value, pandas_datetimestruct dts,
object tz, object freq):
""" convenience routine to construct a datetime.time from its parts """
return time(dts.hour, dts.min, dts.sec, dts.us, tz)
return time(dts.hour, dts.min, dts.sec, dts.us)


def ints_to_pydatetime(ndarray[int64_t] arr, tz=None, freq=None,
Expand Down
21 changes: 21 additions & 0 deletions pandas/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,16 @@ def compression(request):
return request.param


@pytest.fixture(params=['gzip', 'bz2', 'zip',
pytest.param('xz', marks=td.skip_if_no_lzma)])
def compression_only(request):
"""
Fixture for trying common compression types in compression tests excluding
uncompressed case
"""
return request.param


@pytest.fixture(scope='module')
def datetime_tz_utc():
from datetime import timezone
Expand Down Expand Up @@ -149,3 +159,14 @@ def tz_aware_fixture(request):
Fixture for trying explicit timezones: {0}
"""
return request.param


@pytest.fixture(params=[str, 'str', 'U'])
def string_dtype(request):
"""Parametrized fixture for string dtypes.

* str
* 'str'
* 'U'
"""
return request.param
42 changes: 42 additions & 0 deletions pandas/core/dtypes/cast.py
Original file line number Diff line number Diff line change
Expand Up @@ -1227,3 +1227,45 @@ def construct_1d_object_array_from_listlike(values):
result = np.empty(len(values), dtype='object')
result[:] = values
return result


def construct_1d_ndarray_preserving_na(values, dtype=None, copy=False):
"""
Construct a new ndarray, coercing `values` to `dtype`, preserving NA.

Parameters
----------
values : Sequence
dtype : numpy.dtype, optional
copy : bool, default False
Note that copies may still be made with ``copy=False`` if casting
is required.

Returns
-------
arr : ndarray[dtype]

Examples
--------
>>> np.array([1.0, 2.0, None], dtype='str')
array(['1.0', '2.0', 'None'], dtype='<U4')

>>> construct_1d_ndarray_preserving_na([1.0, 2.0, None], dtype='str')


"""
subarr = np.array(values, dtype=dtype, copy=copy)

if dtype is not None and dtype.kind in ("U", "S"):
# GH-21083
# We can't just return np.array(subarr, dtype='str') since
# NumPy will convert the non-string objects into strings
# Including NA values. Se we have to go
# string -> object -> update NA, which requires an
# additional pass over the data.
na_values = isna(values)
subarr2 = subarr.astype(object)
subarr2[na_values] = np.asarray(values, dtype=object)[na_values]
subarr = subarr2

return subarr
4 changes: 2 additions & 2 deletions pandas/core/indexes/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@
Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=True'.
To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass sort=False
To retain the current behavior and silence the warning, pass 'sort=True'.
""")


Expand Down
2 changes: 1 addition & 1 deletion pandas/core/indexes/category.py
Original file line number Diff line number Diff line change
Expand Up @@ -378,7 +378,7 @@ def _engine(self):
# introspection
@cache_readonly
def is_unique(self):
return not self.duplicated().any()
return self._engine.is_unique

@property
def is_monotonic_increasing(self):
Expand Down
22 changes: 20 additions & 2 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -2032,15 +2032,33 @@ def time(self):
"""
Returns numpy array of datetime.time. The time part of the Timestamps.
"""
return libts.ints_to_pydatetime(self.asi8, self.tz, box="time")

# If the Timestamps have a timezone that is not UTC,
# convert them into their i8 representation while
# keeping their timezone and not using UTC
if (self.tz is not None and self.tz is not utc):
timestamps = self._local_timestamps()
else:
timestamps = self.asi8

return libts.ints_to_pydatetime(timestamps, box="time")

@property
def date(self):
"""
Returns numpy array of python datetime.date objects (namely, the date
part of Timestamps without timezone information).
"""
return libts.ints_to_pydatetime(self.normalize().asi8, box="date")

# If the Timestamps have a timezone that is not UTC,
# convert them into their i8 representation while
# keeping their timezone and not using UTC
if (self.tz is not None and self.tz is not utc):
timestamps = self._local_timestamps()
else:
timestamps = self.asi8

return libts.ints_to_pydatetime(timestamps, box="date")

def normalize(self):
"""
Expand Down
4 changes: 4 additions & 0 deletions pandas/core/indexes/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,10 @@ def maybe_convert_platform_interval(values):
-------
array
"""
if is_categorical_dtype(values):
# GH 21243/21253
values = np.array(values)

if isinstance(values, (list, tuple)) and len(values) == 0:
# GH 19016
# empty lists/tuples get object dtype by default, but this is not
Expand Down
Loading