Warning
Starting with the 0.25.x series of releases, pandas only supports Python 3.5 and higher. See :ref:`install.dropping-27` for more details.
{{ header }}
These are the changes in pandas 0.25.0. See :ref:`release` for a full changelog including other versions of pandas.
- Added support for ISO week year format ('%G-%V-%u') when parsing datetimes using :meth: to_datetime (:issue:`16607`)
- Indexing of
DataFrame
andSeries
now accepts zerodimnp.ndarray
(:issue:`24919`) - :meth:`Timestamp.replace` now supports the
fold
argument to disambiguate DST transition times (:issue:`25017`) - :meth:`DataFrame.at_time` and :meth:`Series.at_time` now support :meth:`datetime.time` objects with timezones (:issue:`24043`)
Series.str
has gained :meth:`Series.str.casefold` method to removes all case distinctions present in a string (:issue:`25405`)- :meth:`DataFrame.set_index` now works for instances of
abc.Iterator
, provided their output is of the same length as the calling frame (:issue:`22484`, :issue:`24984`) - :meth:`DatetimeIndex.union` now supports the
sort
argument. The behaviour of the sort parameter matches that of :meth:`Index.union` (:issue:`24994`) - :meth:`DataFrame.rename` now supports the
errors
argument to raise errors when attempting to rename nonexistent keys (:issue:`13473`) - :class:`RangeIndex` has gained :attr:`~RangeIndex.start`, :attr:`~RangeIndex.stop`, and :attr:`~RangeIndex.step` attributes (:issue:`25710`)
- :class:`datetime.timezone` objects are now supported as arguments to timezone methods and constructors (:issue:`25065`)
Indexing a :class:`DataFrame` or :class:`Series` with a :class:`DatetimeIndex` with a date string with a UTC offset would previously ignore the UTC offset. Now, the UTC offset is respected in indexing. (:issue:`24076`, :issue:`16785`)
Previous Behavior:
In [1]: df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific'))
In [2]: df
Out[2]:
0
2019-01-01 00:00:00-08:00 0
In [3]: df['2019-01-01 00:00:00+04:00':'2019-01-01 01:00:00+04:00']
Out[3]:
0
2019-01-01 00:00:00-08:00 0
New Behavior:
.. ipython:: ipython df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific')) df['2019-01-01 12:00:00+04:00':'2019-01-01 13:00:00+04:00']
When passed DataFrames whose values are sparse, :func:`concat` will now return a
Series or DataFrame with sparse values, rather than a SparseDataFrame
(:issue:`25702`).
.. ipython:: python df = pd.DataFrame({"A": pd.SparseArray([0, 1])})
Previous Behavior:
In [2]: type(pd.concat([df, df]))
pandas.core.sparse.frame.SparseDataFrame
New Behavior:
.. ipython:: python type(pd.concat([df, df]))
This now matches the existing behavior of :class:`concat` on Series
with sparse values.
:func:`concat` will continue to return a SparseDataFrame
when all the values
are instances of SparseDataFrame
.
This change also affects routines using :func:`concat` internally, like :func:`get_dummies`,
which now returns a :class:`DataFrame` in all cases (previously a SparseDataFrame
was
returned if all the columns were dummy encoded, and a :class:`DataFrame` otherwise).
Providing any SparseSeries
or SparseDataFrame
to :func:`concat` will
cause a SparseSeries
or SparseDataFrame
to be returned, as before.
Due to dropping support for Python 2.7, a number of optional dependencies have updated minimum versions. Independently, some minimum supported versions of dependencies were updated (:issue:`23519`, :issue:`24942`). If installed, we now require:
Package | Minimum Version | Required |
---|---|---|
beautifulsoup4 | 4.4.1 | |
openpyxl | 2.2.6 | |
pymysql | 0.6.6 | |
pytz | 2015.4 | |
sqlalchemy | 1.0.8 | |
xlsxwriter | 0.7.7 | |
xlwt | 1.0.0 | |
pytest (dev) | 4.0.2 |
- :class:`DatetimeTZDtype` will now standardize pytz timezones to a common timezone instance (:issue:`24713`)
Timestamp
andTimedelta
scalars now implement the :meth:`to_numpy` method as aliases to :meth:`Timestamp.to_datetime64` and :meth:`Timedelta.to_timedelta64`, respectively. (:issue:`24653`)- :meth:`Timestamp.strptime` will now rise a
NotImplementedError
(:issue:`25016`) - Bug in :meth:`DatetimeIndex.snap` which didn't preserving the
name
of the input :class:`Index` (:issue:`25575`)
- Deprecated the M (months) and Y (year) units parameter of :func: pandas.to_timedelta, :func: pandas.Timedelta and :func: pandas.TimedeltaIndex (:issue:`16344`)
- The functions :func:`pandas.to_datetime` and :func:`pandas.to_timedelta` have deprecated the
box
keyword. Instead, use :meth:`to_numpy` or :meth:`Timestamp.to_datetime64`/:meth:`Timedelta.to_timedelta64`. (:issue:`24416`)
- Removed (parts of) :class:`Panel` (:issue:`25047`,:issue:25191,:issue:25231)
- Significant speedup in SparseArray initialization that benefits most operations, fixing performance regression introduced in v0.20.0 (:issue:`24985`)
- DataFrame.to_stata() is now faster when outputting data with any string or non-native endian columns (:issue:`25045`)
- Improved performance of :meth:`Series.searchsorted`. The speedup is especially large when the dtype is int8/int16/int32 and the searched key is within the integer bounds for the dtype (:issue:`22034`)
- Improved performance of :meth:`pandas.core.groupby.GroupBy.quantile` (:issue:`20405`)
- Bug in :func:`to_datetime` which would raise an (incorrect)
ValueError
when called with a date far into the future and theformat
argument specified instead of raisingOutOfBoundsDatetime
(:issue:`23830`)
- Bug in :func:`to_datetime` with
utc=True
and datetime strings that would apply previously parsed UTC offsets to subsequent arguments (:issue:`24992`) - Bug in :func:`Timestamp.tz_localize` and :func:`Timestamp.tz_convert` does not propagate
freq
(:issue:`25241`) - Bug in :func:`Series.at` where setting :class:`Timestamp` with timezone raises
TypeError
(:issue:`25506`)
- Bug in :meth:`to_numeric` in which large negative numbers were being improperly handled (:issue:`24910`)
- Bug in :meth:`to_numeric` in which numbers were being coerced to float, even though
errors
was notcoerce
(:issue:`24910`) - Bug in error messages in :meth:`DataFrame.corr` and :meth:`Series.corr`. Added the possibility of using a callable. (:issue:`25729`)
- Bug in :meth:`Series.divmod` and :meth:`Series.rdivmod` which would raise an (incorrect)
ValueError
rather than return a pair of :class:`Series` objects as result (:issue:`25557`)
- Fixed misleading exception message in :meth:`Series.missing` if argument
order
is required, but omitted (:issue:`10633`, :issue:`24014`). - Fixed class type displayed in exception message in :meth:`DataFrame.dropna` if invalid
axis
parameter passed (:issue:`25555`)
- Bug in which incorrect exception raised by :meth:`pd.Timedelta` when testing the membership of :class:`MultiIndex` (:issue:`24570`)
- Bug in :func:`DataFrame.to_html()` where values were truncated using display options instead of outputting the full content (:issue:`17004`)
- Fixed bug in missing text when using :meth:`to_clipboard` if copying utf-16 characters in Python 3 on Windows (:issue:`25040`)
- Bug in :func:`read_json` for
orient='table'
when it tries to infer dtypes by default, which is not applicable as dtypes are already defined in the JSON schema (:issue:`21345`) - Bug in :func:`read_json` for
orient='table'
and float index, as it infers index dtype by default, which is not applicable because index dtype is already defined in the JSON schema (:issue:`25433`) - Bug in :func:`read_json` for
orient='table'
and string of float column names, as it makes a column name type conversion to Timestamp, which is not applicable because column names are already defined in the JSON schema (:issue:`25435`) - Bug in :func:`json_normalize` for
errors='ignore'
where missing values in the input data, were filled in resultingDataFrame
with the string "nan" instead ofnumpy.nan
(:issue:`25468`) - :meth:`DataFrame.to_html` now raises
TypeError
when using an invalid type for theclasses
parameter instead ofAsseertionError
(:issue:`25608`) - Bug in :meth:`DataFrame.to_string` and :meth:`DataFrame.to_latex` that would lead to incorrect output when the
header
keyword is used (:issue:`16718`)
- Fixed bug where :class:`api.extensions.ExtensionArray` could not be used in matplotlib plotting (:issue:`25587`)
- Bug in an error message in :meth:`DataFrame.plot`. Improved the error message if non-numerics are passed to :meth:`DataFrame.plot` (:issue:`25481`)
- Bug in :meth:`pandas.core.resample.Resampler.agg` with a timezone aware index where
OverflowError
would raise when passing a list of functions (:issue:`22660`) - Bug in :meth:`pandas.core.groupby.DataFrameGroupBy.nunique` in which the names of column levels were lost (:issue:`23222`)
- Bug in :func:`pandas.core.groupby.GroupBy.agg` when applying a aggregation function to timezone aware data (:issue:`23683`)
- Bug in :func:`pandas.core.groupby.GroupBy.first` and :func:`pandas.core.groupby.GroupBy.last` where timezone information would be dropped (:issue:`21603`)
- Ensured that ordering of outputs in
groupby
aggregation functions is consistent across all versions of Python (:issue:`25692`)
- Bug in :func:`pandas.merge` adds a string of
None
ifNone
is assigned in suffixes instead of remain the column name as-is (:issue:`24782`). - Bug in :func:`merge` when merging by index name would sometimes result in an incorrectly numbered index (:issue:`24212`)
- :func:`to_records` now accepts dtypes to its column_dtypes parameter (:issue:`24895`)
- Bug in :func:`concat` where order of
OrderedDict
(anddict
in Python 3.6+) is not respected, when passed in asobjs
argument (:issue:`21510`)
- Significant speedup in SparseArray initialization that benefits most operations, fixing performance regression introduced in v0.20.0 (:issue:`24985`)
- Bug in :class:`SparseFrame` constructor where passing
None
as the data would causedefault_fill_value
to be ignored (:issue:`16807`)
.. contributors:: v0.24.x..HEAD