Warning
Starting with the 0.25.x series of releases, pandas only supports Python 3.5 and higher. See :ref:`install.dropping-27` for more details.
Warning
Panel has been fully removed. For N-D labeled data structures, please use xarray
{{ header }}
These are the changes in pandas 0.25.0. See :ref:`release` for a full changelog including other versions of pandas.
- Added support for ISO week year format ('%G-%V-%u') when parsing datetimes using :meth: to_datetime (:issue:`16607`)
- Indexing of
DataFrame
andSeries
now accepts zerodimnp.ndarray
(:issue:`24919`) - :meth:`Timestamp.replace` now supports the
fold
argument to disambiguate DST transition times (:issue:`25017`) - :meth:`DataFrame.at_time` and :meth:`Series.at_time` now support :meth:`datetime.time` objects with timezones (:issue:`24043`)
Series.str
has gained :meth:`Series.str.casefold` method to removes all case distinctions present in a string (:issue:`25405`)- :meth:`DataFrame.set_index` now works for instances of
abc.Iterator
, provided their output is of the same length as the calling frame (:issue:`22484`, :issue:`24984`) - :meth:`DatetimeIndex.union` now supports the
sort
argument. The behaviour of the sort parameter matches that of :meth:`Index.union` (:issue:`24994`) - :meth:`RangeIndex.union` now supports the
sort
argument. Ifsort=False
an unsortedInt64Index
is always returned.sort=None
is the default and returns a mononotically increasingRangeIndex
if possible or a sortedInt64Index
if not (:issue:`24471`) - :meth:`DataFrame.rename` now supports the
errors
argument to raise errors when attempting to rename nonexistent keys (:issue:`13473`) - :class:`RangeIndex` has gained :attr:`~RangeIndex.start`, :attr:`~RangeIndex.stop`, and :attr:`~RangeIndex.step` attributes (:issue:`25710`)
- :class:`datetime.timezone` objects are now supported as arguments to timezone methods and constructors (:issue:`25065`)
- :meth:`DataFrame.query` and :meth:`DataFrame.eval` now supports quoting column names with backticks to refer to names with spaces (:issue:`6508`)
Indexing a :class:`DataFrame` or :class:`Series` with a :class:`DatetimeIndex` with a date string with a UTC offset would previously ignore the UTC offset. Now, the UTC offset is respected in indexing. (:issue:`24076`, :issue:`16785`)
Previous Behavior:
In [1]: df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific'))
In [2]: df
Out[2]:
0
2019-01-01 00:00:00-08:00 0
In [3]: df['2019-01-01 00:00:00+04:00':'2019-01-01 01:00:00+04:00']
Out[3]:
0
2019-01-01 00:00:00-08:00 0
New Behavior:
.. ipython:: ipython df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific')) df['2019-01-01 12:00:00+04:00':'2019-01-01 13:00:00+04:00']
The implementation of :meth:`DataFrameGroupBy.apply() <pandas.core.groupby.DataFrameGroupBy.apply>` previously evaluated the supplied function consistently twice on the first group to infer if it is safe to use a fast code path. Particularly for functions with side effects, this was an undesired behavior and may have led to surprises.
(:issue:`2936`, :issue:`2656`, :issue:`7739`, :issue:`10519`, :issue:`12155`, :issue:`20084`, :issue:`21417`)
Now every group is evaluated only a single time.
.. ipython:: python df = pd.DataFrame({"a": ["x", "y"], "b": [1, 2]}) df def func(group): print(group.name) return group
Previous Behaviour:
In [3]: df.groupby('a').apply(func)
x
x
y
Out[3]:
a b
0 x 1
1 y 2
New Behaviour:
.. ipython:: python df.groupby("a").apply(func)
When passed DataFrames whose values are sparse, :func:`concat` will now return a
Series or DataFrame with sparse values, rather than a SparseDataFrame
(:issue:`25702`).
.. ipython:: python df = pd.DataFrame({"A": pd.SparseArray([0, 1])})
Previous Behavior:
In [2]: type(pd.concat([df, df]))
pandas.core.sparse.frame.SparseDataFrame
New Behavior:
.. ipython:: python type(pd.concat([df, df]))
This now matches the existing behavior of :class:`concat` on Series
with sparse values.
:func:`concat` will continue to return a SparseDataFrame
when all the values
are instances of SparseDataFrame
.
This change also affects routines using :func:`concat` internally, like :func:`get_dummies`,
which now returns a :class:`DataFrame` in all cases (previously a SparseDataFrame
was
returned if all the columns were dummy encoded, and a :class:`DataFrame` otherwise).
Providing any SparseSeries
or SparseDataFrame
to :func:`concat` will
cause a SparseSeries
or SparseDataFrame
to be returned, as before.
Due to dropping support for Python 2.7, a number of optional dependencies have updated minimum versions (issue:25725, :issue:`24942`, :issue:`25752`). Independently, some minimum supported versions of dependencies were updated (:issue:`23519`, :issue:`25554`). If installed, we now require:
Package | Minimum Version | Required |
---|---|---|
numpy | 1.13.3 | X |
pytz | 2015.4 | X |
bottleneck | 1.2.1 | |
numexpr | 2.6.2 | |
pytest (dev) | 4.0.2 |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package | Minimum Version |
---|---|
fastparquet | 0.2.1 |
matplotlib | 2.2.2 |
openpyxl | 2.4.0 |
pyarrow | 0.9.0 |
pytables | 3.4.2 |
scipy | 0.19.0 |
sqlalchemy | 1.1.4 |
xarray | 0.8.2 |
xlrd | 1.0.0 |
xlsxwriter | 0.7.7 |
xlwt | 1.0.0 |
- :class:`DatetimeTZDtype` will now standardize pytz timezones to a common timezone instance (:issue:`24713`)
Timestamp
andTimedelta
scalars now implement the :meth:`to_numpy` method as aliases to :meth:`Timestamp.to_datetime64` and :meth:`Timedelta.to_timedelta64`, respectively. (:issue:`24653`)- :meth:`Timestamp.strptime` will now rise a
NotImplementedError
(:issue:`25016`) - Bug in :meth:`DatetimeIndex.snap` which didn't preserving the
name
of the input :class:`Index` (:issue:`25575`)
- Deprecated the M (months) and Y (year) units parameter of :func: pandas.to_timedelta, :func: pandas.Timedelta and :func: pandas.TimedeltaIndex (:issue:`16344`)
- The functions :func:`pandas.to_datetime` and :func:`pandas.to_timedelta` have deprecated the
box
keyword. Instead, use :meth:`to_numpy` or :meth:`Timestamp.to_datetime64`/:meth:`Timedelta.to_timedelta64`. (:issue:`24416`)
- Removed (parts of) :class:`Panel` (:issue:`25047`,:issue:25191,:issue:25231)
- Significant speedup in SparseArray initialization that benefits most operations, fixing performance regression introduced in v0.20.0 (:issue:`24985`)
- DataFrame.to_stata() is now faster when outputting data with any string or non-native endian columns (:issue:`25045`)
- Improved performance of :meth:`Series.searchsorted`. The speedup is especially large when the dtype is int8/int16/int32 and the searched key is within the integer bounds for the dtype (:issue:`22034`)
- Improved performance of :meth:`pandas.core.groupby.GroupBy.quantile` (:issue:`20405`)
- Improved performance of :meth:`read_csv` by faster tokenizing and faster parsing of small float numbers (:issue:`25784`)
- Improved performance of :meth:`read_csv` by faster parsing of N/A and boolean values (:issue:`25804`)
- Bug in :func:`to_datetime` which would raise an (incorrect)
ValueError
when called with a date far into the future and theformat
argument specified instead of raisingOutOfBoundsDatetime
(:issue:`23830`)
- Bug in :func:`DatetimeIndex.to_frame` where timezone aware data would be converted to timezone naive data (:issue:`25809`)
- Bug in :func:`to_datetime` with
utc=True
and datetime strings that would apply previously parsed UTC offsets to subsequent arguments (:issue:`24992`) - Bug in :func:`Timestamp.tz_localize` and :func:`Timestamp.tz_convert` does not propagate
freq
(:issue:`25241`) - Bug in :func:`Series.at` where setting :class:`Timestamp` with timezone raises
TypeError
(:issue:`25506`) - Bug in :func:`DataFrame.update` when updating with timezone aware data would return timezone naive data (:issue:`25807`)
- Bug in :meth:`to_numeric` in which large negative numbers were being improperly handled (:issue:`24910`)
- Bug in :meth:`to_numeric` in which numbers were being coerced to float, even though
errors
was notcoerce
(:issue:`24910`) - Bug in error messages in :meth:`DataFrame.corr` and :meth:`Series.corr`. Added the possibility of using a callable. (:issue:`25729`)
- Bug in :meth:`Series.divmod` and :meth:`Series.rdivmod` which would raise an (incorrect)
ValueError
rather than return a pair of :class:`Series` objects as result (:issue:`25557`) - Raises a helpful exception when a non-numeric index is sent to :meth:`interpolate` with methods which require numeric index. (:issue:`21662`)
- Construction of :class:`Interval` is restricted to numeric, :class:`Timestamp` and :class:`Timedelta` endpoints (:issue:`23013`)
- Improved exception message when calling :meth:`DataFrame.iloc` with a list of non-numeric objects (:issue:`25753`).
- Fixed misleading exception message in :meth:`Series.missing` if argument
order
is required, but omitted (:issue:`10633`, :issue:`24014`). - Fixed class type displayed in exception message in :meth:`DataFrame.dropna` if invalid
axis
parameter passed (:issue:`25555`)
- Bug in which incorrect exception raised by :meth:`pd.Timedelta` when testing the membership of :class:`MultiIndex` (:issue:`24570`)
- Bug in :func:`DataFrame.to_html()` where values were truncated using display options instead of outputting the full content (:issue:`17004`)
- Fixed bug in missing text when using :meth:`to_clipboard` if copying utf-16 characters in Python 3 on Windows (:issue:`25040`)
- Bug in :func:`read_json` for
orient='table'
when it tries to infer dtypes by default, which is not applicable as dtypes are already defined in the JSON schema (:issue:`21345`) - Bug in :func:`read_json` for
orient='table'
and float index, as it infers index dtype by default, which is not applicable because index dtype is already defined in the JSON schema (:issue:`25433`) - Bug in :func:`read_json` for
orient='table'
and string of float column names, as it makes a column name type conversion to Timestamp, which is not applicable because column names are already defined in the JSON schema (:issue:`25435`) - Bug in :func:`json_normalize` for
errors='ignore'
where missing values in the input data, were filled in resultingDataFrame
with the string "nan" instead ofnumpy.nan
(:issue:`25468`) - :meth:`DataFrame.to_html` now raises
TypeError
when using an invalid type for theclasses
parameter instead ofAsseertionError
(:issue:`25608`) - Bug in :meth:`DataFrame.to_string` and :meth:`DataFrame.to_latex` that would lead to incorrect output when the
header
keyword is used (:issue:`16718`) - Bug in :func:`read_csv` not properly interpreting the UTF8 encoded filenames on Windows on Python 3.6+ (:issue:`15086`)
- Improved performance in :meth:`pandas.read_stata` and :class:`pandas.io.stata.StataReader` when converting columns that have missing values (:issue:`25772`)
- Bug in :meth:`DataFrame.to_html` where header numbers would ignore display options when rounding (:issue:`17280`)
- Fixed bug where :class:`api.extensions.ExtensionArray` could not be used in matplotlib plotting (:issue:`25587`)
- Bug in an error message in :meth:`DataFrame.plot`. Improved the error message if non-numerics are passed to :meth:`DataFrame.plot` (:issue:`25481`)
- Bug in :meth:`pandas.core.resample.Resampler.agg` with a timezone aware index where
OverflowError
would raise when passing a list of functions (:issue:`22660`) - Bug in :meth:`pandas.core.groupby.DataFrameGroupBy.nunique` in which the names of column levels were lost (:issue:`23222`)
- Bug in :func:`pandas.core.groupby.GroupBy.agg` when applying a aggregation function to timezone aware data (:issue:`23683`)
- Bug in :func:`pandas.core.groupby.GroupBy.first` and :func:`pandas.core.groupby.GroupBy.last` where timezone information would be dropped (:issue:`21603`)
- Ensured that ordering of outputs in
groupby
aggregation functions is consistent across all versions of Python (:issue:`25692`)
- Bug in :func:`pandas.merge` adds a string of
None
ifNone
is assigned in suffixes instead of remain the column name as-is (:issue:`24782`). - Bug in :func:`merge` when merging by index name would sometimes result in an incorrectly numbered index (:issue:`24212`)
- :func:`to_records` now accepts dtypes to its column_dtypes parameter (:issue:`24895`)
- Bug in :func:`concat` where order of
OrderedDict
(anddict
in Python 3.6+) is not respected, when passed in asobjs
argument (:issue:`21510`) - Bug in :func:`concat` where the resulting
freq
of two :class:`DatetimeIndex` with the samefreq
would be dropped (:issue:`3232`).
- Significant speedup in SparseArray initialization that benefits most operations, fixing performance regression introduced in v0.20.0 (:issue:`24985`)
- Bug in :class:`SparseFrame` constructor where passing
None
as the data would causedefault_fill_value
to be ignored (:issue:`16807`)
.. contributors:: v0.24.x..HEAD