These are the changes in pandas 1.2.0. See :ref:`release` for a full changelog including other versions of pandas.
{{ header }}
Many read/write functions have acquired the storage_options
optional argument,
to pass a dictionary of parameters to the storage backend. This allows, for
example, for passing credentials to S3 and GCS storage. The details of what
parameters can be passed to which backends can be found in the documentation
of the individual storage backends (detailed from the fsspec docs for
builtin implementations and linked to external ones). See
Section :ref:`io.remote`.
:issue:`35655` added fsspec support (including storage_options
)
for reading excel files.
:meth:`to_csv` supports file handles in binary mode (:issue:`19827` and :issue:`35058`)
with encoding
(:issue:`13068` and :issue:`23854`) and compression
(:issue:`22555`).
mode
has to contain a b
for binary handles to be supported.
For example:
.. ipython:: python import io data = pd.DataFrame([0, 1, 2]) buffer = io.BytesIO() data.to_csv(buffer, mode="w+b", encoding="utf-8", compression="gzip")
- :class:`Index` with object dtype supports division and multiplication (:issue:`34160`)
Pandas 1.2.0 supports Python 3.7.1 and higher (:issue:`35214`).
Some minimum supported versions of dependencies were updated (:issue:`35214`). If installed, we now require:
Package | Minimum Version | Required | Changed |
---|---|---|---|
numpy | 1.16.5 | X | X |
pytz | 2017.3 | X | X |
python-dateutil | 2.7.3 | X | |
bottleneck | 1.2.1 | ||
numexpr | 2.6.8 | X | |
pytest (dev) | 5.0.1 | X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package | Minimum Version | Changed |
---|---|---|
beautifulsoup4 | 4.6.0 | |
fastparquet | 0.3.2 | |
fsspec | 0.7.4 | |
gcsfs | 0.6.0 | |
lxml | 4.3.0 | X |
matplotlib | 2.2.3 | X |
numba | 0.46.0 | |
openpyxl | 2.5.7 | |
pyarrow | 0.15.0 | X |
pymysql | 0.7.11 | X |
pytables | 3.4.4 | X |
s3fs | 0.4.0 | |
scipy | 1.2.0 | |
sqlalchemy | 1.2.8 | X |
xarray | 0.12.0 | X |
xlrd | 1.2.0 | X |
xlsxwriter | 1.0.2 | X |
xlwt | 1.3.0 | X |
pandas-gbq | 0.12.0 |
See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.
:meth:`~DataFrame.groupby` no longer ignores group_keys
for transform-like apply
:meth:`~DataFrame.groupby` will no longer ignore the group_keys
argument for functions passed to apply
that return like-indexed outputs (:issue:`34809`, :issue:`31612`, :issue:`14927`, :issue:`13056`).
Previous versions of pandas would add the group keys only when the result from the applied function had a different index to the input.
Previous behavior:
>>> # pandas 1.0.4
>>> df = pd.DataFrame({"A": [1, 2, 2], "B": [1, 2, 3]})
>>> df
A B
0 1 1
1 2 2
2 2 3
>>> df.groupby("A").apply(lambda x: x.rename(np.exp)) # Different index
A B
A
1 1.000000 1 1
2 2.718282 2 2
7.389056 2 3
>>> df.groupby("A").apply(lambda x: x) # Same index
A B
0 1 1
1 2 2
2 2 3
In this future this behavior will change to always respect group_keys
, which defaults to True.
New behavior:
.. ipython:: python df = pd.DataFrame({"A": [1, 2, 2], "B": [1, 2, 3]}) df.groupby("A", group_keys=True).apply(lambda x: x) df.groupby("A", group_keys=True).apply(lambda x: x.rename(np.exp))
A warning will be issued if the result would change from pandas 1.0.4
.. ipython:: python :okwarning: df.groupby("A").apply(lambda x: x)
- Deprecated parameter
inplace
in :meth:`MultiIndex.set_codes` and :meth:`MultiIndex.set_levels` (:issue:`35626`) - Deprecated parameter
dtype
in :~meth:Index.copy on method all index classes. Use the :meth:`Index.astype` method instead for changing dtype(:issue:`35853`)
- Performance improvement in :meth:`GroupBy.agg` with the
numba
engine (:issue:`35759`)
- Bug in :attr:`DatetimeArray.date` where a
ValueError
would be raised with a read-only backing array (:issue:`33530`) - Bug in
NaT
comparisons failing to raiseTypeError
on invalid inequality comparisons (:issue:`35046`) - Bug in :class:`DateOffset` where attributes reconstructed from pickle files differ from original objects when input values exceed normal ranges (e.g months=12) (:issue:`34511`)
- Bug in :class:`TimedeltaIndex`, :class:`Series`, and :class:`DataFrame` floor-division with
timedelta64
dtypes andNaT
in the denominator (:issue:`35529`)
- Bug in :func:`date_range` was raising AmbiguousTimeError for valid input with ambiguous=False (:issue:`35297`)
- Bug in :meth:`SeriesGroupBy.transform` now correctly handles missing values for dropna=False (:issue:`35014`)
- Bug in :meth:`DataFrame.xs` when used with :class:`IndexSlice` raises
TypeError
with message Expected label or tuple of labels (:issue:`35301`)
- Bug in :meth:`to_csv` caused a
ValueError
when it was called with a filename in combination withmode
containing ab
(:issue:`35058`) - In :meth:`read_csv` float_precision='round_trip' now handles decimal and thousands parameters (:issue:`35365`)
- :meth:`to_pickle` and :meth:`read_pickle` were closing user-provided file objects (:issue:`35679`)
- :meth:`to_csv` passes compression arguments for 'gzip' always to gzip.GzipFile (:issue:`28103`)
- Bug in :meth:`DataFrameGroupBy.count` and :meth:`SeriesGroupBy.sum` returning
NaN
for missing categories when grouped on multipleCategoricals
. Now returning0
(:issue:`35028`) - Bug in :meth:`DataFrameGroupBy.apply` that would some times throw an erroneous
ValueError
if the grouping axis had duplicate entries (:issue:`16646`) - Bug when combining methods :meth:`DataFrame.groupby` with :meth:`DataFrame.resample` and :meth:`DataFrame.interpolate` raising an
TypeError
(:issue:`35325`) - Bug in :meth:`DataFrameGroupBy.apply` where a non-nuisance grouping column would be dropped from the output columns if another groupby method was called before
.apply()
(:issue:`34656`) - Bug in :meth:`DataFrameGroupby.apply` would drop a :class:`CategoricalIndex` when grouped on. (:issue:`35792`)
- Bug when subsetting columns on a :class:`~pandas.core.groupby.DataFrameGroupBy` (e.g.
df.groupby('a')[['b']])
) would reset the attributesaxis
,dropna
,group_keys
,level
,mutated
,sort
, andsqueeze
to their default values. (:issue:`9959`)
- Bug in :meth:`DataFrame.pivot_table` with
aggfunc='count'
oraggfunc='sum'
returningNaN
for missing categories when pivoted on aCategorical
. Now returning0
(:issue:`31422`) - Bug in :func:`union_indexes` where input index names are not preserved in some cases. Affects :func:`concat` and :class:`DataFrame` constructor (:issue:`13475`)
- Bug in func :meth:`crosstab` when using multiple columns with
margins=True
andnormalize=True
(:issue:`35144`)