Skip to content

Commit a5747fd

Browse files
REGR: revert behaviour change for concat with empty/all-NaN data (pandas-dev#47372)
1 parent 62f05e2 commit a5747fd

File tree

8 files changed

+383
-142
lines changed

8 files changed

+383
-142
lines changed

doc/source/whatsnew/v1.4.0.rst

+11-2
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,9 @@ the given ``dayfirst`` value when the value is a delimited date string (e.g.
271271
Ignoring dtypes in concat with empty or all-NA columns
272272
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
273273

274+
.. note::
275+
This behaviour change has been reverted in pandas 1.4.3.
276+
274277
When using :func:`concat` to concatenate two or more :class:`DataFrame` objects,
275278
if one of the DataFrames was empty or had all-NA values, its dtype was
276279
*sometimes* ignored when finding the concatenated dtype. These are now
@@ -301,9 +304,15 @@ object, the ``np.nan`` is retained.
301304

302305
*New behavior*:
303306

304-
.. ipython:: python
307+
.. code-block:: ipython
308+
309+
In [4]: res
310+
Out[4]:
311+
bar
312+
0 2013-01-01 00:00:00
313+
1 NaN
314+
305315
306-
res
307316
308317
.. _whatsnew_140.notable_bug_fixes.value_counts_and_mode_do_not_coerce_to_nan:
309318

doc/source/whatsnew/v1.4.3.rst

+11
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,17 @@ including other versions of pandas.
1010

1111
.. ---------------------------------------------------------------------------
1212
13+
.. _whatsnew_143.concat:
14+
15+
Behaviour of ``concat`` with empty or all-NA DataFrame columns
16+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17+
18+
The behaviour change in version 1.4.0 to stop ignoring the data type
19+
of empty or all-NA columns with float or object dtype in :func:`concat`
20+
(:ref:`whatsnew_140.notable_bug_fixes.concat_with_empty_or_all_na`) has been
21+
reverted (:issue:`45637`).
22+
23+
1324
.. _whatsnew_143.regressions:
1425

1526
Fixed regressions

pandas/core/dtypes/missing.py

+38
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
import pandas._libs.missing as libmissing
1919
from pandas._libs.tslibs import (
2020
NaT,
21+
Period,
2122
iNaT,
2223
)
2324

@@ -739,3 +740,40 @@ def is_valid_na_for_dtype(obj, dtype: DtypeObj) -> bool:
739740

740741
# fallback, default to allowing NaN, None, NA, NaT
741742
return not isinstance(obj, (np.datetime64, np.timedelta64, Decimal))
743+
744+
745+
def isna_all(arr: ArrayLike) -> bool:
746+
"""
747+
Optimized equivalent to isna(arr).all()
748+
"""
749+
total_len = len(arr)
750+
751+
# Usually it's enough to check but a small fraction of values to see if
752+
# a block is NOT null, chunks should help in such cases.
753+
# parameters 1000 and 40 were chosen arbitrarily
754+
chunk_len = max(total_len // 40, 1000)
755+
756+
dtype = arr.dtype
757+
if dtype.kind == "f":
758+
checker = nan_checker
759+
760+
elif dtype.kind in ["m", "M"] or dtype.type is Period:
761+
# error: Incompatible types in assignment (expression has type
762+
# "Callable[[Any], Any]", variable has type "ufunc")
763+
checker = lambda x: np.asarray(x.view("i8")) == iNaT # type: ignore[assignment]
764+
765+
else:
766+
# error: Incompatible types in assignment (expression has type "Callable[[Any],
767+
# Any]", variable has type "ufunc")
768+
checker = lambda x: _isna_array( # type: ignore[assignment]
769+
x, inf_as_na=INF_AS_NA
770+
)
771+
772+
return all(
773+
# error: Argument 1 to "__call__" of "ufunc" has incompatible type
774+
# "Union[ExtensionArray, Any]"; expected "Union[Union[int, float, complex, str,
775+
# bytes, generic], Sequence[Union[int, float, complex, str, bytes, generic]],
776+
# Sequence[Sequence[Any]], _SupportsArray]"
777+
checker(arr[i : i + chunk_len]).all() # type: ignore[arg-type]
778+
for i in range(0, total_len, chunk_len)
779+
)

0 commit comments

Comments
 (0)