Skip to content

Commit ad7dc56

Browse files
Backport PR #47372 on branch 1.4.x (REGR: revert behaviour change for concat with empty/all-NaN data) (#47472)
Backport PR #47372: REGR: revert behaviour change for concat with empty/all-NaN data Co-authored-by: Joris Van den Bossche <[email protected]>
1 parent 30a3b98 commit ad7dc56

File tree

8 files changed

+383
-140
lines changed

8 files changed

+383
-140
lines changed

doc/source/whatsnew/v1.4.0.rst

+11-2
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,9 @@ the given ``dayfirst`` value when the value is a delimited date string (e.g.
271271
Ignoring dtypes in concat with empty or all-NA columns
272272
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
273273

274+
.. note::
275+
This behaviour change has been reverted in pandas 1.4.3.
276+
274277
When using :func:`concat` to concatenate two or more :class:`DataFrame` objects,
275278
if one of the DataFrames was empty or had all-NA values, its dtype was
276279
*sometimes* ignored when finding the concatenated dtype. These are now
@@ -301,9 +304,15 @@ object, the ``np.nan`` is retained.
301304

302305
*New behavior*:
303306

304-
.. ipython:: python
307+
.. code-block:: ipython
308+
309+
In [4]: res
310+
Out[4]:
311+
bar
312+
0 2013-01-01 00:00:00
313+
1 NaN
314+
305315
306-
res
307316
308317
.. _whatsnew_140.notable_bug_fixes.value_counts_and_mode_do_not_coerce_to_nan:
309318

doc/source/whatsnew/v1.4.3.rst

+11
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,17 @@ including other versions of pandas.
1010

1111
.. ---------------------------------------------------------------------------
1212
13+
.. _whatsnew_143.concat:
14+
15+
Behaviour of ``concat`` with empty or all-NA DataFrame columns
16+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
17+
18+
The behaviour change in version 1.4.0 to stop ignoring the data type
19+
of empty or all-NA columns with float or object dtype in :func:`concat`
20+
(:ref:`whatsnew_140.notable_bug_fixes.concat_with_empty_or_all_na`) has been
21+
reverted (:issue:`45637`).
22+
23+
1324
.. _whatsnew_143.regressions:
1425

1526
Fixed regressions

pandas/core/dtypes/missing.py

+38
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
import pandas._libs.missing as libmissing
1515
from pandas._libs.tslibs import (
1616
NaT,
17+
Period,
1718
iNaT,
1819
)
1920
from pandas._typing import (
@@ -668,3 +669,40 @@ def is_valid_na_for_dtype(obj, dtype: DtypeObj) -> bool:
668669

669670
# fallback, default to allowing NaN, None, NA, NaT
670671
return not isinstance(obj, (np.datetime64, np.timedelta64, Decimal))
672+
673+
674+
def isna_all(arr: ArrayLike) -> bool:
675+
"""
676+
Optimized equivalent to isna(arr).all()
677+
"""
678+
total_len = len(arr)
679+
680+
# Usually it's enough to check but a small fraction of values to see if
681+
# a block is NOT null, chunks should help in such cases.
682+
# parameters 1000 and 40 were chosen arbitrarily
683+
chunk_len = max(total_len // 40, 1000)
684+
685+
dtype = arr.dtype
686+
if dtype.kind == "f":
687+
checker = nan_checker
688+
689+
elif dtype.kind in ["m", "M"] or dtype.type is Period:
690+
# error: Incompatible types in assignment (expression has type
691+
# "Callable[[Any], Any]", variable has type "ufunc")
692+
checker = lambda x: np.asarray(x.view("i8")) == iNaT # type: ignore[assignment]
693+
694+
else:
695+
# error: Incompatible types in assignment (expression has type "Callable[[Any],
696+
# Any]", variable has type "ufunc")
697+
checker = lambda x: _isna_array( # type: ignore[assignment]
698+
x, inf_as_na=INF_AS_NA
699+
)
700+
701+
return all(
702+
# error: Argument 1 to "__call__" of "ufunc" has incompatible type
703+
# "Union[ExtensionArray, Any]"; expected "Union[Union[int, float, complex, str,
704+
# bytes, generic], Sequence[Union[int, float, complex, str, bytes, generic]],
705+
# Sequence[Sequence[Any]], _SupportsArray]"
706+
checker(arr[i : i + chunk_len]).all() # type: ignore[arg-type]
707+
for i in range(0, total_len, chunk_len)
708+
)

0 commit comments

Comments
 (0)