Skip to content

Commit 144bd32

Browse files
SaturnFromTitanproost
authored andcommitted
API: Use object dtype for empty Series (pandas-dev#29405)
1 parent a368b19 commit 144bd32

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+444
-247
lines changed

doc/source/user_guide/missing_data.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -190,15 +190,15 @@ The sum of an empty or all-NA Series or column of a DataFrame is 0.
190190
191191
pd.Series([np.nan]).sum()
192192
193-
pd.Series([]).sum()
193+
pd.Series([], dtype="float64").sum()
194194
195195
The product of an empty or all-NA Series or column of a DataFrame is 1.
196196

197197
.. ipython:: python
198198
199199
pd.Series([np.nan]).prod()
200200
201-
pd.Series([]).prod()
201+
pd.Series([], dtype="float64").prod()
202202
203203
204204
NA values in GroupBy

doc/source/user_guide/scale.rst

+1
Original file line numberDiff line numberDiff line change
@@ -358,6 +358,7 @@ results will fit in memory, so we can safely call ``compute`` without running
358358
out of memory. At that point it's just a regular pandas object.
359359

360360
.. ipython:: python
361+
:okwarning:
361362
362363
@savefig dask_resample.png
363364
ddf[['x', 'y']].resample("1D").mean().cumsum().compute().plot()

doc/source/whatsnew/v0.19.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -707,6 +707,7 @@ A ``Series`` will now correctly promote its dtype for assignment with incompat v
707707

708708

709709
.. ipython:: python
710+
:okwarning:
710711
711712
s = pd.Series()
712713

doc/source/whatsnew/v0.21.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -428,6 +428,7 @@ Note that this also changes the sum of an empty ``Series``. Previously this alwa
428428
but for consistency with the all-NaN case, this was changed to return NaN as well:
429429

430430
.. ipython:: python
431+
:okwarning:
431432
432433
pd.Series([]).sum()
433434

doc/source/whatsnew/v0.22.0.rst

+3
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ The default sum for empty or all-*NA* ``Series`` is now ``0``.
5555
*pandas 0.22.0*
5656

5757
.. ipython:: python
58+
:okwarning:
5859
5960
pd.Series([]).sum()
6061
pd.Series([np.nan]).sum()
@@ -67,6 +68,7 @@ pandas 0.20.3 without bottleneck, or pandas 0.21.x), use the ``min_count``
6768
keyword.
6869

6970
.. ipython:: python
71+
:okwarning:
7072
7173
pd.Series([]).sum(min_count=1)
7274
@@ -85,6 +87,7 @@ required for a non-NA sum or product.
8587
returning ``1`` instead.
8688

8789
.. ipython:: python
90+
:okwarning:
8891
8992
pd.Series([]).prod()
9093
pd.Series([np.nan]).prod()

doc/source/whatsnew/v1.0.0.rst

+18-1
Original file line numberDiff line numberDiff line change
@@ -366,6 +366,23 @@ When :class:`Categorical` contains ``np.nan``,
366366
367367
pd.Categorical([1, 2, np.nan], ordered=True).min()
368368
369+
370+
Default dtype of empty :class:`pandas.Series`
371+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
372+
373+
Initialising an empty :class:`pandas.Series` without specifying a dtype will raise a `DeprecationWarning` now
374+
(:issue:`17261`). The default dtype will change from ``float64`` to ``object`` in future releases so that it is
375+
consistent with the behaviour of :class:`DataFrame` and :class:`Index`.
376+
377+
*pandas 1.0.0*
378+
379+
.. code-block:: ipython
380+
381+
In [1]: pd.Series()
382+
Out[2]:
383+
DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
384+
Series([], dtype: float64)
385+
369386
.. _whatsnew_1000.api_breaking.deps:
370387

371388
Increased minimum versions for dependencies
@@ -494,7 +511,7 @@ Removal of prior version deprecations/changes
494511

495512
Previously, pandas would register converters with matplotlib as a side effect of importing pandas (:issue:`18720`).
496513
This changed the output of plots made via matplotlib plots after pandas was imported, even if you were using
497-
matplotlib directly rather than rather than :meth:`~DataFrame.plot`.
514+
matplotlib directly rather than :meth:`~DataFrame.plot`.
498515

499516
To use pandas formatters with a matplotlib plot, specify
500517

pandas/compat/pickle_compat.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ def __new__(cls) -> "Series": # type: ignore
6464
stacklevel=6,
6565
)
6666

67-
return Series()
67+
return Series(dtype=object)
6868

6969

7070
class _LoadSparseFrame:

pandas/core/apply.py

+16-3
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@
1515
)
1616
from pandas.core.dtypes.generic import ABCMultiIndex, ABCSeries
1717

18+
from pandas.core.construction import create_series_with_explicit_dtype
19+
1820
if TYPE_CHECKING:
1921
from pandas import DataFrame, Series, Index
2022

@@ -203,15 +205,15 @@ def apply_empty_result(self):
203205

204206
if not should_reduce:
205207
try:
206-
r = self.f(Series([]))
208+
r = self.f(Series([], dtype=np.float64))
207209
except Exception:
208210
pass
209211
else:
210212
should_reduce = not isinstance(r, Series)
211213

212214
if should_reduce:
213215
if len(self.agg_axis):
214-
r = self.f(Series([]))
216+
r = self.f(Series([], dtype=np.float64))
215217
else:
216218
r = np.nan
217219

@@ -346,14 +348,25 @@ def apply_series_generator(self) -> Tuple[ResType, "Index"]:
346348
def wrap_results(
347349
self, results: ResType, res_index: "Index"
348350
) -> Union["Series", "DataFrame"]:
351+
from pandas import Series
349352

350353
# see if we can infer the results
351354
if len(results) > 0 and 0 in results and is_sequence(results[0]):
352355

353356
return self.wrap_results_for_axis(results, res_index)
354357

355358
# dict of scalars
356-
result = self.obj._constructor_sliced(results)
359+
360+
# the default dtype of an empty Series will be `object`, but this
361+
# code can be hit by df.mean() where the result should have dtype
362+
# float64 even if it's an empty Series.
363+
constructor_sliced = self.obj._constructor_sliced
364+
if constructor_sliced is Series:
365+
result = create_series_with_explicit_dtype(
366+
results, dtype_if_empty=np.float64
367+
)
368+
else:
369+
result = constructor_sliced(results)
357370
result.index = res_index
358371

359372
return result

pandas/core/base.py

+8-2
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
from pandas.core.accessor import DirNamesMixin
3535
from pandas.core.algorithms import duplicated, unique1d, value_counts
3636
from pandas.core.arrays import ExtensionArray
37+
from pandas.core.construction import create_series_with_explicit_dtype
3738
import pandas.core.nanops as nanops
3839

3940
_shared_docs: Dict[str, str] = dict()
@@ -1132,9 +1133,14 @@ def _map_values(self, mapper, na_action=None):
11321133
# convert to an Series for efficiency.
11331134
# we specify the keys here to handle the
11341135
# possibility that they are tuples
1135-
from pandas import Series
11361136

1137-
mapper = Series(mapper)
1137+
# The return value of mapping with an empty mapper is
1138+
# expected to be pd.Series(np.nan, ...). As np.nan is
1139+
# of dtype float64 the return value of this method should
1140+
# be float64 as well
1141+
mapper = create_series_with_explicit_dtype(
1142+
mapper, dtype_if_empty=np.float64
1143+
)
11381144

11391145
if isinstance(mapper, ABCSeries):
11401146
# Since values were input this means we came from either

pandas/core/construction.py

+65-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
55
These should not depend on core.internals.
66
"""
7-
from typing import Optional, Sequence, Union, cast
7+
from typing import TYPE_CHECKING, Any, Optional, Sequence, Union, cast
88

99
import numpy as np
1010
import numpy.ma as ma
@@ -44,8 +44,13 @@
4444
)
4545
from pandas.core.dtypes.missing import isna
4646

47+
from pandas._typing import ArrayLike, Dtype
4748
import pandas.core.common as com
4849

50+
if TYPE_CHECKING:
51+
from pandas.core.series import Series # noqa: F401
52+
from pandas.core.index import Index # noqa: F401
53+
4954

5055
def array(
5156
data: Sequence[object],
@@ -565,3 +570,62 @@ def _try_cast(
565570
else:
566571
subarr = np.array(arr, dtype=object, copy=copy)
567572
return subarr
573+
574+
575+
def is_empty_data(data: Any) -> bool:
576+
"""
577+
Utility to check if a Series is instantiated with empty data,
578+
which does not contain dtype information.
579+
580+
Parameters
581+
----------
582+
data : array-like, Iterable, dict, or scalar value
583+
Contains data stored in Series.
584+
585+
Returns
586+
-------
587+
bool
588+
"""
589+
is_none = data is None
590+
is_list_like_without_dtype = is_list_like(data) and not hasattr(data, "dtype")
591+
is_simple_empty = is_list_like_without_dtype and not data
592+
return is_none or is_simple_empty
593+
594+
595+
def create_series_with_explicit_dtype(
596+
data: Any = None,
597+
index: Optional[Union[ArrayLike, "Index"]] = None,
598+
dtype: Optional[Dtype] = None,
599+
name: Optional[str] = None,
600+
copy: bool = False,
601+
fastpath: bool = False,
602+
dtype_if_empty: Dtype = object,
603+
) -> "Series":
604+
"""
605+
Helper to pass an explicit dtype when instantiating an empty Series.
606+
607+
This silences a DeprecationWarning described in GitHub-17261.
608+
609+
Parameters
610+
----------
611+
data : Mirrored from Series.__init__
612+
index : Mirrored from Series.__init__
613+
dtype : Mirrored from Series.__init__
614+
name : Mirrored from Series.__init__
615+
copy : Mirrored from Series.__init__
616+
fastpath : Mirrored from Series.__init__
617+
dtype_if_empty : str, numpy.dtype, or ExtensionDtype
618+
This dtype will be passed explicitly if an empty Series will
619+
be instantiated.
620+
621+
Returns
622+
-------
623+
Series
624+
"""
625+
from pandas.core.series import Series
626+
627+
if is_empty_data(data) and dtype is None:
628+
dtype = dtype_if_empty
629+
return Series(
630+
data=data, index=index, dtype=dtype, name=name, copy=copy, fastpath=fastpath
631+
)

pandas/core/frame.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -7956,7 +7956,7 @@ def quantile(self, q=0.5, axis=0, numeric_only=True, interpolation="linear"):
79567956
cols = Index([], name=self.columns.name)
79577957
if is_list_like(q):
79587958
return self._constructor([], index=q, columns=cols)
7959-
return self._constructor_sliced([], index=cols, name=q)
7959+
return self._constructor_sliced([], index=cols, name=q, dtype=np.float64)
79607960

79617961
result = data._data.quantile(
79627962
qs=q, axis=1, interpolation=interpolation, transposed=is_transposed

pandas/core/generic.py

+5-4
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@
7272
import pandas.core.algorithms as algos
7373
from pandas.core.base import PandasObject, SelectionMixin
7474
import pandas.core.common as com
75+
from pandas.core.construction import create_series_with_explicit_dtype
7576
from pandas.core.index import (
7677
Index,
7778
InvalidIndexError,
@@ -6042,9 +6043,9 @@ def fillna(
60426043

60436044
if self.ndim == 1:
60446045
if isinstance(value, (dict, ABCSeries)):
6045-
from pandas import Series
6046-
6047-
value = Series(value)
6046+
value = create_series_with_explicit_dtype(
6047+
value, dtype_if_empty=object
6048+
)
60486049
elif not is_list_like(value):
60496050
pass
60506051
else:
@@ -6996,7 +6997,7 @@ def asof(self, where, subset=None):
69966997
if not is_series:
69976998
from pandas import Series
69986999

6999-
return Series(index=self.columns, name=where)
7000+
return Series(index=self.columns, name=where, dtype=np.float64)
70007001
return np.nan
70017002

70027003
# It's always much faster to use a *while* loop here for

pandas/core/groupby/generic.py

+18-7
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@
5151
import pandas.core.algorithms as algorithms
5252
from pandas.core.base import DataError, SpecificationError
5353
import pandas.core.common as com
54+
from pandas.core.construction import create_series_with_explicit_dtype
5455
from pandas.core.frame import DataFrame
5556
from pandas.core.generic import ABCDataFrame, ABCSeries, NDFrame, _shared_docs
5657
from pandas.core.groupby import base
@@ -259,7 +260,9 @@ def aggregate(self, func=None, *args, **kwargs):
259260
result = self._aggregate_named(func, *args, **kwargs)
260261

261262
index = Index(sorted(result), name=self.grouper.names[0])
262-
ret = Series(result, index=index)
263+
ret = create_series_with_explicit_dtype(
264+
result, index=index, dtype_if_empty=object
265+
)
263266

264267
if not self.as_index: # pragma: no cover
265268
print("Warning, ignoring as_index=True")
@@ -407,7 +410,7 @@ def _wrap_transformed_output(
407410
def _wrap_applied_output(self, keys, values, not_indexed_same=False):
408411
if len(keys) == 0:
409412
# GH #6265
410-
return Series([], name=self._selection_name, index=keys)
413+
return Series([], name=self._selection_name, index=keys, dtype=np.float64)
411414

412415
def _get_index() -> Index:
413416
if self.grouper.nkeys > 1:
@@ -493,7 +496,7 @@ def _transform_general(self, func, *args, **kwargs):
493496

494497
result = concat(results).sort_index()
495498
else:
496-
result = Series()
499+
result = Series(dtype=np.float64)
497500

498501
# we will only try to coerce the result type if
499502
# we have a numeric dtype, as these are *always* user-defined funcs
@@ -1205,10 +1208,18 @@ def first_not_none(values):
12051208
if v is None:
12061209
return DataFrame()
12071210
elif isinstance(v, NDFrame):
1208-
values = [
1209-
x if x is not None else v._constructor(**v._construct_axes_dict())
1210-
for x in values
1211-
]
1211+
1212+
# this is to silence a DeprecationWarning
1213+
# TODO: Remove when default dtype of empty Series is object
1214+
kwargs = v._construct_axes_dict()
1215+
if v._constructor is Series:
1216+
backup = create_series_with_explicit_dtype(
1217+
**kwargs, dtype_if_empty=object
1218+
)
1219+
else:
1220+
backup = v._constructor(**kwargs)
1221+
1222+
values = [x if (x is not None) else backup for x in values]
12121223

12131224
v = values[0]
12141225

0 commit comments

Comments
 (0)