Skip to content

Commit 9c8c67d

Browse files
added FutureWarning to empty Series without dtype and adjusted the tests and docs so that no unnecessary warnings are thrown
1 parent f1117bd commit 9c8c67d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

83 files changed

+403
-242
lines changed

doc/source/user_guide/missing_data.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -190,15 +190,15 @@ The sum of an empty or all-NA Series or column of a DataFrame is 0.
190190
191191
pd.Series([np.nan]).sum()
192192
193-
pd.Series([]).sum()
193+
pd.Series([], dtype="float64").sum()
194194
195195
The product of an empty or all-NA Series or column of a DataFrame is 1.
196196

197197
.. ipython:: python
198198
199199
pd.Series([np.nan]).prod()
200200
201-
pd.Series([]).prod()
201+
pd.Series([], dtype="float64").prod()
202202
203203
204204
NA values in GroupBy

doc/source/user_guide/scale.rst

+1
Original file line numberDiff line numberDiff line change
@@ -358,6 +358,7 @@ results will fit in memory, so we can safely call ``compute`` without running
358358
out of memory. At that point it's just a regular pandas object.
359359

360360
.. ipython:: python
361+
:okwarning:
361362
362363
@savefig dask_resample.png
363364
ddf[['x', 'y']].resample("1D").mean().cumsum().compute().plot()

doc/source/whatsnew/v0.19.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -707,6 +707,7 @@ A ``Series`` will now correctly promote its dtype for assignment with incompat v
707707

708708

709709
.. ipython:: python
710+
:okwarning:
710711
711712
s = pd.Series()
712713

doc/source/whatsnew/v0.21.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -428,6 +428,7 @@ Note that this also changes the sum of an empty ``Series``. Previously this alwa
428428
but for consistency with the all-NaN case, this was changed to return NaN as well:
429429

430430
.. ipython:: python
431+
:okwarning:
431432
432433
pd.Series([]).sum()
433434

doc/source/whatsnew/v0.22.0.rst

+3
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ The default sum for empty or all-*NA* ``Series`` is now ``0``.
5555
*pandas 0.22.0*
5656

5757
.. ipython:: python
58+
:okwarning:
5859
5960
pd.Series([]).sum()
6061
pd.Series([np.nan]).sum()
@@ -67,6 +68,7 @@ pandas 0.20.3 without bottleneck, or pandas 0.21.x), use the ``min_count``
6768
keyword.
6869

6970
.. ipython:: python
71+
:okwarning:
7072
7173
pd.Series([]).sum(min_count=1)
7274
@@ -85,6 +87,7 @@ required for a non-NA sum or product.
8587
returning ``1`` instead.
8688

8789
.. ipython:: python
90+
:okwarning:
8891
8992
pd.Series([]).prod()
9093
pd.Series([np.nan]).prod()

doc/source/whatsnew/v1.0.0.rst

+18-1
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,23 @@ When :class:`Categorical` contains ``np.nan``,
356356
357357
pd.Categorical([1, 2, np.nan], ordered=True).min()
358358
359+
360+
Default dtype of empty :class:`pandas.core.series.Series`
361+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
362+
363+
Initialising an empty :class:`pandas.core.series.Series` without specifying a dtype will raise a `FutureWarning` now
364+
(:issue:`17261`). The default dtype will change from ``float64`` to ``object`` in future releases so that it is
365+
consistent with the behaviour of :class:`DataFrame` and :class:`Index`.
366+
367+
*pandas 1.0.0*
368+
369+
.. code-block:: ipython
370+
371+
In [1]: pd.Series()
372+
Out[2]:
373+
FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in the next version. Specify a dtype explicitly to silence this warning.
374+
Series([], dtype: float64)
375+
359376
.. _whatsnew_1000.api_breaking.deps:
360377

361378
Increased minimum versions for dependencies
@@ -484,7 +501,7 @@ Removal of prior version deprecations/changes
484501

485502
Previously, pandas would register converters with matplotlib as a side effect of importing pandas (:issue:`18720`).
486503
This changed the output of plots made via matplotlib plots after pandas was imported, even if you were using
487-
matplotlib directly rather than rather than :meth:`~DataFrame.plot`.
504+
matplotlib directly rather than :meth:`~DataFrame.plot`.
488505

489506
To use pandas formatters with a matplotlib plot, specify
490507

pandas/compat/pickle_compat.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ def __new__(cls) -> "Series": # type: ignore
6464
stacklevel=6,
6565
)
6666

67-
return Series()
67+
return Series(dtype=object)
6868

6969

7070
class _LoadSparseFrame:

pandas/core/algorithms.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -601,7 +601,7 @@ def _factorize_array(
601601
)
602602
@Appender(_shared_docs["factorize"])
603603
def factorize(
604-
values, sort: bool = False, na_sentinel: int = -1, size_hint: Optional[int] = None,
604+
values, sort: bool = False, na_sentinel: int = -1, size_hint: Optional[int] = None
605605
) -> Tuple[np.ndarray, Union[np.ndarray, ABCIndex]]:
606606
# Implementation notes: This method is responsible for 3 things
607607
# 1.) coercing data to array-like (ndarray, Index, extension array)

pandas/core/apply.py

+16-3
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@
1515
)
1616
from pandas.core.dtypes.generic import ABCMultiIndex, ABCSeries
1717

18+
from pandas.core.construction import create_series_with_explicit_dtype
19+
1820
if TYPE_CHECKING:
1921
from pandas import DataFrame, Series, Index
2022

@@ -203,15 +205,15 @@ def apply_empty_result(self):
203205

204206
if not should_reduce:
205207
try:
206-
r = self.f(Series([]))
208+
r = self.f(Series([], dtype=np.float64))
207209
except Exception:
208210
pass
209211
else:
210212
should_reduce = not isinstance(r, Series)
211213

212214
if should_reduce:
213215
if len(self.agg_axis):
214-
r = self.f(Series([]))
216+
r = self.f(Series([], dtype=np.float64))
215217
else:
216218
r = np.nan
217219

@@ -346,14 +348,25 @@ def apply_series_generator(self) -> Tuple[ResType, "Index"]:
346348
def wrap_results(
347349
self, results: ResType, res_index: "Index"
348350
) -> Union["Series", "DataFrame"]:
351+
from pandas import Series
349352

350353
# see if we can infer the results
351354
if len(results) > 0 and 0 in results and is_sequence(results[0]):
352355

353356
return self.wrap_results_for_axis(results, res_index)
354357

355358
# dict of scalars
356-
result = self.obj._constructor_sliced(results)
359+
360+
# the default dtype of an empty Series will be `object`, but this
361+
# code can be hit by df.mean() where the result should have dtype
362+
# float64 even if it's an empty Series.
363+
constructor_sliced = self.obj._constructor_sliced
364+
if constructor_sliced is Series:
365+
result = create_series_with_explicit_dtype(
366+
results, dtype_if_empty=np.float64
367+
)
368+
else:
369+
result = constructor_sliced(results)
357370
result.index = res_index
358371

359372
return result

pandas/core/base.py

+8-2
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
from pandas.core.accessor import DirNamesMixin
3535
from pandas.core.algorithms import duplicated, unique1d, value_counts
3636
from pandas.core.arrays import ExtensionArray
37+
from pandas.core.construction import create_series_with_explicit_dtype
3738
import pandas.core.nanops as nanops
3839

3940
_shared_docs: Dict[str, str] = dict()
@@ -1132,9 +1133,14 @@ def _map_values(self, mapper, na_action=None):
11321133
# convert to an Series for efficiency.
11331134
# we specify the keys here to handle the
11341135
# possibility that they are tuples
1135-
from pandas import Series
11361136

1137-
mapper = Series(mapper)
1137+
# The return value of mapping with an empty mapper is
1138+
# expected to be pd.Series(np.nan, ...). As np.nan is
1139+
# of dtype float64 the return value of this method should
1140+
# be float64 as well
1141+
mapper = create_series_with_explicit_dtype(
1142+
mapper, dtype_if_empty=np.float64
1143+
)
11381144

11391145
if isinstance(mapper, ABCSeries):
11401146
# Since values were input this means we came from either

pandas/core/construction.py

+35
Original file line numberDiff line numberDiff line change
@@ -565,3 +565,38 @@ def _try_cast(
565565
else:
566566
subarr = np.array(arr, dtype=object, copy=copy)
567567
return subarr
568+
569+
570+
# see gh-17261
571+
def is_empty_data(data):
572+
"""
573+
Utility to check if a Series is instantiated with empty data
574+
"""
575+
is_none = data is None
576+
is_simple_empty = isinstance(data, (list, tuple, dict)) and not data
577+
return is_none or is_simple_empty
578+
579+
580+
def create_series_with_explicit_dtype(
581+
data=None,
582+
index=None,
583+
dtype=None,
584+
name=None,
585+
copy=False,
586+
fastpath=False,
587+
dtype_if_empty=object,
588+
):
589+
"""
590+
Helper to pass an explicit dtype when instantiating an empty Series.
591+
592+
The signature of this function mirrors the signature of Series.__init__
593+
but adds the additional keyword argument `dtype_if_empty`.
594+
595+
This silences a FutureWarning described in the GitHub issue
596+
mentioned above.
597+
"""
598+
from pandas.core.series import Series
599+
600+
if is_empty_data(data) and dtype is None:
601+
dtype = dtype_if_empty
602+
return Series(data, index, dtype, name, copy, fastpath)

pandas/core/frame.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -7954,7 +7954,7 @@ def quantile(self, q=0.5, axis=0, numeric_only=True, interpolation="linear"):
79547954
cols = Index([], name=self.columns.name)
79557955
if is_list_like(q):
79567956
return self._constructor([], index=q, columns=cols)
7957-
return self._constructor_sliced([], index=cols, name=q)
7957+
return self._constructor_sliced([], index=cols, name=q, dtype=np.float64)
79587958

79597959
result = data._data.quantile(
79607960
qs=q, axis=1, interpolation=interpolation, transposed=is_transposed

pandas/core/generic.py

+3-4
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@
7272
import pandas.core.algorithms as algos
7373
from pandas.core.base import PandasObject, SelectionMixin
7474
import pandas.core.common as com
75+
from pandas.core.construction import create_series_with_explicit_dtype
7576
from pandas.core.index import (
7677
Index,
7778
InvalidIndexError,
@@ -6110,9 +6111,7 @@ def fillna(
61106111

61116112
if self.ndim == 1:
61126113
if isinstance(value, (dict, ABCSeries)):
6113-
from pandas import Series
6114-
6115-
value = Series(value)
6114+
value = create_series_with_explicit_dtype(value)
61166115
elif not is_list_like(value):
61176116
pass
61186117
else:
@@ -7064,7 +7063,7 @@ def asof(self, where, subset=None):
70647063
if not is_series:
70657064
from pandas import Series
70667065

7067-
return Series(index=self.columns, name=where)
7066+
return Series(index=self.columns, name=where, dtype=np.float64)
70687067
return np.nan
70697068

70707069
# It's always much faster to use a *while* loop here for

pandas/core/groupby/generic.py

+14-5
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@
5151
import pandas.core.algorithms as algorithms
5252
from pandas.core.base import DataError, SpecificationError
5353
import pandas.core.common as com
54+
from pandas.core.construction import create_series_with_explicit_dtype
5455
from pandas.core.frame import DataFrame
5556
from pandas.core.generic import ABCDataFrame, ABCSeries, NDFrame, _shared_docs
5657
from pandas.core.groupby import base
@@ -259,7 +260,7 @@ def aggregate(self, func=None, *args, **kwargs):
259260
result = self._aggregate_named(func, *args, **kwargs)
260261

261262
index = Index(sorted(result), name=self.grouper.names[0])
262-
ret = Series(result, index=index)
263+
ret = create_series_with_explicit_dtype(result, index=index)
263264

264265
if not self.as_index: # pragma: no cover
265266
print("Warning, ignoring as_index=True")
@@ -407,7 +408,7 @@ def _wrap_transformed_output(
407408
def _wrap_applied_output(self, keys, values, not_indexed_same=False):
408409
if len(keys) == 0:
409410
# GH #6265
410-
return Series([], name=self._selection_name, index=keys)
411+
return Series([], name=self._selection_name, index=keys, dtype=np.float64)
411412

412413
def _get_index() -> Index:
413414
if self.grouper.nkeys > 1:
@@ -493,7 +494,7 @@ def _transform_general(self, func, *args, **kwargs):
493494

494495
result = concat(results).sort_index()
495496
else:
496-
result = Series()
497+
result = Series(dtype=np.float64)
497498

498499
# we will only try to coerce the result type if
499500
# we have a numeric dtype, as these are *always* user-defined funcs
@@ -1205,9 +1206,17 @@ def first_not_none(values):
12051206
if v is None:
12061207
return DataFrame()
12071208
elif isinstance(v, NDFrame):
1209+
1210+
# this is to silence a FutureWarning
1211+
# TODO: Remove when default dtype of empty Series is object
1212+
kwargs = v._construct_axes_dict()
1213+
if v._constructor is Series:
1214+
is_empty = "data" not in kwargs or not kwargs["data"]
1215+
if "dtype" not in kwargs and is_empty:
1216+
kwargs["dtype"] = object
1217+
12081218
values = [
1209-
x if x is not None else v._constructor(**v._construct_axes_dict())
1210-
for x in values
1219+
x if (x is not None) else v._constructor(**kwargs) for x in values
12111220
]
12121221

12131222
v = values[0]

pandas/core/series.py

+21-2
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,12 @@
5454
from pandas.core.arrays.categorical import Categorical, CategoricalAccessor
5555
from pandas.core.arrays.sparse import SparseAccessor
5656
import pandas.core.common as com
57-
from pandas.core.construction import extract_array, sanitize_array
57+
from pandas.core.construction import (
58+
create_series_with_explicit_dtype,
59+
extract_array,
60+
is_empty_data,
61+
sanitize_array,
62+
)
5863
from pandas.core.index import (
5964
Float64Index,
6065
Index,
@@ -175,6 +180,18 @@ class Series(base.IndexOpsMixin, generic.NDFrame):
175180
def __init__(
176181
self, data=None, index=None, dtype=None, name=None, copy=False, fastpath=False
177182
):
183+
if is_empty_data(data) and dtype is None:
184+
# Empty Series should have dtype object to be consistent
185+
# with the behaviour of DataFrame and Index
186+
warnings.warn(
187+
"The default dtype for empty Series will be 'object' instead"
188+
" of 'float64' in the next version. Specify a dtype explicitly"
189+
" to silence this warning.",
190+
FutureWarning,
191+
stacklevel=2,
192+
)
193+
# uncomment the line below when removing the FutureWarning
194+
# dtype = np.dtype(object)
178195

179196
# we are called internally, so short-circuit
180197
if fastpath:
@@ -328,7 +345,9 @@ def _init_dict(self, data, index=None, dtype=None):
328345
keys, values = [], []
329346

330347
# Input is now list-like, so rely on "standard" construction:
331-
s = Series(values, index=keys, dtype=dtype)
348+
s = create_series_with_explicit_dtype(
349+
values, index=keys, dtype=dtype, dtype_if_empty=np.float64
350+
)
332351

333352
# Now we just make sure the order is respected, if any
334353
if data and index is not None:

pandas/core/tools/datetimes.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,8 @@ def _maybe_cache(arg, format, cache, convert_listlike):
145145
"""
146146
from pandas import Series
147147

148-
cache_array = Series()
148+
cache_array = Series(dtype=object)
149+
149150
if cache:
150151
# Perform a quicker unique check
151152
if not should_cache(arg):

pandas/io/html.py

+3-2
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414

1515
from pandas.core.dtypes.common import is_list_like
1616

17-
from pandas import Series
17+
from pandas.core.construction import create_series_with_explicit_dtype
1818

1919
from pandas.io.common import _is_url, _validate_header_arg, urlopen
2020
from pandas.io.formats.printing import pprint_thing
@@ -762,7 +762,8 @@ def _parse_tfoot_tr(self, table):
762762

763763

764764
def _expand_elements(body):
765-
lens = Series([len(elem) for elem in body])
765+
data = [len(elem) for elem in body]
766+
lens = create_series_with_explicit_dtype(data)
766767
lens_max = lens.max()
767768
not_max = lens[lens != lens_max]
768769

0 commit comments

Comments
 (0)