Skip to content

Study on refactoring the series constructor #57952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from 95 commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
398477a
REF Series: add a test to check that on dictionary constructor np.nan…
aureliobarbosa Mar 8, 2024
a15fa21
REF Series: simplify Series._init_dict constructor
aureliobarbosa Mar 9, 2024
e72cff1
REF Series: add tests to ensure that series dict constructor preserve…
aureliobarbosa Mar 9, 2024
0e6f269
REF Series: Move 'data is None' below
aureliobarbosa Mar 11, 2024
b696dc2
REF Series: ensure dict is not series and Series(dict(),...) -> Serie…
aureliobarbosa Mar 11, 2024
66793c3
REF Series: Series(dict(),...) extracted from _init_dict() which is…
aureliobarbosa Mar 11, 2024
5e09cc8
REF Series: bring dict data closer and make if-else structure identical
aureliobarbosa Mar 11, 2024
f0f336d
REF Series: Decouple dictionary from all other kinds of data
aureliobarbosa Mar 11, 2024
ca87d9d
REF Series: Starting to decouple index and data is None/Not None edge…
aureliobarbosa Mar 12, 2024
6e850ea
REF Series: Further simplifications
aureliobarbosa Mar 12, 2024
94f4fe5
REF Series: More simplifications
aureliobarbosa Mar 12, 2024
b4f71ac
REF Series: Else if is joined and escalated from the most restrictive…
aureliobarbosa Mar 12, 2024
c0c4554
REF Series: Clean unused code and comments.
aureliobarbosa Mar 12, 2024
ff139db
REF Series: simplification
aureliobarbosa Mar 13, 2024
9a1843e
REF Series: Groupping common operations on index
aureliobarbosa Mar 13, 2024
161d0ff
REF Series: selecting iterator on if-else data logic was no longer ne…
aureliobarbosa Mar 13, 2024
4b208bb
REF Series: Separating a peculiar data Sized type being catched by is…
aureliobarbosa Mar 13, 2024
299169a
REF Series: Repositioning code blocks according to tasks
aureliobarbosa Mar 13, 2024
b1f1320
REF Series: Separating copy from dtype logic on final steps. Clean co…
aureliobarbosa Mar 13, 2024
d7007c0
REF Series: Separating contexts. Who needs the SingleBlockManager and…
aureliobarbosa Mar 13, 2024
5d40174
REF Series: organizing ideas for decoupling index and copy. While cre…
aureliobarbosa Mar 13, 2024
dccda08
REF Series - Constructor for Series and Manager: Making parameter dee…
aureliobarbosa Mar 13, 2024
ac8ea67
REF Series: Refrasing Comments
aureliobarbosa Mar 13, 2024
eb11e89
REF Series: Organizing TODOs for next steps.
aureliobarbosa Mar 13, 2024
5fcdefa
REF Series - TODO 1: Move error when data is MultiIndex to 'Series TA…
aureliobarbosa Mar 13, 2024
28ca344
REF Series - TODO 3: Move Manager AssertionError to Series TASK 1
aureliobarbosa Mar 14, 2024
ae97b19
REF Series TODO 3: Joined with another TODO and detailed the tasks.
aureliobarbosa Mar 14, 2024
ac74810
REF Series - TODO 2. Organizing two blocks of code with Manager (on …
aureliobarbosa Mar 14, 2024
b5a372c
REF Series - TODO 2: Organizing two blocks of code with Manager (on T…
aureliobarbosa Mar 14, 2024
29eb542
REF Series - TODO 2. Move code on TASK 2 to proper places. Step 2.2 D…
aureliobarbosa Mar 14, 2024
cf5070f
REF Series - TODO 2. Decouple warnings / data manipulation. Steps 2.3…
aureliobarbosa Mar 14, 2024
2edf3ca
REF Series - TODO 2. Decouple warnings / data manipulation. Steps 2.3…
aureliobarbosa Mar 15, 2024
83ef80f
REF Series - TODO 2. Decouple warnings / data manipulation. Step 2.5.…
aureliobarbosa Mar 15, 2024
7ea6b8e
REF Series - TODO 2. Decouple warnings / data manipulation. Organizin…
aureliobarbosa Mar 15, 2024
8dd002d
REF Series - TODO 2. Decouple warnings / data manipulation. Step 2.5.…
aureliobarbosa Mar 15, 2024
ff9b4ce
REF Series - TODO 2. Decouple warnings / data manipulation. Reorganiz…
aureliobarbosa Mar 15, 2024
b6d4636
REF Series - TODO 6. DECOUPLE MANAGER PREPARATION FROM COPYING. Step …
aureliobarbosa Mar 15, 2024
cf12b42
REF Series - TODO 4: Move Warning to Series TASK 7.
aureliobarbosa Mar 15, 2024
4b7215c
REF Series - TODO 10. Move copy code for ExtendedArrays and NDArrays…
aureliobarbosa Mar 15, 2024
6b9a792
REF Series - TODO 10. Move copy code for ExtendedArrays and NDArrays…
aureliobarbosa Mar 15, 2024
18f70de
REF Series - TODO 7 - dtype Series with arguments equivalent to empty…
aureliobarbosa Mar 15, 2024
0be3e2d
REF Series: Reorganizing TODOs
aureliobarbosa Mar 15, 2024
07091c8
REF Series - TODO 10. DECOUPLE MANAGER PREPARATION FROM COPYING. Ste…
aureliobarbosa Mar 16, 2024
a80760f
REF Series - TODO 10. DECOUPLE MANAGER PREPARATION FROM COPYING. Ste…
aureliobarbosa Mar 16, 2024
d351538
REF Series - TODO 18. Done
aureliobarbosa Mar 16, 2024
bc14bbb
REF Series - TODO 6. DECOUPLE MANAGER PREPARATION FROM COPYING. Worki…
aureliobarbosa Mar 16, 2024
1dfa4b1
REF Series - TODO 8: Decouple single element from the other data.
aureliobarbosa Mar 16, 2024
8a7f2b6
REF Series - TODO 6. DECOUPLE MANAGER PREPARATION FROM COPYING. Step …
aureliobarbosa Mar 16, 2024
8101dd2
Merge remote-tracking branch 'upstream/main' into refactor_series_con…
aureliobarbosa Mar 17, 2024
da2bdac
Revert "REF Series: add tests to ensure that series dict constructor …
aureliobarbosa Mar 17, 2024
43d5592
REF Series - TODO 6. DECOUPLE MANAGER PREPARATION FROM COPYING. Step …
aureliobarbosa Mar 17, 2024
9877d2f
REF Series - TODO 6. DECOUPLE MANAGER PREPARATION FROM COPYING. Every…
aureliobarbosa Mar 17, 2024
eea989d
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 17, 2024
9692594
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 17, 2024
d230651
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 17, 2024
2c3b415
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 17, 2024
092a918
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 17, 2024
5e6f9b0
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 17, 2024
08c27eb
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 18, 2024
a58ab1b
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 18, 2024
61f7908
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 18, 2024
316c600
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 18, 2024
023f4db
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 18, 2024
0b8eeea
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 18, 2024
37cfa65
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 18, 2024
82270db
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 18, 2024
9c8628d
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 18, 2024
87bf240
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 18, 2024
434c1fc
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 18, 2024
5d24ed4
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 18, 2024
a8cedb0
REF Series - TODO 14: Try capture final data type that seems scalar. …
aureliobarbosa Mar 18, 2024
92c25da
REF Series - TODO 14: Refactor data transformation on edge cases.
aureliobarbosa Mar 18, 2024
0178074
EF Series - TODO 14: Refactor data transformation on edge cases. Inde…
aureliobarbosa Mar 18, 2024
8f7cf6b
REF Series: change variable name require_manager <-> has_manager
aureliobarbosa Mar 18, 2024
07c1664
REF Series - TODO 14: Refactor data transformation on edge cases. Sim…
aureliobarbosa Mar 18, 2024
d7a2798
REF Series - TODO 14: Refactor data transformation on edge cases. Sim…
aureliobarbosa Mar 18, 2024
7e48887
REF Series: saving memory on dict input
aureliobarbosa Mar 19, 2024
d698c57
REF Series: Unifying index treatment in a single place
aureliobarbosa Mar 19, 2024
c16a0e9
REF Series: removing dead code and if (True).
aureliobarbosa Mar 19, 2024
cc9154d
REF Series: Unifying index treatment in a single place. step 2
aureliobarbosa Mar 19, 2024
f8063d2
REF Series: Unifying index treatment in a single place. step 3
aureliobarbosa Mar 19, 2024
c606464
REF Series: Unifying index treatment in a single place. step 4
aureliobarbosa Mar 19, 2024
f88bbe4
REF Series: simplify if-else structure on data manipulation. Step 1
aureliobarbosa Mar 19, 2024
a21ad6d
REF Series: simplify if-else structure on data manipulation. Step 2
aureliobarbosa Mar 19, 2024
83518c5
EF Series: simplify if-else structure on data manipulation. Done
aureliobarbosa Mar 19, 2024
3dfcbb2
REF Series: Simplify identification of default_empty_series to change…
aureliobarbosa Mar 19, 2024
09fdd03
REF Series: Erasing comments, and DONEs and TODOs
aureliobarbosa Mar 20, 2024
daaaeb8
REF Series: group two identical cases on if-else logic of index const…
aureliobarbosa Mar 20, 2024
c68c73e
REF Series: simplifying comments
aureliobarbosa Mar 20, 2024
7aee896
Merge remote-tracking branch 'upstream/main' into refactor_series_con…
aureliobarbosa Mar 20, 2024
e312146
REF Series: simplifying detection/manipulation of scalar data
aureliobarbosa Mar 21, 2024
5304d75
REF Series: simplifying detection/manipulation of scalar data. Step 2
aureliobarbosa Mar 21, 2024
32dda5b
REF Series: simplifying logic for single element becoming a list
aureliobarbosa Mar 21, 2024
f3967ca
REF Series: incorporate changes from #57889
aureliobarbosa Mar 21, 2024
7e01783
Merge remote-tracking branch 'upstream/main' into refactor_series_con…
aureliobarbosa Mar 21, 2024
e5e4426
Merge remote-tracking branch 'upstream/main' into refactor_series_con…
aureliobarbosa Mar 21, 2024
84e6293
Merge remote-tracking branch 'upstream/main' into refactor_series_con…
aureliobarbosa Mar 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
320 changes: 148 additions & 172 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
Iterable,
Mapping,
Sequence,
Sized,
)
import operator
import sys
Expand Down Expand Up @@ -363,212 +364,187 @@ def __init__(
copy: bool | None = None,
) -> None:
allow_mgr = False
if (
isinstance(data, SingleBlockManager)
and index is None
and dtype is None
and (copy is False or copy is None)
):
if not allow_mgr:
# GH#52419
warnings.warn(
f"Passing a {type(data).__name__} to {type(self).__name__} "
"is deprecated and will raise in a future version. "
"Use public APIs instead.",
DeprecationWarning,
stacklevel=2,
)
data = data.copy(deep=False)
# GH#33357 called with just the SingleBlockManager
NDFrame.__init__(self, data)
self.name = name
return
deep = True # deep copy

is_pandas_object = isinstance(data, (Series, Index, ExtensionArray))
data_dtype = getattr(data, "dtype", None)
original_dtype = dtype
# Series TASK 1: VALIDATE BASIC TYPES.
if dtype is not None:
dtype = self._validate_dtype(dtype)

if isinstance(data, (ExtensionArray, np.ndarray)):
if copy is not False:
if dtype is None or astype_is_view(data.dtype, pandas_dtype(dtype)):
data = data.copy()
if copy is None:
copy = False
copy_arrays = copy is True or copy is None # Arrays and ExtendedArrays
copy = copy is True # Series and Manager

if isinstance(data, SingleBlockManager) and not copy:
data = data.copy(deep=False)
# Series TASK 2: RAISE ERRORS ON KNOWN UNACEPPTED CASES.
if isinstance(data, MultiIndex):
raise NotImplementedError(
"initializing a Series from a MultiIndex is not supported"
)

if not allow_mgr:
warnings.warn(
f"Passing a {type(data).__name__} to {type(self).__name__} "
"is deprecated and will raise in a future version. "
"Use public APIs instead.",
DeprecationWarning,
stacklevel=2,
if isinstance(data, SingleBlockManager):
if not (data.index.equals(index) or index is None) or copy:
# GH #19275 SingleBlockManager input should only be called internally
raise AssertionError(
"Cannot pass both SingleBlockManager "
"`data` argument and a different "
"`index` argument. `copy` must be False."
)

if isinstance(data, np.ndarray):
if len(data.dtype):
# GH#13296 we are dealing with a compound dtype,
# which should be treated as 2D.
raise ValueError(
"Cannot construct a Series from an ndarray with "
"compound dtype. Use DataFrame instead."
)

# Series TASK 3: CAPTURE INPUT SIGNATURE.
is_array = isinstance(data, (np.ndarray, ExtensionArray))
is_pandas_object = isinstance(data, (Series, Index, ExtensionArray))
original_dtype = dtype
original_data_type = type(data)
original_data_dtype = getattr(data, "dtype", None)
refs = None
name = ibase.maybe_extract_name(name, data, type(self))
na_value = na_value_for_dtype(pandas_dtype(dtype), compat=False)

# Series TASK 4: DATA TRANSFORMATIONS.
if isinstance(data, Mapping):
# if is_dict_like(data) and not is_pandas_object and data is not None:
# Dict is SPECIAL case, since it's data has data values and index keys.

# Looking for NaN in dict doesn't work ({np.nan : 1}[float('nan')]
# raises KeyError). Send it to Series for "standard" construction:

# index = tuple(data.keys()) consumes more memory (up to 25%).
if data:
data = Series(
data=list(data.values()),
index=data.keys(),
dtype=dtype,
)
else:
data = None

if index is not None:
index = ensure_index(index)
if is_list_like(data) and not isinstance(data, Sized):
data = list(data)

if dtype is not None:
dtype = self._validate_dtype(dtype)
if (
(is_scalar(data) or not isinstance(data, Sized))
and index is None
and data is not None
):
data = [data]

if data is None:
index = index if index is not None else default_index(0)
if len(index) or dtype is not None:
data = na_value_for_dtype(pandas_dtype(dtype), compat=False)
# Series TASK 5: TRANSFORMATION ON INDEX. There is always an index after this.
original_index = index
if index is None:
if data is None:
index = default_index(0)
else:
data = []
if isinstance(data, (SingleBlockManager, Series)):
index = data.index
else:
index = default_index(len(data))
else:
index = ensure_index(index)

if isinstance(data, MultiIndex):
raise NotImplementedError(
"initializing a Series from a MultiIndex is not supported"
)
# Series TASK 6: TRANSFORMATIONS ON DATA.
# REQUIREMENTS FOR COPYING AND MANAGER CREATION (WHEN NEEDED).
list_like_input = False
require_manager = True
fast_path_manager = False
if data is None and len(index):
data = na_value

refs = None
if isinstance(data, Index):
elif isinstance(data, Series):
require_manager = False
copy = True if original_index is None else False
deep = not copy

if original_index is not None:
data = data.reindex(index) # copy
index = data.index

data = data._mgr

elif isinstance(data, SingleBlockManager):
require_manager = False
fast_path_manager = original_index is None and not copy and dtype is None

elif isinstance(data, Index):
if dtype is not None:
data = data.astype(dtype)

refs = data._references
data = data._values
copy = False

elif isinstance(data, np.ndarray):
if len(data.dtype):
# GH#13296 we are dealing with a compound dtype, which
# should be treated as 2D
raise ValueError(
"Cannot construct a Series from an ndarray with "
"compound dtype. Use DataFrame instead."
)
elif isinstance(data, Series):
if index is None:
index = data.index
data = data._mgr.copy(deep=False)
else:
data = data.reindex(index)
copy = False
data = data._mgr
elif isinstance(data, Mapping):
data, index = self._init_dict(data, index, dtype)
dtype = None
copy = False
elif isinstance(data, SingleBlockManager):
if index is None:
index = data.index
elif not data.index.equals(index) or copy:
# GH#19275 SingleBlockManager input should only be called
# internally
raise AssertionError(
"Cannot pass both SingleBlockManager "
"`data` argument and a different "
"`index` argument. `copy` must be False."
)
elif is_array:
pass

if not allow_mgr:
warnings.warn(
f"Passing a {type(data).__name__} to {type(self).__name__} "
"is deprecated and will raise in a future version. "
"Use public APIs instead.",
DeprecationWarning,
stacklevel=2,
)
allow_mgr = True
elif is_list_like(data):
list_like_input = True

elif isinstance(data, ExtensionArray):
pass
else:
data = com.maybe_iterable_to_list(data)
if is_list_like(data) and not len(data) and dtype is None:
# GH 29405: Pre-2.0, this defaulted to float.
# Series TASK 7: COPYING THE MANAGER.
if require_manager:
# GH 29405: Pre-2.0, this defaulted to float.
default_empty_series = list_like_input and not len(data) and dtype is None
if default_empty_series:
dtype = np.dtype(object)

if index is None:
if not is_list_like(data):
data = [data]
index = default_index(len(data))
elif is_list_like(data):
com.require_length_match(data, index)
# Final requirements
if is_list_like(data):
com.require_length_match(data, index)

if is_array and copy_arrays:
if copy_arrays:
if dtype is None or astype_is_view(data.dtype, pandas_dtype(dtype)):
data = data.copy() # not np.ndarray.copy(deep=...)

# create/copy the manager
if isinstance(data, SingleBlockManager):
if dtype is not None:
data = data.astype(dtype=dtype, errors="ignore")
elif copy:
data = data.copy()
else:
data = sanitize_array(data, index, dtype, copy)
data = SingleBlockManager.from_array(data, index, refs=refs)

else:
deep = deep if not fast_path_manager else False
if dtype is not None:
data = data.astype(dtype=dtype, errors="ignore") # Copy the manager
copy = False

if copy or fast_path_manager:
data = data.copy(deep)

# Series TASK 8: CREATE THE DATAFRAME
NDFrame.__init__(self, data)
self.name = name
self._set_axis(0, index)
if not fast_path_manager:
self._set_axis(0, index)

# Series TASK 9: RAISE WARNINGS
if (
original_dtype is None
and is_pandas_object
and original_data_dtype == np.object_
and self.dtype != original_data_dtype
):
warnings.warn(
"Dtype inference on a pandas object "
"(Series, Index, ExtensionArray) is deprecated. The Series "
"constructor will keep the original dtype in the future. "
"Call `infer_objects` on the result to get the old behavior.",
FutureWarning,
stacklevel=find_stack_level(),
)

if original_dtype is None and is_pandas_object and data_dtype == np.object_:
if self.dtype != data_dtype:
if original_data_type is SingleBlockManager:
if not allow_mgr:
warnings.warn(
"Dtype inference on a pandas object "
"(Series, Index, ExtensionArray) is deprecated. The Series "
"constructor will keep the original dtype in the future. "
"Call `infer_objects` on the result to get the old behavior.",
FutureWarning,
stacklevel=find_stack_level(),
f"Passing a {type(data).__name__} to {type(self).__name__} "
"is deprecated and will raise in a future version. "
"Use public APIs instead.",
DeprecationWarning,
stacklevel=2,
)

def _init_dict(
self, data: Mapping, index: Index | None = None, dtype: DtypeObj | None = None
):
"""
Derive the "_mgr" and "index" attributes of a new Series from a
dictionary input.

Parameters
----------
data : dict or dict-like
Data used to populate the new Series.
index : Index or None, default None
Index for the new Series: if None, use dict keys.
dtype : np.dtype, ExtensionDtype, or None, default None
The dtype for the new Series: if None, infer from data.

Returns
-------
_data : BlockManager for the new Series
index : index for the new Series
"""
keys: Index | tuple

# Looking for NaN in dict doesn't work ({np.nan : 1}[float('nan')]
# raises KeyError), so we iterate the entire dict, and align
if data:
# GH:34717, issue was using zip to extract key and values from data.
# using generators in effects the performance.
# Below is the new way of extracting the keys and values

keys = tuple(data.keys())
values = list(data.values()) # Generating list of values- faster way
elif index is not None:
# fastpath for Series(data=None). Just use broadcasting a scalar
# instead of reindexing.
if len(index) or dtype is not None:
values = na_value_for_dtype(pandas_dtype(dtype), compat=False)
else:
values = []
keys = index
else:
keys, values = default_index(0), []

# Input is now list-like, so rely on "standard" construction:
s = Series(values, index=keys, dtype=dtype)

# Now we just make sure the order is respected, if any
if data and index is not None:
s = s.reindex(index)
return s._mgr, s.index

# ----------------------------------------------------------------------

@property
Expand Down
6 changes: 6 additions & 0 deletions pandas/tests/series/test_constructors.py
Original file line number Diff line number Diff line change
Expand Up @@ -1388,6 +1388,12 @@ def test_constructor_dict_nan_key(self, value):
)
tm.assert_series_equal(result, expected)

def test_dict_np_nan_equals_floatnan(self):
d = {np.nan: 1}
result = Series(d, index=[float("nan")])
expected = Series(d)
tm.assert_series_equal(result, expected)

def test_constructor_dict_datetime64_index(self):
# GH 9456

Expand Down
Loading