-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: Make Series._values match Index._values #31182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: Make Series._values match Index._values #31182
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
pandas/core/construction.py
Outdated
obj = obj.array | ||
arr = obj._values | ||
if not extract_numpy and isinstance(arr, np.ndarray): | ||
return obj.array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was also just thinking while looking above at the .array
implementation, that we could do the same here instead of going through the ".array -> wrap in PandasArray -> extract the numpy array again" route, that will further reduce some overhead of extract_array(..., extract_numpy=True)
.
Could also do a arr = PandasArray(arr)
here for being explicit (it's not that it duplicates a lot from .array
)
rebased, this now removes array_values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will need to update the Series._values
docstring I wrote.
Also, can you show similar timings as you showed on the other PR but now compared to latest master?
@@ -1249,6 +1258,9 @@ def unique(self): | |||
if hasattr(values, "unique"): | |||
|
|||
result = values.unique() | |||
if self.dtype.kind in ["m", "M"]: | |||
if getattr(self.dtype, "tz", None) is None: | |||
result = np.asarray(result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment here on why this is needed
Updated timings vs current master are a mixed bag:
|
Do we understand where the slowdown is coming from? From looking at the code, from the extra |
It looks like restoring array_values and Series.array does the trick. With those two changes, all of the timings become slightly better than master, with the exception of |
Restored array_values to avoid a perf hit; im ambivalent on if its worthwhile.
|
Ah, I suppose the use of |
It might be that now you added to use |
OK, updated one more time, this time just changing DatetimeBlock/TimedeltaBlock.internal_values to return We can worry about optimizing array/extract_array further separately, as that should really be orthogonal to this. |
@@ -515,8 +515,9 @@ def _values(self): | |||
----------- | ------------- | ------------- | ------------- | --------------- | | |||
Numeric | ndarray | ndarray | PandasArray | ndarray | | |||
Category | Categorical | Categorical | Categorical | ndarray[int] | | |||
dt64[ns] | ndarray[M8ns] | ndarray[M8ns] | DatetimeArray | ndarray[M8ns] | | |||
dt64[ns] | ndarray[M8ns] | DatetimeArray | DatetimeArray | ndarray[M8ns] | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit above (beginning of the docstring), the sentence "This are the values as stored in the Block" is no longer adequate I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentence I am quoting (the second line of the docstring) still needs to be updated
|
lgtm needs a rebase |
Thanks! |
Discussed in #31037