-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
WIP/PERF: block-wise ops for frame-with-series axis=1 #32997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: MomIsBestFriend <>
Co-authored-by: MomIsBestFriend <>
…2737) * Generate exception from the C code in the proper manner Get rid of all error printf's and produce proper Python exceptions * Declare some more exceptions from C code * Remove special case error message for c parser * Add whatsnew entry * Fix missing semicolons * Add regression test * Remove special case handling for Windows PyErr_SetFromErrnoWithFilename works for Unix and Windows * Remove call to GetLastError(), when using 0, the python error code handles this * black fixes * Fix indentation of assert statement (also in previous test, same error) * Skip the test on windows * Fix black issue * Let new_mmap fail without exception to allow fallback * Do not create a python error in new_mmap to allow the fallback to work silently * Remove the NULL pointer check for new_rd_source now that it will raise an exception * Update doc/source/whatsnew/v1.1.0.rst Co-Authored-By: gfyoung <[email protected]> Co-authored-by: Jeff Reback <[email protected]> Co-authored-by: gfyoung <[email protected]>
…dev#32687) * TST: Parametrize in pandas/tests/internals/test_internals.py * Addressed lint issues * Addressing lint issues Co-authored-by: MomIsBestFriend <>
…acter in the column name (pandas-dev#32701)
def tile(arr: ArrayLike, shape) -> ArrayLike: | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this leftover debug stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more or less, will remove
else: | ||
values = values.reshape(-1, 1) | ||
|
||
btvalues = np.broadcast_to(values, shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the result btvalues
being returned to a user? broadcast_to
is typically (always?) a readonly view, and I don't think we want to be returning readonly results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way its used in this PR, it is not returned to a user, but used as an intermediate object in the arithmetic op
@@ -455,6 +455,12 @@ def ravel(self, *args, **kwargs): | |||
data = self._data.ravel(*args, **kwargs) | |||
return type(self)(data, dtype=self.dtype) | |||
|
|||
@property | |||
def T(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used? In an only datetimelike context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
if isinstance(arr, np.ndarray): | ||
result = btvalues | ||
else: | ||
result = type(arr)._from_factorized(btvalues, arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the result of that other discussion not to use _from_factorized
outside of pd.factorize
? Or was that just _values_for_factorize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That has more or less stalled. joris made a good point that e.g. fletcher doesnt use _values_for_factorize, so ive come around to the opinion that we shouldn't have _values_for_factorize/_from_factorized at all, since they are each only used once in EA.factorize.
Then the issue becomes whether we can use _values_for_argsort, and if we have a constructor that can round-trip those.
def __getitem__(self, key): | ||
# PeriodArray.__getitem__ returns PeriodArray for 2D lookups, | ||
# so we need to issue deprecation warning and cast here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning to split this into it's own PR? Don't want to have it held up by review here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hadnt planned on it, but might as well. this PR is back-burner until at least the frame-with-frame case is done
mothballing to clear the queue |
xref #32779 which does the same thing for frame-with-frame ops. This one is turning out to be the most difficult of the cases to get right.
Posting this for comments, in large part because there are still two failing tests locally, and these may just be behaviors we want to change.
The other one is harder to post an example of because it uses fixtures making it a PITA to copy/paste from the test, but the upshot is that
op(frame[mixed-float-dtypes], series[float64])
downcasts to the original dtypes ATM (since each column is effectively operating against a scalar) but when done blockwise they all come back float64.