Skip to content

WIP/PERF: block-wise ops for frame-with-series axis=1 #32997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 47 commits into from

Conversation

jbrockmendel
Copy link
Member

xref #32779 which does the same thing for frame-with-frame ops. This one is turning out to be the most difficult of the cases to get right.

Posting this for comments, in large part because there are still two failing tests locally, and these may just be behaviors we want to change.

# test_scalar_na_logical_ops_corners
s = Series([2, 3, 4, 5, 6, 7, 8, 9, datetime(2005, 1, 1)])
s[::2] = np.nan
d = DataFrame({"A": s})

with pytest.raises(TypeError):
    d.__and__(s, axis="columns")   # <-- when done block-wise, this does not raise

The other one is harder to post an example of because it uses fixtures making it a PITA to copy/paste from the test, but the upshot is that op(frame[mixed-float-dtypes], series[float64]) downcasts to the original dtypes ATM (since each column is effectively operating against a scalar) but when done blockwise they all come back float64.

jbrockmendel and others added 30 commits March 17, 2020 18:22
Co-authored-by: MomIsBestFriend <>
…2737)

* Generate exception from the C code in the proper manner

Get rid of all error printf's and produce proper Python exceptions

* Declare some more exceptions from C code

* Remove special case error message for c parser

* Add whatsnew entry

* Fix missing semicolons

* Add regression test

* Remove special case handling for Windows

PyErr_SetFromErrnoWithFilename works for Unix and Windows

* Remove call to GetLastError(), when using 0, the python error code handles this

* black fixes

* Fix indentation of assert statement (also in previous test, same error)

* Skip the test on windows

* Fix black issue

* Let new_mmap fail without exception to allow fallback

* Do not create a python error in new_mmap to allow the fallback to work silently

* Remove the NULL pointer check for new_rd_source now that it will raise an exception

* Update doc/source/whatsnew/v1.1.0.rst

Co-Authored-By: gfyoung <[email protected]>

Co-authored-by: Jeff Reback <[email protected]>
Co-authored-by: gfyoung <[email protected]>
…dev#32687)

* TST: Parametrize in pandas/tests/internals/test_internals.py

* Addressed lint issues

* Addressing lint issues

Co-authored-by: MomIsBestFriend <>
Comment on lines +9 to +10
def tile(arr: ArrayLike, shape) -> ArrayLike:
raise NotImplementedError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this leftover debug stuff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more or less, will remove

else:
values = values.reshape(-1, 1)

btvalues = np.broadcast_to(values, shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the result btvalues being returned to a user? broadcast_to is typically (always?) a readonly view, and I don't think we want to be returning readonly results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way its used in this PR, it is not returned to a user, but used as an intermediate object in the arithmetic op

@@ -455,6 +455,12 @@ def ravel(self, *args, **kwargs):
data = self._data.ravel(*args, **kwargs)
return type(self)(data, dtype=self.dtype)

@property
def T(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this used? In an only datetimelike context?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

if isinstance(arr, np.ndarray):
result = btvalues
else:
result = type(arr)._from_factorized(btvalues, arr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the result of that other discussion not to use _from_factorized outside of pd.factorize? Or was that just _values_for_factorize?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That has more or less stalled. joris made a good point that e.g. fletcher doesnt use _values_for_factorize, so ive come around to the opinion that we shouldn't have _values_for_factorize/_from_factorized at all, since they are each only used once in EA.factorize.

Then the issue becomes whether we can use _values_for_argsort, and if we have a constructor that can round-trip those.

Comment on lines +354 to +356
def __getitem__(self, key):
# PeriodArray.__getitem__ returns PeriodArray for 2D lookups,
# so we need to issue deprecation warning and cast here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning to split this into it's own PR? Don't want to have it held up by review here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hadnt planned on it, but might as well. this PR is back-burner until at least the frame-with-frame case is done

@jbrockmendel jbrockmendel added Performance Memory or execution speed performance Numeric Operations Arithmetic, Comparison, and Logical operations labels Apr 1, 2020
@jbrockmendel
Copy link
Member Author

mothballing to clear the queue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.