Skip to content

PDEP6 implementation pt 1: block.setitem, block.putmask #50626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 106 commits into from
Jul 27, 2023

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Jan 7, 2023

This is pt 1. More changes would be needed to fully complete PDEP6 (eablock.setitem and eablock.putmask for a start), but I'd suggest that we do this in stages. This PR is already pretty big

some of this can be factored out, I'll do that first

pt2: #53405

@@ -206,7 +206,20 @@ def test_convert_dtypes(
# Test that it is a copy
copy = series.copy(deep=True)

result[result.notna()] = np.nan
if result.notna().sum() > 0 and result.dtype in [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesnt have to be for this PR, but long-term should we also warn/raise in the no-op case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as in, warn if

ser[empty_mask] = incompatible_value

?

I wouldn't think that would be necessary, but we can discuss it

@jbrockmendel
Copy link
Member

9/21 files viewed! Getting there...

# note: commented-out in the EA case too
# FIXME: don't leave commented-out
# with tm.assert_produces_warning(FutureWarning, match=msg):
# ser[:] = invalid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesnt match the commented-out code in the masked tests. any chance it can be un-commented?

or is_complex_dtype(df.dtypes.iloc[0])
or is_object_dtype(df.dtypes.iloc[0])
or is_datetime64_dtype(df.dtypes.iloc[0]) # TODO this one should warn
or is_timedelta64_dtype(df.dtypes.iloc[0]) # TODO this one should warn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these TODOs things you want to do for this PR or follow-ups?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one addresses them: #53405

@@ -1851,3 +1870,54 @@ def test_setitem_dict_and_set_disallowed_multiindex(self, key):
)
with pytest.raises(TypeError, match="as an indexer is not supported"):
df.loc[key] = 1


class TestSetitemValidation:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a viable way to share this with the analogous copy/pasted Series tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, but I can't think of anything that wouldn't overcomplicate it - any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not worth worrying about

@@ -189,6 +189,8 @@ def _check_set(df, cond, check_dtypes=True):
with pytest.raises(TypeError, match=msg):
df > 0
return
if df is mixed_int_frame:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be reached?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup: pandas/tests/frame/indexing/test_where.py::TestDataFrameIndexingWhere::test_where_set[mixed_int]

# GH#38303
ser = Series([0, 0])
ser.loc[0] = Series([42], index=[index])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelatedly, this looks sketchy since the RHS is a sequence and the LHS is a scalar

@jbrockmendel
Copy link
Member

small comments, otherwise LGTM

f"Setting an item of incompatible dtype is deprecated "
"and will raise in a future error of pandas. "
f"Value '{other}' has dtype incompatible with {self.dtype}, "
"please explicitly cast to a compatible dtype first.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking aloud. It would be difficult to suggest which dtype to cast? I guess object would be safest though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in practice i think it will almost always be x->object or int->float

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe - but I'd suggest doing this separately if it's OK, this has already been open since January 😄 reckon we can get this in for 2.1? It should be pretty quick to rebase #53405, can do that right after this one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this would be fine as a follow up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me

@mroeschke
Copy link
Member

Looks pretty good to me too.

Should this have a whatsnew note for 2.1? And is it okay if not all of PDEP6 changes don't make it into 2.1?

@MarcoGorelli
Copy link
Member Author

Thanks!

I think ideally this and #53405 should go together, either both in 2.1 or both in 2.2

Whatsnew - #54014 introduces a section for notable deprecations, shall I add it to that once that's merged?
Or I can add it as part of #53405

@mroeschke
Copy link
Member

Probably best if the whatsnew is added in #53405 then. Is that PR pretty close?

@MarcoGorelli
Copy link
Member Author

yeah it should be ready, I just need to rebase it after this one - but it doesn't look like there's too many conflicts to resolve fortunately

@mroeschke mroeschke added this to the 2.1 milestone Jul 27, 2023
@mroeschke mroeschke merged commit b8e14ec into pandas-dev:main Jul 27, 2023
@mroeschke
Copy link
Member

Thanks @MarcoGorelli

@jbrockmendel
Copy link
Member

nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants