Skip to content

REF: use check_setitem_lengths in DTA.__setitem__ #36339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jbrockmendel
Copy link
Member

Turning this into a two-line check will make it easy to add to PandasArray.__setitem__ and Categorical.__setitem__, at which point we'll be able to move these methods up to NDArrayBackedExtensionArray.__setitem__

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Only a comment regarding the todo

"different length than the value"
)
else:
# TODO: dont we still need lengths to match?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes?
But can you do that here, since the original code (I think?) did it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the existing code does not check for length-match in the len==0 case. ill take a look at adding that check and getting rid of this comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It actually is checked right now, because of the len(key) != len(value) before checking elif not len(key): (but it's clearly not tested ..)

As example:

In [1]: arr = pd.date_range("2012", periods=3)._data                                                                                                                                                               

In [2]: arr                                                                                                                                                                                                        
Out[2]: 
<DatetimeArray>
['2012-01-01 00:00:00', '2012-01-02 00:00:00', '2012-01-03 00:00:00']
Length: 3, dtype: datetime64[ns]

In [3]: arr[[]] = [pd.Timestamp("2012")]                                                                                                                                                                           
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-d4b75e105bf5> in <module>
----> 1 arr[[]] = [pd.Timestamp("2012")]

~/scipy/pandas/pandas/core/arrays/datetimelike.py in __setitem__(self, key, value)
    620                         f"'{len(value)}'."
    621                     )
--> 622                     raise ValueError(msg)
    623                 elif not len(key):
    624                     return

ValueError: shape mismatch: value array of length '0' does not match indexing result of length '1'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch; i was comparing against the existing check_setitem_lengths

@jreback jreback added the Indexing Related to indexing on series/frames, not to indexes themselves label Sep 13, 2020
if len(value) != length_of_indexer(indexer, values):
raise ValueError(
"cannot set using a slice indexer with a "
"different length than the value"
)
if len(value) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use not len(value) (and same as above)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I find if len(value) == 0: more readable / clearer about the intent ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not idiomatic python (though i tend to agree), either way as long as we are consistent, and unfortunately this is not consistent with the rest of the code base

@jorisvandenbossche
Copy link
Member

@jbrockmendel can you add a test for the setitem case with zero-length indexer and non-zero length value (as it is clearly untested ..)

@jbrockmendel
Copy link
Member Author

updated per suggestions

@jorisvandenbossche
Copy link
Member

Thanks!

@jorisvandenbossche jorisvandenbossche merged commit 8df0218 into pandas-dev:master Sep 14, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.2 milestone Sep 14, 2020
@jbrockmendel jbrockmendel deleted the check-empty-setitem branch September 14, 2020 15:58
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants