Skip to content

BUG: List indexer on PeriodIndex doesn't coerce strings #30515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 3, 2020

Conversation

proost
Copy link
Contributor

@proost proost commented Dec 27, 2019

Description:
In single object indexer case,
datetime string converted to datetime-like object. Because indexing use ".xs" which is ".get_loc" that has type casting labels code.

But in list indexer,
there is no type check and casting code. So i add them.

this bug happen not only series but also dataframe And "datetimeindex" as well as "periodIndex"

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are specific conversions for index types
this is adding a large special case

pls find a more existing code friendly way

indexing code is already very complex
we need to reduce that not add to it

@gfyoung gfyoung added Bug Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type labels Dec 27, 2019
@proost proost force-pushed the enh-loc-datetime-list branch 6 times, most recently from 0c9ee8c to 2cbd26b Compare January 3, 2020 17:34
@pep8speaks
Copy link

pep8speaks commented Jan 3, 2020

Hello @proost! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-05-03 15:17:33 UTC

@proost proost force-pushed the enh-loc-datetime-list branch from 2cbd26b to 5e86611 Compare January 3, 2020 17:38
@proost proost force-pushed the enh-loc-datetime-list branch from a414a4f to fa51eb7 Compare January 14, 2020 14:44
@@ -1006,6 +1006,16 @@ def indexer_between_time(

return mask.nonzero()[0]

def _convert_arr_indexer(self, keyarr):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where exactly are these hit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these function call from "._convert_listlike_indexer" and get returned value from "common.asarray_tuple"

@@ -839,6 +839,16 @@ def memory_usage(self, deep=False):
result += self._int64index.memory_usage(deep=deep)
return result

def _convert_arr_indexer(self, keyarr):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would these not share the impl from the superclass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first, I Implemented on the base But,

would it be simpler to override DatetimeIndex._convert_arr_indexer and PeriodIndex._convert_arr_indexer?

So I changed. Okay I change again.

@jreback
Copy link
Contributor

jreback commented Jan 26, 2020

@jbrockmendel if you'd have a look

@proost proost force-pushed the enh-loc-datetime-list branch 3 times, most recently from 2805c8d to 8b9d712 Compare January 28, 2020 14:55
if isinstance(values, list) and (
is_datetime64_any_dtype(dtype) or is_period_dtype(dtype)
):
return array(values, dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does timedelta64 ever go through here? could just check needs_i8_conversion(dtype)

What about other EA dtypes? isnt this kind of "anything that we dont want to pass to np.asarray below"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel
Okay. I change code

proost added a commit to proost/pandas that referenced this pull request Feb 19, 2020
@proost proost force-pushed the enh-loc-datetime-list branch from 8b9d712 to 91f70fc Compare February 19, 2020 16:39
@@ -225,6 +225,10 @@ def asarray_tuplesafe(values, dtype=None):
if isinstance(values, list) and dtype in [np.object_, object]:
return construct_1d_object_array_from_listlike(values)

if isinstance(values, list) and hasattr(values, "__array__"):
# avoid converting extension array to numpy array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we're looking at some custom list subclass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel
"asarray_tuplesafe" and "_convert_arr_indexer" both return "array-like" object. before, "asarray_tuplesafe" and "_convert_arr_indexer" return numpy array type. But i think, both can return EA type also

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't you use use is_list_like?

@proost proost force-pushed the enh-loc-datetime-list branch from cb53b55 to fa76233 Compare April 1, 2020 07:52
proost added a commit to proost/pandas that referenced this pull request Apr 12, 2020
@proost proost force-pushed the enh-loc-datetime-list branch from fa76233 to 3ae8f48 Compare April 12, 2020 16:36
proost added a commit to proost/pandas that referenced this pull request Apr 14, 2020
@proost proost force-pushed the enh-loc-datetime-list branch from 3ae8f48 to b138f94 Compare April 14, 2020 03:26
proost added a commit to proost/pandas that referenced this pull request Apr 14, 2020
@proost proost force-pushed the enh-loc-datetime-list branch from b138f94 to dbb0cdd Compare April 14, 2020 05:02
@jreback jreback added this to the 1.1 milestone Apr 19, 2020
@jreback
Copy link
Contributor

jreback commented Apr 19, 2020

@jbrockmendel over to you if any comments.

if lib.infer_dtype(keyarr) == "string":
# Weak reasoning that indexer is a list of strings
# representing datetime or timedelta or period
extension_arr = pd_array(keyarr, self.dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have strings that cant be parsed to self.dtype this raises DateParseError (at least in the Period[D] case). should we catch and re-raise as KeyError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel
Okay, It makes sense. I add exception handling and test code

proost added a commit to proost/pandas that referenced this pull request Apr 20, 2020
@proost proost force-pushed the enh-loc-datetime-list branch from f9f246b to 3efb005 Compare April 20, 2020 11:09
proost added a commit to proost/pandas that referenced this pull request Apr 20, 2020
@proost proost force-pushed the enh-loc-datetime-list branch 2 times, most recently from ac86f36 to c143ddb Compare April 20, 2020 14:50
@jreback
Copy link
Contributor

jreback commented May 2, 2020

looks find. can you merge master and ping o green.

@proost proost force-pushed the enh-loc-datetime-list branch from 7e57054 to 9f2db9a Compare May 3, 2020 15:17
@proost
Copy link
Contributor Author

proost commented May 3, 2020

ping @jreback

@jreback jreback merged commit a3477c7 into pandas-dev:master May 3, 2020
@jreback
Copy link
Contributor

jreback commented May 3, 2020

thanks @proost for sticking with this

@proost proost deleted the enh-loc-datetime-list branch May 3, 2020 16:49
sthagen added a commit to sthagen/pandas-dev-pandas that referenced this pull request May 3, 2020
BUG: List indexer on PeriodIndex doesn't coerce strings (pandas-dev#30515) (#30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

List indexer on PeriodIndex doesn't coerce strings
5 participants