-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Expand index types in Series.struct.field method #56065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @tswast |
This expands the set of types allowed by Series.struct.field to allow those allowed by pyarrow. Closes pandas-dev#56065
|
Thanks!
This is the method I was pursuing but it became a little more complex than I wanted to get into at the time. You make a good point that you can't actually recover the original names anyway since intermediate steps might actually contain |
This expands the set of types allowed by Series.struct.field to allow those allowed by pyarrow. Closes pandas-dev#56065
This expands the set of types allowed by Series.struct.field to allow those allowed by pyarrow. Closes pandas-dev#56065
This expands the set of types allowed by Series.struct.field to allow those allowed by pyarrow. Closes pandas-dev#56065
* [ENH]: Expand types allowed in Series.struct.field This expands the set of types allowed by Series.struct.field to allow those allowed by pyarrow. Closes #56065 * fixed docstring order * Skip on older pyarrow * skip if no pyarrow.compute * flip skip * fixups * fixed versions * doc * fixup * fixup * fixup * fixup
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
As mentioned in #54977 (comment),
pc.struct_field
takes more types than we allow inSeries.struct.field
: https://arrow.apache.org/docs/python/generated/pyarrow.compute.struct_field.htmlFeature Description
Allow more types in Series.struct.field, including lists to supported nested selection.
Alternative Solutions
Repeated
s.struct.field(0).struct.field(1)
gives the same answer, but is slower since each intermediate object needs to be created.Otherwise, not possible without using pyarrow directly.
Additional Context
I've started on a branch implementing this.
The one bit that needs discussion is how to handle the
name
when doing nested selection. For now, I think we should just return the name of the last field.You might consider something like
".".join([names])
where names is the list of intermediate fields, so my earlier example would have a name likeversion.minor
to indicate that you sliced multiple levels. However, it'd be hard to use that reliably since any intermediate name might have a.
in it, so you can be sure thatresult.name.split(".")
gives the right fields from the original array.The text was updated successfully, but these errors were encountered: