Skip to content

Support multiplication of pd.ArrowDtype(pa.string()) and integral value where integral value is a series #56538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 19, 2023

Conversation

rohanjain101
Copy link
Contributor

@rohanjain101 rohanjain101 changed the title allow repeat count to be a series Support multiplication of pd.ArrowDtype(pa.string()) and integral value where integral value is a series Dec 17, 2023
@mroeschke mroeschke added Strings String extension data type and string data Arrow pyarrow functionality labels Dec 18, 2023
Comment on lines 734 to 735
pa_integral = pc.if_else(pc.less(integral, 0), 0, integral)
return pc.binary_repeat(binary, pa_integral)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you inline these where needed. I don't think a whole new function is needed for it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke, can do, but there's 2 callers of it, so if we remove the function, the logic for calculating the repeat count to pass to binary_repeat would need to be repeated in both callers. Additionally, I believe the helper function simplifies the code because there's 2 scenarios:

  1. binary is lhs and int is rhs
  2. binary is rhs and int is lhs

When we call binary_repeat, we need to always ensure the binary is the left arg and the integral is the right arg, imo the helper function simplifies the code and makes it a bit more readable. Let me know if you would still prefer logic of _evaluate_binary_repeat to be inlined, or if I misunderstood your suggestion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still prefer to inline this for now. We don't really use staticmethods in the codebase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, inlined.

@mroeschke mroeschke added this to the 2.2 milestone Dec 19, 2023
@mroeschke mroeschke merged commit 98e1d2f into pandas-dev:main Dec 19, 2023
@mroeschke
Copy link
Member

Thanks @rohanjain101

@rohanjain101 rohanjain101 deleted the arrow_str_repeat_series branch December 19, 2023 23:51
cbpygit pushed a commit to cbpygit/pandas that referenced this pull request Jan 2, 2024
…ue where integral value is a series (pandas-dev#56538)

* allow repeat count to be a series

* fix validation

* gh reference

* fix conditional logic

* Revert "fix conditional logic"

This reverts commit 15f1990.

* remove condition

* inline

---------

Co-authored-by: Rohan Jain <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Multiplication of pd.ArrowDtype(pa.string()) and int should support the integral type being a series
2 participants