-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: allow opt-in to inferring pyarrow strings #54430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
{"a": ["a", "b"]}, dtype="object", columns=Index(["a"], dtype=dtype) | ||
) | ||
with pd.option_context("future.infer_string", True): | ||
df = DataFrame({"a": ["a", "b"]}, dtype="object") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a case with null/nan/None in it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, needed a fix...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. Have we considered naming the option infer_pyarrow_string
instead of infer_string
?
y,2""" | ||
parser = all_parsers | ||
if parser.engine == "pyarrow": | ||
pytest.skip("TODO: Follow up") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we ever discuss pyarrow backed engines returning pyarrow types by default? I think from a user perspective it is less than ideal to have to specify both this option and the dtype backend for any pyarrow backed IO methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this discussion to Basel, but there is an issue about this: #51846
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you link the GH issue in this skip message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#54431 already addresses this
cc @jbrockmendel thoughts about the option name? |
I think future.infer_string is reasonable |
Co-authored-by: Matthew Roeschke <[email protected]>
import pyarrow as pa | ||
|
||
pa_dtype = pa.string() | ||
dtype = ArrowDtype(pa_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there any tests that hit this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use some documentation in the whatsnew otherwise looks good
I’ll add this With the general arrow announcement |
Thanks @phofl |
* ENH: allow opt-in to inferring pyarrow strings * Remove comments and add tests * Add json tests * Update * Update pandas/_libs/lib.pyx Co-authored-by: Matthew Roeschke <[email protected]> * Update * Add test --------- Co-authored-by: Brock <[email protected]> Co-authored-by: Matthew Roeschke <[email protected]>
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.This is a simple opt-in for now without warning by default. Can adjust this for 2.2
cc @jbrockmendel