-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Add use_nullable_dtypes for read_html #50286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/tests/io/test_html.py
Outdated
@@ -132,6 +138,64 @@ def test_to_html_compat(self): | |||
res = self.read_html(out, attrs={"class": "dataframe"}, index_col=0)[0] | |||
tm.assert_frame_equal(res, df) | |||
|
|||
@pytest.mark.parametrize("nullable_backend", ["pandas", "pyarrow"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.parametrize("nullable_backend", ["pandas", "pyarrow"]) | |
@pytest.mark.parametrize("dtype_backend", ["pandas", "pyarrow"]) |
pandas/tests/io/test_html.py
Outdated
|
||
out = df.to_html(index=False) | ||
with pd.option_context("mode.string_storage", storage): | ||
with pd.option_context("mode.nullable_backend", nullable_backend): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with pd.option_context("mode.nullable_backend", nullable_backend): | |
with pd.option_context("mode.dtype_backend", nullable_backend): |
use_nullable_dtypes : bool = False | ||
Whether to use nullable dtypes as default when reading data. If | ||
set to True, nullable dtypes are used for all dtypes that have a nullable | ||
implementation, even if no nulls are present. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the additional paragraph of mode.dtype_backend
being available that other docstrings have? (Should start with The nullable dtype implementation
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, added
Thanks @phofl |
Hi Folks, I'm not sure if this is the right venue for comments on patches after the fact, but just updated my codebase from pandas 1.5.3 to the current version (at time of this post it is 2.2), and noticed that at 2.0, there was a change to the nullable string values added to na_values: https://pandas.pydata.org/docs/whatsnew/v2.0.0.html#:~:text=Added%20%22None%22%20to%20default%20na_values%20in%20read_csv()%20(GH%2050286 Changing "None" to NaN ended up introducing a breaking change to my script, where it still ran without runtime errors, but processed the data differently causing errors in the output dataset. I had a csv file with "None" intentionally present in some columns in order to show the word on a dashboard. The issue didn't actually present until that null value showed up in an np.where() where the condition checked to see if it was "None". The observation then followed an undesired logic path. I addressed this by copying the default na_values list from pandas 1.5.3 and overriding the one in pandas 2.2 (as I'd noticed a number of new values showed up in the default list in addition to "None"). I'm not sure I can recommend a better way to introduce a change like this, or a way to better communicate this to users, and the change was mentioned pretty far down the release notes.. You probably don't want to put FutureWarnings in read_csv() for everyone who uses it as it'd get pretty annoying. At any rate, I wanted to make a note of this, as adding/removing values from the default na_values list might introduce a "soft" breaking change when moving to new pandas versions. Cheers, |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.