Skip to content

tocsv_interval _categories #46562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

tocsv_interval _categories #46562

wants to merge 6 commits into from

Conversation

Kyrpel
Copy link

@Kyrpel Kyrpel commented Mar 29, 2022

@rhshadrach rhshadrach added IO CSV read_csv, to_csv Bug Categorical Categorical Data Type labels Apr 22, 2022
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Comment on lines +2284 to +2289
elif isinstance(values, (PeriodArray, IntervalArray)):
# GH46297
values = np.array(values, dtype="object")
mask = isna(values)
values[mask] = na_rep
return values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similar to the EA block below (L2321); can they be combined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kyrpel can you do this one

@@ -110,6 +111,9 @@ def take_nd(
return arr.take(
indexer, fill_value=fill_value, allow_fill=allow_fill, axis=axis
)
if arr.dtype.kind in "O" and is_interval_dtype(arr):
# # GH46297 Interval
return arr.take(indexer, allow_fill=allow_fill)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fill_value here comes from the na_rep argument of to_csv; is ignoring it the right behavior? Does that mean whatever the user specifies will be ignored?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah, this seems like a weird place to special-case

@simonjayhawkins simonjayhawkins added the Regression Functionality that used to work in a prior pandas version label May 16, 2022
@simonjayhawkins simonjayhawkins added this to the 1.4.3 milestone May 16, 2022
@simonjayhawkins
Copy link
Member

Thanks @Kyrpel for the PR. does this also fix the related issue #46812?

can you add a release note doc/source/whatsnew/v1.4.3.rst

Comment on lines +2284 to +2289
elif isinstance(values, (PeriodArray, IntervalArray)):
# GH46297
values = np.array(values, dtype="object")
mask = isna(values)
values[mask] = na_rep
return values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kyrpel can you do this one

@@ -110,6 +111,9 @@ def take_nd(
return arr.take(
indexer, fill_value=fill_value, allow_fill=allow_fill, axis=axis
)
if arr.dtype.kind in "O" and is_interval_dtype(arr):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elif

@simonjayhawkins
Copy link
Member

Thanks @Kyrpel for the PR.

This fix is currently adding some more code where the regression was caused by a code addition in #44930.

if isinstance(values, Categorical):
# GH#40754 Convert categorical datetimes to datetime array
values = algos.take_nd(
values.categories._values,
ensure_platform_int(values._codes),
fill_value=na_rep,
)

whereas before #44930 the conversion was done in ...

elif isinstance(values, ExtensionArray):
mask = isna(values)
new_values = np.asarray(values.astype(object))
new_values[mask] = na_rep
return new_values

Now removing the code addition in #44930 also fixes this issue.

Now since the astype(object) for categorical containing Timestamps was also fixed in #18024, i'm not sure why the special casing for Categorical was added to to_native_types.

Going back to this issue. repr already still works and according to some code comments, to_csv/repr are supposed to be sharing the same to_native_types code.

we also have the related issue #46812, and I may need to update this observation once I have looked at that one. I suspect that both issues should probably be fixed togther.

@simonjayhawkins
Copy link
Member

Now since the astype(object) for categorical containing Timestamps was also fixed in #18024, i'm not sure why the special casing for Categorical was added to to_native_types.

should have been containing Timestamps was also fixed in #18024 -> containing Timestamps #18024 was also fixed in #44930

cc @jbrockmendel @phofl (participants in #44930)

@simonjayhawkins simonjayhawkins mentioned this pull request Jun 8, 2022
@simonjayhawkins
Copy link
Member

Thanks @Kyrpel for the PR. closing as superseded by #47347.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type IO CSV read_csv, to_csv Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Error writing DataFrame with categorical type column and interval data to a CSV file
5 participants