Skip to content

BUG: CategoricalIndex.format #35440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 3, 2020

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Jul 28, 2020

I've temporarily put the whatsnewentry in the v.1.1.0 release note, because there isn't a v.1.1.1 version yet. I'll move it, before this is merged.

@simonjayhawkins
Copy link
Member

Thanks @topper-123 for the PR.

The regression was caused by #35118. Categorical types other than object were also affected. maybe need to parameterise test with other values for cols

>>> pd.__version__
'1.2.0.dev0+10.g3b1d4f1ee'
>>>
>>> data = [[4, 2], [3, 2], [4, 3]]
>>> cols = [1, None]
>>> res = pd.DataFrame(data, columns=cols)
>>> print(res)
   1  NaN
0  4    2
1  3    2
2  4    3
>>>
>>> res = pd.DataFrame(data, columns=pd.CategoricalIndex(cols))
>>> print(res)
   1    NaN
0    4    2
1    3    2
2    4    3
>>>
>>> pd.__version__
'1.0.5'
>>>
>>> data = [[4, 2], [3, 2], [4, 3]]
>>> cols = [1, None]
>>> res = pd.DataFrame(data, columns=cols)
>>> print(res)
   1  NaN
0  4    2
1  3    2
2  4    3
>>>
>>> res = pd.DataFrame(data, columns=pd.CategoricalIndex(cols))
>>> print(res)
   1.0  NaN
0    4    2
1    3    2
2    4    3
>>>

I've temporarily put the whatsnewentry in the v.1.1.0 release note, because there isn't a v.1.1.1 version yet.

doc\source\whatsnew\v1.1.1.rst now merged to master

@simonjayhawkins simonjayhawkins added Categorical Categorical Data Type Output-Formatting __repr__ of pandas objects, to_string labels Jul 30, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.1 milestone Jul 30, 2020
@topper-123 topper-123 force-pushed the categorical_df_to_string branch from eb920f9 to cd8a9ee Compare August 2, 2020 12:47
@@ -197,9 +197,6 @@ def _format_data(self, name=None):
# we are formatting thru the attributes
return None

def _format_with_header(self, header, na_rep="NaN") -> List[str]:
return header + [pprint_thing(x) for x in self._range]

Copy link
Contributor Author

@topper-123 topper-123 Aug 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added tests revealed that this method in master made the output from RangeIndex.format different than for Int64Index.format:

>>> pd.RangeIndex(0, 18, 2).format()
['0', '2', '4', '6', '8', '10', '12', '14', '16']
>>> pd.Int64Index(range(0, 18, 2)).format()
 ['0 ', '2 ', '4 ', '6 ', '8 ', '10', '12', '14', '16']

Notice the extra space for one-digit scalars in the Int64Index case. The outputs from the two methods are identical after merging this PR.

@topper-123
Copy link
Contributor Author

Updated.

@jreback jreback merged commit cda8284 into pandas-dev:master Aug 3, 2020
@jreback
Copy link
Contributor

jreback commented Aug 3, 2020

thanks @topper-123 very nice!

@topper-123 topper-123 deleted the categorical_df_to_string branch August 4, 2020 06:07
@simonjayhawkins
Copy link
Member

@meeseeksdev backport to 1.1.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: output of df.to_string depends on whether columns is a CategoricalIndex or not
3 participants