Skip to content

ENH: Add NDFrame.format for easier conversion to string dtype #34941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Jun 22, 2020

This adds a format method to DataFrame and Series. This is useful for data transformation.

This method allows/makes it easier to do more complex conversion from arbitrary dtypes to string series, including combining several columns in a DataFrame to make the string series. For example we can now do this conversion quite easily:

>>> df = pd.DataFrame({
...     'state_name': ['California', 'Texas', 'Florida'],
...     'state_abbreviation': ['CA', 'TX', 'FL'],
...     'population': [39_512_223, 28_995_881, 21_477_737],
...     }, index=[1, 2, 3])
>>> df
   state_name state_abbreviation  population
1  California                 CA    39512223
2       Texas                 TX    28995881
3     Florida                 FL    21477737

>>>  df.format("{state_name:<10} ({state_abbreviation}): {population:,}")
1    California (CA): 39,512,223
2    Texas      (TX): 28,995,881
3    Florida    (FL): 21,477,737
dtype: string

I still need to update text.rst, but would like feedback on this first, as this is a bit different than discussed in #17211. In that issue we e.g. only discussed a format method for Series, while this also adds it for DataFrame. In #17211 I also aired the idea of allowing series methods in the format string. I think that is technically quite difficult, so is not part of this PR.

3 Population: 21,477,737
dtype: string

>>> df.format("{} ({}): {:,}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud but would it not be more useful to apply this formatting to the columns returning a like-shaped data frame instead of a series?

Copy link
Contributor Author

@topper-123 topper-123 Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that users would rarely want to convert the whole frame, but usually selected columns and do it like this df.assign(x=df['x'].format(...), y=df['y'].format(...)). Maybe have a combine=True keyword argument, that if set False, would work on single columns?

@jorisvandenbossche
Copy link
Member

@topper-123 can you indicate a bit what is different compared to the discussions on the issue?

@topper-123
Copy link
Contributor Author

@jorisvandenbossche , I've added to the beginning comment, so the the rationale is collected in one place.

@jorisvandenbossche
Copy link
Member

Thanks for the update

Just thinking out loud but would it not be more useful to apply this formatting to the columns returning a like-shaped data frame instead of a series?

Although I like the example you show with combining multiple columns, on second thought I think I agree with @WillAyd that applying the formatting to each column separately might be what users would expect by default.

In any case, I think it can certainly also be useful. For example if you have a numerical DataFrame and want to format all columns with "{:,.2f}".

So a keyword to toggle between both might be a good option (detecting it from the format specification is probably a bit brittle?)

@jreback jreback added API Design Output-Formatting __repr__ of pandas objects, to_string labels Jun 24, 2020
format string.
name : Label, optional
The name of the returned Series.
positional_only : bool, default False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we actually need this parameter? isn't positional obvious if you don't have field labels?

@simonjayhawkins
Copy link
Member

@topper-123 can you resolve conflicts and address comments

@WillAyd
Copy link
Member

WillAyd commented Sep 10, 2020

Closing as I think this is stale but ping @topper-123 if you'd like to pick back up

@WillAyd WillAyd closed this Sep 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DISCUSSION: Add format parameter to .astype when converting to str dtype
5 participants