-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add NDFrame.format for easier conversion to string dtype #34941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4ed17d9
to
59b5dc2
Compare
a10838c
to
db6d235
Compare
3 Population: 21,477,737 | ||
dtype: string | ||
|
||
>>> df.format("{} ({}): {:,}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking out loud but would it not be more useful to apply this formatting to the columns returning a like-shaped data frame instead of a series?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that users would rarely want to convert the whole frame, but usually selected columns and do it like this df.assign(x=df['x'].format(...), y=df['y'].format(...))
. Maybe have a combine=True
keyword argument, that if set False, would work on single columns?
@topper-123 can you indicate a bit what is different compared to the discussions on the issue? |
@jorisvandenbossche , I've added to the beginning comment, so the the rationale is collected in one place. |
Thanks for the update
Although I like the example you show with combining multiple columns, on second thought I think I agree with @WillAyd that applying the formatting to each column separately might be what users would expect by default. In any case, I think it can certainly also be useful. For example if you have a numerical DataFrame and want to format all columns with So a keyword to toggle between both might be a good option (detecting it from the format specification is probably a bit brittle?) |
format string. | ||
name : Label, optional | ||
The name of the returned Series. | ||
positional_only : bool, default False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we actually need this parameter? isn't positional obvious if you don't have field labels?
@topper-123 can you resolve conflicts and address comments |
Closing as I think this is stale but ping @topper-123 if you'd like to pick back up |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
This adds a
format
method to DataFrame and Series. This is useful for data transformation.This method allows/makes it easier to do more complex conversion from arbitrary dtypes to
string
series, including combining several columns in a DataFrame to make the string series. For example we can now do this conversion quite easily:I still need to update text.rst, but would like feedback on this first, as this is a bit different than discussed in #17211. In that issue we e.g. only discussed a
format
method forSeries
, while this also adds it forDataFrame
. In #17211 I also aired the idea of allowing series methods in the format string. I think that is technically quite difficult, so is not part of this PR.