custom formatters for to_csv #4668

cpcloud · 2013-08-25T05:14:47Z

date formatting Date formatting option in to_csv? #2583, PR ENH: to_csv() date formatting #4313, to_csv date format flag not behaving as expected with multiindex #6797
string spacing (justification?) BUG: DataFrame.to_string() creates extra space for string dtypes #4158
float format Float format syntax #2502, Floating point precision in DataFrame.to_csv #2069
int format, int_format in display options #6502
timedelta fromat smaller than microsecond timedelta64 Series are not saved correctly with to_csv #6783

something like

df.to_csv(format='%10.4f', sep=' ')

nehalecky · 2013-08-25T19:05:11Z

Most legacy Fortran 77 based simulators (of which many still are actively used in different scientific communities) such as TOUGH2, (which I've had the pleasure of working with extensively), often have data input file subroutines that have hardcoded fields widths. This makes life interesting, and often as much time is spent in pre processing as in post processing. Scientists I worked with had folders of rigid bash scripts and even Fortran routines to attempt to manage these 'input decks' creation—oh the scientist-hours lost. 😢.

Input decks can contain information such as mesh geometry and physical properties, system initial conditions (i.e., thermodynamic state of each element and transport states between them), and simulator operation parameters. While it's often the case that pandas is used to analyze measured or resultant data, I certainly could envision using it to manipulate input files (indeed, I wrote an entire perl library to do this, before discovering python and pandas). With functionality to manage such input decks, it is fundamental to have fixed width output.

Clearly there are much better ways to interact with fortran libraries (f2py / numpy) but I can tell you that (some) scientists are simply interested in getting a simulation up and running. If pandas already has them hooked for data analysis use, there could be a large benefit from such functionality. If had I access to such a tool, my graduate student life would have been a who lot more social 😉.

jreback · 2013-08-26T01:16:00Z

@nehalecky so you want either to_csv to have a fixed width mode

what kind of an API would you see here?

we have been toying with the idea of passing a style parameter to these output routines which could be a class (pandas would provide a base class) that could be overridden for really custom behavior

but easy to see a FIxedWidthWriter

or maybe overkill an just need something straightforward?

hayd · 2013-08-26T11:47:52Z

Perhaps should be float_format to match with options.display.... actually atm that requires a formatter (e.g. '{:10.4f}'.format), maybe should also accept strings like '%10.4f'...

patricktokeeffe · 2013-10-30T17:44:41Z

I think per-column functionality should be added to this list, similar to how read_csv's dtype and na_values accept per-column parameters as a dict.

That would allow users, for example, to apply a different float format to the timestamp than the data columns. Or change the time formatting to military format. (Date formatting was touched in PR #4313 but not time IIUC)

jreback · 2013-10-30T18:25:27Z

this is really just waiting on a nice API that either does what you are suggesting / templates or both

and of course someone to work on this....

it would not be hard to extend float_format/date_format to accept a dict of columns to format

e.g.

date_format={'A' : '%Y%m%d', 'B' : '%y'}

nehalecky · 2014-02-15T00:43:47Z

A per column template, as suggested by @jreback I think would be grand. For large/complex column arrangements, you could use a series beforehand to prescribe slices across certain columns and generate a dict. :)

cancan101 · 2014-02-16T22:22:22Z

Currently the date_format argument is a little unclear as to what it does when the value being formatted is a "date" (datetime w/o a time) as opposed to a "datetime" (datetime w/ a time). At present, it treats these alike and uses the same formatter for each. This is different from how a DatetimeIndex is formatted to CSV. In that case, the formatting code detects if all of the values in the index do not contain times in which cases it only formats the date component. See:

df = pd.DataFrame({'a':[datetime.datetime(2013,1,1)]}, index=pd.to_datetime([datetime.datetime(2013,1,1)]))
io = StringIO()
df.to_csv(io)

In [12]: print io.getvalue()
,a
2013-01-01,2013-01-01 00:00:00

I would suggest having some way how to format datetime w/o time different from datetime with times.

hayd · 2014-03-01T02:33:49Z

@cancan101 could have a flag to drop the minutes / seconds if 00:00:00 (not sure on good name). Could do with an example of date_format in doc, think it would make it clearer (or use default.

Should these be in options.io ?

cancan101 · 2014-03-03T05:36:44Z

@hayd I assume you mean drop the hours, minutes, and seconds (ie the time component of the datetime)?

I think the option and its name depend on how it will work: should it be an "intelligent" format that only prints the time component if needed (i.e. if any of the datetime values have a time != midnight, see #5701) or should it work as a truncate where datetimes are truncated to just dates.

cpcloud · 2014-03-13T01:26:28Z

Did we ever settle on an API here?

jreback · 2014-03-13T01:36:08Z

I think need to create a Format object

Format(col or cols, format=None, default=None)

so this easiky handles date_fornat and float_format (for back compat)

and handles ability to customize as well

drafter250 · 2016-03-25T05:28:21Z

I dont want to be a pain but This master issue looks like it has been holding the string spacing issue hostage for almost 3 years and I think is mostly unrelated to more complex issues like float and datetime formatting. just ran into it in version 17.1 Going to have to format entire table manualy.

jreback · 2016-03-25T11:10:33Z

@drafter250 well we have 1600 issues - which one shall be first? best way to get something in would be to put in a pull request

VelizarVESSELINOV · 2016-04-05T03:39:57Z

Suggestion to have float32_formatter different than the float64_formatter. Today force float32 and float64 types to have same formatter is far from optimal.

drafter250 · 2016-05-23T02:48:40Z

@jreback re: extra spaces between columns in DataFrame.to_string. not long after my posting I dug into the pandas/core/format.py module where to_string is located under the DataFrameFormater class. I found that there are a few calls to a method "self.adj.adjoin" where the first argument should be an integer for the number of spaces between columns and the second are the columns themselves. most of the calls to this method are hardcoded and the col_space argument in the "to_string" method seems to actually go un-used.

so I added these lines to DataFrameFormatter.init()

    #set col_space to zero if custom formaters provided and
    #no col_space provided.
    self.col_space = col_space
    if formatters is not None and col_space is None:
        self.col_space = 0
    elif col_space is None:
        self.col_space = 1

then replaced the hard-coded values to self.adj.adjoin with self.col_space and it seemed to work both with formatters or specifying a number of spaces to the col_space argument

I just got a development environment setup for pandas per the user instructions and noticed that the format module is missing from pandas/core and is now in a subpackage called formatters. This would be my first pull request and I want to do things right and write my tests first.

Q1 Where would I find any tests related to the to_string functionality as I don't see a test folder under the new formatters sub-package?

Q2. Could the extra space issue #4158 be separated out from this bigger issue so it can be referenced from the pull request. (you can wait till I actually submit the request)?

Thanks!

jreback · 2016-05-23T13:18:14Z

rebase on master and you will see the pandas/formats where things moved in 0.18.0.
test are in tests/formats/

ideally you DO separate out issues to as narrow as possible

drafter250 · 2016-11-15T05:50:03Z

I got hung up trying to find right spot to add/modify tests and think I
remember running into some corner cases that i couldn't quite figure out.
After looking at the formatting code It seems very intertwined with other
bits and i can see why they wanted to revamp the API.

On Mon, Nov 14, 2016 at 1:46 PM, acosby [email protected] wrote:

Was there any progress on this? Or a way to hack to_space to use commas
as sep?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#4668 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ALNFB6uawZRNKHXCHWDCJAYuM8zefqvIks5q-Kx-gaJpZM4A7uJA
.

simonjayhawkins · 2019-04-06T14:17:42Z

perhaps reopen #16840 in preference to this, purely for consistency with to_string etc.

jbrockmendel · 2020-09-24T01:13:00Z

all but one of the issues in the OP are now closed. is this still needed?

mroeschke · 2021-04-11T03:09:17Z

Agreed, the last issue doesn't need this master tracker anymore. Closing

davetapley · 2023-06-06T05:11:45Z

@mroeschke et al. care to weigh in on my comment here? 🙏🏻

API: consistent float_format specification #9448 (comment)

This was referenced Sep 21, 2013

BUG/API: can't pass parameters to csv module via df.to_csv #4528

Closed

Floating point precision in DataFrame.to_csv #2069

Closed

This was referenced Sep 28, 2013

BUG: DataFrame.to_string() creates extra space for string dtypes #4158

Closed

Date formatting option in to_csv? #2583

Closed

Float format syntax #2502

Closed

jreback modified the milestones: 0.15.0, 0.14.0 Mar 11, 2014

hayd mentioned this issue May 29, 2014

Make IntArrayFormatter and all upstream users able to specify formatting for integer types #3096

Closed

TomAugspurger mentioned this issue Sep 6, 2014

to_excel() float_format to accept this format string? #8191

Closed

jorisvandenbossche mentioned this issue Feb 9, 2015

API: consistent float_format specification #9448

Open

jreback mentioned this issue Feb 20, 2015

float_format string error #9531

Closed

jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015

jreback added the Master Tracker High level tracker for similar issues label Apr 5, 2016

jreback mentioned this issue Jul 6, 2017

Add formatters parameter to DataFrame.to_csv() #16840

Closed

datapythonista modified the milestones: Contributions Welcome, Someday Jul 8, 2018

ghisvail mentioned this issue Dec 4, 2019

Thousands separator for to_csv #30045

Open

mroeschke closed this as completed Apr 11, 2021

samuelchodur mentioned this issue Apr 16, 2023

BUG: DataFrame.to_string() creates extra space for string dtypes in pandas >=2.0 #52690

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

custom formatters for to_csv #4668

custom formatters for to_csv #4668

cpcloud commented Aug 25, 2013

nehalecky commented Aug 25, 2013

jreback commented Aug 26, 2013

hayd commented Aug 26, 2013

patricktokeeffe commented Oct 30, 2013

jreback commented Oct 30, 2013

nehalecky commented Feb 15, 2014

cancan101 commented Feb 16, 2014

hayd commented Mar 1, 2014

cancan101 commented Mar 3, 2014

cpcloud commented Mar 13, 2014

jreback commented Mar 13, 2014

drafter250 commented Mar 25, 2016

jreback commented Mar 25, 2016

VelizarVESSELINOV commented Apr 5, 2016

drafter250 commented May 23, 2016 •

edited

Loading

jreback commented May 23, 2016

drafter250 commented Nov 15, 2016

simonjayhawkins commented Apr 6, 2019

jbrockmendel commented Sep 24, 2020

mroeschke commented Apr 11, 2021

davetapley commented Jun 6, 2023

custom formatters for to_csv #4668

custom formatters for to_csv #4668

Comments

cpcloud commented Aug 25, 2013

nehalecky commented Aug 25, 2013

jreback commented Aug 26, 2013

hayd commented Aug 26, 2013

patricktokeeffe commented Oct 30, 2013

jreback commented Oct 30, 2013

nehalecky commented Feb 15, 2014

cancan101 commented Feb 16, 2014

hayd commented Mar 1, 2014

cancan101 commented Mar 3, 2014

cpcloud commented Mar 13, 2014

jreback commented Mar 13, 2014

drafter250 commented Mar 25, 2016

jreback commented Mar 25, 2016

VelizarVESSELINOV commented Apr 5, 2016

drafter250 commented May 23, 2016 • edited Loading

jreback commented May 23, 2016

drafter250 commented Nov 15, 2016

simonjayhawkins commented Apr 6, 2019

jbrockmendel commented Sep 24, 2020

mroeschke commented Apr 11, 2021

davetapley commented Jun 6, 2023

drafter250 commented May 23, 2016 •

edited

Loading