-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
custom formatters for to_csv #4668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Most legacy Fortran 77 based simulators (of which many still are actively used in different scientific communities) such as TOUGH2, (which I've had the pleasure of working with extensively), often have data input file subroutines that have hardcoded fields widths. This makes life interesting, and often as much time is spent in pre processing as in post processing. Scientists I worked with had folders of rigid bash scripts and even Fortran routines to attempt to manage these 'input decks' creation—oh the scientist-hours lost. 😢. Input decks can contain information such as mesh geometry and physical properties, system initial conditions (i.e., thermodynamic state of each element and transport states between them), and simulator operation parameters. While it's often the case that pandas is used to analyze measured or resultant data, I certainly could envision using it to manipulate input files (indeed, I wrote an entire perl library to do this, before discovering python and pandas). With functionality to manage such input decks, it is fundamental to have fixed width output. Clearly there are much better ways to interact with fortran libraries (f2py / numpy) but I can tell you that (some) scientists are simply interested in getting a simulation up and running. If pandas already has them hooked for data analysis use, there could be a large benefit from such functionality. If had I access to such a tool, my graduate student life would have been a who lot more social 😉. |
@nehalecky so you want either to_csv to have a fixed width mode what kind of an API would you see here? we have been toying with the idea of passing a style parameter to these output routines which could be a class (pandas would provide a base class) that could be overridden for really custom behavior but easy to see a FIxedWidthWriter or maybe overkill an just need something straightforward? |
Perhaps should be |
I think per-column functionality should be added to this list, similar to how That would allow users, for example, to apply a different float format to the timestamp than the data columns. Or change the time formatting to military format. (Date formatting was touched in PR #4313 but not time IIUC) |
this is really just waiting on a nice API that either does what you are suggesting / templates or both and of course someone to work on this.... it would not be hard to extend e.g.
|
A per column template, as suggested by @jreback I think would be grand. For large/complex column arrangements, you could use a series beforehand to prescribe slices across certain columns and generate a dict. :) |
Currently the
I would suggest having some way how to format datetime w/o time different from datetime with times. |
@cancan101 could have a flag to drop the minutes / seconds if 00:00:00 (not sure on good name). Could do with an example of date_format in doc, think it would make it clearer (or use default. Should these be in |
@hayd I assume you mean drop the hours, minutes, and seconds (ie the time component of the I think the option and its name depend on how it will work: should it be an "intelligent" format that only prints the time component if needed (i.e. if any of the datetime values have a time != midnight, see #5701) or should it work as a truncate where datetimes are truncated to just dates. |
Did we ever settle on an API here? |
I think need to create a Format object Format(col or cols, format=None, default=None) so this easiky handles date_fornat and float_format (for back compat) and handles ability to customize as well |
I dont want to be a pain but This master issue looks like it has been holding the string spacing issue hostage for almost 3 years and I think is mostly unrelated to more complex issues like float and datetime formatting. just ran into it in version 17.1 Going to have to format entire table manualy. |
@drafter250 well we have 1600 issues - which one shall be first? best way to get something in would be to put in a pull request |
Suggestion to have float32_formatter different than the float64_formatter. Today force float32 and float64 types to have same formatter is far from optimal. |
@jreback re: extra spaces between columns in DataFrame.to_string. not long after my posting I dug into the pandas/core/format.py module where to_string is located under the DataFrameFormater class. I found that there are a few calls to a method "self.adj.adjoin" where the first argument should be an integer for the number of spaces between columns and the second are the columns themselves. most of the calls to this method are hardcoded and the col_space argument in the "to_string" method seems to actually go un-used. so I added these lines to DataFrameFormatter.init()
then replaced the hard-coded values to self.adj.adjoin with self.col_space and it seemed to work both with formatters or specifying a number of spaces to the col_space argument I just got a development environment setup for pandas per the user instructions and noticed that the format module is missing from pandas/core and is now in a subpackage called formatters. This would be my first pull request and I want to do things right and write my tests first. Q1 Where would I find any tests related to the to_string functionality as I don't see a test folder under the new formatters sub-package? Q2. Could the extra space issue #4158 be separated out from this bigger issue so it can be referenced from the pull request. (you can wait till I actually submit the request)? Thanks! |
rebase on master and you will see the ideally you DO separate out issues to as narrow as possible |
I got hung up trying to find right spot to add/modify tests and think I On Mon, Nov 14, 2016 at 1:46 PM, acosby [email protected] wrote:
|
perhaps reopen #16840 in preference to this, purely for consistency with to_string etc. |
all but one of the issues in the OP are now closed. is this still needed? |
Agreed, the last issue doesn't need this master tracker anymore. Closing |
@mroeschke et al. care to weigh in on my comment here? 🙏🏻 |
SO question
something like
The text was updated successfully, but these errors were encountered: