-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: IO header formatting #15548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: IO header formatting #15548
Conversation
pandas/tests/formats/test_format.py
Outdated
@@ -1125,6 +1125,17 @@ def test_to_string_no_header(self): | |||
|
|||
self.assertEqual(df_s, expected) | |||
|
|||
def test_to_latex_specified_header(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to pandas/tests/formats/test_to_latex (new)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that was the wrong function name. It should be:
test_to_string_specified_header
so this has a bit of overlap with code in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand it, that would not work very well with the abstract structure that is used at the moment.
The _save_header
method diretly writes into the file without knowing about the values of other rows in the same column.
For a CSV
file this is not a problem, because the string formatting of each row can be independent
from the other rows.
(Justification is simply not an issue for CSV
files).
In the case of LaTeX
or string
tables however the formatting of the header row depends on the n-1
other rows.
The maximum columnwidth determines the justification of every row.
For this reason the _to_str_column
method exists.
One could rewrite the CSVFormatter
to use _to_str_column
as well.
But I assume that the person who introduced a specific CSVFormatter
did so for performance reasons.
It is unlikely that size and performance is a problem of tables exported for LaTeX
,
but it could be one for CSV
files.
I just speculate, so please correct me, if I am wrong.
What could be done also with the current logical structure is to rewrite to_html
.
pandas/formats/format.py
Outdated
minimum=(self.col_space or 0), | ||
adj=self.adj) | ||
stringified.append(fmt_values) | ||
elif has_aliases: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth noting that the only difference between the elif has_aliases:
and else:
conditional branches is the test for raising a ValueError
and cheader = [self.header[i]]
instead of cheader = str_columns[i]
.
It would be possible to write instead:
if not (has_aliases or self.header):
# same code block
pass
else:
if has_aliases:
if len(self.header) != len(self.columns):
raise ValueError(('Writing %d cols but got %d aliases'
% (len(self.columns), len(self.header))))
stringified = []
for i, c in enumerate(frame):
if has_aliases:
cheader = [self.header[i]]
else:
cheader = str_columns[i]
header_colwidth = max(self.col_space or 0,
*(self.adj.len(x) for x in cheader))
fmt_values = self._format_col(i)
fmt_values = _make_fixed_width(fmt_values, self.justify,
minimum=header_colwidth,
adj=self.adj)
max_len = max(np.max([self.adj.len(x) for x in fmt_values]),
header_colwidth)
cheader = self.adj.justify(cheader, max_len, mode=self.justify)
stringified.append(cheader + fmt_values)
What do you prefer? Redundancy vs. readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mcocdawc readability!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aye, so I keep it as it is
fyi your travis job was cancled. master is breaking, so just wait a min while I fix (then rebase). |
@mcocdawc about combining code. I merely want you to make a helper method or function (maybe |
This function So should it just reside as function in the def f(self, x):
return self.value * x
class example1:
def __init__(self):
self.value = 5
def f(self, x):
return f(self, x)
class example2:
def __init__(self):
self.value = 10
def f(self, x):
return f(self, x) Or what is the preferred structure? |
@mcocdawc you can call it as a function directly, no? It might be more convenient as a method (e.g. in FYI you can rebase on master now and push |
Ok, but if I define it as method in Should it be similar to this construct?: class example3:
def __init__(self, value):
self.value = value
def g(self):
return self.value * 10
class example4:
def __init__(self, value):
self.g = example3(value).g |
yeah this is a dumb problem. e.g.
maybe they could be combined more. I would actually do this as a separate PR (merged before this one). To make this clean. |
|
I think just historical (should be fixed)
no problem. please come back to help on it! (you can make an issue to describe what needs to be done so won't be lost) |
you need to rebase on master and force push, you have picked up commits which are already there something like
|
Sorry I don't completely understand the problem and the output of your commands.
The output of the last command was:
Do I have to cheryypick something or can I proceed with |
hmm, maybe just try with |
Codecov Report
@@ Coverage Diff @@
## master #15548 +/- ##
==========================================
- Coverage 91.06% 91.04% -0.03%
==========================================
Files 137 137
Lines 49307 49314 +7
==========================================
- Hits 44900 44896 -4
- Misses 4407 4418 +11
Continue to review full report at Codecov.
|
can you add a whatsnew note. And update the |
|
you can use replaceable parameters in the doc-string with Substitution
doc/source/whatsnew/v0.20.0.txt in Enhancements |
pandas/formats/format.py
Outdated
@@ -53,7 +53,7 @@ | |||
col_space : int, optional | |||
the minimum width of each column | |||
header : bool, optional | |||
whether to print column labels, default True | |||
%s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a named argument e.g. %{header}s
pandas/formats/format.py
Outdated
@@ -489,30 +489,48 @@ def _to_str_columns(self): | |||
str_index = self._get_formatted_index(frame) | |||
str_columns = self._get_formatted_column_labels(frame) | |||
|
|||
if self.header: | |||
has_aliases = isinstance(self.header, (tuple, list, np.ndarray, Index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use is_list_like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to mention that the original reason why I wrote it in this "unpythonic" way was to have consistency with CSVFormatter
.
It really seems as if CSVFormatter
also could be refurbished.
But for the time being I just keep it in mind for when they are both transformed to using the Mixin class you mentioned.
pandas/formats/format.py
Outdated
@@ -489,30 +489,48 @@ def _to_str_columns(self): | |||
str_index = self._get_formatted_index(frame) | |||
str_columns = self._get_formatted_column_labels(frame) | |||
|
|||
if self.header: | |||
has_aliases = isinstance(self.header, (tuple, list, np.ndarray, Index)) | |||
if not (has_aliases or self.header): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you can change this whole if test around, something like
headers = self.columns
if is_list_like(self.header):
# validate len(self.header) == len(self.columns)
headers = self.header
# do the loop from below
Ok, I tried to make a clean rebase operation and it still does not work. To summarise the results of your review:
|
@mcocdawc it doesn't matter to make this super clean. The merge will squash it. Though I generally squash my own PR's as they are simpler (to rebase) and understand. Lots of commits are fine, people do things differently. |
the key thing is to rebase on origin/master; that will put all of your commits on the top |
Hereby also to_latex and to_string accept optional headers
closes pandas-dev#15475 Author: Ben Thayer <[email protected]> Author: bthayer2365 <[email protected]> Closes pandas-dev#15506 from bthayer2365/frozen-index and squashes the following commits: 428a1b3 [Ben Thayer] Added __iadd__ test, fixed whatsnew 84ba405 [Ben Thayer] Merge branch 'master' of github.com:pandas-dev/pandas into frozen-index 8dbde1e [Ben Thayer] Rebased to upstream/master 6f6c140 [Ben Thayer] Added docstrings, depricated __iadd__, changed __add__ to use self.union() 66b3b91 [Ben Thayer] Fixed issue number 3d6cee5 [Ben Thayer] Depricated __add__ in favor of union ccd75c7 [Ben Thayer] Changed __sub__ to difference cd7de26 [Ben Thayer] Added versionadded tag in docs and renamed test_inplace to test_inplace_add for consistency 0ea8d21 [Ben Thayer] Added __isub__ and groupby example to docs 79dd958 [Ben Thayer] Updated whatsnew to reflect changes 0fc7e19 [Ben Thayer] Removed whitespace 73564ab [Ben Thayer] Added FrozenList subtraction fee7a7d [bthayer2365] Merge branch 'master' into frozen-index 6a2b48d [Ben Thayer] Added docstrings, depricated __iadd__, changed __add__ to use self.union() 2ab85cb [Ben Thayer] Fixed issue number cb95089 [Ben Thayer] Depricated __add__ in favor of union 2e43849 [Ben Thayer] Changed __sub__ to difference fdcfbbb [Ben Thayer] Added versionadded tag in docs and renamed test_inplace to test_inplace_add for consistency 2fad2f7 [Ben Thayer] Added __isub__ and groupby example to docs cd73faa [Ben Thayer] Updated whatsnew to reflect changes f6381a8 [Ben Thayer] Removed whitespace ada7cda [Ben Thayer] Added FrozenList subtraction
@mcocdawc I fixed your rebase. if any comments left by @jorisvandenbossche |
Thank you very much. Do I have to do now anything anymore for this commit? |
@mcocdawc Thanks! |
…ng column names (pandas-dev#15548) closes pandas-dev#15536
git diff upstream/master | flake8 --diff
In this branch I changed
pd.formats.format.DataFrameFormatter._to_str_columns
whichis used by
to_latex
andto_string
to format output nicely.They support now optional headers for the column titles.
See the appended example jupyter notebook.
to_html
does not usepd.formats.format.DataFrameFormatter._to_str_columns
and remains an issue.Test cases were created.