-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Index Name is not displayed with header=False in to_csv #24840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Index Name is not displayed with header=False in to_csv #24840
Conversation
Codecov Report
@@ Coverage Diff @@
## master #24840 +/- ##
===========================================
- Coverage 92.38% 42.91% -49.48%
===========================================
Files 166 166
Lines 52382 52390 +8
===========================================
- Hits 48395 22482 -25913
- Misses 3987 29908 +25921
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24840 +/- ##
==========================================
+ Coverage 91.47% 91.47% +<.01%
==========================================
Files 172 173 +1
Lines 52870 52882 +12
==========================================
+ Hits 48362 48376 +14
+ Misses 4508 4506 -2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comments & failing the CI.
@@ -188,6 +188,35 @@ def save(self): | |||
for _fh in handles: | |||
_fh.close() | |||
|
|||
def _index_label_encoder(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string
pandas/io/formats/csvs.py
Outdated
if index_label is None: | ||
if isinstance(obj.index, ABCMultiIndex): | ||
index_label = [] | ||
for i, name in enumerate(obj.index.names): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comments where appropriate
pandas/io/formats/csvs.py
Outdated
@@ -201,6 +230,9 @@ def _save_header(self): | |||
has_aliases = isinstance(header, (tuple, list, np.ndarray, | |||
ABCIndexClass)) | |||
if not (has_aliases or self.header): | |||
encoded_labels = self._index_label_encoder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment here
@simonjayhawkins if you'd have a look |
oh... yeah... output will look different after |
@charlesdong1991 can you change closes #24546 to xref #24546, other output formatting methods need to be checked before the issue is closed. thanks. |
|
@simonjayhawkins : I'm personally not too concerned if we have to add this parameter to the class. It's not very visible (if at all) from a general |
@charlesdong1991 : I strongly suspect that you will not have to modify anything in |
pandas/tests/generic/test_frame.py
Outdated
(False, 'index.name,,\n0,0,0\n1,0,0\n'), | ||
(True, 'index.name,0,1\n0,0,0\n1,0,0\n') | ||
]) | ||
def test_to_csv_header_not_multi_index(self, header, expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of "not_multi_index," just call it "single_index"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could potentially even squash your tests together so that in your parameterization, you pass in the index
that you want to set on df.index
.
Not sure if that makes parameterization a little more complicated, but that's also another thought, given how similar how some of the code is between both of your tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, thanks for the review! @gfyoung i thought about it, but i thought they represented two different cases although the ultimate idea is to test if index name is displayed or not... so i separated them... but i will try to squash them a bit and see if it makes tests clearer.
@gfyoung if a parameter is added to |
@@ -188,6 +188,35 @@ def save(self): | |||
for _fh in handles: | |||
_fh.close() | |||
|
|||
def _index_label_encoder(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is _get_column_name_list
in format.py, could this be moved into a base class/mixin and reused?
pandas/pandas/io/formats/format.py
Lines 840 to 848 in f57bd72
def _get_column_name_list(self): | |
names = [] | |
columns = self.frame.columns | |
if isinstance(columns, ABCMultiIndex): | |
names.extend('' if name is None else name | |
for name in columns.names) | |
else: | |
names.append('' if columns.name is None else columns.name) | |
return names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially. Let's revisit once tests are passing again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
future PR though.
@simonjayhawkins : Eventually, we should, but for now, we can keep the parameter internal (so as to fix the bug) to However, I see no reason to not patch the bug now and then expose |
pandas/core/generic.py
Outdated
@@ -2916,6 +2916,8 @@ def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None, | |||
sequence should be given if the object uses MultiIndex. If | |||
False do not print fields for index names. Use index_label=False | |||
for easier importing in R. | |||
index_name: bool, default True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this? we already have the index
and index_label
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After collecting feedbacks from @gfyoung and @simonjayhawkins, if i understand correctly, a parameter index_name
could be added into CSVFormatter
that determines if index name should be kept/displayed or not if it is not None, since header
in to_csv
should only removes columns names based on its definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what index_label already does
Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2019-03-24 22:25:00 UTC |
pandas/tests/frame/test_to_csv.py
Outdated
@@ -764,7 +765,7 @@ def test_to_csv_wide_frame_formatting(self): | |||
# Issue #8621 | |||
df = DataFrame(np.random.randn(1, 100010), columns=None, index=None) | |||
with ensure_clean() as filename: | |||
df.to_csv(filename, header=False, index=False) | |||
df.to_csv(filename, header=False, index=False, index_label=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default for index_label
is None
, but it probably makes sense for the default to be False
when header=False
. This would maintain backwards compatibility and follow the Principle of Least Astonishment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right!!! thanks for the review and feedback! @simonjayhawkins just changed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you will also need to change the docstring. to make it less wordy. it may be worth allowing True
as an option, having True
as the default when header
and index
are True
and then the current default None
will just become the new default. i.e. True use the column and index names if they exist and so maintain backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the feedback, I am not sure if this is worth it... since when header
and index
are True, then current default None
plays exactly same role as True
, which will print index name is it exists, so the users don't have to define it. And I haven't found other use cases for index_label
being True
. Please enlighten me if you have different thoughts. @simonjayhawkins
pandas/io/formats/csvs.py
Outdated
decimal='.'): | ||
index_label=None, mode='w', nanRep=None, | ||
encoding=None, compression='infer', quoting=None, | ||
line_terminator='\n', chunksize=None, tupleize_cols=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you don't need to change these, pls revert
pandas/core/generic.py
Outdated
Column label for index column(s) if desired. If None is given, and | ||
`header` and `index` are True, then the index names are used. A | ||
`header` and `index` are True, then the index names are used. If | ||
`header` is False or None, default index_label changes to False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really odd to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, @jreback ! Can you say more about it? The change now is because, in previous api design, when header
is set to None or False, col names and index names will be all gone (won't be printed), and in this new change, the index name and col names are separated, and default index_label
is None
which means index names will still be printed if they exist. Therefore, following @simonjayhawkins suggestion to maintain backwards compatibility, in our internal CSVFormatter
, the index_label
will be changed to False
given header
is None
or False
. And I am glad to hear more feedbacks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry if i was not clear..
i was think along the lines
index_label : str or sequence, or bool, default { True,
False if header or index are False
this should result in an non breaking api and maintain the same behaviour for None
which does not need to be included in the docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! i just changed and removed None! @simonjayhawkins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, wait!! would this still not be able to display index names when header=False
since index_label
is coerced to False
by default? @simonjayhawkins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if its explicitly set to True
by the caller then the default values become irrelevant.
Thanks a lot for all of your reviews! I think this time all should work!
Benefits:
I am looking forward to your feedbacks! @jreback @simonjayhawkins @gfyoung |
pandas/tests/generic/test_frame.py
Outdated
@@ -269,3 +270,40 @@ def test_deepcopy_empty(self): | |||
empty_frame_copy = deepcopy(empty_frame) | |||
|
|||
self._compare(empty_frame_copy, empty_frame) | |||
|
|||
@pytest.mark.skipif(os.name == 'nt', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use tm.convert_rows_list_to_csv_str
this should become unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yeah! nice!
pandas/tests/generic/test_frame.py
Outdated
(False, None, '0,0,0\n1,0,0\n'), | ||
(True, None, 'index.name,0,1\n0,0,0\n1,0,0\n') | ||
]) | ||
def test_to_csv_header_single_index(self, header, index_label, expected): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the tests should probably go in pandas\tests\io\formats\test_to_csv.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved!
pandas/tests/generic/test_frame.py
Outdated
def test_to_csv_header_single_index(self, header, index_label, expected): | ||
# issue 24546 | ||
df = pd.DataFrame(np.zeros((2, 2), dtype=int)) | ||
df.index.name = 'index.name' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for added assurance it may be worth also assigning a name (different to the row index name) to the columns index in the test set-up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was thinking df.columns.name
here.
pandas/tests/frame/test_to_csv.py
Outdated
@@ -120,6 +120,7 @@ def test_to_csv_from_csv3(self): | |||
|
|||
df1.to_csv(path) | |||
df2.to_csv(path, mode='a', header=False) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'll need to revert changes to this module.
@charlesdong1991 can you merge master? |
cdf1a6a
to
92db032
Compare
Thanks @WillAyd , i just merged to master. |
# if index label is not explicitly called, index label is True if | ||
# header or index is not False; otherwise, index label is set to False | ||
if index_label is None: | ||
if self.header is False or self.header is None or not self.index: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.header is False or self.header is None or not self.index: | |
if not (self.header or self.index): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah... thanks @WillAyd ... however, it doesn't seem equal in my test. Because by default, self.index
is True
, and with if not (self.header or self.index)
, the result will be always False
if self.index
uses default value, and it doesn't align to my purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So not self.header or not self.index
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update this
self.index_label = self.header or self.index
seems equivalent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it does seem equivalent though,
header : bool or list of str, default True
Write out the column names. If a list of strings is given it is
assumed to be aliases for the column names.
.. versionchanged:: 0.24.0
for instance, since from version 0.24.0, it looks like header
could be a list, and if a list is given, we don't want to assign this list to index_label
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try bool(self.header) or self.index
then
pandas/io/formats/csvs.py
Outdated
def _index_label_encoder(self): | ||
"""Encode index label if it is not False. | ||
|
||
Returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you conform docstring to the pandas standard?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, changed! and i also add several more tests! @WillAyd
pandas/io/formats/csvs.py
Outdated
# append index label based on index type | ||
if isinstance(obj.index, ABCMultiIndex): | ||
index_label = [] | ||
for i, name in enumerate(obj.index.names): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use of enumerate here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, changed!
index_label = [''] | ||
else: | ||
index_label = [index_label] | ||
elif not isinstance(index_label, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does the code go down this branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, should be if index_label is True
above. @WillAyd
I pushed some changes based on your nice reviews, @WillAyd looks like there was a connection issue during checking, i will re-push it along with fixes of your potential follow-up change requests :) |
# if index label is not explicitly called, index label is True if | ||
# header or index is not False; otherwise, index label is set to False | ||
if index_label is None: | ||
if self.header is False or self.header is None or not self.index: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So not self.header or not self.index
then?
pandas/io/formats/csvs.py
Outdated
if index_label is True: | ||
# append index label based on index type | ||
if isinstance(obj.index, ABCMultiIndex): | ||
index_label = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's generally a lot of shadowing of index_label
here that I think can only cause confusion. Given this always needs to be a list on return, why not just insatiate to an empty list and then append the appropriate value within each branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i simplified a bit, would like to hear your opinion. @WillAyd
return | ||
if not (has_aliases or header): | ||
# if index_label is False, nothing will display. | ||
if index_label is False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it matter if this is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it depends, if index label is not explicitly called which is None
, index label is True
if header
or index
is not False
; otherwise, index label is set to False
275d390
to
1d5237e
Compare
@charlesdong1991 can you check failures, merge master and address comments? FYI if you merge master you shouldn't need a force push, which is preferable for maintaining GH comment history |
Hi, @WillAyd last time it was failed because of master failure, and I just merged the master, and see if the tests can be passed. |
@@ -50,7 +50,17 @@ def __init__(self, obj, path_or_buf=None, sep=",", na_rep='', | |||
|
|||
self.header = header | |||
self.index = index | |||
self.index_label = index_label | |||
# if index label is not explicitly called, index label is True if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
# if index label is not explicitly called, index label is True if | ||
# header or index is not False; otherwise, index label is set to False | ||
if index_label is None: | ||
if self.header is False or self.header is None or not self.index: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update this
self.index_label = self.header or self.index
seems equivalent
pandas/io/formats/csvs.py
Outdated
# add empty string is name is None | ||
if name is None: | ||
name = '' | ||
index_label.append(name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name or ''
if index_label is True: | ||
index_label = [] | ||
# append index label based on index type | ||
if isinstance(obj.index, ABCMultiIndex): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not simply do
index_label = list(map(lambda name: name or '', obj.index.names))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed!
index_label.append(name) | ||
else: | ||
# if no name, use empty string | ||
if obj.index.name is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't htink you need this branch at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's needed, and looks like if the branch is removed, lots of tests will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above the cases for multiindex and index are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be a lot simpler as indicated.
# if index label is not explicitly called, index label is True if | ||
# header or index is not False; otherwise, index label is set to False | ||
if index_label is None: | ||
if self.header is False or self.header is None or not self.index: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try bool(self.header) or self.index
then
index_label.append(name) | ||
else: | ||
# if no name, use empty string | ||
if obj.index.name is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above the cases for multiindex and index are the same.
@charlesdong1991 can you merge master and address comments? |
1 similar comment
@charlesdong1991 can you merge master and address comments? |
Closing as stale but ping if you'd like to pick this back up! |
git diff upstream/master -u -- "*.py" | flake8 --diff