Skip to content

[BUG] Reading multiindex, incorrectly names columns without name. #13115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

brandys11
Copy link
Contributor

Fixed reading and writing of Multiindex.
In a situation where Multiindex looked like this:

One Two
X

it was changed to:

One Two
X X

@jreback jreback added Bug IO Excel read_excel, to_excel labels May 8, 2016
@jreback
Copy link
Contributor

jreback commented May 8, 2016

pls replicate the exact test as indicated in the issue (write it then read it). you don't need to include actual excel files unless you are trying to read something which pandas is not capable of generating (which I don't think is the case here).

@brandys11
Copy link
Contributor Author

Ok, I have joined the tests. But the problem with Unnamed_columns remains, the code now explicitly checks if the cell is empty and if yes - it renames it to Unnamed: row_level_column pandas/pandas/io/parsers.py.

Is there any suggestion how to deal with this?

@jreback
Copy link
Contributor

jreback commented May 18, 2016

so if you have something unnamed, you can do it like the read_csv

In [1]: df = DataFrame([[1, 2]],columns=pd.MultiIndex.from_tuples([('A', 1), ('B', 1)]))

In [2]: df
Out[2]: 
   A  B
   1  1
0  1  2

In [3]: df.to_csv(None)
Out[3]: ',A,B\n,1,1\n,,\n0,1,2\n'

In [4]: pd.read_csv(StringIO(df.to_csv(None)), header=[0,1])
Out[4]: 
  Unnamed: 0_level_0  A  B
  Unnamed: 0_level_1  1  1
0                  0  1  2

@brandys11 brandys11 force-pushed the excel_multiindex_empty_name branch from 700673a to f4669f1 Compare June 4, 2016 08:44
@codecov-io
Copy link

codecov-io commented Jun 4, 2016

Current coverage is 84.14%

Merging #13115 into master will decrease coverage by <.01%

@@             master     #13115   diff @@
==========================================
  Files           138        137     -1   
  Lines         50739      50288   -451   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          42740      42312   -428   
+ Misses         7999       7976    -23   
  Partials          0          0          

Powered by Codecov. Last updated by 27448d9...372c81f

@brandys11
Copy link
Contributor Author

So then everything should be OK.

@jreback jreback added this to the 0.18.2 milestone Jun 4, 2016
@jreback
Copy link
Contributor

jreback commented Jun 4, 2016

can you also add a test where the currently last column ('Z','') is not at the end.

def _fill_mi_header(row):
# forward fill blanks entries
def _fill_mi_header(row, control_row):
# forward fill blanks entries, but only inside same parent index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a doc-string here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@brandys11 brandys11 force-pushed the excel_multiindex_empty_name branch 2 times, most recently from 11d59af to 41874af Compare June 4, 2016 19:58
@brandys11
Copy link
Contributor Author

The new test is there and also the docstring.

@jreback
Copy link
Contributor

jreback commented Jun 6, 2016

lgtm. @jorisvandenbossche

@@ -1277,29 +1277,27 @@ def _write_hierarchical_rows(self, fmt_values, indent):


def _get_level_lengths(levels, sentinel=''):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you have rewritten this function a bit, would you like to add a docstring at once as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brandys11 ok, if you can update this doc-string. ping on green. ty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback ping, everything is green

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 6, 2016

Minor comment (not essential for merging), looks good for the rest!

@brandys11 brandys11 force-pushed the excel_multiindex_empty_name branch from 41874af to c4b793b Compare June 6, 2016 15:42
Merge tests

Fix flake8 issues.

Fix tests with python 2.7

Add test with empty cell not at the end.

Docstring for function _fill_mi_header

Docstring for function _get_level_lengths
@brandys11 brandys11 force-pushed the excel_multiindex_empty_name branch from c4b793b to 7953aee Compare June 6, 2016 18:36
@jreback jreback closed this in 67b72e3 Jun 6, 2016
@jreback
Copy link
Contributor

jreback commented Jun 6, 2016

thanks @brandys11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: excel export merge margin
4 participants