Skip to content

BUG: multi-index to_native_types is not passing thru parameters #11551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

BUG: multi-index to_native_types is not passing thru parameters #11551

wants to merge 1 commit into from

Conversation

nbonnotte
Copy link
Contributor

closes #7791
xref #6797

return self.values
# we go through the levels and format them
levels = [_._format_native_types(**kwargs) for _ in self.levels]
mi = MultiIndex(levels=levels, labels=self.labels,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add names=self.names

@jreback
Copy link
Contributor

jreback commented Nov 8, 2015

couple of comments. pls also add a whatsnew note (bug fix section)

@jreback jreback added Bug Output-Formatting __repr__ of pandas objects, to_string MultiIndex labels Nov 8, 2015
@jreback jreback added this to the 0.17.1 milestone Nov 8, 2015
@nbonnotte
Copy link
Contributor Author

Hum, na_rep has its own difficulty:

For a plain index:

In [79]: pd.DataFrame({'a': np.NaN, 'b':[2,3]}).set_index('a').index
Out[79]: Float64Index([nan, nan], dtype='float64', name=u'a')

and that's cool, but for a multi-index:

In [80]: pd.DataFrame({'a': np.NaN, 'b':[2,3]}).set_index(['a','b']).index
Out[80]: 
MultiIndex(levels=[[], [2, 3]],
           labels=[[-1, -1], [0, 1]],
           names=[u'a', u'b'])

so there are no nan values stored, and thus:

In [81]: pd.DataFrame({'a': np.NaN, 'b':[2,3]}).set_index(['a','b']).index.levels[0]
Out[81]: Float64Index([], dtype='float64', name=u'a')

@jreback
Copy link
Contributor

jreback commented Nov 8, 2015

You are setting everything to the index, which collapses the frame.

In [10]: pd.DataFrame({'a': np.NaN, 'b':[2,3], 'c' : [5,6]}).set_index(['a','b'])      
Out[10]: 
       c
a   b   
NaN 2  5
    3  6

having nans in an index is quite odd and only semi-supported, meaning some edge cases exist.

@nbonnotte
Copy link
Contributor Author

Even when there is a third column 'c' that stays outside the index, we still get an empty Float64Index:

In [5]: pd.DataFrame({'a': np.NaN, 'b':[2,3], 'c' : [5,6]}).set_index(['a','b']).index
Out[5]: 
MultiIndex(levels=[[], [2, 3]],
           labels=[[-1, -1], [0, 1]],
           names=[u'a', u'b'])

In [6]: pd.DataFrame({'a': np.NaN, 'b':[2,3], 'c' : [5,6]}).set_index('a').index
Out[6]: Float64Index([nan, nan], dtype='float64', name=u'a')

If this is an "edge case", I'm pushing what I have done so far:

  • what's new
  • a short explanation for the test for date_format
  • add names=self.names where it should be
  • test if quoting is also passed to the multi-indexes

What I have NOT done:

@jreback
Copy link
Contributor

jreback commented Nov 10, 2015

ok, this looks good. pls squash and ping when green.

@@ -120,3 +120,6 @@ Bug Fixes
- Bug in ``to_excel`` with openpyxl 2.2+ and merging (:issue:`11408`)

- Bug in ``DataFrame.to_dict()`` produces a ``np.datetime64`` object instead of ``Timestamp`` when only datetime is present in data (:issue:`11327`)


- Bug in ``DataFrame.to_csv`` with argument ``date_format`` when dealing with MultiIndexes (:issue:`7791`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say something like bug in DataFrame.to_csv in passing thru arguments for formatting MultiIndexes, including date_format

@nbonnotte
Copy link
Contributor Author

All done

@jreback
Copy link
Contributor

jreback commented Nov 11, 2015

merged via 9cbe8b9

thanks!

@jreback jreback closed this Nov 11, 2015
@nbonnotte nbonnotte deleted the groupby-then-to_csv branch November 16, 2015 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug MultiIndex Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MultiIndex DataFrame to_csv() ignores date_format
2 participants