Skip to content

BUG: Fix csv_read bugs when using empty input. GH10467 & GH10413 #10469

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

santegoeds
Copy link
Contributor

closes #10413
closes #10467

@@ -2301,6 +2301,18 @@ def test_empty_with_index(self):
expected = DataFrame([], columns=['y'], index=Index([], name='x'))
tm.assert_frame_equal(result, expected)

def test_emtpy_with_multiindex(self):
data = 'x,y,z'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue numbers as comments to both of the function

@jreback
Copy link
Contributor

jreback commented Jun 30, 2015

pls add a whatsnew note (you can put both issues in a single entry)

@santegoeds santegoeds force-pushed the bugfix/fix-csv_reader-multiindex-empty-data branch 3 times, most recently from 3c927c2 to 0365fd4 Compare July 3, 2015 08:44
@santegoeds
Copy link
Contributor Author

@jreback - build is green.

Let me know if you'd like me to squash the commits or if you want a whatsnew note for the last commit.

else:
index_col = list(index_col)
index = MultiIndex.from_arrays([[]] * len(index_col), names=index_names)
index_col.sort()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this need sorting? why not just pop them in order?

e.g.
columns = columns - Index(index_col)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs sorting so that the calculated column index, i.e. n-i in the pop, evaluates to the correct value even for a multiindex that's specified with decrementing index columns.

E.g. Without the sort and for index_col=[1,0], the loop would try to pop first column 1 - 0 = 0 and then 0 - 1 = -1.

I expect I'm missing something but columns = columns - Index(index_col) raises an error for me (*** TypeError: can only perform ops with scalar values)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry old syntax

try

columns.difference(Index(index_col))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

columns is a list of strings and index_col is a list of integers. I don't think this will work either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think need some more tests then
index_col can also be a list of strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR we already have tests that specify a multiindex with either a list of strings or a list of ints (see test_empty_with_multiindex and test_empty_with_reversed_multiindex).

I haven't completely stepped through the code but afaics, function _clean_index_names is called prior to get_empty_meta and converts index_col as a list of strings into a list of ints.

@santegoeds
Copy link
Contributor Author

Hi @jreback - just to clarify what's required. Are you happy to merge this if I add some extra tests to explicitly verify different index_col types (i.e. str, int, list of str, list of int)?

@jreback
Copy link
Contributor

jreback commented Jul 5, 2015

@santegoeds yes, just want to try to cover the bases on tests.

@santegoeds santegoeds force-pushed the bugfix/fix-csv_reader-multiindex-empty-data branch from 0365fd4 to 8523105 Compare July 6, 2015 20:44
@santegoeds
Copy link
Contributor Author

@jreback, build passing again. Is this in line with your expectations?

@jreback
Copy link
Contributor

jreback commented Jul 7, 2015

very nice
comprehensive! thanks

jreback added a commit that referenced this pull request Jul 7, 2015
…index-empty-data

BUG: Fix csv_read bugs when using empty input. GH10467 & GH10413
@jreback jreback merged commit fbe8c0b into pandas-dev:master Jul 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
2 participants