-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix csv_read bugs when using empty input. GH10467 & GH10413 #10469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix csv_read bugs when using empty input. GH10467 & GH10413 #10469
Conversation
@@ -2301,6 +2301,18 @@ def test_empty_with_index(self): | |||
expected = DataFrame([], columns=['y'], index=Index([], name='x')) | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
def test_emtpy_with_multiindex(self): | |||
data = 'x,y,z' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue numbers as comments to both of the function
pls add a whatsnew note (you can put both issues in a single entry) |
3c927c2
to
0365fd4
Compare
@jreback - build is green. Let me know if you'd like me to squash the commits or if you want a whatsnew note for the last commit. |
else: | ||
index_col = list(index_col) | ||
index = MultiIndex.from_arrays([[]] * len(index_col), names=index_names) | ||
index_col.sort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this need sorting? why not just pop them in order?
e.g.
columns = columns - Index(index_col)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs sorting so that the calculated column index, i.e. n-i
in the pop
, evaluates to the correct value even for a multiindex that's specified with decrementing index columns.
E.g. Without the sort and for index_col=[1,0]
, the loop would try to pop
first column 1 - 0 = 0
and then 0 - 1 = -1
.
I expect I'm missing something but columns = columns - Index(index_col)
raises an error for me (*** TypeError: can only perform ops with scalar values
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry old syntax
try
columns.difference(Index(index_col))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
columns
is a list of strings and index_col
is a list of integers. I don't think this will work either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think need some more tests then
index_col can also be a list of strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR we already have tests that specify a multiindex with either a list of strings or a list of ints (see test_empty_with_multiindex
and test_empty_with_reversed_multiindex
).
I haven't completely stepped through the code but afaics, function _clean_index_names
is called prior to get_empty_meta
and converts index_col
as a list of strings into a list of ints.
Hi @jreback - just to clarify what's required. Are you happy to merge this if I add some extra tests to explicitly verify different |
@santegoeds yes, just want to try to cover the bases on tests. |
0365fd4
to
8523105
Compare
@jreback, build passing again. Is this in line with your expectations? |
very nice |
…index-empty-data BUG: Fix csv_read bugs when using empty input. GH10467 & GH10413
closes #10413
closes #10467