-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DataFrame fail to construct when data is list and columns is nested list for MI #32202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-04-06 19:38:52 UTC |
Slight -1 on supporting this - what benefit does this provide over passing a MultiIndex? |
Hmm OK I see the original issue where this may or may not work depending on what is passed as the values to the DataFrame constructor, so OK... Is there not any shared code to leverage here? Wonder why this works for DataFrame construction with an ndarray but not with lists |
emm, yeah, passing MI should be the way to go, but the main reason i think this might be good to support is we do support such assignment when data is a numpy array, as the example presented in issue description. In the way, we could have kinda consistency. |
ahh, sorry, did not see your update comment @WillAyd 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code change needs some work
will look later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. @jbrockmendel if any comments.
|
||
|
||
def _list_of_series_to_arrays(data, columns, coerce_float=False, dtype=None): | ||
def _list_of_series_to_arrays( | ||
data, columns, coerce_float=False, dtype=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can any of the args be typed? coerce_float should be easy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added types for all input argument
def _validate_or_indexify_columns( | ||
content: List, columns: Union[Index, List, None] | ||
) -> Union[Index, List[Axis]]: | ||
"""If columns is None, make numbers as column names; If not None, validate if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline after """
, after the ; -> "otherwise validate that columns have valid length."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed!
|
||
Parameters | ||
---------- | ||
content: list of processed data records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does "processed data records" mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for misleading, just because this is not original input data (input data being transformed a bit), so i put that word. I will just use list of data
instead to avoid confusion.
1. When content is not composed of list of lists, and if length of columns | ||
is not equal to length of content. | ||
2. When content is list of lists, but length of each sub-list is not equal | ||
3. When content is list of lists, but length of sub-list is not equal to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we say what kinds of exceptions are raised in these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed!
def _convert_object_array( | ||
content: List[Scalar], coerce_float: bool = False, dtype: Optional[Dtype] = None | ||
) -> List[Scalar]: | ||
"""Internal function ot convert object array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newline after """
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed!
|
||
Returns | ||
------- | ||
arrays: list of converted arrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does "arrays" here mean Union[ndarray, EA]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean the first arrays
or the second? i just referred to the return variable name. But it seems misleading, I will rephrase it.
with pytest.raises(ValueError, match=msg): | ||
DataFrame([[1, 2, 3, 4], [4, 5, 6, 7]], columns=arrays) | ||
|
||
def test_constructor_inequal_length_nested_list_column(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inequal -> unequal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corrected
thanks very much for your detailed reviews @jbrockmendel appreciate a lot! I have addressed them and feel free to take another look! thanks! |
emm, seems the build is broken, @jbrockmendel doesn't seem to relate to this PR though |
thanks @charlesdong1991 very nice |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff