Skip to content

BUG: DataFrame fail to construct when data is list and columns is nested list for MI #32202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Apr 6, 2020

Conversation

charlesdong1991
Copy link
Member

@pep8speaks
Copy link

pep8speaks commented Feb 23, 2020

Hello @charlesdong1991! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-04-06 19:38:52 UTC

@WillAyd
Copy link
Member

WillAyd commented Feb 24, 2020

Slight -1 on supporting this - what benefit does this provide over passing a MultiIndex?

@WillAyd WillAyd added Constructors Series/DataFrame/Index/pd.array Constructors DataFrame DataFrame data structure labels Feb 24, 2020
@WillAyd
Copy link
Member

WillAyd commented Feb 24, 2020

Hmm OK I see the original issue where this may or may not work depending on what is passed as the values to the DataFrame constructor, so OK...

Is there not any shared code to leverage here? Wonder why this works for DataFrame construction with an ndarray but not with lists

@charlesdong1991
Copy link
Member Author

emm, yeah, passing MI should be the way to go, but the main reason i think this might be good to support is we do support such assignment when data is a numpy array, as the example presented in issue description. In the way, we could have kinda consistency.

@charlesdong1991
Copy link
Member Author

ahh, sorry, did not see your update comment @WillAyd 😅

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the code change needs some work
will look later

@charlesdong1991 charlesdong1991 requested a review from jreback March 5, 2020 21:26
@charlesdong1991
Copy link
Member Author

@WillAyd @jreback I have made some improvements on type annotations. and some feedbacks will be very welcome! thanks!

@charlesdong1991 charlesdong1991 requested a review from WillAyd April 3, 2020 16:42
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @jbrockmendel if any comments.



def _list_of_series_to_arrays(data, columns, coerce_float=False, dtype=None):
def _list_of_series_to_arrays(
data, columns, coerce_float=False, dtype=None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can any of the args be typed? coerce_float should be easy

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added types for all input argument

def _validate_or_indexify_columns(
content: List, columns: Union[Index, List, None]
) -> Union[Index, List[Axis]]:
"""If columns is None, make numbers as column names; If not None, validate if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline after """, after the ; -> "otherwise validate that columns have valid length."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed!


Parameters
----------
content: list of processed data records
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "processed data records" mean here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for misleading, just because this is not original input data (input data being transformed a bit), so i put that word. I will just use list of data instead to avoid confusion.

1. When content is not composed of list of lists, and if length of columns
is not equal to length of content.
2. When content is list of lists, but length of each sub-list is not equal
3. When content is list of lists, but length of sub-list is not equal to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we say what kinds of exceptions are raised in these cases?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed!

def _convert_object_array(
content: List[Scalar], coerce_float: bool = False, dtype: Optional[Dtype] = None
) -> List[Scalar]:
"""Internal function ot convert object array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newline after """

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed!


Returns
-------
arrays: list of converted arrays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does "arrays" here mean Union[ndarray, EA]?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean the first arrays or the second? i just referred to the return variable name. But it seems misleading, I will rephrase it.

with pytest.raises(ValueError, match=msg):
DataFrame([[1, 2, 3, 4], [4, 5, 6, 7]], columns=arrays)

def test_constructor_inequal_length_nested_list_column(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inequal -> unequal

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

@charlesdong1991
Copy link
Member Author

thanks very much for your detailed reviews @jbrockmendel appreciate a lot!

I have addressed them and feel free to take another look! thanks!

@charlesdong1991
Copy link
Member Author

charlesdong1991 commented Apr 4, 2020

emm, seems the build is broken, @jbrockmendel doesn't seem to relate to this PR though

@charlesdong1991 charlesdong1991 requested review from jbrockmendel and removed request for jbrockmendel April 6, 2020 20:43
@jreback jreback added this to the 1.1 milestone Apr 6, 2020
@jreback jreback merged commit 2dd80c8 into pandas-dev:master Apr 6, 2020
@jreback
Copy link
Contributor

jreback commented Apr 6, 2020

thanks @charlesdong1991 very nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors DataFrame DataFrame data structure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

can't create DataFrame with multiIndex by list of list
5 participants