Skip to content

BUG: AssertionError on Series.append(DataFrame) fix #30975 #31036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jan 24, 2020

Conversation

hvardhan20
Copy link
Contributor

@hvardhan20 hvardhan20 commented Jan 15, 2020

Series.append() does not throw AssertionError anymore. This fix performs a precheck before proceeding with concatenation. Tests have passed. Whatsnew entry added.
Thanks!

@pep8speaks
Copy link

pep8speaks commented Jan 15, 2020

Hello @hvardhan20! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-01-21 06:02:52 UTC

@hvardhan20 hvardhan20 changed the title Series append fix Series.append(DataFrame) AssertionError Fix #30975 Jan 15, 2020
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hvardhan20 for the PR.

@simonjayhawkins simonjayhawkins added Error Reporting Incorrect or improved errors from pandas Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 15, 2020
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hvardhan20 also need to run black pandas before a commit, see https://dev.pandas.io/docs/development/contributing.html#python-pep8-black

@@ -2538,6 +2538,10 @@ def append(self, to_append, ignore_index=False, verify_integrity=False) -> "Seri
to_concat = [self]
to_concat.extend(to_append)
else:
if not isinstance(to_append, type(self)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think should allow a Series to be appended to a subclassed Series

Suggested change
if not isinstance(to_append, type(self)):
if not isinstance(to_append, Series):

also need to move the check into above condition to check the items of a sequence

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or instead of inside the else, loop over to_concat[1:] outside the else

Copy link
Contributor Author

@hvardhan20 hvardhan20 Jan 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the hint @simonjayhawkins.
Could you please suggest the better way to do this between the following 2 ways:
1)

Suggested change
if not isinstance(to_append, type(self)):
for x in to_concat[1:]:
if not isinstance(x, Series):
msg = (
f"to_append should be a Series or list/tuple of Series, "
f"got {type(to_append).__name__}"
f'{(" of " + type(x).__name__) if isinstance(to_append, (list, tuple)) else ""}'
)
raise TypeError(msg)
Suggested change
if not isinstance(to_append, type(self)):
if any(not isinstance(x, Series) for x in to_concat[1:]):
msg = (
f"to_append should be a Series or list/tuple of Series, "
f"got {type(to_append).__name__}"
)
raise TypeError(msg)

Both go after the else. The problem with 2nd way is we cannot get the type of the sequence element which raises the Error. If there's a better way to do this, please do share.
Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test case test_concatlike_same_dtypes is failing in pipeline, which is testing the appending of various data types and expecting to catch and raise a TypeError in Concatenator constructor.
With if not isinstance(x, Series): to_append will catch & raise TypeError for all data types.
Hence test_concatlike_same_dtypes is failing.

I think we should consider the following change:

Suggested change
if not isinstance(to_append, type(self)):
for x in to_concat[1:]:
if isinstance(x, pd.DataFrame):
raise TypeError

What are your thoughts on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC correctly from the issue the behaviour of adding a DataFrame needs to be revisited, so I agree that maybe checking for a DataFrame maybe better than checking for not a Series.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the simple loop of step 1 (fails faster) and the simplicity of message 2.

msg = "to_append should be a Series or list/tuple of Series, got " \
"<class 'pandas.core.frame.DataFrame'>"
with pytest.raises(TypeError, match=msg):
df.A.append(df)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also need to add test to check items in a list/tuple

Copy link
Contributor Author

@hvardhan20 hvardhan20 Jan 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an acceptable test case for testing items in sequence?

Suggested change
df.A.append(df)
df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
li = [df.B, df]
msg = "to_append should be a Series or list/tuple of Series, got list of DataFrame"
with pytest.raises(TypeError, match=msg):
df.A.append(li)

@hvardhan20 hvardhan20 changed the title Series.append(DataFrame) AssertionError Fix #30975 BUG: AssertionError on Series.append(DataFrame) fix #30975 Jan 17, 2020
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hvardhan20. a few more suggestions otherwise lgtm.

if isinstance(x, (pd.DataFrame,)):
msg = (
f"to_append should be a Series or list/tuple of Series, "
f"got {type(to_append).__name__}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may make the error message clearer in the case where the sequence contains a DataFame

Suggested change
f"got {type(to_append).__name__}"
f"got {type(x).__name__}"

@@ -2539,6 +2539,13 @@ def append(self, to_append, ignore_index=False, verify_integrity=False) -> "Seri
to_concat.extend(to_append)
else:
to_concat = [self, to_append]
for x in to_concat[1:]:
if isinstance(x, (pd.DataFrame,)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(x, (pd.DataFrame,)):
if isinstance(x, ABCDataFrame):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of checking if its a dataframe, would be better to check not an acceptable type, e.g. Series, list-like of Series.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC other incorrect types are caught by concat. This PR will be backported so just minimising the changes here to fix the regression (raises AssertionError) for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could catch the AssertionError and raise improved message instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no i would rather not backport this, let's actually fix this properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no i would rather not backport this, let's actually fix this properly.

and backport the proper fix?

I think we need something backported. to get back to the (dubious) 0.25.3 behaviour could just remove the self._ensure_type call added for typing purposes that is causing the AssertionError in 1.0.0rc0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need something backported. to get back to the (dubious) 0.25.3 behaviour could just remove the self._ensure_type call added for typing purposes that is causing the AssertionError in 1.0.0rc0.

ok let's do that then

df.A.append(df)
msg = "to_append should be a Series or list/tuple of Series, got list"
with pytest.raises(TypeError, match=msg):
df.A.append(li)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use just the minimum to invoke TypeError

Suggested change
df.A.append(li)
df.A.append([df])

@simonjayhawkins simonjayhawkins added this to the 1.0.0 milestone Jan 18, 2020
@@ -2539,6 +2539,13 @@ def append(self, to_append, ignore_index=False, verify_integrity=False) -> "Seri
to_concat.extend(to_append)
else:
to_concat = [self, to_append]
for x in to_concat[1:]:
if isinstance(x, (pd.DataFrame,)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of checking if its a dataframe, would be better to check not an acceptable type, e.g. Series, list-like of Series.

@jreback jreback modified the milestone: 1.0.0 Jan 18, 2020
@hvardhan20 hvardhan20 requested a review from jreback January 19, 2020 07:35
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvardhan20 the direction of this PR should now be #31036 (comment) instead of what was proposed in #30975 (comment)

@@ -2462,7 +2462,7 @@ def searchsorted(self, value, side="left", sorter=None):
# -------------------------------------------------------------------
# Combination

def append(self, to_append, ignore_index=False, verify_integrity=False) -> "Series":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than deleting the annotation, can you use FrameOrSeriesUnion from pandas._typing instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins
FrameOrSeriesUnion is throwing this error during static analysis:
Performing static analysis using mypy
pandas/core/frame.py:2590: error: Incompatible types in assignment (expression has type "Union[DataFrame, Series]", variable has type "Series")

And FrameOrSeries is throwing this:
Performing static analysis using mypy
pandas/core/series.py:2505: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "FrameOrSeries")

I guess this is why there was no type hint initially for 0.25.3 version of append.
What do you think about this?

Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK perhaps we should keep this PR to just fixing the regression. so removing the return annotation is fine here.

@simonjayhawkins simonjayhawkins added Regression Functionality that used to work in a prior pandas version and removed Error Reporting Incorrect or improved errors from pandas labels Jan 19, 2020
Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -2462,7 +2462,7 @@ def searchsorted(self, value, side="left", sorter=None):
# -------------------------------------------------------------------
# Combination

def append(self, to_append, ignore_index=False, verify_integrity=False) -> "Series":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK perhaps we should keep this PR to just fixing the regression. so removing the return annotation is fine here.

@TomAugspurger
Copy link
Contributor

This seems fine for backport.

What's the plan for 1.0.1 / 1.1? For this to raise a TypeError? Do we have an issue for that?

@jreback
Copy link
Contributor

jreback commented Jan 24, 2020

thanks @hvardhan20

can you open an issue as @TomAugspurger suggest #31036 (comment)

jreback pushed a commit that referenced this pull request Jan 24, 2020
@hvardhan20 hvardhan20 deleted the series_append_fix branch January 24, 2020 20:46
@hvardhan20 hvardhan20 mentioned this pull request Feb 19, 2020
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AssertionError from Series.append(DataFrame) with 1.0.0rc0
5 participants