-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: AssertionError on Series.append(DataFrame) fix #30975 #31036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @hvardhan20! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-01-21 06:02:52 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hvardhan20 for the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hvardhan20 also need to run black pandas before a commit, see https://dev.pandas.io/docs/development/contributing.html#python-pep8-black
pandas/core/series.py
Outdated
@@ -2538,6 +2538,10 @@ def append(self, to_append, ignore_index=False, verify_integrity=False) -> "Seri | |||
to_concat = [self] | |||
to_concat.extend(to_append) | |||
else: | |||
if not isinstance(to_append, type(self)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think should allow a Series to be appended to a subclassed Series
if not isinstance(to_append, type(self)): | |
if not isinstance(to_append, Series): |
also need to move the check into above condition to check the items of a sequence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or instead of inside the else, loop over to_concat[1:] outside the else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the hint @simonjayhawkins.
Could you please suggest the better way to do this between the following 2 ways:
1)
if not isinstance(to_append, type(self)): | |
for x in to_concat[1:]: | |
if not isinstance(x, Series): | |
msg = ( | |
f"to_append should be a Series or list/tuple of Series, " | |
f"got {type(to_append).__name__}" | |
f'{(" of " + type(x).__name__) if isinstance(to_append, (list, tuple)) else ""}' | |
) | |
raise TypeError(msg) |
if not isinstance(to_append, type(self)): | |
if any(not isinstance(x, Series) for x in to_concat[1:]): | |
msg = ( | |
f"to_append should be a Series or list/tuple of Series, " | |
f"got {type(to_append).__name__}" | |
) | |
raise TypeError(msg) |
Both go after the else. The problem with 2nd way is we cannot get the type of the sequence element which raises the Error. If there's a better way to do this, please do share.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test case test_concatlike_same_dtypes is failing in pipeline, which is testing the appending of various data types and expecting to catch and raise a TypeError in Concatenator constructor.
With if not isinstance(x, Series):
to_append will catch & raise TypeError for all data types.
Hence test_concatlike_same_dtypes is failing.
I think we should consider the following change:
if not isinstance(to_append, type(self)): | |
for x in to_concat[1:]: | |
if isinstance(x, pd.DataFrame): | |
raise TypeError | |
What are your thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC correctly from the issue the behaviour of adding a DataFrame needs to be revisited, so I agree that maybe checking for a DataFrame maybe better than checking for not a Series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the simple loop of step 1 (fails faster) and the simplicity of message 2.
msg = "to_append should be a Series or list/tuple of Series, got " \ | ||
"<class 'pandas.core.frame.DataFrame'>" | ||
with pytest.raises(TypeError, match=msg): | ||
df.A.append(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also need to add test to check items in a list/tuple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an acceptable test case for testing items in sequence?
df.A.append(df) | |
df = pd.DataFrame({"A": [1, 2], "B": [3, 4]}) | |
li = [df.B, df] | |
msg = "to_append should be a Series or list/tuple of Series, got list of DataFrame" | |
with pytest.raises(TypeError, match=msg): | |
df.A.append(li) |
53f1c11
to
1e5e8e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hvardhan20. a few more suggestions otherwise lgtm.
pandas/core/series.py
Outdated
if isinstance(x, (pd.DataFrame,)): | ||
msg = ( | ||
f"to_append should be a Series or list/tuple of Series, " | ||
f"got {type(to_append).__name__}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may make the error message clearer in the case where the sequence contains a DataFame
f"got {type(to_append).__name__}" | |
f"got {type(x).__name__}" |
pandas/core/series.py
Outdated
@@ -2539,6 +2539,13 @@ def append(self, to_append, ignore_index=False, verify_integrity=False) -> "Seri | |||
to_concat.extend(to_append) | |||
else: | |||
to_concat = [self, to_append] | |||
for x in to_concat[1:]: | |||
if isinstance(x, (pd.DataFrame,)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(x, (pd.DataFrame,)): | |
if isinstance(x, ABCDataFrame): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of checking if its a dataframe, would be better to check not an acceptable type, e.g. Series, list-like of Series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC other incorrect types are caught by concat. This PR will be backported so just minimising the changes here to fix the regression (raises AssertionError) for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could catch the AssertionError and raise improved message instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no i would rather not backport this, let's actually fix this properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no i would rather not backport this, let's actually fix this properly.
and backport the proper fix?
I think we need something backported. to get back to the (dubious) 0.25.3 behaviour could just remove the self._ensure_type
call added for typing purposes that is causing the AssertionError in 1.0.0rc0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need something backported. to get back to the (dubious) 0.25.3 behaviour could just remove the self._ensure_type call added for typing purposes that is causing the AssertionError in 1.0.0rc0.
ok let's do that then
df.A.append(df) | ||
msg = "to_append should be a Series or list/tuple of Series, got list" | ||
with pytest.raises(TypeError, match=msg): | ||
df.A.append(li) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could use just the minimum to invoke TypeError
df.A.append(li) | |
df.A.append([df]) |
pandas/core/series.py
Outdated
@@ -2539,6 +2539,13 @@ def append(self, to_append, ignore_index=False, verify_integrity=False) -> "Seri | |||
to_concat.extend(to_append) | |||
else: | |||
to_concat = [self, to_append] | |||
for x in to_concat[1:]: | |||
if isinstance(x, (pd.DataFrame,)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of checking if its a dataframe, would be better to check not an acceptable type, e.g. Series, list-like of Series.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hvardhan20 the direction of this PR should now be #31036 (comment) instead of what was proposed in #30975 (comment)
@@ -2462,7 +2462,7 @@ def searchsorted(self, value, side="left", sorter=None): | |||
# ------------------------------------------------------------------- | |||
# Combination | |||
|
|||
def append(self, to_append, ignore_index=False, verify_integrity=False) -> "Series": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than deleting the annotation, can you use FrameOrSeriesUnion from pandas._typing instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins
FrameOrSeriesUnion is throwing this error during static analysis:
Performing static analysis using mypy
pandas/core/frame.py:2590: error: Incompatible types in assignment (expression has type "Union[DataFrame, Series]", variable has type "Series")
And FrameOrSeries is throwing this:
Performing static analysis using mypy
pandas/core/series.py:2505: error: Incompatible return value type (got "Union[DataFrame, Series]", expected "FrameOrSeries")
I guess this is why there was no type hint initially for 0.25.3 version of append.
What do you think about this?
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK perhaps we should keep this PR to just fixing the regression. so removing the return annotation is fine here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -2462,7 +2462,7 @@ def searchsorted(self, value, side="left", sorter=None): | |||
# ------------------------------------------------------------------- | |||
# Combination | |||
|
|||
def append(self, to_append, ignore_index=False, verify_integrity=False) -> "Series": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK perhaps we should keep this PR to just fixing the regression. so removing the return annotation is fine here.
This seems fine for backport. What's the plan for 1.0.1 / 1.1? For this to raise a TypeError? Do we have an issue for that? |
thanks @hvardhan20 can you open an issue as @TomAugspurger suggest #31036 (comment) |
#30975 (#31267) Co-authored-by: Harshavardhan Bachina <[email protected]>
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Series.append() does not throw AssertionError anymore. This fix performs a precheck before proceeding with concatenation. Tests have passed. Whatsnew entry added.
Thanks!