-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: assert_frame_equal exception for datetime #37609 #38157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When using assert_frame_equal on two DataFrames that are equal, besides for their dtypes, one dtype being datetime64[ns] and the other not being an extension array, the code raises an exception due to an assertion that checks whether both are extension arrays. This fix addresses issue pandas-dev#37609.
pandas/_testing.py
Outdated
is_extension_array_dtype(left.dtype) and needs_i8_conversion(right.dtype) | ||
) or (is_extension_array_dtype(right.dtype) and needs_i8_conversion(left.dtype)): | ||
# If we have an extension array and Datetimearray / TimedeltaArray, we still | ||
# want to assert_extension_array_equal even though Datetimearray and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not correct, the dtypes of a DTA or TDA are not EA dtypes (well except a tz-aware DTA).
>>> is_extension_type(pd.DatetimeIndex([1, 2, 3]))
False
>>> is_extension_type(pd.DatetimeIndex([1, 2, 3], tz="US/Eastern"))
True
the check is ok, its the text that needs updating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @jreback !
Thanks for the quick response. Are you saying that just the comment needs to be changed?
Maybe to:
If we have an extension array dtype and Datetime dtype / Timedelta dtype..
I'm a bit confused otherwise because I say later in the comment that
Datetimearray and TimedeltaArray dtypes are not an instance of ExtensionArraydtype subclass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment again. its not about DTA or TDA themselve, rather their dtypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback
Would this suffice?
If we have an array of dtype EA and a DTA/TDA (whose dtype is not of EA), we still perform assert_extension_array_equal
@@ -272,6 +272,12 @@ def test_assert_frame_equal_ignore_extension_dtype_mismatch(right_dtype): | |||
tm.assert_frame_equal(left, right, check_dtype=False) | |||
|
|||
|
|||
def test_assert_frame_equal_datetime_dtype_mismatch(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you parameterize over timedelta, datetime64[ns, UTC], Period[D]
pandas/_testing.py
Outdated
elif ( | ||
is_extension_array_dtype(left.dtype) and needs_i8_conversion(right.dtype) | ||
) or (is_extension_array_dtype(right.dtype) and needs_i8_conversion(left.dtype)): | ||
# If we have an extension array and Datetimearray / TimedeltaArray, we still | ||
# want to assert_extension_array_equal even though Datetimearray and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot going on here.
I suggest extracting a function checking for this condition, with a proper docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may also want to reference original issue in the comment or docstring (#37609).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay! Will do.
Also added parameterization over test case.
pandas/_testing.py
Outdated
@@ -1866,6 +1873,18 @@ def assert_copy(iter1, iter2, **eql_kwargs): | |||
assert elem1 is not elem2, msg | |||
|
|||
|
|||
def is_ExtensionArrayDtype_and_needs_i8_conversion(left_dtype, right_dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you rename to
is_extension_array_dtype_and_needs_i8_conversion (e.g. all lowercase)
ideally type left_dtype, right_dtype (I think we type these in other places)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and type the return type as bool
@@ -1866,6 +1873,18 @@ def assert_copy(iter1, iter2, **eql_kwargs): | |||
assert elem1 is not elem2, msg | |||
|
|||
|
|||
def is_ExtensionArrayDtype_and_needs_i8_conversion(left_dtype, right_dtype): | |||
""" | |||
Checks that we have the combination of an ExtensionArraydtype and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would just check a single dtype and do the or condition in on L1459
also pls add a whatsnew note for 1.2, bug fixes other is fine |
also added change to whatsnew note for 1.2
thanks @ssortman |
closes #37609
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Working with @RichardWong8 on this issue.
This addresses issue #37609 where a check
needs_i8_conversion(left.dtype) or needs_i8_conversion(right.dtype)
in assert_series_equal caused an assert_extension_array on left._values and right._values when DatetimeArray or TimedeltaArray were used. This caused an exception to be raised when either the right or left was not an instance of ExtensionArray but was used with DatetimeArray or TimedeltaArray.This would have been covered in the earlier check
is_extension_array_dtype(left.dtype) and is_extension_array_dtype(right.dtype)
, however, DatetimeArray and TimedeltaArray are still experimental and their dtypes are not instances of ExtensionDtype subclass.