Skip to content

BUG: assert_frame_equal exception for datetime #37609 #38157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 2, 2020

Conversation

ssortman
Copy link
Contributor

@ssortman ssortman commented Nov 29, 2020

closes #37609

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

Working with @RichardWong8 on this issue.

This addresses issue #37609 where a check needs_i8_conversion(left.dtype) or needs_i8_conversion(right.dtype) in assert_series_equal caused an assert_extension_array on left._values and right._values when DatetimeArray or TimedeltaArray were used. This caused an exception to be raised when either the right or left was not an instance of ExtensionArray but was used with DatetimeArray or TimedeltaArray.

This would have been covered in the earlier check is_extension_array_dtype(left.dtype) and is_extension_array_dtype(right.dtype), however, DatetimeArray and TimedeltaArray are still experimental and their dtypes are not instances of ExtensionDtype subclass.

When using assert_frame_equal on two DataFrames that are equal, besides for their dtypes, one dtype being datetime64[ns] and the other not being an extension array, the code raises an exception due to an assertion that checks whether both are extension arrays. This fix addresses issue pandas-dev#37609.
is_extension_array_dtype(left.dtype) and needs_i8_conversion(right.dtype)
) or (is_extension_array_dtype(right.dtype) and needs_i8_conversion(left.dtype)):
# If we have an extension array and Datetimearray / TimedeltaArray, we still
# want to assert_extension_array_equal even though Datetimearray and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not correct, the dtypes of a DTA or TDA are not EA dtypes (well except a tz-aware DTA).

    >>> is_extension_type(pd.DatetimeIndex([1, 2, 3]))
    False
    >>> is_extension_type(pd.DatetimeIndex([1, 2, 3], tz="US/Eastern"))
    True

the check is ok, its the text that needs updating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jreback !
Thanks for the quick response. Are you saying that just the comment needs to be changed?
Maybe to:

If we have an extension array dtype and Datetime dtype / Timedelta dtype..

I'm a bit confused otherwise because I say later in the comment that

Datetimearray and TimedeltaArray dtypes are not an instance of ExtensionArraydtype subclass

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment again. its not about DTA or TDA themselve, rather their dtypes

Copy link

@RichardWong8 RichardWong8 Nov 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
Would this suffice?

If we have an array of dtype EA and a DTA/TDA (whose dtype is not of EA), we still perform assert_extension_array_equal

@@ -272,6 +272,12 @@ def test_assert_frame_equal_ignore_extension_dtype_mismatch(right_dtype):
tm.assert_frame_equal(left, right, check_dtype=False)


def test_assert_frame_equal_datetime_dtype_mismatch():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parameterize over timedelta, datetime64[ns, UTC], Period[D]

@jreback jreback added Testing pandas testing functions or related to the test suite Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. labels Nov 29, 2020
Comment on lines 1459 to 1463
elif (
is_extension_array_dtype(left.dtype) and needs_i8_conversion(right.dtype)
) or (is_extension_array_dtype(right.dtype) and needs_i8_conversion(left.dtype)):
# If we have an extension array and Datetimearray / TimedeltaArray, we still
# want to assert_extension_array_equal even though Datetimearray and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot going on here.
I suggest extracting a function checking for this condition, with a proper docstring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may also want to reference original issue in the comment or docstring (#37609).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay! Will do.

@@ -1866,6 +1873,18 @@ def assert_copy(iter1, iter2, **eql_kwargs):
assert elem1 is not elem2, msg


def is_ExtensionArrayDtype_and_needs_i8_conversion(left_dtype, right_dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename to

is_extension_array_dtype_and_needs_i8_conversion (e.g. all lowercase)

ideally type left_dtype, right_dtype (I think we type these in other places)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and type the return type as bool

@@ -1866,6 +1873,18 @@ def assert_copy(iter1, iter2, **eql_kwargs):
assert elem1 is not elem2, msg


def is_ExtensionArrayDtype_and_needs_i8_conversion(left_dtype, right_dtype):
"""
Checks that we have the combination of an ExtensionArraydtype and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would just check a single dtype and do the or condition in on L1459

@jreback
Copy link
Contributor

jreback commented Dec 2, 2020

also pls add a whatsnew note for 1.2, bug fixes other is fine

@jreback jreback added this to the 1.2 milestone Dec 2, 2020
also added change to whatsnew note for 1.2
@jreback jreback merged commit bda4bc3 into pandas-dev:master Dec 2, 2020
@jreback
Copy link
Contributor

jreback commented Dec 2, 2020

thanks @ssortman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pandas.testing.assert_frame_equal with check_dtype=False unexpectedly raises an exception
4 participants