Skip to content

Added test test_datetimeField_after_setitem for issue #6942 #28790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
35 changes: 35 additions & 0 deletions pandas/tests/generic/test_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

import numpy as np
import pytest
from datetime import datetime
import time

import pandas.util._test_decorators as td

Expand Down Expand Up @@ -287,3 +289,36 @@ def test_deepcopy_empty(self):
empty_frame_copy = deepcopy(empty_frame)

self._compare(empty_frame_copy, empty_frame)

def test_datetimeField_after_setitem(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention the use of at in the title: test_datetime_setitem_with_at

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

# This test covers the unexpected behaviour of datetimeField when using
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datetimeField -> datetime64 column ?

# setitem on another column as reported in issue #6942

start = pd.to_datetime("20140401")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use pd.Timestamp here


df = pd.DataFrame(
index=pd.date_range(start, periods=1), columns=["timenow", "Live"]
)

df.at[start, "timenow"] = datetime.today() # initial time.
time1 = df.at[start, "timenow"]

time.sleep(1) # sleep time of 1 second in between assignments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sleep calls are not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests were failing with AssertionError, so added those.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why were AssertionErrors being raised?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the same code on my computer and got the following results:
time1: 2019-10-04 23:23:31.911241
time2: 2019-10-04 23:23:31.911534
time3: 2019-10-04 23:23:31.912060
that is a time difference of 10^(-3) between the time1 and time2.Now if both the statements are executed within a time frame of <= 10^(-6) then the two will match. So that's why I thought of adding the sleep in between time1, time2 and time3 to ascertain this doesn't happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests should generally be

result = pd.DataFrame(index=pd.date_range(start,periods=1), columns=['timenow','Live'])
var = datetime.today()
result.at[start,'timenow'] = var
result.Live = True
expected = pd.DataFrame([[var, True]],index=pd.date_range(start,periods=1), columns=['timenow','Live'])
tm.assert_frame_equal(result, expected) 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I misread the problem and was writing the test to check if the timenow column changes value after setting the Live column with "at".
So, what the test should check is that the dtype of timenow column remains unchanged after setting the Live column to True with at. For that I could use the assert_series_equal and check the dtype of timenow before and after.
Is that right?

Copy link
Member

@mroeschke mroeschke Oct 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should check that the timenow column changes once df.Live = True

i.e. the example in #6942 (comment) is the fixed behavior that we want to test for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some points:

  1. For the general template of tests that is used in pandas/tests, most of them assert that either two DataFrames or Series are equal and call the predefined functions tm.assert_something_equal. However in this case we assert the column actually changes value and I couldn't find a function for such a case in tm and therefore wrote the individual assert statements for the different values in the timenow column(asserting that they are not equal).

  2. Without the sleep statements inserted, some checks were failing. When I viewed the log of the failed tests I observed that the error was because
    assert not time1 == time2
    was evaluating to
    assert not datetime.datetime(2019, 10, 4, 16, 12, 38, 392372) == datetime.datetime(2019, 10, 4, 16, 12, 38, 392372)
    So I realized that this could only occur if both the statements are run within a time-frame < 10^(-6). When I added the sleep statements the checks passed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will still use tm.assert_frame_equal. You will have to construct the expected dataframe from scratch

In [4]: result = pd.DataFrame(index=pd.date_range(start,periods=1), columns=['timenow','Live'])

In [5]: result.at[start,'timenow'] = datetime.today() # initial value


In [7]: new_date =  datetime.today() 

In [9]: result.Live = True

In [10]: result.at[start,'timenow'] = new_date 

In [11]: expected = pd.DataFrame([[new_date, True]], index=pd.date_range(start,periods=1), columns=['timenow','Live'])

In [12]: tm.assert_frame_equal(result, expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, couldn't come up with that! Will make the necessary changes and resubmit the PR.


df.at[
start, "timenow"
] = datetime.today() # modified time before 'Live' column is set.
time2 = df.at[start, "timenow"]

df.Live = True # setting the 'Live' column to True.

time.sleep(1) # sleep time of 1 second in between assignments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These sleep calls are not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some checks were failing with AssertionError, so I added those


df.at[
start, "timenow"
] = datetime.today() # modified time after 'Live' column is set.
time3 = df.at[start, "timenow"]

assert not time1 == time2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would like to construct 2 dataframes here and use assert_frame_equal to compare them

result = ...
expected = ...
tm.assert_frame_equal(result, expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of the same but the whole point of the test is to make sure that the datetimeField changes with each assignment and as such would require to assert that frames are not equal, that's went with this. Couldn't find a more elegant way round this.

assert not time2 == time3
assert not time3 == time1