Skip to content

timedelta on datetime df, series arithmetic #59844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ Other enhancements
- :meth:`pandas.concat` will raise a ``ValueError`` when ``ignore_index=True`` and ``keys`` is not ``None`` (:issue:`59274`)
- :meth:`str.get_dummies` now accepts a ``dtype`` parameter to specify the dtype of the resulting DataFrame (:issue:`47872`)
- Multiplying two :class:`DateOffset` objects will now raise a ``TypeError`` instead of a ``RecursionError`` (:issue:`59442`)
- Arithmetic operations on :class:`Datetime` dataframe and series objects will result in :class:`Timedelta` values instead of ``TypeError`` (:issue: `59529`)
- Restore support for reading Stata 104-format and enable reading 103-format dta files (:issue:`58554`)
- Support passing a :class:`Iterable[Hashable]` input to :meth:`DataFrame.drop_duplicates` (:issue:`59237`)
- Support reading Stata 102-format (Stata 1) dta files (:issue:`58978`)
Expand Down
2 changes: 2 additions & 0 deletions pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -997,6 +997,8 @@ def _make_na_block(
dtype = interleaved_dtype([blk.dtype for blk in self.blocks])
if dtype is not None and np.issubdtype(dtype.type, np.floating):
fill_value = dtype.type(fill_value)
if dtype is not None and np.issubdtype(dtype.type, np.datetime64):
fill_value = np.datetime64("NaT")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also pass an appropriate unit to np.datetime64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I was wondering what do you mean by unit?


shape = (len(placement), self.shape[1])

Expand Down
33 changes: 30 additions & 3 deletions pandas/tests/frame/test_arithmetic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1038,9 +1038,9 @@ def test_frame_with_frame_reindex(self):
[
(1, "i8"),
(1.0, "f8"),
(2**63, "f8"),
(2 ** 63, "f8"),
(1j, "complex128"),
(2**63, "complex128"),
(2 ** 63, "complex128"),
(True, "bool"),
(np.timedelta64(20, "ns"), "<m8[ns]"),
(np.datetime64(20, "ns"), "<M8[ns]"),
Expand Down Expand Up @@ -1147,6 +1147,33 @@ def test_arithmetic_midx_cols_different_dtypes_different_order(self):
expected = DataFrame([[-1, 1], [-1, 1]], columns=midx)
tm.assert_frame_equal(result, expected)

@pytest.mark.xfail(reason="NaT op NaT results in datetime instead of timedelta")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel in this test, the comparison fails because the result dtype value of NaT is datetime64 and expected dtype value of NaT is Timedelta. I would expect NaT - NaT to result in Timedelta. Is this behavior expected?

def test_arithmetic_datetime_df_series(self):
# GH#59529
df_datetime = DataFrame([[1, 2], [3, 4]]).astype("datetime64[ns]")
ser_datetime = Series([5, 6, 7]).astype("datetime64[ns]")
result = df_datetime - ser_datetime
expected = DataFrame(
[
[pd.Timedelta(-4), pd.Timedelta(-4), pd.Timedelta(pd.NaT)],
[pd.Timedelta(-2), pd.Timedelta(-2), pd.Timedelta(pd.NaT)],
]
)
tm.assert_frame_equal(result, expected)

@pytest.mark.xfail(reason="NaT op NaT results in datetime instead of timedelta")
def test_arithmetic_timestamp_timedelta(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC these tests are mainly about the alignment behavior, so they belong in something like tests.frame.test_arithmetic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are currently in pandas/tests/frame/test_arithmetic.py. Should they be moved somewhere else?

# GH#59529
df_timestamp = DataFrame([pd.Timestamp(1)])
ser_timedelta = Series([pd.Timedelta(2), pd.Timedelta(3)])
result = df_timestamp - ser_timedelta
expected = DataFrame(
[
[pd.Timestamp(-1), pd.NaT],
]
)
tm.assert_frame_equal(result, expected)


def test_frame_with_zero_len_series_corner_cases():
# GH#28600
Expand Down Expand Up @@ -1913,7 +1940,7 @@ def test_pow_with_realignment():
left = DataFrame({"A": [0, 1, 2]})
right = DataFrame(index=[0, 1, 2])

result = left**right
result = left ** right
expected = DataFrame({"A": [np.nan, 1.0, np.nan]})
tm.assert_frame_equal(result, expected)

Expand Down
Loading