Skip to content

BUG: NaT instead of error for timestamp concat with None #53042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

srkds
Copy link
Contributor

@srkds srkds commented May 2, 2023

@srkds
Copy link
Contributor Author

srkds commented May 3, 2023

I missed adding an entry in the whatsnew file. So will add that, any other suggestions?

Thanks!

@@ -2358,7 +2359,7 @@ def _preprocess_slice_or_indexer(
def make_na_array(dtype: DtypeObj, shape: Shape, fill_value) -> ArrayLike:
if isinstance(dtype, DatetimeTZDtype):
# NB: exclude e.g. pyarrow[dt64tz] dtypes
i8values = np.full(shape, fill_value._value)
i8values = np.full(shape, NaT.value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason a test is failing is because in cases where fill_value is not None you need to keep it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the reason a test is failing is because in cases where fill_value is not None you need to keep it

Suggested change
i8values = np.full(shape, NaT.value)
if fill_value is None:
i8values = np.full(shape, NaT.value)
else:
i8values = np.full(shape, fill_value._value)
  1. Do you think this change could fulfill that?
  2. Depreciation warning
    I got a deprecation warning while debugging.
FutureWarning: The behavior of DataFrame concatenation with all-NA entries is deprecated. In a future version, this will no longer exclude all-NA columns when determining the result dtypes. To retain the old behavior, cast the all-NA columns to the desired dtype before the concat operation.

So I tried re-running the example by changing None dtype to datetime64[ns, UTC]. It worked and didn't raise any errors.

>>> data = [{'A': None}]
>>> df = pd.DataFrame(data, dtype="datetime64[ns, UTC]")
>>> data_2 = [{'A': pd.to_datetime("1990-12-20 00:00:00+00:00")}]
>>> df_2 = pd.DataFrame(data_2)
>>> # df.dtypes
>>> pd.concat([df, df_2])

# o/p
	A
0	NaT
0	1990-12-20 00:00:00+00:00

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try Timestamp(fill_value)._value

@mroeschke mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 4, 2023
@srkds
Copy link
Contributor Author

srkds commented May 8, 2023

Hi,
Any suggestions or changes from my side?

@srkds
Copy link
Contributor Author

srkds commented May 10, 2023

One check is failing. Is there anything that I'm skipping?

@srkds srkds requested a review from jbrockmendel May 12, 2023 15:24
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jun 12, 2023
@jbrockmendel
Copy link
Member

One check is failing. Is there anything that I'm skipping?

I think unrelated. can you merge main and see if that does it

@@ -2405,7 +2406,7 @@ def _preprocess_slice_or_indexer(
def make_na_array(dtype: DtypeObj, shape: Shape, fill_value) -> ArrayLike:
if isinstance(dtype, DatetimeTZDtype):
# NB: exclude e.g. pyarrow[dt64tz] dtypes
i8values = np.full(shape, fill_value._value)
i8values = np.full(shape, Timestamp(fill_value)._value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this may mess up if your dtype has a different unit than the fill_value. can you add a test where this is the case? also needs a test for the original bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also needs a test for the original bug

My bad! I forgot to add that. Now added that👍
Thanks for the reminding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this may mess up if your dtype has a different unit than the fill_value. can you add a test where this is the case?

Are you talking about something like this? where the value will be timestamp but its dtype will be different?

df = pd.DataFrame([{'A':None}])
df_2 = pd.DataFrame([{'A': pd.to_datetime("1990-12-20 00:00:00+00:00")}], dtype="int32")
cd = pd.concat([df, df_2])

sorry! if I did not get it correctly. Could you please explain it in detail?

@aph3rson
Copy link

I'm running into issues related to #52093 - it seems like this PR has stalled, and should fix the issue we're hitting.
@jbrockmendel is there anything else needed for this PR, or can this be merged?

@jbrockmendel
Copy link
Member

can you merge main and ill take another look

@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: AttributeError raised with pd.concat between a None and Timestamp
4 participants