Skip to content

DOC: fix to_numpy explanation for tz aware data #24595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 3, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 0 additions & 21 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,27 +99,6 @@ are two possibly useful representations:

Timezones may be preserved with ``dtype=object``

.. ipython:: python

ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
ser.to_numpy(dtype=object)

Or thrown away with ``dtype='datetime64[ns]'``

ser.to_numpy(dtype="datetime64[ns]")

:meth:`~Series.to_numpy` gives some control over the ``dtype`` of the
resulting :class:`ndarray`. For example, consider datetimes with timezones.
NumPy doesn't have a dtype to represent timezone-aware datetimes, so there
are two possibly useful representations:

1. An object-dtype :class:`ndarray` with :class:`Timestamp` objects, each
with the correct ``tz``
2. A ``datetime64[ns]`` -dtype :class:`ndarray`, where the values have
been converted to UTC and the timezone discarded

Timezones may be preserved with ``dtype=object``
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger can you double check here? But it seems this section was duplicated (added each time by two related PRs, probably some merging/rebasing left-over)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


.. ipython:: python

ser = pd.Series(pd.date_range('2000', periods=2, tz="CET"))
Expand Down
12 changes: 8 additions & 4 deletions doc/source/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2425,21 +2425,25 @@ a convert on an aware stamp.
.. note::

Using :meth:`Series.to_numpy` on a ``Series``, returns a NumPy array of the data.
These values are converted to UTC, as NumPy does not currently support timezones (even though it is *printing* in the local timezone!).
NumPy does not currently support timezones (even though it is *printing* in the local timezone!),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the statement about printing is true anymore.

therefore an object array of Timestamps is returned for timezone aware data:

.. ipython:: python

s_naive.to_numpy()
s_aware.to_numpy()

Further note that once converted to a NumPy array these would lose the tz tenor.
By converting to an object array of Timestamps, it preserves the timezone
information. For example, when converting back to a Series:

.. ipython:: python

pd.Series(s_aware.to_numpy())

However, these can be easily converted:
However, if you want an actual NumPy ``datetime64[ns]`` array (with the values
converted to UTC) instead of an array of objects, you can specify the
``dtype`` argument:

.. ipython:: python

pd.Series(s_aware.to_numpy()).dt.tz_localize('UTC').dt.tz_convert('US/Eastern')
s_aware.to_numpy(dtype='datetime64[ns]')
4 changes: 2 additions & 2 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -899,7 +899,6 @@ def to_numpy(self, dtype=None, copy=False):
``to_numpy()`` will return a NumPy array and the categorical dtype
will be lost.


For NumPy dtypes, this will be a reference to the actual data stored
in this Series or Index (assuming ``copy=False``). Modifying the result
in place will modify the data stored in the Series or Index (not that
Expand All @@ -910,7 +909,7 @@ def to_numpy(self, dtype=None, copy=False):
expensive. When you need a no-copy reference to the underlying data,
:attr:`Series.array` should be used instead.

This table lays out the different dtypes and return types of
This table lays out the different dtypes and default return types of
``to_numpy()`` for various dtypes within pandas.

================== ================================
Expand All @@ -920,6 +919,7 @@ def to_numpy(self, dtype=None, copy=False):
period ndarray[object] (Periods)
interval ndarray[object] (Intervals)
IntegerNA ndarray[object]
datetime64[ns] datetime64[ns]
datetime64[ns, tz] ndarray[object] (Timestamps)
================== ================================

Expand Down