Public Data Followups

leftover from https://github.com/pandas-dev/pandas/pull/23623

1. Signature for `.to_numpy()`: @jorisvandenbossche proposed `copy=True`, which I think is good. Beyond that, we may want to control the "fidelity" of the conversion. Should `Series[datetime64[ns, tz]].to_numpy()` be an ndarray of Timestamp objets or an ndarray of dateimte64[ns] normalized to UTC (by default, and should we allow that to be controlled)? Can we hope for a set of keywords appropriate for all subtypes, or do we need to allow `kwargs`? *Perhaps* `to_numpy(copy=True, dtype=None)` will suffice?

2. Make `.array` always an ExtensionArray (via @shoyer). This gives pandas a bit more freedom going forward, since the *type* of `.array` will be stable if / when we flip over to Arrow arrays by default. We'll just swap out the data backing the ExtensionArray. A generic "NumpyBackedExtensionArray" is pretty easy to write (I had one in cyberpandas). My main concern here is that it makes the statement "`.array` is the actual data stored in the Series / Index" falseish, but that's OK.

3. Revert the breaking changes to `Series.values` for `period` and `interval` dtype data (cc @jschendel)? I think we should do this.

```python
In [3]: sper = pd.Series(pd.period_range('2000', periods=4))

In [4]: sper.values  # on master this is the PeriodArray
Out[4]:
array([Period('2000-01-01', 'D'), Period('2000-01-02', 'D'),
       Period('2000-01-03', 'D'), Period('2000-01-04', 'D')], dtype=object)

In [5]: sper.array
Out[5]:
<PeriodArray>
['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04']
Length: 4, dtype: period[D]
```

In terms of LOC, it's a simple change

```diff
@@ -1984,6 +1984,16 @@ class ExtensionBlock(NonConsolidatableMixIn, Block):
         return blocks, mask


+class ObjectValuesExtensionBlock(ExtensionBlock):
+    """Block for Interval / Period data.
+
+    Only needed for backwards compatability to ensure that
+    Series[T].values is an ndarray of objects.
+    """
+    def external_values(self, dtype=None):
+        return self.values.astype(object)
+
+
 class NumericBlock(Block):
     __slots__ = ()
     is_numeric = True
@@ -3004,6 +3014,8 @@ def get_block_type(values, dtype=None):

     if is_categorical(values):
         cls = CategoricalBlock
+    elif is_interval_dtype(dtype) or is_period_dtype(dtype):
+        cls = ObjectValuesExtensionBlock
```

There are a couple other places (like `Series._ndarray_values`) that assume "extension dtype means `.values` is an ExtensionArray", which I've surfaced on my DatetimeArray branch. We'll need to update those to use `.array` anyway.

---

- [x] `Series.to_numpy()` signature
- [ ] `Series.array` is always an EA
- [x] Revert breaking changes to `Series.values` for Period / Interval (#24163)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Public Data Followups #23995

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Public Data Followups #23995

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions