Skip to content

Commit cd24fa9

Browse files
committed
ENH: add origin to to_datetime
closes #11276 closes #11745 superseded #11470 Author: Jeff Reback <[email protected]> Author: Sumit Binnani <[email protected]> Closes #15828 from jreback/datetime-unit and squashes the following commits: ebb4acd [Jeff Reback] doc fixes & cleanup 209591a [Jeff Reback] bug fix 56663a5 [Jeff Reback] add Timedelta floordiv ops a24e88c [Jeff Reback] rename epoch -> unix 6a8a779 [Jeff Reback] update docs / tests ad7356e [Sumit Binnani] BUG: Series creation with datetime64 with non-ns unit as object dtype
1 parent 67cc021 commit cd24fa9

File tree

8 files changed

+317
-26
lines changed

8 files changed

+317
-26
lines changed

doc/source/timeseries.rst

+25-1
Original file line numberDiff line numberDiff line change
@@ -252,7 +252,8 @@ Epoch Timestamps
252252

253253
It's also possible to convert integer or float epoch times. The default unit
254254
for these is nanoseconds (since these are how ``Timestamp`` s are stored). However,
255-
often epochs are stored in another ``unit`` which can be specified:
255+
often epochs are stored in another ``unit`` which can be specified. These are computed
256+
from the starting point specified by the :ref:`Origin Parameter <timeseries.origin>`.
256257

257258
Typical epoch stored units
258259

@@ -276,6 +277,29 @@ These *work*, but the results may be unexpected.
276277

277278
Epoch times will be rounded to the nearest nanosecond.
278279

280+
.. _timeseries.origin:
281+
282+
Using the Origin Parameter
283+
~~~~~~~~~~~~~~~~~~~~~~~~~~
284+
285+
.. versionadded:: 0.20.0
286+
287+
Using the ``origin`` parameter, one can specify an alternative starting point for creation
288+
of a ``DatetimeIndex``.
289+
290+
Start with 1960-01-01 as the starting date
291+
292+
.. ipython:: python
293+
294+
pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp('1960-01-01'))
295+
296+
The default is set at ``origin='unix'``, which defaults to ``1970-01-01 00:00:00``.
297+
Commonly called 'unix epoch' or POSIX time.
298+
299+
.. ipython:: python
300+
301+
pd.to_datetime([1, 2, 3], unit='D')
302+
279303
.. _timeseries.daterange:
280304

281305
Generating Ranges of Timestamps

doc/source/whatsnew/v0.20.0.txt

+22-2
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@ Check the :ref:`API Changes <whatsnew_0200.api_breaking>` and :ref:`deprecations
2727
New features
2828
~~~~~~~~~~~~
2929

30-
3130
.. _whatsnew_0200.enhancements.dataio_dtype:
3231

3332
``dtype`` keyword for data IO
@@ -55,6 +54,27 @@ fixed-width text files, and :func:`read_excel` for parsing Excel files.
5554
pd.read_fwf(StringIO(data)).dtypes
5655
pd.read_fwf(StringIO(data), dtype={'a':'float64', 'b':'object'}).dtypes
5756

57+
.. _whatsnew_0120.enhancements.datetime_origin:
58+
59+
``.to_datetime()`` has gained an ``origin`` parameter
60+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
61+
62+
:func:`to_datetime` has gained a new parameter, ``origin``, to define a reference date
63+
from where to compute the resulting ``DatetimeIndex``. (:issue:`11276`, :issue:`11745`)
64+
65+
Start with 1960-01-01 as the starting date
66+
67+
.. ipython:: python
68+
69+
pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp('1960-01-01'))
70+
71+
The default is set at ``origin='unix'``, which defaults to ``1970-01-01 00:00:00``.
72+
Commonly called 'unix epoch' or POSIX time.
73+
74+
.. ipython:: python
75+
76+
pd.to_datetime([1, 2, 3], unit='D')
77+
5878
.. _whatsnew_0200.enhancements.groupby_access:
5979

6080
Groupby Enhancements
@@ -317,7 +337,7 @@ Other Enhancements
317337
- ``pd.DataFrame.to_latex`` and ``pd.DataFrame.to_string`` now allow optional header aliases. (:issue:`15536`)
318338
- Re-enable the ``parse_dates`` keyword of ``read_excel`` to parse string columns as dates (:issue:`14326`)
319339
- Added ``.empty`` property to subclasses of ``Index``. (:issue:`15270`)
320-
340+
- Enabled floor division for ``Timedelta`` and ``TimedeltaIndex`` (:issue:`15828`)
321341
- ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`)
322342
- ``pandas.io.json.json_normalize()`` with an empty ``list`` will return an empty ``DataFrame`` (:issue:`15534`)
323343
- ``pandas.io.json.json_normalize()`` has gained a ``sep`` option that accepts ``str`` to separate joined fields; the default is ".", which is backward compatible. (:issue:`14883`)

pandas/_libs/tslib.pyx

+35-4
Original file line numberDiff line numberDiff line change
@@ -3073,6 +3073,7 @@ class Timedelta(_Timedelta):
30733073
return np.timedelta64(self.value, 'ns')
30743074

30753075
def _validate_ops_compat(self, other):
3076+
30763077
# return True if we are compat with operating
30773078
if _checknull_with_nat(other):
30783079
return True
@@ -3179,11 +3180,41 @@ class Timedelta(_Timedelta):
31793180
__div__ = __truediv__
31803181
__rdiv__ = __rtruediv__
31813182

3182-
def _not_implemented(self, *args, **kwargs):
3183-
return NotImplemented
3183+
def __floordiv__(self, other):
3184+
3185+
if hasattr(other, 'dtype'):
3186+
3187+
# work with i8
3188+
other = other.astype('m8[ns]').astype('i8')
3189+
3190+
return self.value // other
31843191

3185-
__floordiv__ = _not_implemented
3186-
__rfloordiv__ = _not_implemented
3192+
# integers only
3193+
if is_integer_object(other):
3194+
return Timedelta(self.value // other, unit='ns')
3195+
3196+
if not self._validate_ops_compat(other):
3197+
return NotImplemented
3198+
3199+
other = Timedelta(other)
3200+
if other is NaT:
3201+
return np.nan
3202+
return self.value // other.value
3203+
3204+
def __rfloordiv__(self, other):
3205+
if hasattr(other, 'dtype'):
3206+
3207+
# work with i8
3208+
other = other.astype('m8[ns]').astype('i8')
3209+
return other // self.value
3210+
3211+
if not self._validate_ops_compat(other):
3212+
return NotImplemented
3213+
3214+
other = Timedelta(other)
3215+
if other is NaT:
3216+
return NaT
3217+
return other.value // self.value
31873218

31883219
def _op_unary_method(func, name):
31893220

pandas/tests/indexes/datetimes/test_tools.py

+117
Original file line numberDiff line numberDiff line change
@@ -1515,3 +1515,120 @@ def test_normalize_date():
15151515

15161516
result = normalize_date(value)
15171517
assert (result == datetime(2012, 9, 7))
1518+
1519+
1520+
@pytest.fixture(params=['D', 's', 'ms', 'us', 'ns'])
1521+
def units(request):
1522+
return request.param
1523+
1524+
1525+
@pytest.fixture
1526+
def epoch_1960():
1527+
# for origin as 1960-01-01
1528+
return Timestamp('1960-01-01')
1529+
1530+
1531+
@pytest.fixture
1532+
def units_from_epochs():
1533+
return list(range(5))
1534+
1535+
1536+
@pytest.fixture(params=[epoch_1960(), epoch_1960().to_datetime(),
1537+
epoch_1960().to_datetime64(),
1538+
str(epoch_1960())])
1539+
def epochs(request):
1540+
return request.param
1541+
1542+
1543+
@pytest.fixture
1544+
def julian_dates():
1545+
return pd.date_range('2014-1-1', periods=10).to_julian_date().values
1546+
1547+
1548+
class TestOrigin(object):
1549+
1550+
def test_to_basic(self, julian_dates):
1551+
# gh-11276, gh-11745
1552+
# for origin as julian
1553+
1554+
result = Series(pd.to_datetime(
1555+
julian_dates, unit='D', origin='julian'))
1556+
expected = Series(pd.to_datetime(
1557+
julian_dates - pd.Timestamp(0).to_julian_date(), unit='D'))
1558+
assert_series_equal(result, expected)
1559+
1560+
result = Series(pd.to_datetime(
1561+
[0, 1, 2], unit='D', origin='unix'))
1562+
expected = Series([Timestamp('1970-01-01'),
1563+
Timestamp('1970-01-02'),
1564+
Timestamp('1970-01-03')])
1565+
assert_series_equal(result, expected)
1566+
1567+
# default
1568+
result = Series(pd.to_datetime(
1569+
[0, 1, 2], unit='D'))
1570+
expected = Series([Timestamp('1970-01-01'),
1571+
Timestamp('1970-01-02'),
1572+
Timestamp('1970-01-03')])
1573+
assert_series_equal(result, expected)
1574+
1575+
def test_julian_round_trip(self):
1576+
result = pd.to_datetime(2456658, origin='julian', unit='D')
1577+
assert result.to_julian_date() == 2456658
1578+
1579+
# out-of-bounds
1580+
with pytest.raises(ValueError):
1581+
pd.to_datetime(1, origin="julian", unit='D')
1582+
1583+
def test_invalid_unit(self, units, julian_dates):
1584+
1585+
# checking for invalid combination of origin='julian' and unit != D
1586+
if units != 'D':
1587+
with pytest.raises(ValueError):
1588+
pd.to_datetime(julian_dates, unit=units, origin='julian')
1589+
1590+
def test_invalid_origin(self):
1591+
1592+
# need to have a numeric specified
1593+
with pytest.raises(ValueError):
1594+
pd.to_datetime("2005-01-01", origin="1960-01-01")
1595+
1596+
with pytest.raises(ValueError):
1597+
pd.to_datetime("2005-01-01", origin="1960-01-01", unit='D')
1598+
1599+
def test_epoch(self, units, epochs, epoch_1960, units_from_epochs):
1600+
1601+
expected = Series(
1602+
[pd.Timedelta(x, unit=units) +
1603+
epoch_1960 for x in units_from_epochs])
1604+
1605+
result = Series(pd.to_datetime(
1606+
units_from_epochs, unit=units, origin=epochs))
1607+
assert_series_equal(result, expected)
1608+
1609+
@pytest.mark.parametrize("origin, exc",
1610+
[('random_string', ValueError),
1611+
('epoch', ValueError),
1612+
('13-24-1990', ValueError),
1613+
(datetime(1, 1, 1), tslib.OutOfBoundsDatetime)])
1614+
def test_invalid_origins(self, origin, exc, units, units_from_epochs):
1615+
1616+
with pytest.raises(exc):
1617+
pd.to_datetime(units_from_epochs, unit=units,
1618+
origin=origin)
1619+
1620+
def test_processing_order(self):
1621+
# make sure we handle out-of-bounds *before*
1622+
# constructing the dates
1623+
1624+
result = pd.to_datetime(200 * 365, unit='D')
1625+
expected = Timestamp('2169-11-13 00:00:00')
1626+
assert result == expected
1627+
1628+
result = pd.to_datetime(200 * 365, unit='D', origin='1870-01-01')
1629+
expected = Timestamp('2069-11-13 00:00:00')
1630+
assert result == expected
1631+
1632+
result = pd.to_datetime(300 * 365, unit='D', origin='1870-01-01')
1633+
expected = Timestamp('2169-10-20 00:00:00')
1634+
assert result == expected

pandas/tests/indexes/timedeltas/test_ops.py

+9-4
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,12 @@ def test_ops_compat(self):
284284
result = rng / offset
285285
tm.assert_index_equal(result, expected, exact=False)
286286

287+
# floor divide
288+
expected = Int64Index((np.arange(10) + 1) * 12, name='foo')
289+
for offset in offsets:
290+
result = rng // offset
291+
tm.assert_index_equal(result, expected, exact=False)
292+
287293
# divide with nats
288294
rng = TimedeltaIndex(['1 days', pd.NaT, '2 days'], name='foo')
289295
expected = Float64Index([12, np.nan, 24], name='foo')
@@ -867,20 +873,19 @@ def test_ops(self):
867873
self.assertEqual(td * 2, Timedelta(20, unit='d'))
868874
self.assertTrue((td * pd.NaT) is pd.NaT)
869875
self.assertEqual(td / 2, Timedelta(5, unit='d'))
876+
self.assertEqual(td // 2, Timedelta(5, unit='d'))
870877
self.assertEqual(abs(td), td)
871878
self.assertEqual(abs(-td), td)
872879
self.assertEqual(td / td, 1)
873880
self.assertTrue((td / pd.NaT) is np.nan)
881+
self.assertTrue((td // pd.NaT) is np.nan)
874882

875883
# invert
876884
self.assertEqual(-td, Timedelta('-10d'))
877885
self.assertEqual(td * -1, Timedelta('-10d'))
878886
self.assertEqual(-1 * td, Timedelta('-10d'))
879887
self.assertEqual(abs(-td), Timedelta('10d'))
880888

881-
# invalid
882-
self.assertRaises(TypeError, lambda: Timedelta(11, unit='d') // 2)
883-
884889
# invalid multiply with another timedelta
885890
self.assertRaises(TypeError, lambda: td * td)
886891

@@ -991,7 +996,7 @@ class Other:
991996
self.assertTrue(td.__sub__(other) is NotImplemented)
992997
self.assertTrue(td.__truediv__(other) is NotImplemented)
993998
self.assertTrue(td.__mul__(other) is NotImplemented)
994-
self.assertTrue(td.__floordiv__(td) is NotImplemented)
999+
self.assertTrue(td.__floordiv__(other) is NotImplemented)
9951000

9961001
def test_ops_error_str(self):
9971002
# GH 13624

pandas/tests/scalar/test_timedelta.py

+10
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,7 @@ def test_conversion(self):
216216

217217
def test_freq_conversion(self):
218218

219+
# truediv
219220
td = Timedelta('1 days 2 hours 3 ns')
220221
result = td / np.timedelta64(1, 'D')
221222
self.assertEqual(result, td.value / float(86400 * 1e9))
@@ -224,6 +225,15 @@ def test_freq_conversion(self):
224225
result = td / np.timedelta64(1, 'ns')
225226
self.assertEqual(result, td.value)
226227

228+
# floordiv
229+
td = Timedelta('1 days 2 hours 3 ns')
230+
result = td // np.timedelta64(1, 'D')
231+
self.assertEqual(result, 1)
232+
result = td // np.timedelta64(1, 's')
233+
self.assertEqual(result, 93600)
234+
result = td // np.timedelta64(1, 'ns')
235+
self.assertEqual(result, td.value)
236+
227237
def test_fields(self):
228238
def check(value):
229239
# that we are int/long like

pandas/tseries/tdi.py

+5-2
Original file line numberDiff line numberDiff line change
@@ -326,15 +326,18 @@ def _add_delta(self, delta):
326326
def _evaluate_with_timedelta_like(self, other, op, opstr):
327327

328328
# allow division by a timedelta
329-
if opstr in ['__div__', '__truediv__']:
329+
if opstr in ['__div__', '__truediv__', '__floordiv__']:
330330
if _is_convertible_to_td(other):
331331
other = Timedelta(other)
332332
if isnull(other):
333333
raise NotImplementedError(
334334
"division by pd.NaT not implemented")
335335

336336
i8 = self.asi8
337-
result = i8 / float(other.value)
337+
if opstr in ['__floordiv__']:
338+
result = i8 // other.value
339+
else:
340+
result = op(i8, float(other.value))
338341
result = self._maybe_mask_results(result, convert='float64')
339342
return Index(result, name=self.name, copy=False)
340343

0 commit comments

Comments
 (0)