Skip to content

ENH: implement timeszones support for read_json(orient='table') and astype() from 'object' #35973

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Nov 4, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
f1d7f59
ENH: implement timeszones support for DataFrame.to_json(orient='table')
attack68 Aug 29, 2020
eeb6201
pep8
attack68 Aug 29, 2020
95b9501
minor cleanup
attack68 Aug 29, 2020
c057358
linting and type cleanup
attack68 Aug 29, 2020
70b1448
isort
attack68 Aug 29, 2020
e762ce0
static type ignore
attack68 Aug 29, 2020
61ca6a8
black and mypy fix to work together
attack68 Aug 29, 2020
f9d071a
re-write so conversion occurs in astype() as opposed to parse_json()
attack68 Aug 31, 2020
79bd2eb
removed unused imports
attack68 Aug 31, 2020
f9f413f
black fix
attack68 Aug 31, 2020
ce51e30
typing
attack68 Sep 1, 2020
37cad4f
astype conversion for objects of one tz to another tz
attack68 Sep 1, 2020
39740d8
linting isort
attack68 Sep 1, 2020
d1a9cd3
move tests
attack68 Sep 2, 2020
bb8f7b9
move tests
attack68 Sep 2, 2020
b55cced
seg fault failure fix?
attack68 Sep 13, 2020
5bc4b2c
remove raise condition
attack68 Sep 16, 2020
a6c7ec6
eliminate try-except and move tests
attack68 Sep 16, 2020
a192c66
black fix
attack68 Sep 16, 2020
6d98945
issues stamp
attack68 Sep 16, 2020
b4ac6aa
linting
attack68 Sep 16, 2020
a502a04
Merge remote-tracking branch 'upstream/master' into enh_timezones_json
attack68 Sep 17, 2020
f06a9e0
Merge remote-tracking branch 'upstream/master' into enh_timezones_json
attack68 Sep 19, 2020
5a07736
Merge remote-tracking branch 'upstream/master' into enh_timezones_json
attack68 Sep 21, 2020
bae0a30
test common terms
attack68 Sep 26, 2020
2f36826
test common terms
attack68 Sep 26, 2020
4ebe5b3
Merge remote-tracking branch 'upstream/master' into enh_timezones_json
attack68 Sep 28, 2020
0f7cedd
Merge remote-tracking branch 'upstream/master' into enh_timezones_json
attack68 Sep 29, 2020
54da03f
uncomment previous test now working.
attack68 Sep 29, 2020
4fe7f41
double quotes error
attack68 Sep 30, 2020
f0fe4e4
restart tests
attack68 Oct 1, 2020
8a82832
restart tests
attack68 Oct 1, 2020
978b4a3
Merge remote-tracking branch 'upstream/master' into enh_timezones_json
attack68 Oct 9, 2020
d44a267
fix whats new comments
attack68 Oct 9, 2020
4a1fc86
rephrased test
attack68 Oct 9, 2020
a6e8681
Merge remote-tracking branch 'upstream/master' into enh_timezones_json
attack68 Oct 19, 2020
6b58d2f
Merge branch 'master' into enh_timezones_json
jreback Oct 31, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,7 @@ Other enhancements
- ``Styler`` now allows direct CSS class name addition to individual data cells (:issue:`36159`)
- :meth:`Rolling.mean()` and :meth:`Rolling.sum()` use Kahan summation to calculate the mean to avoid numerical problems (:issue:`10319`, :issue:`11645`, :issue:`13254`, :issue:`32761`, :issue:`36031`)
- :meth:`DatetimeIndex.searchsorted`, :meth:`TimedeltaIndex.searchsorted`, :meth:`PeriodIndex.searchsorted`, and :meth:`Series.searchsorted` with datetimelike dtypes will now try to cast string arguments (listlike and scalar) to the matching datetimelike type (:issue:`36346`)
-
- Added methods :meth:`IntegerArray.prod`, :meth:`IntegerArray.min`, and :meth:`IntegerArray.max` (:issue:`33790`)
- Where possible :meth:`RangeIndex.difference` and :meth:`RangeIndex.symmetric_difference` will return :class:`RangeIndex` instead of :class:`Int64Index` (:issue:`36564`)
- Added :meth:`Rolling.sem()` and :meth:`Expanding.sem()` to compute the standard error of mean (:issue:`26476`).
Expand Down Expand Up @@ -388,6 +389,8 @@ Datetimelike
- Bug in :class:`DatetimeIndex.shift` incorrectly raising when shifting empty indexes (:issue:`14811`)
- :class:`Timestamp` and :class:`DatetimeIndex` comparisons between timezone-aware and timezone-naive objects now follow the standard library ``datetime`` behavior, returning ``True``/``False`` for ``!=``/``==`` and raising for inequality comparisons (:issue:`28507`)
- Bug in :meth:`DatetimeIndex.equals` and :meth:`TimedeltaIndex.equals` incorrectly considering ``int64`` indexes as equal (:issue:`36744`)
- :meth:`to_json` and :meth:`read_json` now implements timezones parsing when orient structure is 'table'.
- :meth:`astype` now attempts to convert to 'datetime64[ns, tz]' directly from 'object' with inferred timezone from string (:issue:`35973`).
- Bug in :meth:`TimedeltaIndex.sum` and :meth:`Series.sum` with ``timedelta64`` dtype on an empty index or series returning ``NaT`` instead of ``Timedelta(0)`` (:issue:`31751`)

Timedelta
Expand Down
8 changes: 7 additions & 1 deletion pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -1970,7 +1970,13 @@ def sequence_to_dt64ns(
data, inferred_tz = objects_to_datetime64ns(
data, dayfirst=dayfirst, yearfirst=yearfirst
)
tz = _maybe_infer_tz(tz, inferred_tz)
if tz and inferred_tz:
# two timezones: convert to intended from base UTC repr
data = tzconversion.tz_convert_from_utc(data.view("i8"), tz)
data = data.view(DT64NS_DTYPE)
elif inferred_tz:
tz = inferred_tz

data_dtype = data.dtype

# `data` may have originally been a Categorical[datetime64[ns, tz]],
Expand Down
4 changes: 3 additions & 1 deletion pandas/io/json/_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,9 @@ def __init__(

# NotImplemented on a column MultiIndex
if obj.ndim == 2 and isinstance(obj.columns, MultiIndex):
raise NotImplementedError("orient='table' is not supported for MultiIndex")
raise NotImplementedError(
"orient='table' is not supported for MultiIndex columns"
)

# TODO: Do this timedelta properly in objToJSON.c See GH #15137
if (
Expand Down
4 changes: 0 additions & 4 deletions pandas/io/json/_table_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -323,10 +323,6 @@ def parse_table_schema(json, precise_float):
for field in table["schema"]["fields"]
}

# Cannot directly use as_type with timezone data on object; raise for now
if any(str(x).startswith("datetime64[ns, ") for x in dtypes.values()):
raise NotImplementedError('table="orient" can not yet read timezone data')

# No ISO constructor for Timedelta as of yet, so need to raise
if "timedelta64" in dtypes.values():
raise NotImplementedError(
Expand Down
24 changes: 24 additions & 0 deletions pandas/tests/frame/methods/test_astype.py
Original file line number Diff line number Diff line change
Expand Up @@ -587,3 +587,27 @@ def test_astype_ignores_errors_for_extension_dtypes(self, df, errors):
msg = "(Cannot cast)|(could not convert)"
with pytest.raises((ValueError, TypeError), match=msg):
df.astype(float, errors=errors)

def test_astype_tz_conversion(self):
# GH 35973
val = {"tz": date_range("2020-08-30", freq="d", periods=2, tz="Europe/London")}
df = DataFrame(val)
result = df.astype({"tz": "datetime64[ns, Europe/Berlin]"})

expected = df
expected["tz"] = expected["tz"].dt.tz_convert("Europe/Berlin")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this expected change to L595 as it goes with it

tm.assert_frame_equal(result, expected)

@pytest.mark.parametrize("tz", ["UTC", "Europe/Berlin"])
def test_astype_tz_object_conversion(self, tz):
# GH 35973
val = {"tz": date_range("2020-08-30", freq="d", periods=2, tz="Europe/London")}
expected = DataFrame(val)

# convert expected to object dtype from other tz str (independently tested)
result = expected.astype({"tz": f"datetime64[ns, {tz}]"})
result = result.astype({"tz": "object"})

# do real test: object dtype to a specified tz, different from construction tz.
result = result.astype({"tz": "datetime64[ns, Europe/London]"})
tm.assert_frame_equal(result, expected)
54 changes: 48 additions & 6 deletions pandas/tests/io/json/test_json_table_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -676,6 +676,11 @@ class TestTableOrientReader:
{"floats": [1.0, 2.0, 3.0, 4.0]},
{"floats": [1.1, 2.2, 3.3, 4.4]},
{"bools": [True, False, False, True]},
{
"timezones": pd.date_range(
"2016-01-01", freq="d", periods=4, tz="US/Central"
) # added in # GH 35973
},
],
)
@pytest.mark.skipif(sys.version_info[:3] == (3, 7, 0), reason="GH-35309")
Expand All @@ -686,22 +691,59 @@ def test_read_json_table_orient(self, index_nm, vals, recwarn):
tm.assert_frame_equal(df, result)

@pytest.mark.parametrize("index_nm", [None, "idx", "index"])
@pytest.mark.parametrize(
"vals",
[{"timedeltas": pd.timedelta_range("1H", periods=4, freq="T")}],
)
def test_read_json_table_orient_raises(self, index_nm, vals, recwarn):
df = DataFrame(vals, index=pd.Index(range(4), name=index_nm))
out = df.to_json(orient="table")
with pytest.raises(NotImplementedError, match="can not yet read "):
pd.read_json(out, orient="table")

@pytest.mark.parametrize(
"idx",
[
pd.Index(range(4)),
pd.Index(
pd.date_range(
"2020-08-30",
freq="d",
periods=4,
),
freq=None,
),
pd.Index(
pd.date_range("2020-08-30", freq="d", periods=4, tz="US/Central"),
freq=None,
),
pd.MultiIndex.from_product(
[
pd.date_range("2020-08-30", freq="d", periods=2, tz="US/Central"),
["x", "y"],
],
),
],
)
@pytest.mark.parametrize(
"vals",
[
{"timedeltas": pd.timedelta_range("1H", periods=4, freq="T")},
{"floats": [1.1, 2.2, 3.3, 4.4]},
{"dates": pd.date_range("2020-08-30", freq="d", periods=4)},
{
"timezones": pd.date_range(
"2016-01-01", freq="d", periods=4, tz="US/Central"
"2020-08-30", freq="d", periods=4, tz="Europe/London"
)
},
],
)
def test_read_json_table_orient_raises(self, index_nm, vals, recwarn):
df = DataFrame(vals, index=pd.Index(range(4), name=index_nm))
@pytest.mark.skipif(sys.version_info[:3] == (3, 7, 0), reason="GH-35309")
def test_read_json_table_timezones_orient(self, idx, vals, recwarn):
# GH 35973
df = DataFrame(vals, index=idx)
out = df.to_json(orient="table")
with pytest.raises(NotImplementedError, match="can not yet read "):
pd.read_json(out, orient="table")
result = pd.read_json(out, orient="table")
tm.assert_frame_equal(df, result)

def test_comprehensive(self):
df = DataFrame(
Expand Down