Skip to content

REF: simplify json get_values #32731

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 16, 2020
Merged

Conversation

jbrockmendel
Copy link
Member

cc @WillAyd trying to get away from _internal_get_values in the json code; this is zero-logic-changing to narrow the scope of what gets passed to _internal_get_values.

AFAICT replacing the _internal_get_values call with calling __array__ doesnt break any tests, but in general the two are not always equivalent and it isnt obvious to me whether there are untested cases where it matters.

@jreback jreback added IO JSON read_json, to_json, json_normalize Refactor Internal refactoring of code labels Mar 16, 2020
@jreback jreback added this to the 1.1 milestone Mar 16, 2020
@jreback jreback merged commit b012cf1 into pandas-dev:master Mar 16, 2020
@jreback
Copy link
Contributor

jreback commented Mar 16, 2020

sure

@jbrockmendel jbrockmendel deleted the json-values branch March 16, 2020 01:54
@jbrockmendel
Copy link
Member Author

@WillAyd could use your input on how to handle the last few edge cases. The relevant cases all involve either CategoricalIndex[dt64tz] or Series[Category[dt64tz]]. Example:

dti = pd.date_range('2016-01-01', periods=2, tz='US/Pacific')
ci = pd.CategoricalIndex(dti)
ser = pd.Series(dti)
cser = pd.Series(ci)
ciser = pd.Series(range(2), index=ci)

cser.to_json() matches ser.to_json(), but they get there in different ways. get_values in objToJSON calls ser._internal_get_values() and gets a ndarray[dt64_naive], while cser._internal_get_values() gets a ndarray[object] of Timestamp objects. Is going through object-dtype intentional, or would it be OK to match the non-categorical behavior?

If matching the non-categorical behavior is OK, then we can get equivalent behavior by calling ser.__array__ instead of ser._internal_get_values.

ciser.to_json currently raises because the call to ci._internal_get_values returns a DatetimeIndex. Is raising the desired behavior, or maybe we want to match the series values behavior?

@WillAyd
Copy link
Member

WillAyd commented Mar 16, 2020

get_values in objToJSON calls ser._internal_get_values() and gets a ndarray[dt64_naive], while cser._internal_get_values() gets a ndarray[object] of Timestamp objects. Is going through object-dtype intentional, or would it be OK to match the non-categorical behavior?

Shouldn't need an object coercion. Off the top of my head there were some bugs with datetimelike values where coercion to object would have yielded different values in some combination of the index / values, but those should be squashed now (or if not, would want to identify and correct)

@jbrockmendel
Copy link
Member Author

OK. Cleanest option I've identified is to use obj.values.__array__() instead of obj._internal_get_values(). The only behavior difference is that CategoricalIndex[dt64tz] (or Series with that for its index) will no longer raise. Is that acceptable?

SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants