Skip to content

BUG: DataFrame.to_json OverflowError with np.long* dtypes #55494

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions doc/source/user_guide/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -896,9 +896,9 @@ into ``freq`` keyword arguments. The available date offsets and associated frequ
:class:`~pandas.tseries.offsets.BQuarterBegin`, ``'BQS'``, "business quarter begin"
:class:`~pandas.tseries.offsets.FY5253Quarter`, ``'REQ'``, "retail (aka 52-53 week) quarter"
:class:`~pandas.tseries.offsets.YearEnd`, ``'Y'``, "calendar year end"
:class:`~pandas.tseries.offsets.YearBegin`, ``'AS'`` or ``'BYS'``,"calendar year begin"
:class:`~pandas.tseries.offsets.BYearEnd`, ``'BA'``, "business year end"
:class:`~pandas.tseries.offsets.BYearBegin`, ``'BAS'``, "business year begin"
:class:`~pandas.tseries.offsets.YearBegin`, ``'YS'`` or ``'BYS'``,"calendar year begin"
:class:`~pandas.tseries.offsets.BYearEnd`, ``'BY'``, "business year end"
:class:`~pandas.tseries.offsets.BYearBegin`, ``'BYS'``, "business year begin"
:class:`~pandas.tseries.offsets.FY5253`, ``'RE'``, "retail (aka 52-53 week) year"
:class:`~pandas.tseries.offsets.Easter`, None, "Easter holiday"
:class:`~pandas.tseries.offsets.BusinessHour`, ``'bh'``, "business hour"
Expand Down Expand Up @@ -1259,9 +1259,9 @@ frequencies. We will refer to these aliases as *offset aliases*.
"QS", "quarter start frequency"
"BQS", "business quarter start frequency"
"Y", "year end frequency"
"BA, BY", "business year end frequency"
"AS, YS", "year start frequency"
"BAS, BYS", "business year start frequency"
"BY", "business year end frequency"
"YS", "year start frequency"
"BYS", "business year start frequency"
"h", "hourly frequency"
"bh", "business hour frequency"
"cbh", "custom business hour frequency"
Expand Down Expand Up @@ -1692,7 +1692,7 @@ the end of the interval.
.. warning::

The default values for ``label`` and ``closed`` is '**left**' for all
frequency offsets except for 'ME', 'Y', 'Q', 'BM', 'BA', 'BQ', and 'W'
frequency offsets except for 'ME', 'Y', 'Q', 'BM', 'BY', 'BQ', and 'W'
which all have a default of 'right'.

This might unintendedly lead to looking ahead, where the value for a later
Expand Down
20 changes: 16 additions & 4 deletions doc/source/whatsnew/v0.20.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -886,11 +886,23 @@ This would happen with a ``lexsorted``, but non-monotonic levels. (:issue:`15622

This is *unchanged* from prior versions, but shown for illustration purposes:

.. ipython:: python
.. code-block:: python
df = pd.DataFrame(np.arange(6), columns=['value'],
index=pd.MultiIndex.from_product([list('BA'), range(3)]))
df
In [81]: df = pd.DataFrame(np.arange(6), columns=['value'],
....: index=pd.MultiIndex.from_product([list('BA'), range(3)]))
....:
In [82]: df
Out[82]:
value
B 0 0
1 1
2 2
A 0 3
1 4
2 5
[6 rows x 1 columns]
.. code-block:: python
Expand Down
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v2.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,7 @@ I/O
- Bug in :func:`read_csv` with ``engine="pyarrow"`` where ``usecols`` wasn't working with a csv with no headers (:issue:`54459`)
- Bug in :func:`read_excel`, with ``engine="xlrd"`` (``xls`` files) erroring when file contains NaNs/Infs (:issue:`54564`)
- Bug in :func:`to_excel`, with ``OdsWriter`` (``ods`` files) writing boolean/string value (:issue:`54994`)
- Bug in :meth:`DataFrame.to_json` OverflowError with np.long* dtypes (:issue:`55403`)
- Bug in :meth:`pandas.read_excel` with an ODS file without cached formatted cell for float values (:issue:`55219`)

Period
Expand Down Expand Up @@ -394,6 +395,7 @@ Other
^^^^^
- Bug in :func:`cut` incorrectly allowing cutting of timezone-aware datetimes with timezone-naive bins (:issue:`54964`)
- Bug in :meth:`DataFrame.apply` where passing ``raw=True`` ignored ``args`` passed to the applied function (:issue:`55009`)
- Bug in rendering ``inf`` values inside a a :class:`DataFrame` with the ``use_inf_as_na`` option enabled (:issue:`55483`)
- Bug in rendering a :class:`Series` with a :class:`MultiIndex` when one of the index level's names is 0 not having that name displayed (:issue:`55415`)
-

Expand Down
29 changes: 15 additions & 14 deletions pandas/_libs/include/pandas/vendored/ujson/lib/ultrajson.h
Original file line number Diff line number Diff line change
Expand Up @@ -138,19 +138,20 @@ typedef int64_t JSLONG;
#endif

enum JSTYPES {
JT_NULL, // NULL
JT_TRUE, // boolean true
JT_FALSE, // boolean false
JT_INT, // (JSINT32 (signed 32-bit))
JT_LONG, // (JSINT64 (signed 64-bit))
JT_DOUBLE, // (double)
JT_BIGNUM, // integer larger than sys.maxsize
JT_UTF8, // (char 8-bit)
JT_ARRAY, // Array structure
JT_OBJECT, // Key/Value structure
JT_INVALID, // Internal, do not return nor expect
JT_POS_INF, // Positive infinity
JT_NEG_INF, // Negative infinity
JT_NULL, // NULL
JT_TRUE, // boolean true
JT_FALSE, // boolean false
JT_INT, // (JSINT32 (signed 32-bit))
JT_LONG, // (JSINT64 (signed 64-bit))
JT_DOUBLE, // (double)
JT_BIGNUM, // integer larger than sys.maxsize
JT_UTF8, // (char 8-bit)
JT_ARRAY, // Array structure
JT_OBJECT, // Key/Value structure
JT_INVALID, // Internal, do not return nor expect
JT_POS_INF, // Positive infinity
JT_NEG_INF, // Negative infinity
JT_LONG_DOUBLE // Long Double
};

typedef void * JSOBJ;
Expand Down Expand Up @@ -181,7 +182,7 @@ typedef struct __JSONObjectEncoder {
size_t *_outLen);
JSINT64 (*getLongValue)(JSOBJ obj, JSONTypeContext *tc);
JSINT32 (*getIntValue)(JSOBJ obj, JSONTypeContext *tc);
double (*getDoubleValue)(JSOBJ obj, JSONTypeContext *tc);
long double (*getLongDoubleValue)(JSOBJ obj, JSONTypeContext *tc);
const char *(*getBigNumStringValue)(JSOBJ obj, JSONTypeContext *tc,
size_t *_outLen);

Expand Down
1 change: 0 additions & 1 deletion pandas/_libs/missing.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,3 @@ def isneginf_scalar(val: object) -> bool: ...
def checknull(val: object, inf_as_na: bool = ...) -> bool: ...
def isnaobj(arr: np.ndarray, inf_as_na: bool = ...) -> npt.NDArray[np.bool_]: ...
def is_numeric_na(values: np.ndarray) -> npt.NDArray[np.bool_]: ...
def is_float_nan(values: np.ndarray) -> npt.NDArray[np.bool_]: ...
25 changes: 0 additions & 25 deletions pandas/_libs/missing.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -255,31 +255,6 @@ cdef bint checknull_with_nat_and_na(object obj):
return checknull_with_nat(obj) or obj is C_NA


@cython.wraparound(False)
@cython.boundscheck(False)
def is_float_nan(values: ndarray) -> ndarray:
"""
True for elements which correspond to a float nan

Returns
-------
ndarray[bool]
"""
cdef:
ndarray[uint8_t] result
Py_ssize_t i, N
object val

N = len(values)
result = np.zeros(N, dtype=np.uint8)

for i in range(N):
val = values[i]
if util.is_nan(val):
result[i] = True
return result.view(bool)


@cython.wraparound(False)
@cython.boundscheck(False)
def is_numeric_na(values: ndarray) -> ndarray:
Expand Down
28 changes: 14 additions & 14 deletions pandas/_libs/src/vendored/ujson/lib/ultrajsonenc.c
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ The extra 2 bytes are for the quotes around the string
*/
#define RESERVE_STRING(_len) (2 + ((_len)*6))

static const double g_pow10[] = {1,
static const long double g_pow10[] = {1,
10,
100,
1000,
Expand Down Expand Up @@ -784,29 +784,29 @@ void Buffer_AppendLongUnchecked(JSONObjectEncoder *enc, JSINT64 value) {
enc->offset += (wstr - (enc->offset));
}

int Buffer_AppendDoubleUnchecked(JSOBJ obj, JSONObjectEncoder *enc,
double value) {
int Buffer_AppendLongDoubleUnchecked(JSOBJ obj, JSONObjectEncoder *enc,
long double value) {
/* if input is beyond the thresholds, revert to exponential */
const double thres_max = (double)1e16 - 1;
const double thres_min = (double)1e-15;
const long double thres_max = (long double)1e16 - 1;
const long double thres_min = (long double)1e-15;
char precision_str[20];
int count;
double diff = 0.0;
long double diff = 0.0;
char *str = enc->offset;
char *wstr = str;
unsigned long long whole;
double tmp;
long double tmp;
unsigned long long frac;
int neg;
double pow10;
long double pow10;

if (value == HUGE_VAL || value == -HUGE_VAL) {
SetError(obj, enc, "Invalid Inf value when encoding double");
SetError(obj, enc, "Invalid Inf value when encoding long double");
return FALSE;
}

if (!(value == value)) {
SetError(obj, enc, "Invalid Nan value when encoding double");
SetError(obj, enc, "Invalid Nan value when encoding long double");
return FALSE;
}

Expand Down Expand Up @@ -942,7 +942,7 @@ void encode(JSOBJ obj, JSONObjectEncoder *enc, const char *name,
This reservation must hold

length of _name as encoded worst case +
maxLength of double to string OR maxLength of JSLONG to string
maxLength of long double to string OR maxLength of JSLONG to string
*/

Buffer_Reserve(enc, 256 + RESERVE_STRING(cbName));
Expand Down Expand Up @@ -1076,9 +1076,9 @@ void encode(JSOBJ obj, JSONObjectEncoder *enc, const char *name,
break;
}

case JT_DOUBLE: {
if (!Buffer_AppendDoubleUnchecked(obj, enc,
enc->getDoubleValue(obj, &tc))) {
case JT_LONG_DOUBLE: {
if (!Buffer_AppendLongDoubleUnchecked(obj, enc,
enc->getLongDoubleValue(obj, &tc))) {
enc->endTypeContext(obj, &tc);
enc->level--;
return;
Expand Down
31 changes: 18 additions & 13 deletions pandas/_libs/src/vendored/ujson/python/objToJSON.c
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ typedef struct __TypeContext {
PyObject *attrList;
PyObject *iterator;

double doubleValue;
long double longDoubleValue;
JSINT64 longValue;

char *cStr;
Expand Down Expand Up @@ -164,7 +164,7 @@ static TypeContext *createTypeContext(void) {
pc->index = 0;
pc->size = 0;
pc->longValue = 0;
pc->doubleValue = 0.0;
pc->longDoubleValue = (long double) 0.0;
pc->cStr = NULL;
pc->npyarr = NULL;
pc->pdblock = NULL;
Expand Down Expand Up @@ -1494,8 +1494,8 @@ void Object_beginTypeContext(JSOBJ _obj, JSONTypeContext *tc) {
if (npy_isnan(val) || npy_isinf(val)) {
tc->type = JT_NULL;
} else {
pc->doubleValue = val;
tc->type = JT_DOUBLE;
pc->longDoubleValue = (long double) val;
tc->type = JT_LONG_DOUBLE;
}
return;
} else if (PyBytes_Check(obj)) {
Expand All @@ -1507,8 +1507,8 @@ void Object_beginTypeContext(JSOBJ _obj, JSONTypeContext *tc) {
tc->type = JT_UTF8;
return;
} else if (object_is_decimal_type(obj)) {
pc->doubleValue = PyFloat_AsDouble(obj);
tc->type = JT_DOUBLE;
pc->longDoubleValue = (long double) PyFloat_AsDouble(obj);
tc->type = JT_LONG_DOUBLE;
return;
} else if (PyDateTime_Check(obj) || PyDate_Check(obj)) {
if (object_is_nat_type(obj)) {
Expand Down Expand Up @@ -1605,10 +1605,16 @@ void Object_beginTypeContext(JSOBJ _obj, JSONTypeContext *tc) {
PyArray_DescrFromType(NPY_BOOL));
tc->type = (pc->longValue) ? JT_TRUE : JT_FALSE;
return;
} else if (PyArray_IsScalar(obj, Float) || PyArray_IsScalar(obj, Double)) {
PyArray_CastScalarToCtype(obj, &(pc->doubleValue),
} else if (PyArray_IsScalar(obj, Float) ||
PyArray_IsScalar(obj, Double)) {
PyArray_CastScalarToCtype(obj, &(pc->longDoubleValue),
PyArray_DescrFromType(NPY_DOUBLE));
tc->type = JT_DOUBLE;
tc->type = JT_LONG_DOUBLE;
return;
} else if (PyArray_IsScalar(obj, LongDouble)) {
PyArray_CastScalarToCtype(obj, &(pc->longDoubleValue),
PyArray_DescrFromType(NPY_LONGDOUBLE));
tc->type = JT_LONG_DOUBLE;
return;
} else if (PyArray_Check(obj) && PyArray_CheckScalar(obj)) {
PyErr_Format(PyExc_TypeError,
Expand Down Expand Up @@ -1925,8 +1931,8 @@ JSINT64 Object_getLongValue(JSOBJ Py_UNUSED(obj), JSONTypeContext *tc) {
return GET_TC(tc)->longValue;
}

double Object_getDoubleValue(JSOBJ Py_UNUSED(obj), JSONTypeContext *tc) {
return GET_TC(tc)->doubleValue;
long double Object_getLongDoubleValue(JSOBJ Py_UNUSED(obj), JSONTypeContext *tc) {
return GET_TC(tc)->longDoubleValue;
}

const char *Object_getBigNumStringValue(JSOBJ obj, JSONTypeContext *tc,
Expand Down Expand Up @@ -1970,7 +1976,6 @@ PyObject *objToJSON(PyObject *Py_UNUSED(self), PyObject *args,
if (PyDateTimeAPI == NULL) {
return NULL;
}

PandasDateTime_IMPORT;
if (PandasDateTimeAPI == NULL) {
return NULL;
Expand Down Expand Up @@ -2006,7 +2011,7 @@ PyObject *objToJSON(PyObject *Py_UNUSED(self), PyObject *args,
Object_getStringValue,
Object_getLongValue,
NULL, // getIntValue is unused
Object_getDoubleValue,
Object_getLongDoubleValue,
Object_getBigNumStringValue,
Object_iterBegin,
Object_iterNext,
Expand Down
44 changes: 40 additions & 4 deletions pandas/_libs/tslibs/dtypes.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -192,9 +192,6 @@ OFFSET_TO_PERIOD_FREQSTR: dict = {
"BQS": "Q",
"QS": "Q",
"BQ": "Q",
"BA": "Y",
"AS": "Y",
"BAS": "Y",
"MS": "M",
"D": "D",
"B": "B",
Expand All @@ -205,9 +202,9 @@ OFFSET_TO_PERIOD_FREQSTR: dict = {
"ns": "ns",
"h": "h",
"Q": "Q",
"Y": "Y",
"W": "W",
"ME": "M",
"Y": "Y",
"BY": "Y",
"YS": "Y",
"BYS": "Y",
Expand Down Expand Up @@ -244,6 +241,45 @@ DEPR_ABBREVS: dict[str, str]= {
"A-SEP": "Y-SEP",
"A-OCT": "Y-OCT",
"A-NOV": "Y-NOV",
"BA": "BY",
"BA-DEC": "BY-DEC",
"BA-JAN": "BY-JAN",
"BA-FEB": "BY-FEB",
"BA-MAR": "BY-MAR",
"BA-APR": "BY-APR",
"BA-MAY": "BY-MAY",
"BA-JUN": "BY-JUN",
"BA-JUL": "BY-JUL",
"BA-AUG": "BY-AUG",
"BA-SEP": "BY-SEP",
"BA-OCT": "BY-OCT",
"BA-NOV": "BY-NOV",
"AS": "YS",
"AS-DEC": "YS-DEC",
"AS-JAN": "YS-JAN",
"AS-FEB": "YS-FEB",
"AS-MAR": "YS-MAR",
"AS-APR": "YS-APR",
"AS-MAY": "YS-MAY",
"AS-JUN": "YS-JUN",
"AS-JUL": "YS-JUL",
"AS-AUG": "YS-AUG",
"AS-SEP": "YS-SEP",
"AS-OCT": "YS-OCT",
"AS-NOV": "YS-NOV",
"BAS": "BYS",
"BAS-DEC": "BYS-DEC",
"BAS-JAN": "BYS-JAN",
"BAS-FEB": "BYS-FEB",
"BAS-MAR": "BYS-MAR",
"BAS-APR": "BYS-APR",
"BAS-MAY": "BYS-MAY",
"BAS-JUN": "BYS-JUN",
"BAS-JUL": "BYS-JUL",
"BAS-AUG": "BYS-AUG",
"BAS-SEP": "BYS-SEP",
"BAS-OCT": "BYS-OCT",
"BAS-NOV": "BYS-NOV",
"H": "h",
"BH": "bh",
"CBH": "cbh",
Expand Down
4 changes: 2 additions & 2 deletions pandas/_libs/tslibs/fields.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -253,8 +253,8 @@ def get_start_end_field(
# month of year. Other offsets use month, startingMonth as ending
# month of year.

if (freqstr[0:2] in ["MS", "QS", "AS"]) or (
freqstr[1:3] in ["MS", "QS", "AS"]):
if (freqstr[0:2] in ["MS", "QS", "YS"]) or (
freqstr[1:3] in ["MS", "QS", "YS"]):
end_month = 12 if month_kw == 1 else month_kw - 1
start_month = month_kw
else:
Expand Down
Loading