Skip to content

CI: MacPython failing TestPandasContainer.test_to_json_large_numbers #35184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 56 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
30c9b83
add values.dtype.kind==f branch to array_with_unit_datetime
arw2019 Jun 27, 2020
cf67f90
remove unnecessary styling changes
arw2019 Jun 27, 2020
a69b28c
added cast_from_unit definition for float
arw2019 Jun 27, 2020
9df9d4d
to_datetime: added astyping for floats
arw2019 Jun 29, 2020
5746581
revert changes
arw2019 Jun 29, 2020
6b9d4de
revert changes
arw2019 Jun 29, 2020
0e3a876
revert styling change
arw2019 Jun 29, 2020
f1ae8f5
_libs/tslib.pyx added comments
arw2019 Jun 29, 2020
2f25460
merge with master
arw2019 Jun 29, 2020
572363a
revert pandas/_libs/tslib.pyx
arw2019 Jun 29, 2020
a50b4fd
merge with master
arw2019 Jun 29, 2020
b891030
merge with master
arw2019 Jun 30, 2020
ecd8ce3
merge with master
arw2019 Jun 30, 2020
38bac1a
update Grouping.indicies to return for nan values
arw2019 Jul 1, 2020
65a2963
updated _GroupBy._get_index to return for nan values
arw2019 Jul 1, 2020
7df44d1
revert accidental changes
arw2019 Jul 1, 2020
4eb8a17
revert accidental changes
arw2019 Jul 1, 2020
21bb8e7
revert accidental changes
arw2019 Jul 1, 2020
0daca66
styling change
arw2019 Jul 1, 2020
ee55191
merge with master
arw2019 Jul 2, 2020
0c0e289
added tests
arw2019 Jul 2, 2020
a804909
fixed groupby/groupby.py's _get_indicies
arw2019 Jul 6, 2020
5e4419e
removed debug statement
arw2019 Jul 6, 2020
1e694c9
fixed naming error in test
arw2019 Jul 7, 2020
d5e1a3b
remove type coercion block
arw2019 Jul 7, 2020
91947c5
added missing values handing for _GroupBy.get_group method
arw2019 Jul 7, 2020
ce80f7c
updated indicies for case dropna=True
arw2019 Jul 7, 2020
5c992d2
cleaned up syntax
arw2019 Jul 7, 2020
15215fe
cleaned up syntax
arw2019 Jul 7, 2020
30c2fb5
removed print statements
arw2019 Jul 7, 2020
0746a76
_transform_general: add a check that we don't accidentally upcast
arw2019 Jul 7, 2020
d4316cd
_transform_general: add int32, float32 to upcasting check
arw2019 Jul 7, 2020
7a43155
rewrite for loop as list comprehension
arw2019 Jul 7, 2020
2bd5885
rewrote if statement as dict comp + ternary
arw2019 Jul 7, 2020
550985f
fixed small bug in list comp in groupby/groupby.py
arw2019 Jul 7, 2020
57a8da4
deleted debug statement in groupby/groupby.py
arw2019 Jul 7, 2020
c1e7bce
rewrite _get_index using next_iter to set default value
arw2019 Jul 7, 2020
62f52b8
update exepcted test_groupby_nat_exclude for new missing values handling
arw2019 Jul 7, 2020
292fcdc
merge with master
arw2019 Jul 7, 2020
09b9f69
merge with master
arw2019 Jul 7, 2020
ef3c199
remove print statement
arw2019 Jul 7, 2020
9e4ac71
Merge remote-tracking branch 'upstream/master'
arw2019 Jul 8, 2020
1d0ba61
merge with master
arw2019 Jul 8, 2020
b55021d
removed xfail tests
arw2019 Jul 8, 2020
349c614
merge with master
arw2019 Jul 9, 2020
f658d5a
download from remote
arw2019 Jul 9, 2020
15c1c33
reworked solution
arw2019 Jul 9, 2020
657c13b
fixed PEP8 issue
arw2019 Jul 9, 2020
70e3a19
run pre-commit checks
arw2019 Jul 9, 2020
5ace6ac
styling fix
arw2019 Jul 9, 2020
90e9b6a
update whatnew + styling improvements
arw2019 Jul 9, 2020
80842a1
merge with master
arw2019 Jul 11, 2020
2c80131
merge with master
arw2019 Jul 11, 2020
9127b8f
pull from arw2019:CI-MacPython-fails-to_json-large_nums
arw2019 Jul 11, 2020
be05575
add read_json tests
arw2019 Jul 11, 2020
319ae66
Fixups
TomAugspurger Jul 14, 2020
7a78da5
fixed git mistake
arw2019 Jul 14, 2020
33fd7ed
merge with master
arw2019 Jul 14, 2020
128173c
minimize diff
arw2019 Jul 14, 2020
fc5bce6
fix input to test
arw2019 Jul 14, 2020
1f2fa9e
xfail
TomAugspurger Jul 15, 2020
72a612f
fixup
TomAugspurger Jul 15, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -992,6 +992,7 @@ Missing
- :meth:`DataFrame.interpolate` uses the correct axis convention now. Previously interpolating along columns lead to interpolation along indices and vice versa. Furthermore interpolating with methods ``pad``, ``ffill``, ``bfill`` and ``backfill`` are identical to using these methods with :meth:`fillna` (:issue:`12918`, :issue:`29146`)
- Bug in :meth:`DataFrame.interpolate` when called on a DataFrame with column names of string type was throwing a ValueError. The method is no independing of the type of column names (:issue:`33956`)
- passing :class:`NA` will into a format string using format specs will now work. For example ``"{:.1f}".format(pd.NA)`` would previously raise a ``ValueError``, but will now return the string ``"<NA>"`` (:issue:`34740`)
- Bug in :meth:`SeriesGroupBy.transform` now correctly handles missing values for `dropna=False` (:issue:`35014`)

MultiIndex
^^^^^^^^^^
Expand Down
4 changes: 3 additions & 1 deletion pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -548,8 +548,10 @@ def _transform_general(
# we will only try to coerce the result type if
# we have a numeric dtype, as these are *always* user-defined funcs
# the cython take a different path (and casting)
# make sure we don't accidentally upcast (GH35014)
types = ["bool", "int32", "int64", "float32", "float64"]
dtype = self._selected_obj.dtype
if is_numeric_dtype(dtype):
if is_numeric_dtype(dtype) and types.index(dtype) < types.index(result.dtype):
result = maybe_downcast_to_dtype(result, dtype)

result.name = self._selected_obj.name
Expand Down
8 changes: 6 additions & 2 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ class providing the base-class of operations.
)
from pandas.core.dtypes.missing import isna, notna

import pandas as pd
from pandas.core import nanops
import pandas.core.algorithms as algorithms
from pandas.core.arrays import Categorical, DatetimeArray
Expand Down Expand Up @@ -624,7 +625,10 @@ def _get_index(self, name):
"""
Safe get index, translate keys for datelike to underlying repr.
"""
return self._get_indices([name])[0]
if isna(name):
return self._get_indices([pd.NaT])[0]
else:
return self._get_indices([name])[0]

@cache_readonly
def _selected_obj(self):
Expand Down Expand Up @@ -802,7 +806,7 @@ def get_group(self, name, obj=None):
if obj is None:
obj = self._selected_obj

inds = self._get_index(name)
inds = self._get_index(pd.NaT) if pd.isna(name) else self._get_index(name)
if not len(inds):
raise KeyError(name)

Expand Down
12 changes: 11 additions & 1 deletion pandas/core/groupby/grouper.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
)
from pandas.core.dtypes.generic import ABCSeries

import pandas as pd
import pandas.core.algorithms as algorithms
from pandas.core.arrays import Categorical, ExtensionArray
import pandas.core.common as com
Expand Down Expand Up @@ -558,7 +559,16 @@ def indices(self):
return self.grouper.indices

values = Categorical(self.grouper)
return values._reverse_indexer()

# GH35014
reverse_indexer = values._reverse_indexer()
if not self.dropna and any(pd.isna(v) for v in values):
return {
**reverse_indexer,
pd.NaT: np.array([i for i, v in enumerate(values) if pd.isna(v)]),
}
else:
return reverse_indexer

@property
def codes(self) -> np.ndarray:
Expand Down
21 changes: 21 additions & 0 deletions pandas/tests/groupby/test_groupby_dropna.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,27 @@ def test_groupby_dropna_series_by(dropna, expected):
tm.assert_series_equal(result, expected)


def test_slice_groupby_then_transform():
# GH35014

df = pd.DataFrame({"A": [0, 0, 1, None], "B": [1, 2, 3, None]})
gb = df.groupby("A", dropna=False)

res = gb.transform(len)
expected = pd.DataFrame({"B": [2, 2, 1, 1]})
tm.assert_frame_equal(res, expected)

gb_slice = gb[["B"]]
res = gb_slice.transform(len)
expected = pd.DataFrame({"B": [2, 2, 1, 1]})
tm.assert_frame_equal(res, expected)

gb_slice = gb["B"]
res = gb["B"].transform(len)
expected = pd.Series(data=[2, 2, 1, 1], name="B")
tm.assert_series_equal(res, expected)


@pytest.mark.parametrize(
"dropna, tuples, outputs",
[
Expand Down
20 changes: 14 additions & 6 deletions pandas/tests/io/json/test_pandas.py
Original file line number Diff line number Diff line change
Expand Up @@ -1250,23 +1250,31 @@ def test_to_json_large_numbers(self, bigNum):
json = series.to_json()
expected = '{"articleId":' + str(bigNum) + "}"
assert json == expected
# GH 20599

df = DataFrame(bigNum, dtype=object, index=["articleId"], columns=[0])
json = df.to_json()
expected = '{"0":{"articleId":' + str(bigNum) + "}}"
assert json == expected

@pytest.mark.parametrize("bigNum", [2 ** 64 + 1, -(2 ** 64 + 2)])
def test_read_json_large_numbers(self, bigNum):
# GH20599

series = Series(bigNum, dtype=object, index=["articleId"])
json = '{"articleId":' + str(bigNum) + "}"
with pytest.raises(ValueError):
json = StringIO(json)
result = read_json(json)
tm.assert_series_equal(series, result)

df = DataFrame(bigNum, dtype=object, index=["articleId"], columns=[0])
json = df.to_json()
expected = '{"0":{"articleId":' + str(bigNum) + "}}"
assert json == expected
# GH 20599
json = '{"0":{"articleId":' + str(bigNum) + "}}"
with pytest.raises(ValueError):
json = StringIO(json)
result = read_json(json)
tm.assert_frame_equal(df, result)

def test_read_json_large_numbers(self):
def test_read_json_large_numbers2(self):
# GH18842
json = '{"articleId": "1404366058080022500245"}'
json = StringIO(json)
Expand Down