Skip to content

Plotting Int64 columns with nulled integers (NAType) fails #32073 #32387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 28 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
05ab972
BUG: fixes unhandled NAType when plotting (#32073)
jeandersonbc Feb 28, 2020
c7756b1
BUG: fixes unhandled NAType when plotting (#32073)
jeandersonbc Feb 28, 2020
0e738b0
Fixed bad formatting
jeandersonbc Mar 1, 2020
26d3297
REF: collect+parametrize reorder_levels tests (#32373)
jbrockmendel Mar 2, 2020
0ff6b96
TST: Allow definition of `pd.CategoricalDtype` with a specific `categ…
rushabh-v Mar 2, 2020
64f76e9
TYP: annotations for internals, set_axis (#32376)
jbrockmendel Mar 2, 2020
70b840b
misplaced DataFrame.join test (#32375)
jbrockmendel Mar 2, 2020
8262397
DOC: Fixed ES01, PR07, SA04 error in pandas.core.groupby.DataFrameGro…
Iqrar99 Mar 2, 2020
d2d14b7
BUG: Pickle NA objects (#32104)
TomAugspurger Mar 2, 2020
ba7895e
DOC: Fix SA04 errors in docstrings #28792 (#32182)
AdrianMastronardi Mar 3, 2020
f2f8605
CLN: remove _igetitem_cache (#32319)
jbrockmendel Mar 3, 2020
6a48be6
Avoid unnecessary values_from_object (#32398)
jbrockmendel Mar 3, 2020
bc4c189
ENH: infer freq in timedelta_range (#32377)
jbrockmendel Mar 3, 2020
6023860
BUG: 2D DTA/TDA arithmetic with object-dtype (#32185)
jbrockmendel Mar 3, 2020
66422c4
TST: broken off from #32187 (#32258)
jbrockmendel Mar 3, 2020
7210810
REF: simplify PeriodIndex._shallow_copy (#32280)
jbrockmendel Mar 3, 2020
e262a71
CLN: setitem_with_indexer cleanups (#32341)
jbrockmendel Mar 3, 2020
143faa0
BUG: None / Timedelta incorrectly returning NaT (#32340)
jbrockmendel Mar 3, 2020
d2413f9
TST: Using more fixtures in of tests/base/test_ops.py (#32313)
SaturnFromTitan Mar 3, 2020
0a7ebfd
CLN: remove unused values from interpolate call (#32400)
jbrockmendel Mar 3, 2020
3f9b4e8
CLN: some code cleanups to pandas/_libs/missing.pyx (#32367)
ShaharNaveh Mar 3, 2020
0d41a23
BUG: fixes bug when using sep=None and comment keyword for read_csv (…
s-scherrer Mar 3, 2020
a57de43
Don't create _join_functions (#32336)
dsaxton Mar 3, 2020
5759ad9
API: replace() should raise an exception if invalid argument is given…
a-y-khan Mar 3, 2020
1c5b03f
BUG: Fix __ne__ comparison for Categorical (#32304)
dsaxton Mar 3, 2020
b03a910
CLN: clean-up show_versions and consistently use null for json output…
simonjayhawkins Mar 3, 2020
1973ddb
Add missing newline (#32404)
Mar 3, 2020
9f71755
Added simple test case
jeandersonbc Mar 3, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.0.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ Bug fixes
**I/O**

- Using ``pd.NA`` with :meth:`DataFrame.to_json` now correctly outputs a null value instead of an empty object (:issue:`31615`)
- Fixed pickling of ``pandas.NA``. Previously a new object was returned, which broke computations relying on ``NA`` being a singleton (:issue:`31847`)
- Fixed bug in parquet roundtrip with nullable unsigned integer dtypes (:issue:`31896`).

**Experimental dtypes**
Expand Down
6 changes: 6 additions & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ Other enhancements
- :class:`Styler` may now render CSS more efficiently where multiple cells have the same styling (:issue:`30876`)
- When writing directly to a sqlite connection :func:`to_sql` now supports the ``multi`` method (:issue:`29921`)
- `OptionError` is now exposed in `pandas.errors` (:issue:`27553`)
- :func:`timedelta_range` will now infer a frequency when passed ``start``, ``stop``, and ``periods`` (:issue:`32377`)
-

.. ---------------------------------------------------------------------------
Expand Down Expand Up @@ -198,6 +199,7 @@ Categorical

- Bug where :func:`merge` was unable to join on non-unique categorical indices (:issue:`28189`)
- Bug when passing categorical data to :class:`Index` constructor along with ``dtype=object`` incorrectly returning a :class:`CategoricalIndex` instead of object-dtype :class:`Index` (:issue:`32167`)
- Bug where :class:`Categorical` comparison operator ``__ne__`` would incorrectly evaluate to ``False`` when either element was missing (:issue:`32276`)
-

Datetimelike
Expand All @@ -213,6 +215,7 @@ Timedelta
^^^^^^^^^

- Bug in constructing a :class:`Timedelta` with a high precision integer that would round the :class:`Timedelta` components (:issue:`31354`)
- Bug in dividing ``np.nan`` or ``None`` by :class:`Timedelta`` incorrectly returning ``NaT`` (:issue:`31869`)
-

Timezones
Expand Down Expand Up @@ -295,8 +298,10 @@ I/O
- Bug in :meth:`DataFrame.to_parquet` overwriting pyarrow's default for
``coerce_timestamps``; following pyarrow's default allows writing nanosecond
timestamps with ``version="2.0"`` (:issue:`31652`).
- Bug in :meth:`read_csv` was raising `TypeError` when `sep=None` was used in combination with `comment` keyword (:issue:`31396`)
- Bug in :class:`HDFStore` that caused it to set to ``int64`` the dtype of a ``datetime64`` column when reading a DataFrame in Python 3 from fixed format written in Python 2 (:issue:`31750`)


Plotting
^^^^^^^^

Expand All @@ -322,6 +327,7 @@ Reshaping
- Bug in :func:`crosstab` when inputs are two Series and have tuple names, the output will keep dummy MultiIndex as columns. (:issue:`18321`)
- :meth:`DataFrame.pivot` can now take lists for ``index`` and ``columns`` arguments (:issue:`21425`)
- Bug in :func:`concat` where the resulting indices are not copied when ``copy=True`` (:issue:`29879`)
- :meth:`DataFrame.replace` and :meth:`Series.replace` will raise a ``TypeError`` if ``to_replace`` is not an expected type. Previously the ``replace`` would fail silently (:issue:`18634`)


Sparse
Expand Down
4 changes: 1 addition & 3 deletions pandas/_libs/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -100,11 +100,9 @@ def values_from_object(obj: object):
"""
func: object

if getattr(obj, '_typ', '') == 'dataframe':
return obj.values

func = getattr(obj, '_internal_get_values', None)
if func is not None:
# Includes DataFrame, for which we get frame.values
obj = func()

return obj
Expand Down
33 changes: 15 additions & 18 deletions pandas/_libs/missing.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,13 @@ cnp.import_array()

cimport pandas._libs.util as util

from pandas._libs.tslibs.np_datetime cimport (
get_timedelta64_value, get_datetime64_value)

from pandas._libs.tslibs.np_datetime cimport get_datetime64_value, get_timedelta64_value
from pandas._libs.tslibs.nattype cimport (
checknull_with_nat, c_NaT as NaT, is_null_datetimelike)
c_NaT as NaT,
checknull_with_nat,
is_null_datetimelike,
)
from pandas._libs.ops_dispatch import maybe_dispatch_ufunc_to_dunder_op

from pandas.compat import is_platform_32bit
Expand Down Expand Up @@ -44,7 +47,7 @@ cpdef bint checknull(object val):

Returns
-------
result : bool
bool

Notes
-----
Expand Down Expand Up @@ -223,7 +226,7 @@ def isnaobj2d_old(arr: ndarray) -> ndarray:

Returns
-------
result : ndarray (dtype=np.bool_)
ndarray (dtype=np.bool_)

Notes
-----
Expand All @@ -248,17 +251,11 @@ def isnaobj2d_old(arr: ndarray) -> ndarray:


def isposinf_scalar(val: object) -> bool:
if util.is_float_object(val) and val == INF:
return True
else:
return False
return util.is_float_object(val) and val == INF


def isneginf_scalar(val: object) -> bool:
if util.is_float_object(val) and val == NEGINF:
return True
else:
return False
return util.is_float_object(val) and val == NEGINF


cdef inline bint is_null_datetime64(v):
Expand Down Expand Up @@ -364,6 +361,9 @@ class NAType(C_NAType):
exponent = 31 if is_32bit else 61
return 2 ** exponent - 1

def __reduce__(self):
return "NA"

# Binary arithmetic and comparison ops -> propagate

__add__ = _create_binary_propagating_op("__add__")
Expand Down Expand Up @@ -423,7 +423,6 @@ class NAType(C_NAType):
return NA
elif isinstance(other, np.ndarray):
return np.where(other == 1, other, NA)

return NotImplemented

# Logical ops using Kleene logic
Expand All @@ -433,8 +432,7 @@ class NAType(C_NAType):
return False
elif other is True or other is C_NA:
return NA
else:
return NotImplemented
return NotImplemented

__rand__ = __and__

Expand All @@ -443,8 +441,7 @@ class NAType(C_NAType):
return True
elif other is False or other is C_NA:
return NA
else:
return NotImplemented
return NotImplemented

__ror__ = __or__

Expand Down
10 changes: 9 additions & 1 deletion pandas/_libs/tslibs/timedeltas.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -1407,15 +1407,23 @@ class Timedelta(_Timedelta):
# convert to Timedelta below
pass

elif util.is_nan(other):
# i.e. np.nan or np.float64("NaN")
raise TypeError("Cannot divide float by Timedelta")

elif hasattr(other, 'dtype'):
if other.dtype.kind == "O":
# GH#31869
return np.array([x / self for x in other])
return other / self.to_timedelta64()

elif not _validate_ops_compat(other):
return NotImplemented

other = Timedelta(other)
if other is NaT:
return NaT
# In this context we treat NaT as timedelta-like
return np.nan
return float(other.value) / self.value

def __floordiv__(self, other):
Expand Down
4 changes: 2 additions & 2 deletions pandas/_testing.py
Original file line number Diff line number Diff line change
Expand Up @@ -742,9 +742,9 @@ def repr_class(x):
raise_assert_detail(obj, msg, repr_class(left), repr_class(right))


def assert_attr_equal(attr, left, right, obj="Attributes"):
def assert_attr_equal(attr: str, left, right, obj: str = "Attributes"):
"""
checks attributes are equal. Both objects must have attribute.
Check attributes are equal. Both objects must have attribute.

Parameters
----------
Expand Down
2 changes: 1 addition & 1 deletion pandas/_typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
Label = Optional[Hashable]
Level = Union[Label, int]
Ordered = Optional[bool]
JSONSerializable = Union[PythonScalar, List, Dict]
JSONSerializable = Optional[Union[PythonScalar, List, Dict]]
Axes = Collection

# For functions like rename that convert one label to another
Expand Down
24 changes: 24 additions & 0 deletions pandas/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -1047,6 +1047,16 @@ def series_with_simple_index(indices):
for dtype in _narrow_dtypes
}


@pytest.fixture(params=_narrow_series.keys())
def narrow_series(request):
"""
Fixture for Series with low precision data types
"""
# copy to avoid mutation, e.g. setting .name
return _narrow_series[request.param].copy()


_index_or_series_objs = {**indices_dict, **_series, **_narrow_series}


Expand All @@ -1057,3 +1067,17 @@ def index_or_series_obj(request):
copy to avoid mutation, e.g. setting .name
"""
return _index_or_series_objs[request.param].copy(deep=True)


@pytest.fixture
def multiindex_year_month_day_dataframe_random_data():
"""
DataFrame with 3 level MultiIndex (year, month, day) covering
first 100 business days from 2000-01-01 with random data
"""
tdf = tm.makeTimeDataFrame(100)
ymd = tdf.groupby([lambda x: x.year, lambda x: x.month, lambda x: x.day]).sum()
# use Int64Index, to make sure things work
ymd.index.set_levels([lev.astype("i8") for lev in ymd.index.levels], inplace=True)
ymd.index.set_names(["year", "month", "day"], inplace=True)
return ymd
5 changes: 4 additions & 1 deletion pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,10 @@ def func(self, other):
mask = (self._codes == -1) | (other_codes == -1)
if mask.any():
# In other series, the leads to False, so do that here too
ret[mask] = False
if opname == "__ne__":
ret[(self._codes == -1) & (other_codes == -1)] = True
else:
ret[mask] = False
return ret

if is_scalar(other):
Expand Down
21 changes: 8 additions & 13 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
from pandas.core.algorithms import checked_add_with_arr, take, unique1d, value_counts
from pandas.core.arrays.base import ExtensionArray, ExtensionOpsMixin
import pandas.core.common as com
from pandas.core.construction import array, extract_array
from pandas.core.indexers import check_array_indexer
from pandas.core.ops.common import unpack_zerodim_and_defer
from pandas.core.ops.invalid import invalid_comparison, make_invalid_op
Expand Down Expand Up @@ -623,7 +624,7 @@ def astype(self, dtype, copy=True):
dtype = pandas_dtype(dtype)

if is_object_dtype(dtype):
return self._box_values(self.asi8)
return self._box_values(self.asi8.ravel()).reshape(self.shape)
elif is_string_dtype(dtype) and not is_categorical_dtype(dtype):
return self._format_native_types()
elif is_integer_dtype(dtype):
Expand Down Expand Up @@ -1256,19 +1257,13 @@ def _addsub_object_array(self, other: np.ndarray, op):
PerformanceWarning,
)

# For EA self.astype('O') returns a numpy array, not an Index
left = self.astype("O")
# Caller is responsible for broadcasting if necessary
assert self.shape == other.shape, (self.shape, other.shape)

res_values = op(left, np.array(other))
kwargs = {}
if not is_period_dtype(self):
kwargs["freq"] = "infer"
try:
res = type(self)._from_sequence(res_values, **kwargs)
except ValueError:
# e.g. we've passed a Timestamp to TimedeltaArray
res = res_values
return res
res_values = op(self.astype("O"), np.array(other))
result = array(res_values.ravel())
result = extract_array(result, extract_numpy=True).reshape(self.shape)
return result

def _time_shift(self, periods, freq=None):
"""
Expand Down
12 changes: 10 additions & 2 deletions pandas/core/arrays/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
from pandas.core.algorithms import checked_add_with_arr
from pandas.core.arrays import datetimelike as dtl
import pandas.core.common as com
from pandas.core.construction import extract_array

from pandas.tseries.frequencies import to_offset
from pandas.tseries.offsets import Tick
Expand Down Expand Up @@ -141,8 +142,7 @@ def dtype(self):
# Constructors

def __init__(self, values, dtype=_TD_DTYPE, freq=None, copy=False):
if isinstance(values, (ABCSeries, ABCIndexClass)):
values = values._values
values = extract_array(values)

inferred_freq = getattr(values, "_freq", None)

Expand Down Expand Up @@ -258,6 +258,10 @@ def _generate_range(cls, start, end, periods, freq, closed=None):
index = _generate_regular_range(start, end, periods, freq)
else:
index = np.linspace(start.value, end.value, periods).astype("i8")
if len(index) >= 2:
# Infer a frequency
td = Timedelta(index[1] - index[0])
freq = to_offset(td)

if not left_closed:
index = index[1:]
Expand Down Expand Up @@ -614,6 +618,10 @@ def __floordiv__(self, other):
if self.freq is not None:
# Note: freq gets division, not floor-division
freq = self.freq / other
if freq.nanos == 0 and self.freq.nanos != 0:
# e.g. if self.freq is Nano(1) then dividing by 2
# rounds down to zero
freq = None
return type(self)(result.view("m8[ns]"), freq=freq)

if not hasattr(other, "dtype"):
Expand Down
13 changes: 8 additions & 5 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1035,7 +1035,8 @@ def argmin(self, axis=None, skipna=True, *args, **kwargs):

See Also
--------
numpy.ndarray.argmin
numpy.ndarray.argmin : Return indices of the minimum values along
the given axis.
"""
nv.validate_minmax_axis(axis)
nv.validate_argmax_with_skipna(skipna, args, kwargs)
Expand All @@ -1055,7 +1056,8 @@ def tolist(self):

See Also
--------
numpy.ndarray.tolist
numpy.ndarray.tolist : Return the array as an a.ndim-levels deep
nested list of Python scalars.
"""
if not isinstance(self._values, np.ndarray):
# check for ndarray instead of dtype to catch DTA/TDA
Expand Down Expand Up @@ -1402,7 +1404,8 @@ def memory_usage(self, deep=False):

See Also
--------
numpy.ndarray.nbytes
numpy.ndarray.nbytes : Total bytes consumed by the elements of the
array.

Notes
-----
Expand Down Expand Up @@ -1473,8 +1476,8 @@ def factorize(self, sort=False, na_sentinel=-1):

See Also
--------
sort_values
numpy.searchsorted
sort_values : Sort by the values along either axis.
numpy.searchsorted : Similar method from NumPy.

Notes
-----
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ def is_bool_indexer(key: Any) -> bool:
is_array_like(key) and is_extension_array_dtype(key.dtype)
):
if key.dtype == np.object_:
key = np.asarray(values_from_object(key))
key = np.asarray(key)

if not lib.is_bool_array(key):
na_msg = "Cannot mask with non-boolean array containing NA / NaN values"
Expand Down
6 changes: 4 additions & 2 deletions pandas/core/computation/eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,8 +266,10 @@ def eval(

See Also
--------
DataFrame.query
DataFrame.eval
DataFrame.query : Evaluates a boolean expression to query the columns
of a frame.
DataFrame.eval : Evaluate a string describing operations on
DataFrame columns.

Notes
-----
Expand Down
Loading