Skip to content

Commit 03de609

Browse files
Merge pull request #1 from pandas-dev/master
Pulling latest master branch
2 parents 998e2ab + 02ab42f commit 03de609

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+1066
-1033
lines changed

doc/source/user_guide/computation.rst

+12
Original file line numberDiff line numberDiff line change
@@ -597,6 +597,18 @@ You can view other examples of ``BaseIndexer`` subclasses `here <https://github.
597597

598598
.. versionadded:: 1.1
599599

600+
One subclass of note within those examples is the ``NonFixedVariableWindowIndexer`` that allows
601+
rolling operations over a non-fixed offset like a ``BusinessDay``.
602+
603+
.. ipython:: python
604+
605+
from pandas.api.indexers import NonFixedVariableWindowIndexer
606+
df = pd.DataFrame(range(10), index=pd.date_range('2020', periods=10))
607+
offset = pd.offsets.BDay(1)
608+
indexer = NonFixedVariableWindowIndexer(index=df.index, offset=offset)
609+
df
610+
df.rolling(indexer).sum()
611+
600612
For some problems knowledge of the future is available for analysis. For example, this occurs when
601613
each data point is a full time series read from an experiment, and the task is to extract underlying
602614
conditions. In these cases it can be useful to perform forward-looking rolling window computations.

doc/source/whatsnew/v1.1.0.rst

+12
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,15 @@ including other versions of pandas.
1313
Enhancements
1414
~~~~~~~~~~~~
1515

16+
.. _whatsnew_110.specify_missing_labels:
17+
18+
KeyErrors raised by loc specify missing labels
19+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
20+
Previously, if labels were missing for a loc call, a KeyError was raised stating that this was no longer supported.
21+
22+
Now the error message also includes a list of the missing labels (max 10 items, display width 80 characters). See :issue:`34272`.
23+
24+
1625
.. _whatsnew_110.astype_string:
1726

1827
All dtypes can now be converted to ``StringDtype``
@@ -318,6 +327,7 @@ Other enhancements
318327
- :meth:`DataFrame.cov` and :meth:`Series.cov` now support a new parameter ddof to support delta degrees of freedom as in the corresponding numpy methods (:issue:`34611`).
319328
- :meth:`DataFrame.to_html` and :meth:`DataFrame.to_string`'s ``col_space`` parameter now accepts a list or dict to change only some specific columns' width (:issue:`28917`).
320329
- :meth:`DataFrame.to_excel` can now also write OpenOffice spreadsheet (.ods) files (:issue:`27222`)
330+
- :meth:`~Series.explode` now accepts ``ignore_index`` to reset the index, similarly to :meth:`pd.concat` or :meth:`DataFrame.sort_values` (:issue:`34932`).
321331

322332
.. ---------------------------------------------------------------------------
323333
@@ -663,6 +673,7 @@ Other API changes
663673
- ``loc`` lookups with an object-dtype :class:`Index` and an integer key will now raise ``KeyError`` instead of ``TypeError`` when key is missing (:issue:`31905`)
664674
- Using a :func:`pandas.api.indexers.BaseIndexer` with ``count``, ``min``, ``max``, ``median``, ``skew``, ``cov``, ``corr`` will now return correct results for any monotonic :func:`pandas.api.indexers.BaseIndexer` descendant (:issue:`32865`)
665675
- Added a :func:`pandas.api.indexers.FixedForwardWindowIndexer` class to support forward-looking windows during ``rolling`` operations.
676+
- Added a :func:`pandas.api.indexers.NonFixedVariableWindowIndexer` class to support ``rolling`` operations with non-fixed offsets (:issue:`34994`)
666677
- Added :class:`pandas.errors.InvalidIndexError` (:issue:`34570`).
667678
- :meth:`DataFrame.swaplevels` now raises a ``TypeError`` if the axis is not a :class:`MultiIndex`.
668679
Previously an ``AttributeError`` was raised (:issue:`31126`)
@@ -1030,6 +1041,7 @@ I/O
10301041
- Bug in :meth:`read_excel` for ODS files removes 0.0 values (:issue:`27222`)
10311042
- Bug in :meth:`ujson.encode` was raising an `OverflowError` with numbers larger than sys.maxsize (:issue: `34395`)
10321043
- Bug in :meth:`HDFStore.append_to_multiple` was raising a ``ValueError`` when the min_itemsize parameter is set (:issue:`11238`)
1044+
- :meth:`read_json` now could read line-delimited json file from a file url while `lines` and `chunksize` are set.
10331045

10341046
Plotting
10351047
^^^^^^^^

environment.yml

-1
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,6 @@ dependencies:
3737
# Dask and its dependencies (that dont install with dask)
3838
- dask-core
3939
- toolz>=0.7.3
40-
- fsspec>=0.5.1
4140
- partd>=0.3.10
4241
- cloudpickle>=0.2.1
4342

pandas/_libs/reduction.pyx

+1-173
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,12 @@
11
from copy import copy
22

33
from cython import Py_ssize_t
4-
from cpython.ref cimport Py_INCREF
54

65
from libc.stdlib cimport malloc, free
76

87
import numpy as np
98
cimport numpy as cnp
10-
from numpy cimport (ndarray,
11-
int64_t,
12-
PyArray_SETITEM,
13-
PyArray_ITER_NEXT, PyArray_ITER_DATA, PyArray_IterNew,
14-
flatiter)
9+
from numpy cimport ndarray, int64_t
1510
cnp.import_array()
1611

1712
from pandas._libs cimport util
@@ -26,146 +21,6 @@ cdef _check_result_array(object obj, Py_ssize_t cnt):
2621
raise ValueError('Function does not reduce')
2722

2823

29-
cdef class Reducer:
30-
"""
31-
Performs generic reduction operation on a C or Fortran-contiguous ndarray
32-
while avoiding ndarray construction overhead
33-
"""
34-
cdef:
35-
Py_ssize_t increment, chunksize, nresults
36-
object dummy, f, labels, typ, ityp, index
37-
ndarray arr
38-
39-
def __init__(
40-
self, ndarray arr, object f, int axis=1, object dummy=None, object labels=None
41-
):
42-
cdef:
43-
Py_ssize_t n, k
44-
45-
n, k = (<object>arr).shape
46-
47-
if axis == 0:
48-
if not arr.flags.f_contiguous:
49-
arr = arr.copy('F')
50-
51-
self.nresults = k
52-
self.chunksize = n
53-
self.increment = n * arr.dtype.itemsize
54-
else:
55-
if not arr.flags.c_contiguous:
56-
arr = arr.copy('C')
57-
58-
self.nresults = n
59-
self.chunksize = k
60-
self.increment = k * arr.dtype.itemsize
61-
62-
self.f = f
63-
self.arr = arr
64-
self.labels = labels
65-
self.dummy, self.typ, self.index, self.ityp = self._check_dummy(
66-
dummy=dummy)
67-
68-
cdef _check_dummy(self, object dummy=None):
69-
cdef:
70-
object index = None, typ = None, ityp = None
71-
72-
if dummy is None:
73-
dummy = np.empty(self.chunksize, dtype=self.arr.dtype)
74-
75-
# our ref is stolen later since we are creating this array
76-
# in cython, so increment first
77-
Py_INCREF(dummy)
78-
79-
else:
80-
81-
# we passed a Series
82-
typ = type(dummy)
83-
index = dummy.index
84-
dummy = dummy.values
85-
86-
if dummy.dtype != self.arr.dtype:
87-
raise ValueError('Dummy array must be same dtype')
88-
if len(dummy) != self.chunksize:
89-
raise ValueError(f'Dummy array must be length {self.chunksize}')
90-
91-
return dummy, typ, index, ityp
92-
93-
def get_result(self):
94-
cdef:
95-
char* dummy_buf
96-
ndarray arr, result, chunk
97-
Py_ssize_t i
98-
flatiter it
99-
object res, name, labels
100-
object cached_typ = None
101-
102-
arr = self.arr
103-
chunk = self.dummy
104-
dummy_buf = chunk.data
105-
chunk.data = arr.data
106-
labels = self.labels
107-
108-
result = np.empty(self.nresults, dtype='O')
109-
it = <flatiter>PyArray_IterNew(result)
110-
reduction_success = True
111-
112-
try:
113-
for i in range(self.nresults):
114-
115-
# create the cached type
116-
# each time just reassign the data
117-
if i == 0:
118-
119-
if self.typ is not None:
120-
# In this case, we also have self.index
121-
name = labels[i]
122-
cached_typ = self.typ(
123-
chunk, index=self.index, name=name, dtype=arr.dtype)
124-
125-
# use the cached_typ if possible
126-
if cached_typ is not None:
127-
# In this case, we also have non-None labels
128-
name = labels[i]
129-
130-
object.__setattr__(
131-
cached_typ._mgr._block, 'values', chunk)
132-
object.__setattr__(cached_typ, 'name', name)
133-
res = self.f(cached_typ)
134-
else:
135-
res = self.f(chunk)
136-
137-
# TODO: reason for not squeezing here?
138-
extracted_res = _extract_result(res, squeeze=False)
139-
if i == 0:
140-
# On the first pass, we check the output shape to see
141-
# if this looks like a reduction.
142-
# If it does not, return the computed value to be used by the
143-
# pure python implementation,
144-
# so the function won't be called twice on the same object,
145-
# and side effects would occur twice
146-
try:
147-
_check_result_array(extracted_res, len(self.dummy))
148-
except ValueError as err:
149-
if "Function does not reduce" not in str(err):
150-
# catch only the specific exception
151-
raise
152-
153-
reduction_success = False
154-
PyArray_SETITEM(result, PyArray_ITER_DATA(it), copy(res))
155-
break
156-
157-
PyArray_SETITEM(result, PyArray_ITER_DATA(it), extracted_res)
158-
chunk.data = chunk.data + self.increment
159-
PyArray_ITER_NEXT(it)
160-
161-
finally:
162-
# so we don't free the wrong memory
163-
chunk.data = dummy_buf
164-
165-
result = maybe_convert_objects(result)
166-
return result, reduction_success
167-
168-
16924
cdef class _BaseGrouper:
17025
cdef _check_dummy(self, object dummy):
17126
# both values and index must be an ndarray!
@@ -610,30 +465,3 @@ cdef class BlockSlider:
610465
# axis=1 is the frame's axis=0
611466
arr.data = self.base_ptrs[i]
612467
arr.shape[1] = 0
613-
614-
615-
def compute_reduction(arr: ndarray, f, axis: int = 0, dummy=None, labels=None):
616-
"""
617-
618-
Parameters
619-
-----------
620-
arr : np.ndarray
621-
f : function
622-
axis : integer axis
623-
dummy : type of reduced output (series)
624-
labels : Index or None
625-
"""
626-
627-
# We either have both dummy and labels, or neither of them
628-
if (labels is None) ^ (dummy is None):
629-
raise ValueError("Must pass either dummy and labels, or neither")
630-
631-
if labels is not None:
632-
# Caller is responsible for ensuring we don't have MultiIndex
633-
assert labels.nlevels == 1
634-
635-
# pass as an ndarray/ExtensionArray
636-
labels = labels._values
637-
638-
reducer = Reducer(arr, f, axis=axis, dummy=dummy, labels=labels)
639-
return reducer.get_result()

pandas/_libs/tslibs/conversion.pyx

+1-1
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ cdef inline int64_t cast_from_unit(object ts, str unit) except? -1:
7777
return <int64_t>(base * m) + <int64_t>(frac * m)
7878

7979

80-
cpdef inline object precision_from_unit(str unit):
80+
cpdef inline (int64_t, int) precision_from_unit(str unit):
8181
"""
8282
Return a casting of the unit represented to nanoseconds + the precision
8383
to round the fractional part.

pandas/_libs/tslibs/fields.pyx

+4-4
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ def build_field_sarray(const int64_t[:] dtindex):
9191

9292
@cython.wraparound(False)
9393
@cython.boundscheck(False)
94-
def get_date_name_field(const int64_t[:] dtindex, object field, object locale=None):
94+
def get_date_name_field(const int64_t[:] dtindex, str field, object locale=None):
9595
"""
9696
Given a int64-based datetime index, return array of strings of date
9797
name based on requested field (e.g. day_name)
@@ -141,7 +141,7 @@ def get_date_name_field(const int64_t[:] dtindex, object field, object locale=No
141141

142142
@cython.wraparound(False)
143143
@cython.boundscheck(False)
144-
def get_start_end_field(const int64_t[:] dtindex, object field,
144+
def get_start_end_field(const int64_t[:] dtindex, str field,
145145
object freqstr=None, int month_kw=12):
146146
"""
147147
Given an int64-based datetime index return array of indicators
@@ -386,7 +386,7 @@ def get_start_end_field(const int64_t[:] dtindex, object field,
386386

387387
@cython.wraparound(False)
388388
@cython.boundscheck(False)
389-
def get_date_field(const int64_t[:] dtindex, object field):
389+
def get_date_field(const int64_t[:] dtindex, str field):
390390
"""
391391
Given a int64-based datetime index, extract the year, month, etc.,
392392
field and return an array of these values.
@@ -548,7 +548,7 @@ def get_date_field(const int64_t[:] dtindex, object field):
548548

549549
@cython.wraparound(False)
550550
@cython.boundscheck(False)
551-
def get_timedelta_field(const int64_t[:] tdindex, object field):
551+
def get_timedelta_field(const int64_t[:] tdindex, str field):
552552
"""
553553
Given a int64-based timedelta index, extract the days, hrs, sec.,
554554
field and return an array of these values.

pandas/_libs/tslibs/nattype.pyx

+10-10
Original file line numberDiff line numberDiff line change
@@ -50,23 +50,23 @@ _nat_scalar_rules[Py_GE] = False
5050
# ----------------------------------------------------------------------
5151

5252

53-
def _make_nan_func(func_name, doc):
53+
def _make_nan_func(func_name: str, doc: str):
5454
def f(*args, **kwargs):
5555
return np.nan
5656
f.__name__ = func_name
5757
f.__doc__ = doc
5858
return f
5959

6060

61-
def _make_nat_func(func_name, doc):
61+
def _make_nat_func(func_name: str, doc: str):
6262
def f(*args, **kwargs):
6363
return c_NaT
6464
f.__name__ = func_name
6565
f.__doc__ = doc
6666
return f
6767

6868

69-
def _make_error_func(func_name, cls):
69+
def _make_error_func(func_name: str, cls):
7070
def f(*args, **kwargs):
7171
raise ValueError(f"NaTType does not support {func_name}")
7272

@@ -282,31 +282,31 @@ cdef class _NaT(datetime):
282282
return NPY_NAT
283283

284284
@property
285-
def is_leap_year(self):
285+
def is_leap_year(self) -> bool:
286286
return False
287287

288288
@property
289-
def is_month_start(self):
289+
def is_month_start(self) -> bool:
290290
return False
291291

292292
@property
293-
def is_quarter_start(self):
293+
def is_quarter_start(self) -> bool:
294294
return False
295295

296296
@property
297-
def is_year_start(self):
297+
def is_year_start(self) -> bool:
298298
return False
299299

300300
@property
301-
def is_month_end(self):
301+
def is_month_end(self) -> bool:
302302
return False
303303

304304
@property
305-
def is_quarter_end(self):
305+
def is_quarter_end(self) -> bool:
306306
return False
307307

308308
@property
309-
def is_year_end(self):
309+
def is_year_end(self) -> bool:
310310
return False
311311

312312

pandas/_libs/tslibs/period.pyx

+2-1
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ import cython
1414

1515
from cpython.datetime cimport (
1616
datetime,
17+
tzinfo,
1718
PyDate_Check,
1819
PyDateTime_Check,
1920
PyDateTime_IMPORT,
@@ -1417,7 +1418,7 @@ def extract_freq(ndarray[object] values):
14171418

14181419
@cython.wraparound(False)
14191420
@cython.boundscheck(False)
1420-
def dt64arr_to_periodarr(const int64_t[:] stamps, int freq, object tz):
1421+
def dt64arr_to_periodarr(const int64_t[:] stamps, int freq, tzinfo tz):
14211422
cdef:
14221423
Py_ssize_t n = len(stamps)
14231424
int64_t[:] result = np.empty(n, dtype=np.int64)

0 commit comments

Comments
 (0)