Skip to content

Commit b3559c3

Browse files
authored
Merge pull request #254 from pandas-dev/master
Sync Fork from Upstream Repo
2 parents 6267900 + 4f0e2e9 commit b3559c3

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+1086
-760
lines changed

.pre-commit-config.yaml

+4-3
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ repos:
99
- id: absolufy-imports
1010
files: ^pandas/
1111
- repo: https://github.com/python/black
12-
rev: 21.6b0
12+
rev: 21.7b0
1313
hooks:
1414
- id: black
1515
- repo: https://github.com/codespell-project/codespell
@@ -44,6 +44,7 @@ repos:
4444
- flake8-bugbear==21.3.2
4545
- pandas-dev-flaker==0.2.0
4646
- id: flake8
47+
alias: flake8-cython
4748
name: flake8 (cython)
4849
types: [cython]
4950
args: [--append-config=flake8/cython.cfg]
@@ -53,11 +54,11 @@ repos:
5354
types: [text]
5455
args: [--append-config=flake8/cython-template.cfg]
5556
- repo: https://github.com/PyCQA/isort
56-
rev: 5.9.2
57+
rev: 5.9.3
5758
hooks:
5859
- id: isort
5960
- repo: https://github.com/asottile/pyupgrade
60-
rev: v2.21.0
61+
rev: v2.23.3
6162
hooks:
6263
- id: pyupgrade
6364
args: [--py38-plus]

ci/deps/actions-38-db.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ dependencies:
1515
- beautifulsoup4
1616
- botocore>=1.11
1717
- dask
18-
- fastparquet>=0.4.0, < 0.7.0
18+
- fastparquet>=0.4.0
1919
- fsspec>=0.7.4, <2021.6.0
2020
- gcsfs>=0.6.0
2121
- geopandas

ci/deps/azure-windows-38.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ dependencies:
1515
# pandas dependencies
1616
- blosc
1717
- bottleneck
18-
- fastparquet>=0.4.0, <0.7.0
18+
- fastparquet>=0.4.0
1919
- flask
2020
- fsspec>=0.8.0, <2021.6.0
2121
- matplotlib=3.3.2

doc/source/user_guide/duplicates.rst

+1
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ duplicates present. The output can't be determined, and so pandas raises.
2828

2929
.. ipython:: python
3030
:okexcept:
31+
:okwarning:
3132
3233
s1 = pd.Series([0, 1, 2], index=["a", "b", "b"])
3334
s1.reindex(["a", "b", "c"])

doc/source/whatsnew/v1.3.2.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ Bug fixes
4444

4545
Other
4646
~~~~~
47-
-
47+
- :meth:`pandas.read_parquet` now supports reading nullable dtypes with ``fastparquet`` versions above 0.7.1.
4848
-
4949

5050
.. ---------------------------------------------------------------------------

doc/source/whatsnew/v1.4.0.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Other enhancements
3434
- :meth:`Series.sample`, :meth:`DataFrame.sample`, and :meth:`.GroupBy.sample` now accept a ``np.random.Generator`` as input to ``random_state``. A generator will be more performant, especially with ``replace=False`` (:issue:`38100`)
3535
- Additional options added to :meth:`.Styler.bar` to control alignment and display, with keyword only arguments (:issue:`26070`, :issue:`36419`)
3636
- :meth:`Styler.bar` now validates the input argument ``width`` and ``height`` (:issue:`42511`)
37-
- Add keyword ``levels`` to :meth:`.Styler.hide_index` for optionally controlling hidden levels in a MultiIndex (:issue:`25475`)
37+
- Add keyword ``level`` to :meth:`.Styler.hide_index` and :meth:`.Styler.hide_columns` for optionally controlling hidden levels in a MultiIndex (:issue:`25475`)
3838
- :meth:`Series.ewm`, :meth:`DataFrame.ewm`, now support a ``method`` argument with a ``'table'`` option that performs the windowing operation over an entire :class:`DataFrame`. See :ref:`Window Overview <window.overview>` for performance and functional benefits (:issue:`42273`)
3939
- Added ``sparse_index`` and ``sparse_columns`` keyword arguments to :meth:`.Styler.to_html` (:issue:`41946`)
4040
- Added keyword argument ``environment`` to :meth:`.Styler.to_latex` also allowing a specific "longtable" entry with a separate jinja2 template (:issue:`41866`)
@@ -162,6 +162,7 @@ Deprecations
162162
- Deprecated ignoring missing labels when indexing with a sequence of labels on a level of a MultiIndex (:issue:`42351`)
163163
- Creating an empty Series without a dtype will now raise a more visible ``FutureWarning`` instead of a ``DeprecationWarning`` (:issue:`30017`)
164164
- Deprecated the 'kind' argument in :meth:`Index.get_slice_bound`, :meth:`Index.slice_indexer`, :meth:`Index.slice_locs`; in a future version passing 'kind' will raise (:issue:`42857`)
165+
- Deprecated :meth:`Index.reindex` with a non-unique index (:issue:`42568`)
165166
-
166167

167168
.. ---------------------------------------------------------------------------
@@ -232,6 +233,7 @@ Interval
232233
Indexing
233234
^^^^^^^^
234235
- Bug in :meth:`DataFrame.truncate` and :meth:`Series.truncate` when the object's Index has a length greater than one but only one unique value (:issue:`42365`)
236+
- Bug in :meth:`Series.loc` and :meth:`DataFrame.loc` with a :class:`MultiIndex` when indexing with a tuple in which one of the levels is also a tuple (:issue:`27591`)
235237
- Bug in :meth:`Series.loc` when with a :class:`MultiIndex` whose first level contains only ``np.nan`` values (:issue:`42055`)
236238
- Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` when passing a string, the return type depended on whether the index was monotonic (:issue:`24892`)
237239
- Bug in indexing on a :class:`MultiIndex` failing to drop scalar levels when the indexer is a tuple containing a datetime-like string (:issue:`42476`)

environment.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ dependencies:
9999
- xlwt
100100
- odfpy
101101

102-
- fastparquet>=0.4.0, <0.7.0 # pandas.read_parquet, DataFrame.to_parquet
102+
- fastparquet>=0.4.0 # pandas.read_parquet, DataFrame.to_parquet
103103
- pyarrow>=0.17.0 # pandas.read_parquet, DataFrame.to_parquet, pandas.read_feather, DataFrame.to_feather
104104
- python-snappy # required by pyarrow
105105

flake8/cython.cfg

+15-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,17 @@
11
[flake8]
22
filename = *.pyx,*.pxd
3-
select=E501,E302,E203,E111,E114,E221,E303,E128,E231,E126,E265,E305,E301,E127,E261,E271,E129,W291,E222,E241,E123,F403,C400,C401,C402,C403,C404,C405,C406,C407,C408,C409,C410,C411
3+
extend_ignore=
4+
# whitespace before '('
5+
E211,
6+
# missing whitespace around operator
7+
E225,
8+
# missing whitespace around arithmetic operator
9+
E226,
10+
# missing whitespace around bitwise or shift operator
11+
E227,
12+
# ambiguous variable name (# FIXME maybe this one can be fixed)
13+
E741,
14+
# invalid syntax
15+
E999,
16+
# invalid escape sequence (# FIXME maybe this one can be fixed)
17+
W605,

pandas/_libs/algos.pyx

+12-6
Original file line numberDiff line numberDiff line change
@@ -274,16 +274,22 @@ cdef inline numeric kth_smallest_c(numeric* arr, Py_ssize_t k, Py_ssize_t n) nog
274274
j = m
275275

276276
while 1:
277-
while arr[i] < x: i += 1
278-
while x < arr[j]: j -= 1
277+
while arr[i] < x:
278+
i += 1
279+
while x < arr[j]:
280+
j -= 1
279281
if i <= j:
280282
swap(&arr[i], &arr[j])
281-
i += 1; j -= 1
283+
i += 1
284+
j -= 1
282285

283-
if i > j: break
286+
if i > j:
287+
break
284288

285-
if j < k: l = i
286-
if k < i: m = j
289+
if j < k:
290+
l = i
291+
if k < i:
292+
m = j
287293
return arr[k]
288294

289295

pandas/_libs/khash.pxd

+4-4
Original file line numberDiff line numberDiff line change
@@ -26,20 +26,20 @@ cdef extern from "khash_python.h":
2626
double imag
2727

2828
bint are_equivalent_khcomplex128_t \
29-
"kh_complex_hash_equal" (khcomplex128_t a, khcomplex128_t b) nogil
29+
"kh_complex_hash_equal" (khcomplex128_t a, khcomplex128_t b) nogil
3030

3131
ctypedef struct khcomplex64_t:
3232
float real
3333
float imag
3434

3535
bint are_equivalent_khcomplex64_t \
36-
"kh_complex_hash_equal" (khcomplex64_t a, khcomplex64_t b) nogil
36+
"kh_complex_hash_equal" (khcomplex64_t a, khcomplex64_t b) nogil
3737

3838
bint are_equivalent_float64_t \
39-
"kh_floats_hash_equal" (float64_t a, float64_t b) nogil
39+
"kh_floats_hash_equal" (float64_t a, float64_t b) nogil
4040

4141
bint are_equivalent_float32_t \
42-
"kh_floats_hash_equal" (float32_t a, float32_t b) nogil
42+
"kh_floats_hash_equal" (float32_t a, float32_t b) nogil
4343

4444
uint32_t kh_python_hash_func(object key)
4545
bint kh_python_hash_equal(object a, object b)

pandas/_libs/lib.pyx

+3-3
Original file line numberDiff line numberDiff line change
@@ -2107,9 +2107,9 @@ cpdef bint is_interval_array(ndarray values):
21072107
return False
21082108
elif numeric:
21092109
if not (
2110-
util.is_float_object(val.left)
2111-
or util.is_integer_object(val.left)
2112-
):
2110+
util.is_float_object(val.left)
2111+
or util.is_integer_object(val.left)
2112+
):
21132113
# i.e. datetime64 or timedelta64
21142114
return False
21152115
elif td64:

pandas/_libs/parsers.pyx

+3-3
Original file line numberDiff line numberDiff line change
@@ -356,7 +356,7 @@ cdef class TextReader:
356356
thousands=None, # bytes | str
357357
dtype=None,
358358
usecols=None,
359-
on_bad_lines = ERROR,
359+
on_bad_lines=ERROR,
360360
bint na_filter=True,
361361
na_values=None,
362362
na_fvalues=None,
@@ -1442,7 +1442,7 @@ cdef _categorical_convert(parser_t *parser, int64_t col,
14421442

14431443
if na_filter:
14441444
if kh_get_str_starts_item(na_hashset, word):
1445-
# is in NA values
1445+
# is in NA values
14461446
na_count += 1
14471447
codes[i] = NA
14481448
continue
@@ -1578,7 +1578,7 @@ cdef inline int _try_double_nogil(parser_t *parser,
15781578
strcasecmp(word, cposinfty) == 0):
15791579
data[0] = INF
15801580
elif (strcasecmp(word, cneginf) == 0 or
1581-
strcasecmp(word, cneginfty) == 0 ):
1581+
strcasecmp(word, cneginfty) == 0):
15821582
data[0] = NEGINF
15831583
else:
15841584
return 1

pandas/_libs/tslibs/conversion.pyx

+2-2
Original file line numberDiff line numberDiff line change
@@ -200,7 +200,7 @@ cdef inline int64_t get_datetime64_nanos(object val) except? -1:
200200

201201
@cython.boundscheck(False)
202202
@cython.wraparound(False)
203-
def ensure_datetime64ns(arr: ndarray, copy: bool=True):
203+
def ensure_datetime64ns(arr: ndarray, copy: bool = True):
204204
"""
205205
Ensure a np.datetime64 array has dtype specifically 'datetime64[ns]'
206206
@@ -260,7 +260,7 @@ def ensure_datetime64ns(arr: ndarray, copy: bool=True):
260260
return result
261261

262262

263-
def ensure_timedelta64ns(arr: ndarray, copy: bool=True):
263+
def ensure_timedelta64ns(arr: ndarray, copy: bool = True):
264264
"""
265265
Ensure a np.timedelta64 array has dtype specifically 'timedelta64[ns]'
266266

pandas/_libs/tslibs/np_datetime.pyx

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ cdef extern from "src/datetime/np_datetime.h":
3838
void pandas_timedelta_to_timedeltastruct(npy_timedelta val,
3939
NPY_DATETIMEUNIT fr,
4040
pandas_timedeltastruct *result
41-
) nogil
41+
) nogil
4242

4343
npy_datetimestruct _NS_MIN_DTS, _NS_MAX_DTS
4444

pandas/_libs/tslibs/offsets.pyx

+2-2
Original file line numberDiff line numberDiff line change
@@ -1454,7 +1454,7 @@ cdef class BusinessHour(BusinessMixin):
14541454

14551455
def __init__(
14561456
self, n=1, normalize=False, start="09:00", end="17:00", offset=timedelta(0)
1457-
):
1457+
):
14581458
BusinessMixin.__init__(self, n, normalize, offset)
14591459

14601460
# must be validated here to equality check
@@ -3897,7 +3897,7 @@ cdef ndarray[int64_t] _shift_bdays(const int64_t[:] i8other, int periods):
38973897
return result.base
38983898

38993899

3900-
def shift_month(stamp: datetime, months: int, day_opt: object=None) -> datetime:
3900+
def shift_month(stamp: datetime, months: int, day_opt: object = None) -> datetime:
39013901
"""
39023902
Given a datetime (or Timestamp) `stamp`, an integer `months` and an
39033903
option `day_opt`, return a new datetimelike that many months later,

pandas/_libs/tslibs/period.pyx

+1-1
Original file line numberDiff line numberDiff line change
@@ -197,7 +197,7 @@ cdef freq_conv_func get_asfreq_func(int from_freq, int to_freq) nogil:
197197
return <freq_conv_func>asfreq_BtoW
198198
elif to_group == FR_BUS:
199199
return <freq_conv_func>no_op
200-
elif to_group in [FR_DAY, FR_HR, FR_MIN, FR_SEC, FR_MS, FR_US, FR_NS]:
200+
elif to_group in [FR_DAY, FR_HR, FR_MIN, FR_SEC, FR_MS, FR_US, FR_NS]:
201201
return <freq_conv_func>asfreq_BtoDT
202202
else:
203203
return <freq_conv_func>nofunc

pandas/_libs/tslibs/strptime.pyx

+4-4
Original file line numberDiff line numberDiff line change
@@ -199,17 +199,17 @@ def array_strptime(ndarray[object] values, object fmt, bint exact=True, errors='
199199
year = int(found_dict['Y'])
200200
elif parse_code == 2:
201201
month = int(found_dict['m'])
202-
elif parse_code == 3:
203202
# elif group_key == 'B':
203+
elif parse_code == 3:
204204
month = locale_time.f_month.index(found_dict['B'].lower())
205-
elif parse_code == 4:
206205
# elif group_key == 'b':
206+
elif parse_code == 4:
207207
month = locale_time.a_month.index(found_dict['b'].lower())
208-
elif parse_code == 5:
209208
# elif group_key == 'd':
209+
elif parse_code == 5:
210210
day = int(found_dict['d'])
211-
elif parse_code == 6:
212211
# elif group_key == 'H':
212+
elif parse_code == 6:
213213
hour = int(found_dict['H'])
214214
elif parse_code == 7:
215215
hour = int(found_dict['I'])

pandas/_libs/tslibs/timedeltas.pyx

+2-1
Original file line numberDiff line numberDiff line change
@@ -641,7 +641,8 @@ def _binary_op_method_timedeltalike(op, name):
641641
return NaT
642642

643643
elif is_datetime64_object(other) or (
644-
PyDateTime_Check(other) and not isinstance(other, ABCTimestamp)):
644+
PyDateTime_Check(other) and not isinstance(other, ABCTimestamp)
645+
):
645646
# this case is for a datetime object that is specifically
646647
# *not* a Timestamp, as the Timestamp case will be
647648
# handled after `_validate_ops_compat` returns False below

pandas/_libs/tslibs/timestamps.pyx

+1-1
Original file line numberDiff line numberDiff line change
@@ -1958,7 +1958,7 @@ default 'raise'
19581958
self.second / 3600.0 +
19591959
self.microsecond / 3600.0 / 1e+6 +
19601960
self.nanosecond / 3600.0 / 1e+9
1961-
) / 24.0)
1961+
) / 24.0)
19621962

19631963

19641964
# Aliases

pandas/_testing/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ def box_expected(expected, box_cls, transpose=True):
219219
else:
220220
expected = pd.array(expected)
221221
elif box_cls is Index:
222-
expected = Index(expected)
222+
expected = Index._with_infer(expected)
223223
elif box_cls is Series:
224224
expected = Series(expected)
225225
elif box_cls is DataFrame:

pandas/core/arrays/categorical.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -2031,7 +2031,9 @@ def _validate_listlike(self, value):
20312031
from pandas import Index
20322032

20332033
# tupleize_cols=False for e.g. test_fillna_iterable_category GH#41914
2034-
to_add = Index(value, tupleize_cols=False).difference(self.categories)
2034+
to_add = Index._with_infer(value, tupleize_cols=False).difference(
2035+
self.categories
2036+
)
20352037

20362038
# no assignments of values not in categories, but it's always ok to set
20372039
# something to np.nan
@@ -2741,6 +2743,7 @@ def factorize_from_iterable(values) -> tuple[np.ndarray, Index]:
27412743
# as values but its codes are by def [0, ..., len(n_categories) - 1]
27422744
cat_codes = np.arange(len(values.categories), dtype=values.codes.dtype)
27432745
cat = Categorical.from_codes(cat_codes, dtype=values.dtype)
2746+
27442747
categories = CategoricalIndex(cat)
27452748
codes = values.codes
27462749
else:

pandas/core/arrays/interval.py

+7-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,10 @@
1616

1717
from pandas._config import get_option
1818

19-
from pandas._libs import NaT
19+
from pandas._libs import (
20+
NaT,
21+
lib,
22+
)
2023
from pandas._libs.interval import (
2124
VALID_CLOSED,
2225
Interval,
@@ -225,6 +228,9 @@ def __new__(
225228
left, right, infer_closed = intervals_to_interval_bounds(
226229
data, validate_closed=closed is None
227230
)
231+
if left.dtype == object:
232+
left = lib.maybe_convert_objects(left)
233+
right = lib.maybe_convert_objects(right)
228234
closed = closed or infer_closed
229235

230236
return cls._simple_new(

pandas/core/dtypes/dtypes.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -529,7 +529,7 @@ def validate_categories(categories, fastpath: bool = False) -> Index:
529529
f"Parameter 'categories' must be list-like, was {repr(categories)}"
530530
)
531531
elif not isinstance(categories, ABCIndex):
532-
categories = Index(categories, tupleize_cols=False)
532+
categories = Index._with_infer(categories, tupleize_cols=False)
533533

534534
if not fastpath:
535535

pandas/core/groupby/generic.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -455,7 +455,7 @@ def _get_index() -> Index:
455455
if self.grouper.nkeys > 1:
456456
index = MultiIndex.from_tuples(keys, names=self.grouper.names)
457457
else:
458-
index = Index(keys, name=self.grouper.names[0])
458+
index = Index._with_infer(keys, name=self.grouper.names[0])
459459
return index
460460

461461
if isinstance(values[0], dict):

pandas/core/groupby/grouper.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -646,7 +646,7 @@ def group_index(self) -> Index:
646646
return self._group_index
647647

648648
uniques = self._codes_and_uniques[1]
649-
return Index(uniques, name=self.name)
649+
return Index._with_infer(uniques, name=self.name)
650650

651651
@cache_readonly
652652
def _codes_and_uniques(self) -> tuple[np.ndarray, ArrayLike]:

0 commit comments

Comments
 (0)