Skip to content

Commit 5c61723

Browse files
committed
Merge remote-tracking branch 'upstream/master' into test-min_period
2 parents 0ad20e7 + 0a44a51 commit 5c61723

33 files changed

+1534
-1676
lines changed

.github/workflows/ci.yml

-4
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,6 @@ jobs:
3232
with:
3333
fetch-depth: 0
3434

35-
- name: Looking for unwanted patterns
36-
run: ci/code_checks.sh patterns
37-
if: always()
38-
3935
- name: Cache conda
4036
uses: actions/cache@v2
4137
with:

.pre-commit-config.yaml

+22
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,29 @@ repos:
102102
# Incorrect code-block / IPython directives
103103
|\.\.\ code-block\ ::
104104
|\.\.\ ipython\ ::
105+
106+
# Check for deprecated messages without sphinx directive
107+
|(DEPRECATED|DEPRECATE|Deprecated)(:|,|\.)
105108
types_or: [python, cython, rst]
109+
- id: incorrect-backticks
110+
name: Check for backticks incorrectly rendering because of missing spaces
111+
language: pygrep
112+
entry: '[a-zA-Z0-9]\`\`?[a-zA-Z0-9]'
113+
types: [rst]
114+
files: ^doc/source/
115+
- id: seed-check-asv
116+
name: Check for unnecessary random seeds in asv benchmarks
117+
language: pygrep
118+
entry: 'np\.random\.seed'
119+
files: ^asv_bench/benchmarks
120+
exclude: ^asv_bench/benchmarks/pandas_vb_common\.py
121+
- id: invalid-ea-testing
122+
name: Check for invalid EA testing
123+
language: pygrep
124+
entry: 'tm\.assert_(series|frame)_equal'
125+
files: ^pandas/tests/extension/base
126+
types: [python]
127+
exclude: ^pandas/tests/extension/base/base\.py
106128
- id: pip-to-conda
107129
name: Generate pip dependency from conda
108130
description: This hook checks if the conda environment.yml and requirements-dev.txt are equal

ci/code_checks.sh

+2-25
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,13 @@
1111
# Usage:
1212
# $ ./ci/code_checks.sh # run all checks
1313
# $ ./ci/code_checks.sh lint # run linting only
14-
# $ ./ci/code_checks.sh patterns # check for patterns that should not exist
1514
# $ ./ci/code_checks.sh code # checks on imported code
1615
# $ ./ci/code_checks.sh doctests # run doctests
1716
# $ ./ci/code_checks.sh docstrings # validate docstring errors
1817
# $ ./ci/code_checks.sh typing # run static type analysis
1918

20-
[[ -z "$1" || "$1" == "lint" || "$1" == "patterns" || "$1" == "code" || "$1" == "doctests" || "$1" == "docstrings" || "$1" == "typing" ]] || \
21-
{ echo "Unknown command $1. Usage: $0 [lint|patterns|code|doctests|docstrings|typing]"; exit 9999; }
19+
[[ -z "$1" || "$1" == "lint" || "$1" == "code" || "$1" == "doctests" || "$1" == "docstrings" || "$1" == "typing" ]] || \
20+
{ echo "Unknown command $1. Usage: $0 [lint|code|doctests|docstrings|typing]"; exit 9999; }
2221

2322
BASE_DIR="$(dirname $0)/.."
2423
RET=0
@@ -58,28 +57,6 @@ if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then
5857

5958
fi
6059

61-
### PATTERNS ###
62-
if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
63-
64-
# Check for the following code in the extension array base tests: `tm.assert_frame_equal` and `tm.assert_series_equal`
65-
MSG='Check for invalid EA testing' ; echo $MSG
66-
invgrep -r -E --include '*.py' --exclude base.py 'tm.assert_(series|frame)_equal' pandas/tests/extension/base
67-
RET=$(($RET + $?)) ; echo $MSG "DONE"
68-
69-
MSG='Check for deprecated messages without sphinx directive' ; echo $MSG
70-
invgrep -R --include="*.py" --include="*.pyx" -E "(DEPRECATED|DEPRECATE|Deprecated)(:|,|\.)" pandas
71-
RET=$(($RET + $?)) ; echo $MSG "DONE"
72-
73-
MSG='Check for backticks incorrectly rendering because of missing spaces' ; echo $MSG
74-
invgrep -R --include="*.rst" -E "[a-zA-Z0-9]\`\`?[a-zA-Z0-9]" doc/source/
75-
RET=$(($RET + $?)) ; echo $MSG "DONE"
76-
77-
MSG='Check for unnecessary random seeds in asv benchmarks' ; echo $MSG
78-
invgrep -R --exclude pandas_vb_common.py -E 'np.random.seed' asv_bench/benchmarks/
79-
RET=$(($RET + $?)) ; echo $MSG "DONE"
80-
81-
fi
82-
8360
### CODE ###
8461
if [[ -z "$CHECK" || "$CHECK" == "code" ]]; then
8562

ci/deps/actions-38-db.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ dependencies:
1515
- beautifulsoup4
1616
- botocore>=1.11
1717
- dask
18-
- fastparquet>=0.4.0, < 0.7.0
18+
- fastparquet>=0.4.0
1919
- fsspec>=0.7.4, <2021.6.0
2020
- gcsfs>=0.6.0
2121
- geopandas

ci/deps/azure-windows-38.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ dependencies:
1515
# pandas dependencies
1616
- blosc
1717
- bottleneck
18-
- fastparquet>=0.4.0, <0.7.0
18+
- fastparquet>=0.4.0
1919
- flask
2020
- fsspec>=0.8.0, <2021.6.0
2121
- matplotlib=3.3.2

doc/source/user_guide/duplicates.rst

+1
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ duplicates present. The output can't be determined, and so pandas raises.
2828

2929
.. ipython:: python
3030
:okexcept:
31+
:okwarning:
3132
3233
s1 = pd.Series([0, 1, 2], index=["a", "b", "b"])
3334
s1.reindex(["a", "b", "c"])

doc/source/whatsnew/v1.3.2.rst

+3-1
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,9 @@ Bug fixes
3232
~~~~~~~~~
3333
- Bug in :meth:`pandas.read_excel` modifies the dtypes dictionary when reading a file with duplicate columns (:issue:`42462`)
3434
- 1D slices over extension types turn into N-dimensional slices over ExtensionArrays (:issue:`42430`)
35+
- Fixed bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` not calculating window bounds correctly for the first row when ``center=True`` and ``window`` is an offset that covers all the rows (:issue:`42753`)
3536
- :meth:`.Styler.hide_columns` now hides the index name header row as well as column headers (:issue:`42101`)
37+
- :meth:`.Styler.set_sticky` has amended CSS to control the column/index names and ensure the correct sticky positions (:issue:`42537`)
3638
- Bug in de-serializing datetime indexes in PYTHONOPTIMIZED mode (:issue:`42866`)
3739
-
3840

@@ -42,7 +44,7 @@ Bug fixes
4244

4345
Other
4446
~~~~~
45-
-
47+
- :meth:`pandas.read_parquet` now supports reading nullable dtypes with ``fastparquet`` versions above 0.7.1.
4648
-
4749

4850
.. ---------------------------------------------------------------------------

doc/source/whatsnew/v1.4.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,7 @@ Deprecations
162162
- Deprecated ignoring missing labels when indexing with a sequence of labels on a level of a MultiIndex (:issue:`42351`)
163163
- Creating an empty Series without a dtype will now raise a more visible ``FutureWarning`` instead of a ``DeprecationWarning`` (:issue:`30017`)
164164
- Deprecated the 'kind' argument in :meth:`Index.get_slice_bound`, :meth:`Index.slice_indexer`, :meth:`Index.slice_locs`; in a future version passing 'kind' will raise (:issue:`42857`)
165+
- Deprecated :meth:`Index.reindex` with a non-unique index (:issue:`42568`)
165166
-
166167

167168
.. ---------------------------------------------------------------------------

environment.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ dependencies:
9999
- xlwt
100100
- odfpy
101101

102-
- fastparquet>=0.4.0, <0.7.0 # pandas.read_parquet, DataFrame.to_parquet
102+
- fastparquet>=0.4.0 # pandas.read_parquet, DataFrame.to_parquet
103103
- pyarrow>=0.17.0 # pandas.read_parquet, DataFrame.to_parquet, pandas.read_feather, DataFrame.to_feather
104104
- python-snappy # required by pyarrow
105105

pandas/_libs/window/indexers.pyx

+4-5
Original file line numberDiff line numberDiff line change
@@ -79,12 +79,11 @@ def calculate_variable_window_bounds(
7979
else:
8080
end[0] = 0
8181
if center:
82-
for j in range(0, num_values + 1):
83-
if (index[j] == index[0] + index_growth_sign * window_size / 2 and
84-
right_closed):
82+
end_bound = index[0] + index_growth_sign * window_size / 2
83+
for j in range(0, num_values):
84+
if (index[j] < end_bound) or (index[j] == end_bound and right_closed):
8585
end[0] = j + 1
86-
break
87-
elif index[j] >= index[0] + index_growth_sign * window_size / 2:
86+
elif index[j] >= end_bound:
8887
end[0] = j
8988
break
9089

pandas/core/indexes/base.py

+9
Original file line numberDiff line numberDiff line change
@@ -3915,6 +3915,15 @@ def reindex(
39153915
)
39163916
indexer, _ = self.get_indexer_non_unique(target)
39173917

3918+
if not self.is_unique:
3919+
# GH#42568
3920+
warnings.warn(
3921+
"reindexing with a non-unique Index is deprecated and "
3922+
"will raise in a future version",
3923+
FutureWarning,
3924+
stacklevel=2,
3925+
)
3926+
39183927
target = self._wrap_reindex_result(target, indexer, preserve_names)
39193928
return target, indexer
39203929

pandas/core/indexes/category.py

+8
Original file line numberDiff line numberDiff line change
@@ -426,6 +426,14 @@ def reindex(
426426
missing = np.array([], dtype=np.intp)
427427
else:
428428
indexer, missing = self.get_indexer_non_unique(target)
429+
if not self.is_unique:
430+
# GH#42568
431+
warnings.warn(
432+
"reindexing with a non-unique Index is deprecated and will "
433+
"raise in a future version",
434+
FutureWarning,
435+
stacklevel=2,
436+
)
429437

430438
if len(self) and indexer is not None:
431439
new_target = self.take(indexer)

pandas/core/internals/concat.py

+5-2
Original file line numberDiff line numberDiff line change
@@ -94,12 +94,15 @@ def _concatenate_array_managers(
9494
concat_arrays([mgrs[i].arrays[j] for i in range(len(mgrs))])
9595
for j in range(len(mgrs[0].arrays))
9696
]
97-
return ArrayManager(arrays, [axes[1], axes[0]], verify_integrity=False)
9897
else:
9998
# concatting along the columns -> combine reindexed arrays in a single manager
10099
assert concat_axis == 0
101100
arrays = list(itertools.chain.from_iterable([mgr.arrays for mgr in mgrs]))
102-
return ArrayManager(arrays, [axes[1], axes[0]], verify_integrity=False)
101+
if copy:
102+
arrays = [x.copy() for x in arrays]
103+
104+
new_mgr = ArrayManager(arrays, [axes[1], axes[0]], verify_integrity=False)
105+
return new_mgr
103106

104107

105108
def concat_arrays(to_concat: list) -> ArrayLike:

pandas/io/formats/style.py

+59-26
Original file line numberDiff line numberDiff line change
@@ -1534,24 +1534,24 @@ def set_sticky(
15341534
may produce strange behaviour due to CSS controls with missing elements.
15351535
"""
15361536
if axis in [0, "index"]:
1537-
axis, obj, tag, pos = 0, self.data.index, "tbody", "left"
1537+
axis, obj = 0, self.data.index
15381538
pixel_size = 75 if not pixel_size else pixel_size
15391539
elif axis in [1, "columns"]:
1540-
axis, obj, tag, pos = 1, self.data.columns, "thead", "top"
1540+
axis, obj = 1, self.data.columns
15411541
pixel_size = 25 if not pixel_size else pixel_size
15421542
else:
15431543
raise ValueError("`axis` must be one of {0, 1, 'index', 'columns'}")
15441544

1545+
props = "position:sticky; background-color:white;"
15451546
if not isinstance(obj, pd.MultiIndex):
15461547
# handling MultiIndexes requires different CSS
1547-
props = "position:sticky; background-color:white;"
15481548

15491549
if axis == 1:
15501550
# stick the first <tr> of <head> and, if index names, the second <tr>
15511551
# if self._hide_columns then no <thead><tr> here will exist: no conflict
15521552
styles: CSSStyles = [
15531553
{
1554-
"selector": "thead tr:first-child",
1554+
"selector": "thead tr:nth-child(1) th",
15551555
"props": props + "top:0px; z-index:2;",
15561556
}
15571557
]
@@ -1561,7 +1561,7 @@ def set_sticky(
15611561
)
15621562
styles.append(
15631563
{
1564-
"selector": "thead tr:nth-child(2)",
1564+
"selector": "thead tr:nth-child(2) th",
15651565
"props": props
15661566
+ f"top:{pixel_size}px; z-index:2; height:{pixel_size}px; ",
15671567
}
@@ -1572,34 +1572,67 @@ def set_sticky(
15721572
# but <th> will exist in <thead>: conflict with initial element
15731573
styles = [
15741574
{
1575-
"selector": "tr th:first-child",
1575+
"selector": "thead tr th:nth-child(1)",
1576+
"props": props + "left:0px; z-index:3 !important;",
1577+
},
1578+
{
1579+
"selector": "tbody tr th:nth-child(1)",
15761580
"props": props + "left:0px; z-index:1;",
1577-
}
1581+
},
15781582
]
15791583

1580-
return self.set_table_styles(styles, overwrite=False)
1581-
15821584
else:
1585+
# handle the MultiIndex case
15831586
range_idx = list(range(obj.nlevels))
1587+
levels = sorted(levels) if levels else range_idx
15841588

1585-
levels = sorted(levels) if levels else range_idx
1586-
for i, level in enumerate(levels):
1587-
self.set_table_styles(
1588-
[
1589-
{
1590-
"selector": f"{tag} th.level{level}",
1591-
"props": f"position: sticky; "
1592-
f"{pos}: {i * pixel_size}px; "
1593-
f"{f'height: {pixel_size}px; ' if axis == 1 else ''}"
1594-
f"{f'min-width: {pixel_size}px; ' if axis == 0 else ''}"
1595-
f"{f'max-width: {pixel_size}px; ' if axis == 0 else ''}"
1596-
f"background-color: white;",
1597-
}
1598-
],
1599-
overwrite=False,
1600-
)
1589+
if axis == 1:
1590+
styles = []
1591+
for i, level in enumerate(levels):
1592+
styles.append(
1593+
{
1594+
"selector": f"thead tr:nth-child({level+1}) th",
1595+
"props": props
1596+
+ (
1597+
f"top:{i * pixel_size}px; height:{pixel_size}px; "
1598+
"z-index:2;"
1599+
),
1600+
}
1601+
)
1602+
if not all(name is None for name in self.index.names):
1603+
styles.append(
1604+
{
1605+
"selector": f"thead tr:nth-child({obj.nlevels+1}) th",
1606+
"props": props
1607+
+ (
1608+
f"top:{(i+1) * pixel_size}px; height:{pixel_size}px; "
1609+
"z-index:2;"
1610+
),
1611+
}
1612+
)
16011613

1602-
return self
1614+
else:
1615+
styles = []
1616+
for i, level in enumerate(levels):
1617+
props_ = props + (
1618+
f"left:{i * pixel_size}px; "
1619+
f"min-width:{pixel_size}px; "
1620+
f"max-width:{pixel_size}px; "
1621+
)
1622+
styles.extend(
1623+
[
1624+
{
1625+
"selector": f"thead tr th:nth-child({level+1})",
1626+
"props": props_ + "z-index:3 !important;",
1627+
},
1628+
{
1629+
"selector": f"tbody tr th.level{level}",
1630+
"props": props_ + "z-index:1;",
1631+
},
1632+
]
1633+
)
1634+
1635+
return self.set_table_styles(styles, overwrite=False)
16031636

16041637
def set_table_styles(
16051638
self,

pandas/io/parquet.py

+19-7
Original file line numberDiff line numberDiff line change
@@ -309,14 +309,21 @@ def write(
309309
def read(
310310
self, path, columns=None, storage_options: StorageOptions = None, **kwargs
311311
):
312+
parquet_kwargs = {}
312313
use_nullable_dtypes = kwargs.pop("use_nullable_dtypes", False)
313-
if use_nullable_dtypes:
314-
raise ValueError(
315-
"The 'use_nullable_dtypes' argument is not supported for the "
316-
"fastparquet engine"
317-
)
314+
# Technically works with 0.7.0, but was incorrect
315+
# so lets just require 0.7.1
316+
if Version(self.api.__version__) >= Version("0.7.1"):
317+
# Need to set even for use_nullable_dtypes = False,
318+
# since our defaults differ
319+
parquet_kwargs["pandas_nulls"] = use_nullable_dtypes
320+
else:
321+
if use_nullable_dtypes:
322+
raise ValueError(
323+
"The 'use_nullable_dtypes' argument is not supported for the "
324+
"fastparquet engine for fastparquet versions less than 0.7.1"
325+
)
318326
path = stringify_path(path)
319-
parquet_kwargs = {}
320327
handles = None
321328
if is_fsspec_url(path):
322329
fsspec = import_optional_dependency("fsspec")
@@ -337,6 +344,7 @@ def read(
337344
path, "rb", is_text=False, storage_options=storage_options
338345
)
339346
path = handles.handle
347+
340348
parquet_file = self.api.ParquetFile(path, **parquet_kwargs)
341349

342350
result = parquet_file.to_pandas(columns=columns, **kwargs)
@@ -470,14 +478,18 @@ def read_parquet(
470478
471479
use_nullable_dtypes : bool, default False
472480
If True, use dtypes that use ``pd.NA`` as missing value indicator
473-
for the resulting DataFrame (only applicable for ``engine="pyarrow"``).
481+
for the resulting DataFrame.
474482
As new dtypes are added that support ``pd.NA`` in the future, the
475483
output with this option will change to use those dtypes.
476484
Note: this is an experimental option, and behaviour (e.g. additional
477485
support dtypes) may change without notice.
478486
479487
.. versionadded:: 1.2.0
480488
489+
.. versionchanged:: 1.3.2
490+
``use_nullable_dtypes`` now works with the the ``fastparquet`` engine
491+
if ``fastparquet`` is version 0.7.1 or higher.
492+
481493
**kwargs
482494
Any additional kwargs are passed to the engine.
483495

0 commit comments

Comments
 (0)