Skip to content

Commit 3df5cf3

Browse files
Merge branch 'master' into read-excel-bug-dtype
2 parents 57c65e5 + 226876a commit 3df5cf3

File tree

101 files changed

+1822
-646
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

101 files changed

+1822
-646
lines changed

.github/workflows/database.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ jobs:
104104
run: python ci/print_skipped.py
105105

106106
- name: Upload coverage to Codecov
107-
uses: codecov/codecov-action@v1
107+
uses: codecov/codecov-action@v2
108108
with:
109109
flags: unittests
110110
name: codecov-pandas

.github/workflows/posix.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ jobs:
9494
run: python ci/print_skipped.py
9595

9696
- name: Upload coverage to Codecov
97-
uses: codecov/codecov-action@v1
97+
uses: codecov/codecov-action@v2
9898
with:
9999
flags: unittests
100100
name: codecov-pandas

.github/workflows/python-dev.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ jobs:
7878
coverage report -m
7979
8080
- name: Upload coverage to Codecov
81-
uses: codecov/codecov-action@v1
81+
uses: codecov/codecov-action@v2
8282
with:
8383
flags: unittests
8484
name: codecov-pandas

.pre-commit-config.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ repos:
110110
entry: python scripts/generate_pip_deps_from_conda.py
111111
files: ^(environment.yml|requirements-dev.txt)$
112112
pass_filenames: false
113-
additional_dependencies: [pyyaml]
113+
additional_dependencies: [pyyaml, toml]
114114
- id: sync-flake8-versions
115115
name: Check flake8 version is synced across flake8, yesqa, and environment.yml
116116
language: python

asv_bench/benchmarks/groupby.py

+12
Original file line numberDiff line numberDiff line change
@@ -369,6 +369,18 @@ def time_category_size(self):
369369
self.draws.groupby(self.cats).size()
370370

371371

372+
class Shift:
373+
def setup(self):
374+
N = 18
375+
self.df = DataFrame({"g": ["a", "b"] * 9, "v": list(range(N))})
376+
377+
def time_defaults(self):
378+
self.df.groupby("g").shift()
379+
380+
def time_fill_value(self):
381+
self.df.groupby("g").shift(fill_value=99)
382+
383+
372384
class FillNA:
373385
def setup(self):
374386
N = 100

asv_bench/benchmarks/reshape.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ def setup(self, dtype):
102102
columns = np.arange(n)
103103
if dtype == "int":
104104
values = np.arange(m * m * n).reshape(m * m, n)
105+
self.df = DataFrame(values, index, columns)
105106
else:
106107
# the category branch is ~20x slower than int. So we
107108
# cut down the size a bit. Now it's only ~3x slower.
@@ -111,7 +112,10 @@ def setup(self, dtype):
111112
values = np.take(list(string.ascii_letters), indices)
112113
values = [pd.Categorical(v) for v in values.T]
113114

114-
self.df = DataFrame(values, index, columns)
115+
self.df = DataFrame(
116+
{i: cat for i, cat in enumerate(values)}, index, columns
117+
)
118+
115119
self.df2 = self.df.iloc[:-1]
116120

117121
def time_full_product(self, dtype):

ci/code_checks.sh

+3
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,9 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then
121121
pandas/io/parsers/ \
122122
pandas/io/sas/ \
123123
pandas/io/sql.py \
124+
pandas/io/formats/format.py \
125+
pandas/io/formats/style.py \
126+
pandas/io/stata.py \
124127
pandas/tseries/
125128
RET=$(($RET + $?)) ; echo $MSG "DONE"
126129

ci/deps/actions-38-locale.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ dependencies:
1818
- html5lib
1919
- ipython
2020
- jinja2
21-
- jedi<0.18.0
21+
- jedi
2222
- lxml
2323
- matplotlib<3.3.0
2424
- moto

ci/deps/actions-39-slow.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ dependencies:
2323
- matplotlib
2424
- moto>=1.3.14
2525
- flask
26+
- numba
2627
- numexpr
2728
- numpy
2829
- openpyxl

ci/deps/actions-39.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ dependencies:
2222
- matplotlib
2323
- moto>=1.3.14
2424
- flask
25+
- numba
2526
- numexpr
2627
- numpy
2728
- openpyxl

ci/deps/azure-windows-39.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ dependencies:
2323
- matplotlib
2424
- moto>=1.3.14
2525
- flask
26+
- numba
2627
- numexpr
2728
- numpy
2829
- openpyxl

doc/source/_static/style/df_pipe.png

8.47 KB
Loading

doc/source/development/contributing_environment.rst

+2-5
Original file line numberDiff line numberDiff line change
@@ -189,11 +189,8 @@ Creating a Python environment (pip)
189189
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
190190

191191
If you aren't using conda for your development environment, follow these instructions.
192-
You'll need to have at least the :ref:`minimum Python version <install.version>` that pandas supports. If your Python version
193-
is 3.8.0 (or later), you might need to update your ``setuptools`` to version 42.0.0 (or later)
194-
in your development environment before installing the build dependencies::
195-
196-
pip install --upgrade setuptools
192+
You'll need to have at least the :ref:`minimum Python version <install.version>` that pandas supports.
193+
You also need to have ``setuptools`` 51.0.0 or later to build pandas.
197194

198195
**Unix**/**macOS with virtualenv**
199196

doc/source/user_guide/visualization.rst

+54
Original file line numberDiff line numberDiff line change
@@ -316,6 +316,34 @@ The ``by`` keyword can be specified to plot grouped histograms:
316316
@savefig grouped_hist.png
317317
data.hist(by=np.random.randint(0, 4, 1000), figsize=(6, 4));
318318
319+
.. ipython:: python
320+
:suppress:
321+
322+
plt.close("all")
323+
np.random.seed(123456)
324+
325+
In addition, the ``by`` keyword can also be specified in :meth:`DataFrame.plot.hist`.
326+
327+
.. versionchanged:: 1.4.0
328+
329+
.. ipython:: python
330+
331+
data = pd.DataFrame(
332+
{
333+
"a": np.random.choice(["x", "y", "z"], 1000),
334+
"b": np.random.choice(["e", "f", "g"], 1000),
335+
"c": np.random.randn(1000),
336+
"d": np.random.randn(1000) - 1,
337+
},
338+
)
339+
340+
@savefig grouped_hist_by.png
341+
data.plot.hist(by=["a", "b"], figsize=(10, 5));
342+
343+
.. ipython:: python
344+
:suppress:
345+
346+
plt.close("all")
319347
320348
.. _visualization.box:
321349

@@ -448,6 +476,32 @@ columns:
448476
449477
plt.close("all")
450478
479+
You could also create groupings with :meth:`DataFrame.plot.box`, for instance:
480+
481+
.. versionchanged:: 1.4.0
482+
483+
.. ipython:: python
484+
:suppress:
485+
486+
plt.close("all")
487+
np.random.seed(123456)
488+
489+
.. ipython:: python
490+
:okwarning:
491+
492+
df = pd.DataFrame(np.random.rand(10, 3), columns=["Col1", "Col2", "Col3"])
493+
df["X"] = pd.Series(["A", "A", "A", "A", "A", "B", "B", "B", "B", "B"])
494+
495+
plt.figure();
496+
497+
@savefig box_plot_ex4.png
498+
bp = df.plot.box(column=["Col1", "Col2"], by="X")
499+
500+
.. ipython:: python
501+
:suppress:
502+
503+
plt.close("all")
504+
451505
.. _visualization.box.return:
452506

453507
In ``boxplot``, the return type can be controlled by the ``return_type``, keyword. The valid choices are ``{"axes", "dict", "both", None}``.

doc/source/whatsnew/v1.3.2.rst

+11-3
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,15 @@ including other versions of pandas.
1414

1515
Fixed regressions
1616
~~~~~~~~~~~~~~~~~
17-
-
18-
-
17+
- Performance regression in :meth:`DataFrame.isin` and :meth:`Series.isin` for nullable data types (:issue:`42714`)
18+
- Regression in updating values of :class:`pandas.Series` using boolean index, created by using :meth:`pandas.DataFrame.pop` (:issue:`42530`)
19+
- Regression in :meth:`DataFrame.from_records` with empty records (:issue:`42456`)
20+
- Fixed regression in :meth:`DataFrame.shift` where TypeError occurred when shifting DataFrame created by concatenation of slices and fills with values (:issue:`42719`)
21+
- Regression in :meth:`DataFrame.agg` when the ``func`` argument returned lists and ``axis=1`` (:issue:`42727`)
22+
- Regression in :meth:`DataFrame.drop` does nothing if :class:`MultiIndex` has duplicates and indexer is a tuple or list of tuples (:issue:`42771`)
23+
- Fixed regression where :meth:`pandas.read_csv` raised a ``ValueError`` when parameters ``names`` and ``prefix`` were both set to None (:issue:`42387`)
24+
- Fixed regression in comparisons between :class:`Timestamp` object and ``datetime64`` objects outside the implementation bounds for nanosecond ``datetime64`` (:issue:`42794`)
25+
- Fixed regression in :meth:`.Styler.highlight_min` and :meth:`.Styler.highlight_max` where ``pandas.NA`` was not successfully ignored (:issue:`42650`)
1926

2027
.. ---------------------------------------------------------------------------
2128
@@ -24,7 +31,8 @@ Fixed regressions
2431
Bug fixes
2532
~~~~~~~~~
2633
- Bug in :meth:`pandas.read_excel` modifies the dtypes dictionary when reading a file with duplicate columns (:issue:`42462`)
27-
-
34+
- 1D slices over extension types turn into N-dimensional slices over ExtensionArrays (:issue:`42430`)
35+
- :meth:`.Styler.hide_columns` now hides the index name header row as well as column headers (:issue:`42101`)
2836

2937
.. ---------------------------------------------------------------------------
3038

doc/source/whatsnew/v1.4.0.rst

+16-3
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,9 @@ Other enhancements
3535
- Additional options added to :meth:`.Styler.bar` to control alignment and display, with keyword only arguments (:issue:`26070`, :issue:`36419`)
3636
- :meth:`Styler.bar` now validates the input argument ``width`` and ``height`` (:issue:`42511`)
3737
- :meth:`Series.ewm`, :meth:`DataFrame.ewm`, now support a ``method`` argument with a ``'table'`` option that performs the windowing operation over an entire :class:`DataFrame`. See :ref:`Window Overview <window.overview>` for performance and functional benefits (:issue:`42273`)
38+
- Added ``sparse_index`` and ``sparse_columns`` keyword arguments to :meth:`.Styler.to_html` (:issue:`41946`)
39+
- Added keyword argument ``environment`` to :meth:`.Styler.to_latex` also allowing a specific "longtable" entry with a separate jinja2 template (:issue:`41866`)
40+
- :meth:`.GroupBy.cummin` and :meth:`.GroupBy.cummax` now support the argument ``skipna`` (:issue:`34047`)
3841
-
3942

4043
.. ---------------------------------------------------------------------------
@@ -166,6 +169,10 @@ Performance improvements
166169
~~~~~~~~~~~~~~~~~~~~~~~~
167170
- Performance improvement in :meth:`.GroupBy.sample`, especially when ``weights`` argument provided (:issue:`34483`)
168171
- Performance improvement in :meth:`.GroupBy.transform` for user-defined functions (:issue:`41598`)
172+
- Performance improvement in constructing :class:`DataFrame` objects (:issue:`42631`)
173+
- Performance improvement in :meth:`GroupBy.shift` when ``fill_value`` argument is provided (:issue:`26615`)
174+
- Performance improvement in :meth:`DataFrame.corr` for ``method=pearson`` on data without missing values (:issue:`40956`)
175+
-
169176

170177
.. ---------------------------------------------------------------------------
171178
@@ -202,7 +209,7 @@ Numeric
202209
^^^^^^^
203210
- Bug in :meth:`DataFrame.rank` raising ``ValueError`` with ``object`` columns and ``method="first"`` (:issue:`41931`)
204211
- Bug in :meth:`DataFrame.rank` treating missing values and extreme values as equal (for example ``np.nan`` and ``np.inf``), causing incorrect results when ``na_option="bottom"`` or ``na_option="top`` used (:issue:`41931`)
205-
-
212+
- Bug in ``numexpr`` engine still being used when the option ``compute.use_numexpr`` is set to ``False`` (:issue:`32556`)
206213

207214
Conversion
208215
^^^^^^^^^^
@@ -225,7 +232,8 @@ Indexing
225232
- Bug in :meth:`Series.loc` when with a :class:`MultiIndex` whose first level contains only ``np.nan`` values (:issue:`42055`)
226233
- Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` when passing a string, the return type depended on whether the index was monotonic (:issue:`24892`)
227234
- Bug in indexing on a :class:`MultiIndex` failing to drop scalar levels when the indexer is a tuple containing a datetime-like string (:issue:`42476`)
228-
-
235+
- Bug in :meth:`DataFrame.sort_values` and :meth:`Series.sort_values` when passing an ascending value, failed to raise or incorrectly raising ``ValueError`` (:issue:`41634`)
236+
- Bug in updating values of :class:`pandas.Series` using boolean index, created by using :meth:`pandas.DataFrame.pop` (:issue:`42530`)
229237

230238
Missing
231239
^^^^^^^
@@ -260,11 +268,15 @@ Groupby/resample/rolling
260268
^^^^^^^^^^^^^^^^^^^^^^^^
261269
- Fixed bug in :meth:`SeriesGroupBy.apply` where passing an unrecognized string argument failed to raise ``TypeError`` when the underlying ``Series`` is empty (:issue:`42021`)
262270
- Bug in :meth:`Series.rolling.apply`, :meth:`DataFrame.rolling.apply`, :meth:`Series.expanding.apply` and :meth:`DataFrame.expanding.apply` with ``engine="numba"`` where ``*args`` were being cached with the user passed function (:issue:`42287`)
263-
-
271+
- Bug in :meth:`DataFrame.groupby.rolling.var` would calculate the rolling variance only on the first group (:issue:`42442`)
272+
- Bug in :meth:`GroupBy.shift` that would return the grouping columns if ``fill_value`` was not None (:issue:`41556`)
273+
- Bug in :meth:`pandas.DataFrame.ewm`, where non-float64 dtypes were silently failing (:issue:`42452`)
264274

265275
Reshaping
266276
^^^^^^^^^
277+
- Improved error message when creating a :class:`DataFrame` column from a multi-dimensional :class:`numpy.ndarray` (:issue:`42463`)
267278
- :func:`concat` creating :class:`MultiIndex` with duplicate level entries when concatenating a :class:`DataFrame` with duplicates in :class:`Index` and multiple keys (:issue:`42651`)
279+
- Bug in :meth:`pandas.cut` on :class:`Series` with duplicate indices (:issue:`42185`) and non-exact :meth:`pandas.CategoricalIndex` (:issue:`42425`)
268280
-
269281

270282
Sparse
@@ -284,6 +296,7 @@ Styler
284296

285297
Other
286298
^^^^^
299+
- Bug in :meth:`CustomBusinessMonthBegin.__add__` (:meth:`CustomBusinessMonthEnd.__add__`) not applying the extra ``offset`` parameter when beginning (end) of the target month is already a business day (:issue:`41356`)
287300

288301
.. ***DO NOT USE THIS SECTION***
289302

environment.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ dependencies:
108108
- fsspec>=0.7.4, <2021.6.0 # for generic remote file operations
109109
- gcsfs>=0.6.0 # file IO when using 'gcs://...' path
110110
- sqlalchemy # pandas.read_sql, DataFrame.to_sql
111-
- xarray # DataFrame.to_xarray
111+
- xarray<0.19 # DataFrame.to_xarray
112112
- cftime # Needed for downstream xarray.CFTimeIndex test
113113
- pyreadstat # pandas.read_spss
114114
- tabulate>=0.8.3 # DataFrame.to_markdown

0 commit comments

Comments
 (0)