Skip to content

Sync Fork from Upstream Repo #255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Aug 14, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
77443dc
ERR: clarify PerformanceWarning for fragmented frame (#42942)
mzeitlin11 Aug 9, 2021
449c56a
STYLE: moving spacing conventions in cython casting to pre-commit (#4…
01-vyom Aug 10, 2021
8a4f29a
CI: Pin dateutil (#42962)
lithomas1 Aug 10, 2021
8e73c30
TST: use fixtures for sql test data (#42959)
fangchenli Aug 10, 2021
93af2f0
DEPR: dropping nuisance columns in rolling aggregations (#42834)
jbrockmendel Aug 10, 2021
a926bce
TST: move index tests to correct files (#27045) (#42952)
felixDulys Aug 10, 2021
6a909a7
Fix cross-references in docs (#42949)
albertvillanova Aug 10, 2021
14cf6e2
BUG:Can't calculate quantiles from Int64Dtype Series when results are…
debnathshoham Aug 10, 2021
45fd72a
Revert fastparquet nullable dtype support (#42954)
lithomas1 Aug 10, 2021
83fabfb
BUG: df.drop not separating missing labels with commas (#42938)
zeitlinv Aug 10, 2021
61cbb73
REGR: Series.nlargest with masked arrays (#42838)
jbrockmendel Aug 10, 2021
974b00b
PERF: avoid repeating checks in interpolation (#42963)
jbrockmendel Aug 10, 2021
ffb5549
REF: format_object_attrs -> Index._format_attrs (#42953)
jbrockmendel Aug 10, 2021
99cf794
CLN: remove unused Series._index attribute (#42955)
jorisvandenbossche Aug 10, 2021
965d289
BUG: Pass copy argument to expanddim constructor in concat. (#42823)
jmcomie Aug 11, 2021
fca1f7c
BENCH: Fix misleading Interpolate asv (#42956)
jbrockmendel Aug 11, 2021
88a43d8
BUG: Attributes skipped when serialising plain Python objects to JSON…
joelgibson Aug 11, 2021
2fad5d7
DOC/CLN: 1.3.2 release notes (#42983)
simonjayhawkins Aug 11, 2021
5bd9bac
ENH: `Styler.apply_index` and `Styler.applymap_index` for conditional…
attack68 Aug 11, 2021
df19be9
TST: test_offsets files to better homes & parameterize (#27085) (#42991)
felixDulys Aug 12, 2021
50fdd86
TST: move orphaned business hour tests biz hour file (#27085) (#42989)
felixDulys Aug 12, 2021
860ff03
TST: move easter to own file and parameterize (#27085) (#42990)
felixDulys Aug 12, 2021
ab5322c
DOC: More subtotals / margins in pivot_table (#42922)
jxb4892 Aug 12, 2021
fe8276b
ENH: `Styler.apply(map)_index` made compatible with `Styler.to_latex`…
attack68 Aug 12, 2021
d037ff6
REF: remove libreduction.apply_frame_axis0 (#42992)
jbrockmendel Aug 12, 2021
6fd0164
CI: Run python dev with numpy python 3.10 wheels (#43005)
lithomas1 Aug 13, 2021
237886a
TST: refactor iris table creation in SQL test (#42988)
fangchenli Aug 13, 2021
e75f45f
PERF: Series.mad (#43010)
jbrockmendel Aug 13, 2021
c03ee85
POC: issue form for usage questions (#43009)
mzeitlin11 Aug 13, 2021
7db129e
DOC: NumericIndex (#42706)
topper-123 Aug 13, 2021
0799773
TST: refactor iris_view table in SQL test (#43024)
fangchenli Aug 13, 2021
1bd88d7
PERF: internals.concat (#43021)
jbrockmendel Aug 13, 2021
b57ec1a
TST: Fix test related to reverting fastparquet nullable support (#42999)
lithomas1 Aug 13, 2021
6c3a3dd
Fix multiple identical check (#42828)
gcaria Aug 13, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 0 additions & 24 deletions .github/ISSUE_TEMPLATE/submit_question.md

This file was deleted.

43 changes: 43 additions & 0 deletions .github/ISSUE_TEMPLATE/submit_question.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
name: Submit Question
description: Ask a general question about pandas
title: "QST: "
labels: [Usage Question, Needs Triage]

body:
- type: markdown
attributes:
value: >
Since [StackOverflow](https://stackoverflow.com) is better suited towards answering
usage questions, we ask that all usage questions are first asked on StackOverflow.
- type: checkboxes
attributes:
options:
- label: >
I have searched the [[pandas] tag](https://stackoverflow.com/questions/tagged/pandas)
on StackOverflow for similar questions.
required: true
- label: >
I have asked my usage related question on [StackOverflow](https://stackoverflow.com).
required: true
- type: input
id: question-link
attributes:
label: Link to question on StackOverflow
validations:
required: true
- type: markdown
attributes:
value: ---
- type: textarea
id: question
attributes:
label: Question about pandas
description: >
**Note**: If you'd still like to submit a question, please read [this guide](
https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) detailing
how to provide the necessary information for us to reproduce your question.
placeholder: |
```python
# Your code here, if applicable

```
4 changes: 0 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,6 @@ jobs:
- name: Build Pandas
uses: ./.github/actions/build_pandas

- name: Linting
run: ci/code_checks.sh lint
if: always()

- name: Checks on imported code
run: ci/code_checks.sh code
if: always()
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/python-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
- name: Install dependencies
run: |
python -m pip install --upgrade pip setuptools wheel
pip install git+https://github.com/numpy/numpy.git
pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
pip install git+https://github.com/pytest-dev/pytest.git
pip install git+https://github.com/nedbat/coveragepy.git
pip install cython python-dateutil pytz hypothesis pytest-xdist pytest-cov
Expand Down
5 changes: 5 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,11 @@ repos:
# Check for deprecated messages without sphinx directive
|(DEPRECATED|DEPRECATE|Deprecated)(:|,|\.)
types_or: [python, cython, rst]
- id: cython-casting
name: Check Cython casting is `<type>obj`, not `<type> obj`
language: pygrep
entry: '[a-zA-Z0-9*]> '
files: (\.pyx|\.pxi.in)$
- id: incorrect-backticks
name: Check for backticks incorrectly rendering because of missing spaces
language: pygrep
Expand Down
8 changes: 6 additions & 2 deletions asv_bench/benchmarks/frame_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -538,8 +538,12 @@ class Interpolate:
def setup(self, downcast):
N = 10000
# this is the worst case, where every column has NaNs.
self.df = DataFrame(np.random.randn(N, 100))
self.df.values[::2] = np.nan
arr = np.random.randn(N, 100)
# NB: we need to set values in array, not in df.values, otherwise
# the benchmark will be misleading for ArrayManager
arr[::2] = np.nan

self.df = DataFrame(arr)

self.df2 = DataFrame(
{
Expand Down
26 changes: 3 additions & 23 deletions ci/code_checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,18 @@
# Run checks related to code quality.
#
# This script is intended for both the CI and to check locally that code standards are
# respected. We are currently linting (PEP-8 and similar), looking for patterns of
# common mistakes (sphinx directives with missing blank lines, old style classes,
# unwanted imports...), we run doctests here (currently some files only), and we
# respected. We run doctests here (currently some files only), and we
# validate formatting error in docstrings.
#
# Usage:
# $ ./ci/code_checks.sh # run all checks
# $ ./ci/code_checks.sh lint # run linting only
# $ ./ci/code_checks.sh code # checks on imported code
# $ ./ci/code_checks.sh doctests # run doctests
# $ ./ci/code_checks.sh docstrings # validate docstring errors
# $ ./ci/code_checks.sh typing # run static type analysis

[[ -z "$1" || "$1" == "lint" || "$1" == "code" || "$1" == "doctests" || "$1" == "docstrings" || "$1" == "typing" ]] || \
{ echo "Unknown command $1. Usage: $0 [lint|code|doctests|docstrings|typing]"; exit 9999; }
[[ -z "$1" || "$1" == "code" || "$1" == "doctests" || "$1" == "docstrings" || "$1" == "typing" ]] || \
{ echo "Unknown command $1. Usage: $0 [code|doctests|docstrings|typing]"; exit 9999; }

BASE_DIR="$(dirname $0)/.."
RET=0
Expand All @@ -40,23 +37,6 @@ if [[ "$GITHUB_ACTIONS" == "true" ]]; then
INVGREP_PREPEND="##[error]"
fi

### LINTING ###
if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then

# Check that cython casting is of the form `<type>obj` as opposed to `<type> obj`;
# it doesn't make a difference, but we want to be internally consistent.
# Note: this grep pattern is (intended to be) equivalent to the python
# regex r'(?<![ ->])> '
MSG='Linting .pyx code for spacing conventions in casting' ; echo $MSG
invgrep -r -E --include '*.pyx' --include '*.pxi.in' '[a-zA-Z0-9*]> ' pandas/_libs
RET=$(($RET + $?)) ; echo $MSG "DONE"

# readability/casting: Warnings about C casting instead of C++ casting
# runtime/int: Warnings about using C number types instead of C++ ones
# build/include_subdir: Warnings about prefacing included header files with directory

fi

### CODE ###
if [[ -z "$CHECK" || "$CHECK" == "code" ]]; then

Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-39-numpydev.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ dependencies:
- hypothesis>=5.5.3

# pandas dependencies
- python-dateutil
- pytz
- pip
- pip:
- cython==0.29.21 # GH#34014
- "git+git://github.com/dateutil/dateutil.git"
- "--extra-index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple"
- "--pre"
- "numpy"
Expand Down
Binary file added doc/source/_static/style/appmaphead1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added doc/source/_static/style/appmaphead2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 4 additions & 5 deletions doc/source/development/contributing_codebase.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,10 @@ contributing them to the project::

./ci/code_checks.sh

The script verifies the linting of code files, it looks for common mistake patterns
(like missing spaces around sphinx directives that make the documentation not
being rendered properly) and it also validates the doctests. It is possible to
run the checks independently by using the parameters ``lint``, ``patterns`` and
``doctests`` (e.g. ``./ci/code_checks.sh lint``).
The script validates the doctests, formatting in docstrings, static typing, and
imported modules. It is possible to run the checks independently by using the
parameters ``docstring``, ``code``, ``typing``, and ``doctests``
(e.g. ``./ci/code_checks.sh doctests``).

In addition, because a lot of people use our library, it is important that we
do not make sudden changes to the code that could have the potential to break
Expand Down
4 changes: 2 additions & 2 deletions doc/source/development/extending.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ extension array for IP Address data, this might be ``ipaddress.IPv4Address``.

See the `extension dtype source`_ for interface definition.

:class:`pandas.api.extension.ExtensionDtype` can be registered to pandas to allow creation via a string dtype name.
:class:`pandas.api.extensions.ExtensionDtype` can be registered to pandas to allow creation via a string dtype name.
This allows one to instantiate ``Series`` and ``.astype()`` with a registered string name, for
example ``'category'`` is a registered string accessor for the ``CategoricalDtype``.

Expand All @@ -125,7 +125,7 @@ data. We do require that your array be convertible to a NumPy array, even if
this is relatively expensive (as it is for ``Categorical``).

They may be backed by none, one, or many NumPy arrays. For example,
``pandas.Categorical`` is an extension array backed by two arrays,
:class:`pandas.Categorical` is an extension array backed by two arrays,
one for codes and one for categories. An array of IPv6 addresses may
be backed by a NumPy structured array with two fields, one for the
lower 64 bits and one for the upper 64 bits. Or they may be backed
Expand Down
1 change: 1 addition & 0 deletions doc/source/reference/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ Numeric Index
:toctree: api/
:template: autosummary/class_without_autosummary.rst

NumericIndex
RangeIndex
Int64Index
UInt64Index
Expand Down
2 changes: 2 additions & 0 deletions doc/source/reference/style.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ Style application

Styler.apply
Styler.applymap
Styler.apply_index
Styler.applymap_index
Styler.format
Styler.hide_index
Styler.hide_columns
Expand Down
56 changes: 51 additions & 5 deletions doc/source/user_guide/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ MultiIndex / advanced indexing
******************************

This section covers :ref:`indexing with a MultiIndex <advanced.hierarchical>`
and :ref:`other advanced indexing features <indexing.index_types>`.
and :ref:`other advanced indexing features <advanced.index_types>`.

See the :ref:`Indexing and Selecting Data <indexing>` for general indexing documentation.

Expand Down Expand Up @@ -738,7 +738,7 @@ faster than fancy indexing.
%timeit ser.iloc[indexer]
%timeit ser.take(indexer)

.. _indexing.index_types:
.. _advanced.index_types:

Index types
-----------
Expand All @@ -749,7 +749,7 @@ and documentation about ``TimedeltaIndex`` is found :ref:`here <timedeltas.index

In the following sub-sections we will highlight some other index types.

.. _indexing.categoricalindex:
.. _advanced.categoricalindex:

CategoricalIndex
~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -846,22 +846,36 @@ values **not** in the categories, similarly to how you can reindex **any** panda
In [1]: pd.concat([df4, df5])
TypeError: categories must match existing categories when appending

.. _indexing.rangeindex:
.. _advanced.rangeindex:

Int64Index and RangeIndex
~~~~~~~~~~~~~~~~~~~~~~~~~

.. note::

In pandas 2.0, :class:`NumericIndex` will become the default index type for numeric types
instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types
will be removed. See :ref:`here <advanced.numericindex>` for more.
``RangeIndex`` however, will not be removed, as it represents an optimized version of an integer index.

:class:`Int64Index` is a fundamental basic index in pandas. This is an immutable array
implementing an ordered, sliceable set.

:class:`RangeIndex` is a sub-class of ``Int64Index`` that provides the default index for all ``NDFrame`` objects.
``RangeIndex`` is an optimized version of ``Int64Index`` that can represent a monotonic ordered set. These are analogous to Python `range types <https://docs.python.org/3/library/stdtypes.html#typesseq-range>`__.

.. _indexing.float64index:
.. _advanced.float64index:

Float64Index
~~~~~~~~~~~~

.. note::

In pandas 2.0, :class:`NumericIndex` will become the default index type for numeric types
instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types
will be removed. See :ref:`here <advanced.numericindex>` for more.
``RangeIndex`` however, will not be removed, as it represents an optimized version of an integer index.

By default a :class:`Float64Index` will be automatically created when passing floating, or mixed-integer-floating values in index creation.
This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the
same.
Expand Down Expand Up @@ -956,6 +970,38 @@ If you need integer based selection, you should use ``iloc``:

dfir.iloc[0:5]


.. _advanced.numericindex:

NumericIndex
~~~~~~~~~~~~

.. versionadded:: 1.4.0

.. note::

In pandas 2.0, :class:`NumericIndex` will become the default index type for numeric types
instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types
will be removed.
``RangeIndex`` however, will not be removed, as it represents an optimized version of an integer index.

:class:`NumericIndex` is an index type that can hold data of any numpy int/uint/float dtype. For example:

.. ipython:: python

idx = pd.NumericIndex([1, 2, 4, 5], dtype="int8")
idx
ser = pd.Series(range(4), index=idx)
ser

``NumericIndex`` works the same way as the existing ``Int64Index``, ``Float64Index`` and
``UInt64Index`` except that it can hold any numpy int, uint or float dtype.

Until Pandas 2.0, you will have to call ``NumericIndex`` explicitly in order to use it, like in the example above.
In Pandas 2.0, ``NumericIndex`` will become the default pandas numeric index type and will automatically be used where appropriate.

Please notice that ``NumericIndex`` *can not* hold Pandas numeric dtypes (:class:`Int64Dtype`, :class:`Int32Dtype` etc.).

.. _advanced.intervalindex:

IntervalIndex
Expand Down
2 changes: 1 addition & 1 deletion doc/source/user_guide/categorical.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1141,7 +1141,7 @@ Categorical index
``CategoricalIndex`` is a type of index that is useful for supporting
indexing with duplicates. This is a container around a ``Categorical``
and allows efficient indexing and storage of an index with a large number of duplicated elements.
See the :ref:`advanced indexing docs <indexing.categoricalindex>` for a more detailed
See the :ref:`advanced indexing docs <advanced.categoricalindex>` for a more detailed
explanation.

Setting the index will create a ``CategoricalIndex``:
Expand Down
10 changes: 9 additions & 1 deletion doc/source/user_guide/reshaping.rst
Original file line number Diff line number Diff line change
Expand Up @@ -474,7 +474,15 @@ rows and columns:

.. ipython:: python

df.pivot_table(index=["A", "B"], columns="C", margins=True, aggfunc=np.std)
table = df.pivot_table(index=["A", "B"], columns="C", margins=True, aggfunc=np.std)
table

Additionally, you can call :meth:`DataFrame.stack` to display a pivoted DataFrame
as having a multi-level index:

.. ipython:: python

table.stack()

.. _reshaping.crosstabulations:

Expand Down
2 changes: 1 addition & 1 deletion doc/source/user_guide/sparse.rst
Original file line number Diff line number Diff line change
Expand Up @@ -294,7 +294,7 @@ To convert back to sparse SciPy matrix in COO format, you can use the :meth:`Dat

sdf.sparse.to_coo()

meth:`Series.sparse.to_coo` is implemented for transforming a ``Series`` with sparse values indexed by a :class:`MultiIndex` to a :class:`scipy.sparse.coo_matrix`.
:meth:`Series.sparse.to_coo` is implemented for transforming a ``Series`` with sparse values indexed by a :class:`MultiIndex` to a :class:`scipy.sparse.coo_matrix`.

The method requires a ``MultiIndex`` with two or more levels.

Expand Down
Loading