Skip to content

Commit 214df17

Browse files
authored
Merge pull request #255 from pandas-dev/master
Sync Fork from Upstream Repo
2 parents b3559c3 + 6c3a3dd commit 214df17

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+1815
-1533
lines changed

.github/ISSUE_TEMPLATE/submit_question.md

-24
This file was deleted.
+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
name: Submit Question
2+
description: Ask a general question about pandas
3+
title: "QST: "
4+
labels: [Usage Question, Needs Triage]
5+
6+
body:
7+
- type: markdown
8+
attributes:
9+
value: >
10+
Since [StackOverflow](https://stackoverflow.com) is better suited towards answering
11+
usage questions, we ask that all usage questions are first asked on StackOverflow.
12+
- type: checkboxes
13+
attributes:
14+
options:
15+
- label: >
16+
I have searched the [[pandas] tag](https://stackoverflow.com/questions/tagged/pandas)
17+
on StackOverflow for similar questions.
18+
required: true
19+
- label: >
20+
I have asked my usage related question on [StackOverflow](https://stackoverflow.com).
21+
required: true
22+
- type: input
23+
id: question-link
24+
attributes:
25+
label: Link to question on StackOverflow
26+
validations:
27+
required: true
28+
- type: markdown
29+
attributes:
30+
value: ---
31+
- type: textarea
32+
id: question
33+
attributes:
34+
label: Question about pandas
35+
description: >
36+
**Note**: If you'd still like to submit a question, please read [this guide](
37+
https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) detailing
38+
how to provide the necessary information for us to reproduce your question.
39+
placeholder: |
40+
```python
41+
# Your code here, if applicable
42+
43+
```

.github/workflows/ci.yml

-4
Original file line numberDiff line numberDiff line change
@@ -48,10 +48,6 @@ jobs:
4848
- name: Build Pandas
4949
uses: ./.github/actions/build_pandas
5050

51-
- name: Linting
52-
run: ci/code_checks.sh lint
53-
if: always()
54-
5551
- name: Checks on imported code
5652
run: ci/code_checks.sh code
5753
if: always()

.github/workflows/python-dev.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ jobs:
4141
- name: Install dependencies
4242
run: |
4343
python -m pip install --upgrade pip setuptools wheel
44-
pip install git+https://github.com/numpy/numpy.git
44+
pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
4545
pip install git+https://github.com/pytest-dev/pytest.git
4646
pip install git+https://github.com/nedbat/coveragepy.git
4747
pip install cython python-dateutil pytz hypothesis pytest-xdist pytest-cov

.pre-commit-config.yaml

+5
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,11 @@ repos:
107107
# Check for deprecated messages without sphinx directive
108108
|(DEPRECATED|DEPRECATE|Deprecated)(:|,|\.)
109109
types_or: [python, cython, rst]
110+
- id: cython-casting
111+
name: Check Cython casting is `<type>obj`, not `<type> obj`
112+
language: pygrep
113+
entry: '[a-zA-Z0-9*]> '
114+
files: (\.pyx|\.pxi.in)$
110115
- id: incorrect-backticks
111116
name: Check for backticks incorrectly rendering because of missing spaces
112117
language: pygrep

asv_bench/benchmarks/frame_methods.py

+6-2
Original file line numberDiff line numberDiff line change
@@ -538,8 +538,12 @@ class Interpolate:
538538
def setup(self, downcast):
539539
N = 10000
540540
# this is the worst case, where every column has NaNs.
541-
self.df = DataFrame(np.random.randn(N, 100))
542-
self.df.values[::2] = np.nan
541+
arr = np.random.randn(N, 100)
542+
# NB: we need to set values in array, not in df.values, otherwise
543+
# the benchmark will be misleading for ArrayManager
544+
arr[::2] = np.nan
545+
546+
self.df = DataFrame(arr)
543547

544548
self.df2 = DataFrame(
545549
{

ci/code_checks.sh

+3-23
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,18 @@
33
# Run checks related to code quality.
44
#
55
# This script is intended for both the CI and to check locally that code standards are
6-
# respected. We are currently linting (PEP-8 and similar), looking for patterns of
7-
# common mistakes (sphinx directives with missing blank lines, old style classes,
8-
# unwanted imports...), we run doctests here (currently some files only), and we
6+
# respected. We run doctests here (currently some files only), and we
97
# validate formatting error in docstrings.
108
#
119
# Usage:
1210
# $ ./ci/code_checks.sh # run all checks
13-
# $ ./ci/code_checks.sh lint # run linting only
1411
# $ ./ci/code_checks.sh code # checks on imported code
1512
# $ ./ci/code_checks.sh doctests # run doctests
1613
# $ ./ci/code_checks.sh docstrings # validate docstring errors
1714
# $ ./ci/code_checks.sh typing # run static type analysis
1815

19-
[[ -z "$1" || "$1" == "lint" || "$1" == "code" || "$1" == "doctests" || "$1" == "docstrings" || "$1" == "typing" ]] || \
20-
{ echo "Unknown command $1. Usage: $0 [lint|code|doctests|docstrings|typing]"; exit 9999; }
16+
[[ -z "$1" || "$1" == "code" || "$1" == "doctests" || "$1" == "docstrings" || "$1" == "typing" ]] || \
17+
{ echo "Unknown command $1. Usage: $0 [code|doctests|docstrings|typing]"; exit 9999; }
2118

2219
BASE_DIR="$(dirname $0)/.."
2320
RET=0
@@ -40,23 +37,6 @@ if [[ "$GITHUB_ACTIONS" == "true" ]]; then
4037
INVGREP_PREPEND="##[error]"
4138
fi
4239

43-
### LINTING ###
44-
if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then
45-
46-
# Check that cython casting is of the form `<type>obj` as opposed to `<type> obj`;
47-
# it doesn't make a difference, but we want to be internally consistent.
48-
# Note: this grep pattern is (intended to be) equivalent to the python
49-
# regex r'(?<![ ->])> '
50-
MSG='Linting .pyx code for spacing conventions in casting' ; echo $MSG
51-
invgrep -r -E --include '*.pyx' --include '*.pxi.in' '[a-zA-Z0-9*]> ' pandas/_libs
52-
RET=$(($RET + $?)) ; echo $MSG "DONE"
53-
54-
# readability/casting: Warnings about C casting instead of C++ casting
55-
# runtime/int: Warnings about using C number types instead of C++ ones
56-
# build/include_subdir: Warnings about prefacing included header files with directory
57-
58-
fi
59-
6040
### CODE ###
6141
if [[ -z "$CHECK" || "$CHECK" == "code" ]]; then
6242

ci/deps/actions-39-numpydev.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,11 @@ dependencies:
1111
- hypothesis>=5.5.3
1212

1313
# pandas dependencies
14+
- python-dateutil
1415
- pytz
1516
- pip
1617
- pip:
1718
- cython==0.29.21 # GH#34014
18-
- "git+git://github.com/dateutil/dateutil.git"
1919
- "--extra-index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple"
2020
- "--pre"
2121
- "numpy"
4.91 KB
Loading
7.46 KB
Loading

doc/source/development/contributing_codebase.rst

+4-5
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,10 @@ contributing them to the project::
2323

2424
./ci/code_checks.sh
2525

26-
The script verifies the linting of code files, it looks for common mistake patterns
27-
(like missing spaces around sphinx directives that make the documentation not
28-
being rendered properly) and it also validates the doctests. It is possible to
29-
run the checks independently by using the parameters ``lint``, ``patterns`` and
30-
``doctests`` (e.g. ``./ci/code_checks.sh lint``).
26+
The script validates the doctests, formatting in docstrings, static typing, and
27+
imported modules. It is possible to run the checks independently by using the
28+
parameters ``docstring``, ``code``, ``typing``, and ``doctests``
29+
(e.g. ``./ci/code_checks.sh doctests``).
3130

3231
In addition, because a lot of people use our library, it is important that we
3332
do not make sudden changes to the code that could have the potential to break

doc/source/development/extending.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@ extension array for IP Address data, this might be ``ipaddress.IPv4Address``.
106106

107107
See the `extension dtype source`_ for interface definition.
108108

109-
:class:`pandas.api.extension.ExtensionDtype` can be registered to pandas to allow creation via a string dtype name.
109+
:class:`pandas.api.extensions.ExtensionDtype` can be registered to pandas to allow creation via a string dtype name.
110110
This allows one to instantiate ``Series`` and ``.astype()`` with a registered string name, for
111111
example ``'category'`` is a registered string accessor for the ``CategoricalDtype``.
112112

@@ -125,7 +125,7 @@ data. We do require that your array be convertible to a NumPy array, even if
125125
this is relatively expensive (as it is for ``Categorical``).
126126

127127
They may be backed by none, one, or many NumPy arrays. For example,
128-
``pandas.Categorical`` is an extension array backed by two arrays,
128+
:class:`pandas.Categorical` is an extension array backed by two arrays,
129129
one for codes and one for categories. An array of IPv6 addresses may
130130
be backed by a NumPy structured array with two fields, one for the
131131
lower 64 bits and one for the upper 64 bits. Or they may be backed

doc/source/reference/indexing.rst

+1
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,7 @@ Numeric Index
170170
:toctree: api/
171171
:template: autosummary/class_without_autosummary.rst
172172

173+
NumericIndex
173174
RangeIndex
174175
Int64Index
175176
UInt64Index

doc/source/reference/style.rst

+2
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,8 @@ Style application
3636

3737
Styler.apply
3838
Styler.applymap
39+
Styler.apply_index
40+
Styler.applymap_index
3941
Styler.format
4042
Styler.hide_index
4143
Styler.hide_columns

doc/source/user_guide/advanced.rst

+51-5
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ MultiIndex / advanced indexing
77
******************************
88

99
This section covers :ref:`indexing with a MultiIndex <advanced.hierarchical>`
10-
and :ref:`other advanced indexing features <indexing.index_types>`.
10+
and :ref:`other advanced indexing features <advanced.index_types>`.
1111

1212
See the :ref:`Indexing and Selecting Data <indexing>` for general indexing documentation.
1313

@@ -738,7 +738,7 @@ faster than fancy indexing.
738738
%timeit ser.iloc[indexer]
739739
%timeit ser.take(indexer)
740740
741-
.. _indexing.index_types:
741+
.. _advanced.index_types:
742742

743743
Index types
744744
-----------
@@ -749,7 +749,7 @@ and documentation about ``TimedeltaIndex`` is found :ref:`here <timedeltas.index
749749

750750
In the following sub-sections we will highlight some other index types.
751751

752-
.. _indexing.categoricalindex:
752+
.. _advanced.categoricalindex:
753753

754754
CategoricalIndex
755755
~~~~~~~~~~~~~~~~
@@ -846,22 +846,36 @@ values **not** in the categories, similarly to how you can reindex **any** panda
846846
In [1]: pd.concat([df4, df5])
847847
TypeError: categories must match existing categories when appending
848848
849-
.. _indexing.rangeindex:
849+
.. _advanced.rangeindex:
850850

851851
Int64Index and RangeIndex
852852
~~~~~~~~~~~~~~~~~~~~~~~~~
853853

854+
.. note::
855+
856+
In pandas 2.0, :class:`NumericIndex` will become the default index type for numeric types
857+
instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types
858+
will be removed. See :ref:`here <advanced.numericindex>` for more.
859+
``RangeIndex`` however, will not be removed, as it represents an optimized version of an integer index.
860+
854861
:class:`Int64Index` is a fundamental basic index in pandas. This is an immutable array
855862
implementing an ordered, sliceable set.
856863

857864
:class:`RangeIndex` is a sub-class of ``Int64Index`` that provides the default index for all ``NDFrame`` objects.
858865
``RangeIndex`` is an optimized version of ``Int64Index`` that can represent a monotonic ordered set. These are analogous to Python `range types <https://docs.python.org/3/library/stdtypes.html#typesseq-range>`__.
859866

860-
.. _indexing.float64index:
867+
.. _advanced.float64index:
861868

862869
Float64Index
863870
~~~~~~~~~~~~
864871

872+
.. note::
873+
874+
In pandas 2.0, :class:`NumericIndex` will become the default index type for numeric types
875+
instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types
876+
will be removed. See :ref:`here <advanced.numericindex>` for more.
877+
``RangeIndex`` however, will not be removed, as it represents an optimized version of an integer index.
878+
865879
By default a :class:`Float64Index` will be automatically created when passing floating, or mixed-integer-floating values in index creation.
866880
This enables a pure label-based slicing paradigm that makes ``[],ix,loc`` for scalar indexing and slicing work exactly the
867881
same.
@@ -956,6 +970,38 @@ If you need integer based selection, you should use ``iloc``:
956970
957971
dfir.iloc[0:5]
958972
973+
974+
.. _advanced.numericindex:
975+
976+
NumericIndex
977+
~~~~~~~~~~~~
978+
979+
.. versionadded:: 1.4.0
980+
981+
.. note::
982+
983+
In pandas 2.0, :class:`NumericIndex` will become the default index type for numeric types
984+
instead of ``Int64Index``, ``Float64Index`` and ``UInt64Index`` and those index types
985+
will be removed.
986+
``RangeIndex`` however, will not be removed, as it represents an optimized version of an integer index.
987+
988+
:class:`NumericIndex` is an index type that can hold data of any numpy int/uint/float dtype. For example:
989+
990+
.. ipython:: python
991+
992+
idx = pd.NumericIndex([1, 2, 4, 5], dtype="int8")
993+
idx
994+
ser = pd.Series(range(4), index=idx)
995+
ser
996+
997+
``NumericIndex`` works the same way as the existing ``Int64Index``, ``Float64Index`` and
998+
``UInt64Index`` except that it can hold any numpy int, uint or float dtype.
999+
1000+
Until Pandas 2.0, you will have to call ``NumericIndex`` explicitly in order to use it, like in the example above.
1001+
In Pandas 2.0, ``NumericIndex`` will become the default pandas numeric index type and will automatically be used where appropriate.
1002+
1003+
Please notice that ``NumericIndex`` *can not* hold Pandas numeric dtypes (:class:`Int64Dtype`, :class:`Int32Dtype` etc.).
1004+
9591005
.. _advanced.intervalindex:
9601006

9611007
IntervalIndex

doc/source/user_guide/categorical.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -1141,7 +1141,7 @@ Categorical index
11411141
``CategoricalIndex`` is a type of index that is useful for supporting
11421142
indexing with duplicates. This is a container around a ``Categorical``
11431143
and allows efficient indexing and storage of an index with a large number of duplicated elements.
1144-
See the :ref:`advanced indexing docs <indexing.categoricalindex>` for a more detailed
1144+
See the :ref:`advanced indexing docs <advanced.categoricalindex>` for a more detailed
11451145
explanation.
11461146

11471147
Setting the index will create a ``CategoricalIndex``:

doc/source/user_guide/reshaping.rst

+9-1
Original file line numberDiff line numberDiff line change
@@ -474,7 +474,15 @@ rows and columns:
474474

475475
.. ipython:: python
476476
477-
df.pivot_table(index=["A", "B"], columns="C", margins=True, aggfunc=np.std)
477+
table = df.pivot_table(index=["A", "B"], columns="C", margins=True, aggfunc=np.std)
478+
table
479+
480+
Additionally, you can call :meth:`DataFrame.stack` to display a pivoted DataFrame
481+
as having a multi-level index:
482+
483+
.. ipython:: python
484+
485+
table.stack()
478486
479487
.. _reshaping.crosstabulations:
480488

doc/source/user_guide/sparse.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -294,7 +294,7 @@ To convert back to sparse SciPy matrix in COO format, you can use the :meth:`Dat
294294
295295
sdf.sparse.to_coo()
296296
297-
meth:`Series.sparse.to_coo` is implemented for transforming a ``Series`` with sparse values indexed by a :class:`MultiIndex` to a :class:`scipy.sparse.coo_matrix`.
297+
:meth:`Series.sparse.to_coo` is implemented for transforming a ``Series`` with sparse values indexed by a :class:`MultiIndex` to a :class:`scipy.sparse.coo_matrix`.
298298

299299
The method requires a ``MultiIndex`` with two or more levels.
300300

0 commit comments

Comments
 (0)