Skip to content

Commit bc0cd67

Browse files
author
MarcoGorelli
committed
Merge remote-tracking branch 'upstream/main' into implementation-pdep-4
2 parents a7c769f + f2a91a0 commit bc0cd67

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+670
-490
lines changed

.github/ISSUE_TEMPLATE/feature_request.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -68,5 +68,5 @@ body:
6868
attributes:
6969
label: Additional Context
7070
description: >
71-
Please provide any relevant Github issues, code examples or references that help describe and support
71+
Please provide any relevant GitHub issues, code examples or references that help describe and support
7272
the feature request.

.github/PULL_REQUEST_TEMPLATE.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
- [ ] closes #xxxx (Replace xxxx with the Github issue number)
1+
- [ ] closes #xxxx (Replace xxxx with the GitHub issue number)
22
- [ ] [Tests added and passed](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#writing-tests) if fixing a bug or adding a new feature
33
- [ ] All [code checks passed](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#pre-commit).
44
- [ ] Added [type annotations](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#type-hints) to new arguments/methods/functions.

.github/workflows/macos-windows.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: Windows-MacOS
1+
name: Windows-macOS
22

33
on:
44
push:

.github/workflows/python-dev.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
#
44
# In general, this file will remain frozen(present, but not running) until:
55
# - The next unreleased Python version has released beta 1
6-
# - This version should be available on Github Actions.
6+
# - This version should be available on GitHub Actions.
77
# - Our required build/runtime dependencies(numpy, pytz, Cython, python-dateutil)
88
# support that unreleased Python version.
99
# To unfreeze, comment out the ``if: false`` condition, and make sure you update

.github/workflows/wheels.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ jobs:
4444
# Ensure that a wheel builder finishes even if another fails
4545
fail-fast: false
4646
matrix:
47-
# Github Actions doesn't support pairing matrix values together, let's improvise
47+
# GitHub Actions doesn't support pairing matrix values together, let's improvise
4848
# https://github.com/github/feedback/discussions/7835#discussioncomment-1769026
4949
buildplat:
5050
- [ubuntu-20.04, manylinux_x86_64]

.pre-commit-config.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -251,7 +251,7 @@ repos:
251251
files: ^(ci/deps/actions-.*-minimum_versions\.yaml|pandas/compat/_optional\.py)$
252252
- id: validate-errors-locations
253253
name: Validate errors locations
254-
description: Validate errors are in approriate locations.
254+
description: Validate errors are in appropriate locations.
255255
entry: python scripts/validate_exception_location.py
256256
language: python
257257
files: ^pandas/

doc/source/development/contributing_codebase.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -651,7 +651,7 @@ Example
651651
^^^^^^^
652652

653653
Here is an example of a self-contained set of tests in a file ``pandas/tests/test_cool_feature.py``
654-
that illustrate multiple features that we like to use. Please remember to add the Github Issue Number
654+
that illustrate multiple features that we like to use. Please remember to add the GitHub Issue Number
655655
as a comment to a new test.
656656

657657
.. code-block:: python

doc/source/development/maintaining.rst

+7-7
Original file line numberDiff line numberDiff line change
@@ -210,15 +210,15 @@ pandas supports point releases (e.g. ``1.4.3``) that aim to:
210210

211211
* e.g. If a feature worked in ``1.2`` and stopped working since ``1.3``, a fix can be applied in ``1.4.3``.
212212

213-
Since pandas minor releases are based on Github branches (e.g. point release of ``1.4`` are based off the ``1.4.x`` branch),
213+
Since pandas minor releases are based on GitHub branches (e.g. point release of ``1.4`` are based off the ``1.4.x`` branch),
214214
"backporting" means merging a pull request fix to the ``main`` branch and correct minor branch associated with the next point release.
215215

216-
By default, if a pull request is assigned to the next point release milestone within the Github interface,
216+
By default, if a pull request is assigned to the next point release milestone within the GitHub interface,
217217
the backporting process should happen automatically by the ``@meeseeksdev`` bot once the pull request is merged.
218218
A new pull request will be made backporting the pull request to the correct version branch.
219219
Sometimes due to merge conflicts, a manual pull request will need to be made addressing the code conflict.
220220

221-
If the bot does not automatically start the backporting process, you can also write a Github comment in the merged pull request
221+
If the bot does not automatically start the backporting process, you can also write a GitHub comment in the merged pull request
222222
to trigger the backport::
223223

224224
@meeseeksdev backport version-branch
@@ -271,14 +271,14 @@ being helpful on the issue tracker.
271271
The required steps for adding a maintainer are:
272272

273273
1. Contact the contributor and ask their interest to join.
274-
2. Add the contributor to the appropriate `Github Team <https://github.com/orgs/pandas-dev/teams>`_ if accepted the invitation.
274+
2. Add the contributor to the appropriate `GitHub Team <https://github.com/orgs/pandas-dev/teams>`_ if accepted the invitation.
275275

276276
* ``pandas-core`` is for core team members
277277
* ``pandas-triage`` is for pandas triage members
278278

279279
3. Add the contributor to the pandas Google group.
280-
4. Create a pull request to add the contributor's Github handle to ``pandas-dev/pandas/web/pandas/config.yml``.
281-
5. Create a pull request to add the contributor's name/Github handle to the `governance document <https://github.com/pandas-dev/pandas-governance/blob/master/people.md>`_.
280+
4. Create a pull request to add the contributor's GitHub handle to ``pandas-dev/pandas/web/pandas/config.yml``.
281+
5. Create a pull request to add the contributor's name/GitHub handle to the `governance document <https://github.com/pandas-dev/pandas-governance/blob/master/people.md>`_.
282282

283283
The current list of core-team members is at
284284
https://github.com/pandas-dev/pandas-governance/blob/master/people.md
@@ -328,7 +328,7 @@ The machine can be configured with the `Ansible <http://docs.ansible.com/ansible
328328
Publishing
329329
``````````
330330

331-
The results are published to another Github repository, https://github.com/tomaugspurger/asv-collection.
331+
The results are published to another GitHub repository, https://github.com/tomaugspurger/asv-collection.
332332
Finally, we have a cron job on our docs server to pull from https://github.com/tomaugspurger/asv-collection, to serve them from ``/speed``.
333333
Ask Tom or Joris for access to the webserver.
334334

doc/source/ecosystem.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ development to remain focused around it's original requirements.
1919
This is an inexhaustive list of projects that build on pandas in order to provide
2020
tools in the PyData space. For a list of projects that depend on pandas,
2121
see the
22-
`Github network dependents for pandas <https://github.com/pandas-dev/pandas/network/dependents>`_
22+
`GitHub network dependents for pandas <https://github.com/pandas-dev/pandas/network/dependents>`_
2323
or `search pypi for pandas <https://pypi.org/search/?q=pandas>`_.
2424

2525
We'd like to make it easier for users to find these projects, if you know of other
@@ -599,4 +599,4 @@ Install pandas-stubs to enable basic type coverage of pandas API.
599599

600600
Learn more by reading through :issue:`14468`, :issue:`26766`, :issue:`28142`.
601601

602-
See installation and usage instructions on the `github page <https://github.com/pandas-dev/pandas-stubs>`__.
602+
See installation and usage instructions on the `GitHub page <https://github.com/pandas-dev/pandas-stubs>`__.

doc/source/getting_started/overview.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ untouched. In general we like to **favor immutability** where sensible.
133133
Getting support
134134
---------------
135135

136-
The first stop for pandas issues and ideas is the `Github Issue Tracker
136+
The first stop for pandas issues and ideas is the `GitHub Issue Tracker
137137
<https://github.com/pandas-dev/pandas/issues>`__. If you have a general question,
138138
pandas community experts can answer through `Stack Overflow
139139
<https://stackoverflow.com/questions/tagged/pandas>`__.

doc/source/whatsnew/v1.5.0.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -654,7 +654,7 @@ Deprecations
654654
In the next major version release, 2.0, several larger API changes are being considered without a formal deprecation such as
655655
making the standard library `zoneinfo <https://docs.python.org/3/library/zoneinfo.html>`_ the default timezone implementation instead of ``pytz``,
656656
having the :class:`Index` support all data types instead of having multiple subclasses (:class:`CategoricalIndex`, :class:`Int64Index`, etc.), and more.
657-
The changes under consideration are logged in `this Github issue <https://github.com/pandas-dev/pandas/issues/44823>`_, and any
657+
The changes under consideration are logged in `this GitHub issue <https://github.com/pandas-dev/pandas/issues/44823>`_, and any
658658
feedback or concerns are welcome.
659659

660660
.. _whatsnew_150.deprecations.int_slicing_series:

doc/source/whatsnew/v1.6.0.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,7 @@ Other API changes
151151
^^^^^^^^^^^^^^^^^
152152
- Passing ``nanoseconds`` greater than 999 or less than 0 in :class:`Timestamp` now raises a ``ValueError`` (:issue:`48538`, :issue:`48255`)
153153
- :func:`read_csv`: specifying an incorrect number of columns with ``index_col`` of now raises ``ParserError`` instead of ``IndexError`` when using the c parser.
154+
- Default value of ``dtype`` in :func:`get_dummies` is changed to ``bool`` from ``uint8`` (:issue:`45848`)
154155
- :meth:`DataFrame.astype`, :meth:`Series.astype`, and :meth:`DatetimeIndex.astype` casting datetime64 data to any of "datetime64[s]", "datetime64[ms]", "datetime64[us]" will return an object with the given resolution instead of coercing back to "datetime64[ns]" (:issue:`48928`)
155156
- :meth:`DataFrame.astype`, :meth:`Series.astype`, and :meth:`DatetimeIndex.astype` casting timedelta64 data to any of "timedelta64[s]", "timedelta64[ms]", "timedelta64[us]" will return an object with the given resolution instead of coercing to "float64" dtype (:issue:`48963`)
156157
-
@@ -222,7 +223,7 @@ Timezones
222223

223224
Numeric
224225
^^^^^^^
225-
-
226+
- Bug in :meth:`DataFrame.add` cannot apply ufunc when inputs contain mixed DataFrame type and Series type (:issue:`39853`)
226227
-
227228

228229
Conversion

doc/sphinxext/announce.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Script to generate contributor and pull request lists
44
55
This script generates contributor and pull request lists for release
6-
announcements using Github v3 protocol. Use requires an authentication token in
6+
announcements using GitHub v3 protocol. Use requires an authentication token in
77
order to have sufficient bandwidth, you can get one following the directions at
88
`<https://help.github.com/articles/creating-an-access-token-for-command-line-use/>_
99
Don't add any scope, as the default is read access to public information. The
@@ -112,7 +112,7 @@ def get_pull_requests(repo, revision_range):
112112
issues = re.findall("^.*\\(\\#(\\d+)\\)$", commits, re.M)
113113
prnums.extend(int(s) for s in issues)
114114

115-
# get PR data from github repo
115+
# get PR data from GitHub repo
116116
prnums.sort()
117117
prs = [repo.get_pull(n) for n in prnums]
118118
return prs

pandas/_libs/tslibs/parsing.pyx

+14-11
Original file line numberDiff line numberDiff line change
@@ -976,7 +976,6 @@ def guess_datetime_format(dt_str: str, bint dayfirst=False) -> str | None:
976976
(('hour',), '%H', 2),
977977
(('minute',), '%M', 2),
978978
(('second',), '%S', 2),
979-
(('microsecond',), '%f', 6),
980979
(('second', 'microsecond'), '%S.%f', 0),
981980
(('tzinfo',), '%z', 0),
982981
(('tzinfo',), '%Z', 0),
@@ -1048,16 +1047,7 @@ def guess_datetime_format(dt_str: str, bint dayfirst=False) -> str | None:
10481047

10491048
parsed_formatted = parsed_datetime.strftime(attr_format)
10501049
for i, token_format in enumerate(format_guess):
1051-
if '.' not in tokens[i]:
1052-
token_filled = tokens[i].zfill(padding)
1053-
else:
1054-
seconds, nanoseconds = tokens[i].split('.')
1055-
seconds = f'{int(seconds):02d}'
1056-
# right-pad so we get nanoseconds, then only take
1057-
# first 6 digits (microseconds) as stdlib datetime
1058-
# doesn't support nanoseconds
1059-
nanoseconds = nanoseconds.ljust(9, '0')[:6]
1060-
token_filled = f'{seconds}.{nanoseconds}'
1050+
token_filled = _fill_token(tokens[i], padding)
10611051
if token_format is None and token_filled == parsed_formatted:
10621052
format_guess[i] = attr_format
10631053
tokens[i] = token_filled
@@ -1100,6 +1090,19 @@ def guess_datetime_format(dt_str: str, bint dayfirst=False) -> str | None:
11001090
else:
11011091
return None
11021092

1093+
cdef str _fill_token(token: str, padding: int):
1094+
cdef str token_filled
1095+
if '.' not in token:
1096+
token_filled = token.zfill(padding)
1097+
else:
1098+
seconds, nanoseconds = token.split('.')
1099+
seconds = f'{int(seconds):02d}'
1100+
# right-pad so we get nanoseconds, then only take
1101+
# first 6 digits (microseconds) as stdlib datetime
1102+
# doesn't support nanoseconds
1103+
nanoseconds = nanoseconds.ljust(9, '0')[:6]
1104+
token_filled = f'{seconds}.{nanoseconds}'
1105+
return token_filled
11031106

11041107
cdef void _maybe_warn_about_dayfirst(format: str, bint dayfirst):
11051108
cdef:

pandas/_version.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# This file helps to compute a version number in source trees obtained from
2-
# git-archive tarball (such as those provided by githubs download-from-tag
2+
# git-archive tarball (such as those provided by GitHub's download-from-tag
33
# feature). Distribution tarballs (built by setup.py sdist) and build
44
# directories (produced by setup.py build) will contain a much shorter file
55
# that just contains the computed version number.

pandas/core/algorithms.py

+14-43
Original file line numberDiff line numberDiff line change
@@ -762,10 +762,6 @@ def factorize(
762762
if not isinstance(values, ABCMultiIndex):
763763
values = extract_array(values, extract_numpy=True)
764764

765-
# GH35667, if na_sentinel=None, we will not dropna NaNs from the uniques
766-
# of values, assign na_sentinel=-1 to replace code value for NaN.
767-
dropna = na_sentinel is not None
768-
769765
if (
770766
isinstance(values, (ABCDatetimeArray, ABCTimedeltaArray))
771767
and values.freq is not None
@@ -793,17 +789,8 @@ def factorize(
793789

794790
else:
795791
values = np.asarray(values) # convert DTA/TDA/MultiIndex
796-
# TODO: pass na_sentinel=na_sentinel to factorize_array. When sort is True and
797-
# na_sentinel is None we append NA on the end because safe_sort does not
798-
# handle null values in uniques.
799-
if na_sentinel is None and sort:
800-
na_sentinel_arg = -1
801-
elif na_sentinel is None:
802-
na_sentinel_arg = None
803-
else:
804-
na_sentinel_arg = na_sentinel
805792

806-
if not dropna and not sort and is_object_dtype(values):
793+
if na_sentinel is None and is_object_dtype(values):
807794
# factorize can now handle differentiating various types of null values.
808795
# These can only occur when the array has object dtype.
809796
# However, for backwards compatibility we only use the null for the
@@ -816,32 +803,15 @@ def factorize(
816803

817804
codes, uniques = factorize_array(
818805
values,
819-
na_sentinel=na_sentinel_arg,
806+
na_sentinel=na_sentinel,
820807
size_hint=size_hint,
821808
)
822809

823810
if sort and len(uniques) > 0:
824-
if na_sentinel is None:
825-
# TODO: Can remove when na_sentinel=na_sentinel as in TODO above
826-
na_sentinel = -1
827811
uniques, codes = safe_sort(
828812
uniques, codes, na_sentinel=na_sentinel, assume_unique=True, verify=False
829813
)
830814

831-
if not dropna and sort:
832-
# TODO: Can remove entire block when na_sentinel=na_sentinel as in TODO above
833-
if na_sentinel is None:
834-
na_sentinel_arg = -1
835-
else:
836-
na_sentinel_arg = na_sentinel
837-
code_is_na = codes == na_sentinel_arg
838-
if code_is_na.any():
839-
# na_value is set based on the dtype of uniques, and compat set to False is
840-
# because we do not want na_value to be 0 for integers
841-
na_value = na_value_for_dtype(uniques.dtype, compat=False)
842-
uniques = np.append(uniques, [na_value])
843-
codes = np.where(code_is_na, len(uniques) - 1, codes)
844-
845815
uniques = _reconstruct_data(uniques, original.dtype, original)
846816

847817
return _re_wrap_factorize(original, uniques, codes)
@@ -1796,7 +1766,7 @@ def diff(arr, n: int, axis: AxisInt = 0):
17961766
def safe_sort(
17971767
values,
17981768
codes=None,
1799-
na_sentinel: int = -1,
1769+
na_sentinel: int | None = -1,
18001770
assume_unique: bool = False,
18011771
verify: bool = True,
18021772
) -> np.ndarray | MultiIndex | tuple[np.ndarray | MultiIndex, np.ndarray]:
@@ -1813,8 +1783,8 @@ def safe_sort(
18131783
codes : list_like, optional
18141784
Indices to ``values``. All out of bound indices are treated as
18151785
"not found" and will be masked with ``na_sentinel``.
1816-
na_sentinel : int, default -1
1817-
Value in ``codes`` to mark "not found".
1786+
na_sentinel : int or None, default -1
1787+
Value in ``codes`` to mark "not found", or None to encode null values as normal.
18181788
Ignored when ``codes`` is None.
18191789
assume_unique : bool, default False
18201790
When True, ``values`` are assumed to be unique, which can speed up
@@ -1920,24 +1890,25 @@ def safe_sort(
19201890
# may deal with them here without performance loss using `mode='wrap'`
19211891
new_codes = reverse_indexer.take(codes, mode="wrap")
19221892

1923-
mask = codes == na_sentinel
1924-
if verify:
1925-
mask = mask | (codes < -len(values)) | (codes >= len(values))
1893+
if na_sentinel is not None:
1894+
mask = codes == na_sentinel
1895+
if verify:
1896+
mask = mask | (codes < -len(values)) | (codes >= len(values))
19261897

1927-
if mask is not None:
1898+
if na_sentinel is not None and mask is not None:
19281899
np.putmask(new_codes, mask, na_sentinel)
19291900

19301901
return ordered, ensure_platform_int(new_codes)
19311902

19321903

19331904
def _sort_mixed(values) -> np.ndarray:
1934-
"""order ints before strings in 1d arrays, safe in py3"""
1905+
"""order ints before strings before nulls in 1d arrays"""
19351906
str_pos = np.array([isinstance(x, str) for x in values], dtype=bool)
1936-
none_pos = np.array([x is None for x in values], dtype=bool)
1937-
nums = np.sort(values[~str_pos & ~none_pos])
1907+
null_pos = np.array([isna(x) for x in values], dtype=bool)
1908+
nums = np.sort(values[~str_pos & ~null_pos])
19381909
strs = np.sort(values[str_pos])
19391910
return np.concatenate(
1940-
[nums, np.asarray(strs, dtype=object), np.array(values[none_pos])]
1911+
[nums, np.asarray(strs, dtype=object), np.array(values[null_pos])]
19411912
)
19421913

19431914

pandas/core/arraylike.py

+6-2
Original file line numberDiff line numberDiff line change
@@ -250,6 +250,10 @@ def array_ufunc(self, ufunc: np.ufunc, method: str, *inputs: Any, **kwargs: Any)
250250
--------
251251
numpy.org/doc/stable/reference/arrays.classes.html#numpy.class.__array_ufunc__
252252
"""
253+
from pandas.core.frame import (
254+
DataFrame,
255+
Series,
256+
)
253257
from pandas.core.generic import NDFrame
254258
from pandas.core.internals import BlockManager
255259

@@ -295,8 +299,8 @@ def array_ufunc(self, ufunc: np.ufunc, method: str, *inputs: Any, **kwargs: Any)
295299
# At the moment, there aren't any ufuncs with more than two inputs
296300
# so this ends up just being x1.index | x2.index, but we write
297301
# it to handle *args.
298-
299-
if len(set(types)) > 1:
302+
set_types = set(types)
303+
if len(set_types) > 1 and {DataFrame, Series}.issubset(set_types):
300304
# We currently don't handle ufunc(DataFrame, Series)
301305
# well. Previously this raised an internal ValueError. We might
302306
# support it someday, so raise a NotImplementedError.

0 commit comments

Comments
 (0)