Skip to content

Commit 0a28719

Browse files
committed
Merge branch 'master' of https://github.com/pandas-dev/pandas into bug-iloc-ea
2 parents fb8d7ce + bed9103 commit 0a28719

File tree

103 files changed

+2762
-2430
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

103 files changed

+2762
-2430
lines changed

doc/source/getting_started/intro_tutorials/02_read_write.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -118,7 +118,7 @@ done by requesting the pandas ``dtypes`` attribute:
118118
titanic.dtypes
119119
120120
For each of the columns, the used data type is enlisted. The data types
121-
in this ``DataFrame`` are integers (``int64``), floats (``float63``) and
121+
in this ``DataFrame`` are integers (``int64``), floats (``float64``) and
122122
strings (``object``).
123123

124124
.. note::

doc/source/user_guide/io.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
2828
binary;`HDF5 Format <https://support.hdfgroup.org/HDF5/whatishdf5.html>`__;:ref:`read_hdf<io.hdf5>`;:ref:`to_hdf<io.hdf5>`
2929
binary;`Feather Format <https://github.com/wesm/feather>`__;:ref:`read_feather<io.feather>`;:ref:`to_feather<io.feather>`
3030
binary;`Parquet Format <https://parquet.apache.org/>`__;:ref:`read_parquet<io.parquet>`;:ref:`to_parquet<io.parquet>`
31-
binary;`ORC Format <//https://orc.apache.org/>`__;:ref:`read_orc<io.orc>`;
31+
binary;`ORC Format <https://orc.apache.org/>`__;:ref:`read_orc<io.orc>`;
3232
binary;`Msgpack <https://msgpack.org/index.html>`__;:ref:`read_msgpack<io.msgpack>`;:ref:`to_msgpack<io.msgpack>`
3333
binary;`Stata <https://en.wikipedia.org/wiki/Stata>`__;:ref:`read_stata<io.stata_reader>`;:ref:`to_stata<io.stata_writer>`
3434
binary;`SAS <https://en.wikipedia.org/wiki/SAS_(software)>`__;:ref:`read_sas<io.sas_reader>`;
@@ -4817,7 +4817,7 @@ ORC
48174817

48184818
.. versionadded:: 1.0.0
48194819

4820-
Similar to the :ref:`parquet <io.parquet>` format, the `ORC Format <//https://orc.apache.org/>`__ is a binary columnar serialization
4820+
Similar to the :ref:`parquet <io.parquet>` format, the `ORC Format <https://orc.apache.org/>`__ is a binary columnar serialization
48214821
for data frames. It is designed to make reading data frames efficient. Pandas provides *only* a reader for the
48224822
ORC format, :func:`~pandas.read_orc`. This requires the `pyarrow <https://arrow.apache.org/docs/python/>`__ library.
48234823

doc/source/user_guide/text.rst

+25-6
Original file line numberDiff line numberDiff line change
@@ -641,21 +641,40 @@ You can check whether elements contain a pattern:
641641
.. ipython:: python
642642
643643
pattern = r'[0-9][a-z]'
644-
pd.Series(['1', '2', '3a', '3b', '03c'],
644+
pd.Series(['1', '2', '3a', '3b', '03c', '4dx'],
645645
dtype="string").str.contains(pattern)
646646
647647
Or whether elements match a pattern:
648648

649649
.. ipython:: python
650650
651-
pd.Series(['1', '2', '3a', '3b', '03c'],
651+
pd.Series(['1', '2', '3a', '3b', '03c', '4dx'],
652652
dtype="string").str.match(pattern)
653653
654-
The distinction between ``match`` and ``contains`` is strictness: ``match``
655-
relies on strict ``re.match``, while ``contains`` relies on ``re.search``.
654+
.. versionadded:: 1.1.0
656655

657-
Methods like ``match``, ``contains``, ``startswith``, and ``endswith`` take
658-
an extra ``na`` argument so missing values can be considered True or False:
656+
.. ipython:: python
657+
658+
pd.Series(['1', '2', '3a', '3b', '03c', '4dx'],
659+
dtype="string").str.fullmatch(pattern)
660+
661+
.. note::
662+
663+
The distinction between ``match``, ``fullmatch``, and ``contains`` is strictness:
664+
``fullmatch`` tests whether the entire string matches the regular expression;
665+
``match`` tests whether there is a match of the regular expression that begins
666+
at the first character of the string; and ``contains`` tests whether there is
667+
a match of the regular expression at any position within the string.
668+
669+
The corresponding functions in the ``re`` package for these three match modes are
670+
`re.fullmatch <https://docs.python.org/3/library/re.html#re.fullmatch>`_,
671+
`re.match <https://docs.python.org/3/library/re.html#re.match>`_, and
672+
`re.search <https://docs.python.org/3/library/re.html#re.search>`_,
673+
respectively.
674+
675+
Methods like ``match``, ``fullmatch``, ``contains``, ``startswith``, and
676+
``endswith`` take an extra ``na`` argument so missing values can be considered
677+
True or False:
659678

660679
.. ipython:: python
661680

doc/source/whatsnew/v1.1.0.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@ Other enhancements
6969
- `OptionError` is now exposed in `pandas.errors` (:issue:`27553`)
7070
- :func:`timedelta_range` will now infer a frequency when passed ``start``, ``stop``, and ``periods`` (:issue:`32377`)
7171
- Positional slicing on a :class:`IntervalIndex` now supports slices with ``step > 1`` (:issue:`31658`)
72+
- :class:`Series.str` now has a `fullmatch` method that matches a regular expression against the entire string in each row of the series, similar to `re.fullmatch` (:issue:`32806`).
7273
- :meth:`DataFrame.sample` will now also allow array-like and BitGenerator objects to be passed to ``random_state`` as seeds (:issue:`32503`)
7374
-
7475

@@ -260,7 +261,7 @@ Timedelta
260261

261262
- Bug in constructing a :class:`Timedelta` with a high precision integer that would round the :class:`Timedelta` components (:issue:`31354`)
262263
- Bug in dividing ``np.nan`` or ``None`` by :class:`Timedelta`` incorrectly returning ``NaT`` (:issue:`31869`)
263-
-
264+
- Timedeltas now understand ``µs`` as identifier for microsecond (:issue:`32899`)
264265

265266
Timezones
266267
^^^^^^^^^

pandas/_libs/tslibs/timedeltas.pyx

+1
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ cdef dict timedelta_abbrevs = {
8282
"us": "us",
8383
"microseconds": "us",
8484
"microsecond": "us",
85+
"µs": "us",
8586
"micro": "us",
8687
"micros": "us",
8788
"u": "us",

pandas/_testing.py

+31
Original file line numberDiff line numberDiff line change
@@ -2662,3 +2662,34 @@ def external_error_raised(
26622662
import pytest
26632663

26642664
return pytest.raises(expected_exception, match=None)
2665+
2666+
2667+
cython_table = pd.core.base.SelectionMixin._cython_table.items()
2668+
2669+
2670+
def get_cython_table_params(ndframe, func_names_and_expected):
2671+
"""
2672+
Combine frame, functions from SelectionMixin._cython_table
2673+
keys and expected result.
2674+
2675+
Parameters
2676+
----------
2677+
ndframe : DataFrame or Series
2678+
func_names_and_expected : Sequence of two items
2679+
The first item is a name of a NDFrame method ('sum', 'prod') etc.
2680+
The second item is the expected return value.
2681+
2682+
Returns
2683+
-------
2684+
list
2685+
List of three items (DataFrame, function, expected result)
2686+
"""
2687+
results = []
2688+
for func_name, expected in func_names_and_expected:
2689+
results.append((ndframe, func_name, expected))
2690+
results += [
2691+
(ndframe, func, expected)
2692+
for func, name in cython_table
2693+
if name == func_name
2694+
]
2695+
return results

pandas/conftest.py

+13-32
Original file line numberDiff line numberDiff line change
@@ -368,6 +368,17 @@ def _create_multiindex():
368368
return mi
369369

370370

371+
def _create_mi_with_dt64tz_level():
372+
"""
373+
MultiIndex with a level that is a tzaware DatetimeIndex.
374+
"""
375+
# GH#8367 round trip with pickle
376+
return MultiIndex.from_product(
377+
[[1, 2], ["a", "b"], pd.date_range("20130101", periods=3, tz="US/Eastern")],
378+
names=["one", "two", "three"],
379+
)
380+
381+
371382
indices_dict = {
372383
"unicode": tm.makeUnicodeIndex(100),
373384
"string": tm.makeStringIndex(100),
@@ -384,6 +395,7 @@ def _create_multiindex():
384395
"interval": tm.makeIntervalIndex(100),
385396
"empty": Index([]),
386397
"tuples": MultiIndex.from_tuples(zip(["foo", "bar", "baz"], [1, 2, 3])),
398+
"mi-with-dt64tz-level": _create_mi_with_dt64tz_level(),
387399
"multi": _create_multiindex(),
388400
"repeats": Index([0, 0, 1, 1, 2, 2]),
389401
}
@@ -1119,10 +1131,7 @@ def spmatrix(request):
11191131
return getattr(sparse, request.param + "_matrix")
11201132

11211133

1122-
_cython_table = pd.core.base.SelectionMixin._cython_table.items()
1123-
1124-
1125-
@pytest.fixture(params=list(_cython_table))
1134+
@pytest.fixture(params=list(tm.cython_table))
11261135
def cython_table_items(request):
11271136
"""
11281137
Yields a tuple of a function and its corresponding name. Correspond to
@@ -1131,34 +1140,6 @@ def cython_table_items(request):
11311140
return request.param
11321141

11331142

1134-
def _get_cython_table_params(ndframe, func_names_and_expected):
1135-
"""
1136-
Combine frame, functions from SelectionMixin._cython_table
1137-
keys and expected result.
1138-
1139-
Parameters
1140-
----------
1141-
ndframe : DataFrame or Series
1142-
func_names_and_expected : Sequence of two items
1143-
The first item is a name of a NDFrame method ('sum', 'prod') etc.
1144-
The second item is the expected return value.
1145-
1146-
Returns
1147-
-------
1148-
list
1149-
List of three items (DataFrame, function, expected result)
1150-
"""
1151-
results = []
1152-
for func_name, expected in func_names_and_expected:
1153-
results.append((ndframe, func_name, expected))
1154-
results += [
1155-
(ndframe, func, expected)
1156-
for func, name in _cython_table
1157-
if name == func_name
1158-
]
1159-
return results
1160-
1161-
11621143
@pytest.fixture(
11631144
params=[
11641145
getattr(pd.offsets, o)

pandas/core/algorithms.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -700,7 +700,7 @@ def value_counts(
700700
result = result.sort_index()
701701

702702
# if we are dropna and we have NO values
703-
if dropna and (result.values == 0).all():
703+
if dropna and (result._values == 0).all():
704704
result = result.iloc[0:0]
705705

706706
# normalizing is by len of all (regardless of dropna)
@@ -713,7 +713,7 @@ def value_counts(
713713
# handle Categorical and sparse,
714714
result = Series(values)._values.value_counts(dropna=dropna)
715715
result.name = name
716-
counts = result.values
716+
counts = result._values
717717

718718
else:
719719
keys, counts = _value_counts_arraylike(values, dropna)
@@ -823,7 +823,7 @@ def mode(values, dropna: bool = True) -> "Series":
823823
# categorical is a fast-path
824824
if is_categorical_dtype(values):
825825
if isinstance(values, Series):
826-
return Series(values.values.mode(dropna=dropna), name=values.name)
826+
return Series(values._values.mode(dropna=dropna), name=values.name)
827827
return values.mode(dropna=dropna)
828828

829829
if dropna and needs_i8_conversion(values.dtype):

pandas/core/arrays/datetimelike.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -905,7 +905,7 @@ def value_counts(self, dropna=False):
905905
index = Index(
906906
cls(result.index.view("i8"), dtype=self.dtype), name=result.index.name
907907
)
908-
return Series(result.values, index=index, name=result.name)
908+
return Series(result._values, index=index, name=result.name)
909909

910910
def map(self, mapper):
911911
# TODO(GH-23179): Add ExtensionArray.map

pandas/core/arrays/interval.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ class IntervalArray(IntervalMixin, ExtensionArray):
152152
def __new__(cls, data, closed=None, dtype=None, copy=False, verify_integrity=True):
153153

154154
if isinstance(data, ABCSeries) and is_interval_dtype(data):
155-
data = data.values
155+
data = data._values
156156

157157
if isinstance(data, (cls, ABCIntervalIndex)):
158158
left = data.left

pandas/core/arrays/masked.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -244,11 +244,11 @@ def value_counts(self, dropna: bool = True) -> "Series":
244244
# TODO(extension)
245245
# if we have allow Index to hold an ExtensionArray
246246
# this is easier
247-
index = value_counts.index.values.astype(object)
247+
index = value_counts.index._values.astype(object)
248248

249249
# if we want nans, count the mask
250250
if dropna:
251-
counts = value_counts.values
251+
counts = value_counts._values
252252
else:
253253
counts = np.empty(len(value_counts) + 1, dtype="int64")
254254
counts[:-1] = value_counts

pandas/core/base.py

+6-12
Original file line numberDiff line numberDiff line change
@@ -123,15 +123,11 @@ def __setattr__(self, key: str, value):
123123
object.__setattr__(self, key, value)
124124

125125

126-
class GroupByError(Exception):
126+
class DataError(Exception):
127127
pass
128128

129129

130-
class DataError(GroupByError):
131-
pass
132-
133-
134-
class SpecificationError(GroupByError):
130+
class SpecificationError(Exception):
135131
pass
136132

137133

@@ -372,7 +368,7 @@ def _agg_1dim(name, how, subset=None):
372368
)
373369
return colg.aggregate(how)
374370

375-
def _agg_2dim(name, how):
371+
def _agg_2dim(how):
376372
"""
377373
aggregate a 2-dim with how
378374
"""
@@ -660,7 +656,7 @@ def item(self):
660656
):
661657
# numpy returns ints instead of datetime64/timedelta64 objects,
662658
# which we need to wrap in Timestamp/Timedelta/Period regardless.
663-
return self.values.item()
659+
return self._values.item()
664660

665661
if len(self) == 1:
666662
return next(iter(self))
@@ -1132,10 +1128,8 @@ def _map_values(self, mapper, na_action=None):
11321128
# use the built in categorical series mapper which saves
11331129
# time by mapping the categories instead of all values
11341130
return self._values.map(mapper)
1135-
if is_extension_array_dtype(self.dtype):
1136-
values = self._values
1137-
else:
1138-
values = self.values
1131+
1132+
values = self._values
11391133

11401134
indexer = mapper.index.get_indexer(values)
11411135
new_values = algorithms.take_1d(mapper._values, indexer)

pandas/core/common.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -213,7 +213,7 @@ def asarray_tuplesafe(values, dtype=None):
213213
if not (isinstance(values, (list, tuple)) or hasattr(values, "__array__")):
214214
values = list(values)
215215
elif isinstance(values, ABCIndexClass):
216-
return values.values
216+
return values._values
217217

218218
if isinstance(values, list) and dtype in [np.object_, object]:
219219
return construct_1d_object_array_from_listlike(values)

pandas/core/dtypes/cast.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -888,7 +888,7 @@ def astype_nansafe(arr, dtype, copy: bool = True, skipna: bool = False):
888888
elif is_timedelta64_dtype(dtype):
889889
from pandas import to_timedelta
890890

891-
return astype_nansafe(to_timedelta(arr).values, dtype, copy=copy)
891+
return astype_nansafe(to_timedelta(arr)._values, dtype, copy=copy)
892892

893893
if dtype.name in ("datetime64", "timedelta64"):
894894
msg = (

pandas/core/dtypes/common.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -188,7 +188,9 @@ def ensure_python_int(value: Union[int, np.integer]) -> int:
188188
TypeError: if the value isn't an int or can't be converted to one.
189189
"""
190190
if not is_scalar(value):
191-
raise TypeError(f"Value needs to be a scalar value, was type {type(value)}")
191+
raise TypeError(
192+
f"Value needs to be a scalar value, was type {type(value).__name__}"
193+
)
192194
try:
193195
new_value = int(value)
194196
assert new_value == value

pandas/core/dtypes/missing.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -229,7 +229,7 @@ def _isna_ndarraylike(obj):
229229
if not is_extension:
230230
# Avoid accessing `.values` on things like
231231
# PeriodIndex, which may be expensive.
232-
values = getattr(obj, "values", obj)
232+
values = getattr(obj, "_values", obj)
233233
else:
234234
values = obj
235235

@@ -270,7 +270,7 @@ def _isna_ndarraylike(obj):
270270

271271

272272
def _isna_ndarraylike_old(obj):
273-
values = getattr(obj, "values", obj)
273+
values = getattr(obj, "_values", obj)
274274
dtype = values.dtype
275275

276276
if is_string_dtype(dtype):

pandas/core/generic.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -7071,7 +7071,7 @@ def asof(self, where, subset=None):
70717071

70727072
return Series(np.nan, index=self.columns, name=where[0])
70737073

7074-
locs = self.index.asof_locs(where, ~(nulls.values))
7074+
locs = self.index.asof_locs(where, ~(nulls._values))
70757075

70767076
# mask the missing
70777077
missing = locs == -1
@@ -7230,7 +7230,7 @@ def _clip_with_scalar(self, lower, upper, inplace: bool_t = False):
72307230
raise ValueError("Cannot use an NA value as a clip threshold")
72317231

72327232
result = self
7233-
mask = isna(self.values)
7233+
mask = isna(self._values)
72347234

72357235
with np.errstate(all="ignore"):
72367236
if upper is not None:
@@ -8604,7 +8604,7 @@ def _where(
86048604

86058605
if self.ndim == 1:
86068606

8607-
icond = cond.values
8607+
icond = cond._values
86088608

86098609
# GH 2745 / GH 4192
86108610
# treat like a scalar

0 commit comments

Comments
 (0)