Skip to content

Commit bcbc7c1

Browse files
authored
Merge pull request #92 from pandas-dev/master
Sync Fork from Upstream Repo
2 parents 5d63ed2 + 6620dc6 commit bcbc7c1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+919
-341
lines changed

azure-pipelines.yml

+6
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,10 @@
11
# Adapted from https://github.com/numba/numba/blob/master/azure-pipelines.yml
2+
trigger:
3+
- master
4+
5+
pr:
6+
- master
7+
28
jobs:
39
# Mac and Linux use the same template
410
- template: ci/azure/posix.yml

ci/azure/posix.yml

+5-5
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,11 @@ jobs:
3838
LC_ALL: "it_IT.utf8"
3939
EXTRA_APT: "language-pack-it xsel"
4040

41-
py36_32bit:
42-
ENV_FILE: ci/deps/azure-36-32bit.yaml
43-
CONDA_PY: "36"
44-
PATTERN: "not slow and not network and not clipboard"
45-
BITS32: "yes"
41+
#py36_32bit:
42+
# ENV_FILE: ci/deps/azure-36-32bit.yaml
43+
# CONDA_PY: "36"
44+
# PATTERN: "not slow and not network and not clipboard"
45+
# BITS32: "yes"
4646

4747
py37_locale:
4848
ENV_FILE: ci/deps/azure-37-locale.yaml

doc/source/user_guide/dsintro.rst

+22
Original file line numberDiff line numberDiff line change
@@ -397,6 +397,28 @@ The result will be a DataFrame with the same index as the input Series, and
397397
with one column whose name is the original name of the Series (only if no other
398398
column name provided).
399399

400+
.. _basics.dataframe.from_list_dataclasses:
401+
402+
From a list of dataclasses
403+
~~~~~~~~~~~~~~~~~~~~~~~~~~
404+
405+
.. versionadded:: 1.1.0
406+
407+
Data Classes as introduced in `PEP557 <https://www.python.org/dev/peps/pep-0557>`__,
408+
can be passed into the DataFrame constructor.
409+
Passing a list of dataclasses is equivilent to passing a list of dictionaries.
410+
411+
Please be aware, that that all values in the list should be dataclasses, mixing
412+
types in the list would result in a TypeError.
413+
414+
.. ipython:: python
415+
416+
from dataclasses import make_dataclass
417+
418+
Point = make_dataclass("Point", [("x", int), ("y", int)])
419+
420+
pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])
421+
400422
**Missing data**
401423

402424
Much more will be said on this topic in the :ref:`Missing data <missing_data>`

doc/source/whatsnew/v1.1.0.rst

+40-3
Original file line numberDiff line numberDiff line change
@@ -168,14 +168,46 @@ key and type of :class:`Index`. These now consistently raise ``KeyError`` (:iss
168168
169169
.. ---------------------------------------------------------------------------
170170
171+
.. _whatsnew_110.api_breaking.assignment_to_multiple_columns:
172+
173+
Assignment to multiple columns of a DataFrame when some columns do not exist
174+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
175+
176+
Assignment to multiple columns of a :class:`DataFrame` when some of the columns do not exist would previously assign the values to the last column. Now, new columns would be constructed with the right values. (:issue:`13658`)
177+
178+
.. ipython:: python
179+
180+
df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]})
181+
df
182+
183+
*Previous behavior*:
184+
185+
.. code-block:: ipython
186+
187+
In [3]: df[['a', 'c']] = 1
188+
In [4]: df
189+
Out[4]:
190+
a b
191+
0 1 1
192+
1 1 1
193+
2 1 1
194+
195+
*New behavior*:
196+
197+
.. ipython:: python
198+
199+
df[['a', 'c']] = 1
200+
df
201+
171202
.. _whatsnew_110.deprecations:
172203

173204
Deprecations
174205
~~~~~~~~~~~~
175206
- Lookups on a :class:`Series` with a single-item list containing a slice (e.g. ``ser[[slice(0, 4)]]``) are deprecated, will raise in a future version. Either convert the list to tuple, or pass the slice directly instead (:issue:`31333`)
176207
- :meth:`DataFrame.mean` and :meth:`DataFrame.median` with ``numeric_only=None`` will include datetime64 and datetime64tz columns in a future version (:issue:`29941`)
177208
- Setting values with ``.loc`` using a positional slice is deprecated and will raise in a future version. Use ``.loc`` with labels or ``.iloc`` with positions instead (:issue:`31840`)
178-
-
209+
- :meth:`DataFrame.to_dict` has deprecated accepting short names for ``orient`` in future versions (:issue:`32515`)
210+
- :meth:`Categorical.to_dense` is deprecated and will be removed in a future version, use ``np.asarray(cat)`` instead (:issue:`32639`)
179211

180212
.. ---------------------------------------------------------------------------
181213
@@ -190,7 +222,7 @@ Performance improvements
190222
- Performance improvement in flex arithmetic ops between :class:`DataFrame` and :class:`Series` with ``axis=0`` (:issue:`31296`)
191223
- The internal index method :meth:`~Index._shallow_copy` now copies cached attributes over to the new index,
192224
avoiding creating these again on the new index. This can speed up many operations that depend on creating copies of
193-
existing indexes (:issue:`28584`, :issue:`32640`)
225+
existing indexes (:issue:`28584`, :issue:`32640`, :issue:`32669`)
194226

195227
.. ---------------------------------------------------------------------------
196228
@@ -216,6 +248,7 @@ Datetimelike
216248
- Bug in :class:`Timestamp` where constructing :class:`Timestamp` with dateutil timezone less than 128 nanoseconds before daylight saving time switch from winter to summer would result in nonexistent time (:issue:`31043`)
217249
- Bug in :meth:`Period.to_timestamp`, :meth:`Period.start_time` with microsecond frequency returning a timestamp one nanosecond earlier than the correct time (:issue:`31475`)
218250
- :class:`Timestamp` raising confusing error message when year, month or day is missing (:issue:`31200`)
251+
- Bug in :class:`DatetimeIndex` constructor incorrectly accepting ``bool``-dtyped inputs (:issue:`32668`)
219252

220253
Timedelta
221254
^^^^^^^^^
@@ -241,7 +274,7 @@ Conversion
241274
^^^^^^^^^^
242275
- Bug in :class:`Series` construction from NumPy array with big-endian ``datetime64`` dtype (:issue:`29684`)
243276
- Bug in :class:`Timedelta` construction with large nanoseconds keyword value (:issue:`32402`)
244-
-
277+
- Bug in :class:`DataFrame` construction where sets would be duplicated rather than raising (:issue:`32582`)
245278

246279
Strings
247280
^^^^^^^
@@ -306,6 +339,7 @@ I/O
306339
- Bug in :meth:`read_csv` was raising `TypeError` when `sep=None` was used in combination with `comment` keyword (:issue:`31396`)
307340
- Bug in :class:`HDFStore` that caused it to set to ``int64`` the dtype of a ``datetime64`` column when reading a DataFrame in Python 3 from fixed format written in Python 2 (:issue:`31750`)
308341
- Bug in :meth:`read_excel` where a UTF-8 string with a high surrogate would cause a segmentation violation (:issue:`23809`)
342+
- Bug in :meth:`read_csv` was causing a file descriptor leak on an empty file (:issue:`31488`)
309343

310344

311345
Plotting
@@ -335,6 +369,8 @@ Reshaping
335369
- Bug in :func:`concat` where the resulting indices are not copied when ``copy=True`` (:issue:`29879`)
336370
- :meth:`Series.append` will now raise a ``TypeError`` when passed a DataFrame or a sequence containing Dataframe (:issue:`31413`)
337371
- :meth:`DataFrame.replace` and :meth:`Series.replace` will raise a ``TypeError`` if ``to_replace`` is not an expected type. Previously the ``replace`` would fail silently (:issue:`18634`)
372+
- Bug in :meth:`DataFrame.apply` where callback was called with :class:`Series` parameter even though ``raw=True`` requested. (:issue:`32423`)
373+
- Bug in :meth:`DataFrame.pivot_table` losing timezone information when creating a :class:`MultiIndex` level from a column with timezone-aware dtype (:issue:`32558`)
338374

339375

340376
Sparse
@@ -356,6 +392,7 @@ Other
356392
instead of ``TypeError: Can only append a Series if ignore_index=True or if the Series has a name`` (:issue:`30871`)
357393
- Set operations on an object-dtype :class:`Index` now always return object-dtype results (:issue:`31401`)
358394
- Bug in :meth:`AbstractHolidayCalendar.holidays` when no rules were defined (:issue:`31415`)
395+
- Bug in :meth:`DataFrame.to_records` incorrectly losing timezone information in timezone-aware ``datetime64`` columns (:issue:`32535`)
359396

360397
.. ---------------------------------------------------------------------------
361398

pandas/_libs/internals.pyx

+11-19
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import cython
2+
from collections import defaultdict
23
from cython import Py_ssize_t
34

45
from cpython.slice cimport PySlice_GetIndicesEx
@@ -7,7 +8,9 @@ cdef extern from "Python.h":
78
Py_ssize_t PY_SSIZE_T_MAX
89

910
import numpy as np
10-
from numpy cimport int64_t
11+
cimport numpy as cnp
12+
from numpy cimport NPY_INT64, int64_t
13+
cnp.import_array()
1114

1215
from pandas._libs.algos import ensure_int64
1316

@@ -105,7 +108,9 @@ cdef class BlockPlacement:
105108
Py_ssize_t start, stop, end, _
106109
if not self._has_array:
107110
start, stop, step, _ = slice_get_indices_ex(self._as_slice)
108-
self._as_array = np.arange(start, stop, step, dtype=np.int64)
111+
# NOTE: this is the C-optimized equivalent of
112+
# np.arange(start, stop, step, dtype=np.int64)
113+
self._as_array = cnp.PyArray_Arange(start, stop, step, NPY_INT64)
109114
self._has_array = True
110115
return self._as_array
111116

@@ -369,8 +374,7 @@ def get_blkno_indexers(int64_t[:] blknos, bint group=True):
369374
Py_ssize_t i, start, stop, n, diff
370375

371376
object blkno
372-
list group_order
373-
dict group_dict
377+
object group_dict = defaultdict(list)
374378
int64_t[:] res_view
375379

376380
n = blknos.shape[0]
@@ -391,28 +395,16 @@ def get_blkno_indexers(int64_t[:] blknos, bint group=True):
391395

392396
yield cur_blkno, slice(start, n)
393397
else:
394-
group_order = []
395-
group_dict = {}
396-
397398
for i in range(1, n):
398399
if blknos[i] != cur_blkno:
399-
if cur_blkno not in group_dict:
400-
group_order.append(cur_blkno)
401-
group_dict[cur_blkno] = [(start, i)]
402-
else:
403-
group_dict[cur_blkno].append((start, i))
400+
group_dict[cur_blkno].append((start, i))
404401

405402
start = i
406403
cur_blkno = blknos[i]
407404

408-
if cur_blkno not in group_dict:
409-
group_order.append(cur_blkno)
410-
group_dict[cur_blkno] = [(start, n)]
411-
else:
412-
group_dict[cur_blkno].append((start, n))
405+
group_dict[cur_blkno].append((start, n))
413406

414-
for blkno in group_order:
415-
slices = group_dict[blkno]
407+
for blkno, slices in group_dict.items():
416408
if len(slices) == 1:
417409
yield blkno, slice(slices[0][0], slices[0][1])
418410
else:

pandas/conftest.py

+9
Original file line numberDiff line numberDiff line change
@@ -425,6 +425,15 @@ def nselect_method(request):
425425
return request.param
426426

427427

428+
@pytest.fixture(params=["first", "last", False])
429+
def keep(request):
430+
"""
431+
Valid values for the 'keep' parameter used in
432+
.duplicated or .drop_duplicates
433+
"""
434+
return request.param
435+
436+
428437
@pytest.fixture(params=["left", "right", "both", "neither"])
429438
def closed(request):
430439
"""

pandas/core/algorithms.py

+12-10
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111

1212
from pandas._libs import Timestamp, algos, hashtable as htable, lib
1313
from pandas._libs.tslib import iNaT
14+
from pandas._typing import AnyArrayLike
1415
from pandas.util._decorators import doc
1516

1617
from pandas.core.dtypes.cast import (
@@ -45,10 +46,14 @@
4546
is_unsigned_integer_dtype,
4647
needs_i8_conversion,
4748
)
48-
from pandas.core.dtypes.generic import ABCIndex, ABCIndexClass, ABCSeries
49+
from pandas.core.dtypes.generic import (
50+
ABCExtensionArray,
51+
ABCIndex,
52+
ABCIndexClass,
53+
ABCSeries,
54+
)
4955
from pandas.core.dtypes.missing import isna, na_value_for_dtype
5056

51-
import pandas.core.common as com
5257
from pandas.core.construction import array, extract_array
5358
from pandas.core.indexers import validate_indices
5459

@@ -384,7 +389,7 @@ def unique(values):
384389
unique1d = unique
385390

386391

387-
def isin(comps, values) -> np.ndarray:
392+
def isin(comps: AnyArrayLike, values: AnyArrayLike) -> np.ndarray:
388393
"""
389394
Compute the isin boolean array.
390395
@@ -409,15 +414,14 @@ def isin(comps, values) -> np.ndarray:
409414
f"to isin(), you passed a [{type(values).__name__}]"
410415
)
411416

412-
if not isinstance(values, (ABCIndex, ABCSeries, np.ndarray)):
417+
if not isinstance(values, (ABCIndex, ABCSeries, ABCExtensionArray, np.ndarray)):
413418
values = construct_1d_object_array_from_listlike(list(values))
414419

420+
comps = extract_array(comps, extract_numpy=True)
415421
if is_categorical_dtype(comps):
416422
# TODO(extension)
417423
# handle categoricals
418-
return comps._values.isin(values)
419-
420-
comps = com.values_from_object(comps)
424+
return comps.isin(values) # type: ignore
421425

422426
comps, dtype = _ensure_data(comps)
423427
values, _ = _ensure_data(values, dtype=dtype)
@@ -2021,9 +2025,7 @@ def sort_mixed(values):
20212025
)
20222026
codes = ensure_platform_int(np.asarray(codes))
20232027

2024-
from pandas import Index
2025-
2026-
if not assume_unique and not Index(values).is_unique:
2028+
if not assume_unique and not len(unique(values)) == len(values):
20272029
raise ValueError("values should be unique if codes is not None")
20282030

20292031
if sorter is None:

pandas/core/apply.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ def get_result(self):
179179
return self.apply_empty_result()
180180

181181
# raw
182-
elif self.raw and not self.obj._is_mixed_type:
182+
elif self.raw:
183183
return self.apply_raw()
184184

185185
return self.apply_standard()

pandas/core/arrays/categorical.py

+13-2
Original file line numberDiff line numberDiff line change
@@ -1675,6 +1675,12 @@ def to_dense(self):
16751675
-------
16761676
dense : array
16771677
"""
1678+
warn(
1679+
"Categorical.to_dense is deprecated and will be removed in "
1680+
"a future version. Use np.asarray(cat) instead.",
1681+
FutureWarning,
1682+
stacklevel=2,
1683+
)
16781684
return np.asarray(self)
16791685

16801686
def fillna(self, value=None, method=None, limit=None):
@@ -1733,12 +1739,17 @@ def fillna(self, value=None, method=None, limit=None):
17331739

17341740
# If value is a dict or a Series (a dict value has already
17351741
# been converted to a Series)
1736-
if isinstance(value, ABCSeries):
1737-
if not value[~value.isin(self.categories)].isna().all():
1742+
if isinstance(value, (np.ndarray, Categorical, ABCSeries)):
1743+
# We get ndarray or Categorical if called via Series.fillna,
1744+
# where it will unwrap another aligned Series before getting here
1745+
1746+
mask = ~algorithms.isin(value, self.categories)
1747+
if not isna(value[mask]).all():
17381748
raise ValueError("fill value must be in categories")
17391749

17401750
values_codes = _get_codes_for_values(value, self.categories)
17411751
indexer = np.where(codes == -1)
1752+
codes = codes.copy()
17421753
codes[indexer] = values_codes[indexer]
17431754

17441755
# If value is not a dict or Series it should be a scalar

pandas/core/arrays/datetimes.py

+10-5
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
from pandas.core.dtypes.common import (
2424
_INT64_DTYPE,
2525
_NS_DTYPE,
26+
is_bool_dtype,
2627
is_categorical_dtype,
2728
is_datetime64_any_dtype,
2829
is_datetime64_dtype,
@@ -1903,32 +1904,36 @@ def maybe_convert_dtype(data, copy):
19031904
------
19041905
TypeError : PeriodDType data is passed
19051906
"""
1906-
if is_float_dtype(data):
1907+
if not hasattr(data, "dtype"):
1908+
# e.g. collections.deque
1909+
return data, copy
1910+
1911+
if is_float_dtype(data.dtype):
19071912
# Note: we must cast to datetime64[ns] here in order to treat these
19081913
# as wall-times instead of UTC timestamps.
19091914
data = data.astype(_NS_DTYPE)
19101915
copy = False
19111916
# TODO: deprecate this behavior to instead treat symmetrically
19121917
# with integer dtypes. See discussion in GH#23675
19131918

1914-
elif is_timedelta64_dtype(data):
1919+
elif is_timedelta64_dtype(data.dtype) or is_bool_dtype(data.dtype):
19151920
# GH#29794 enforcing deprecation introduced in GH#23539
19161921
raise TypeError(f"dtype {data.dtype} cannot be converted to datetime64[ns]")
1917-
elif is_period_dtype(data):
1922+
elif is_period_dtype(data.dtype):
19181923
# Note: without explicitly raising here, PeriodIndex
19191924
# test_setops.test_join_does_not_recur fails
19201925
raise TypeError(
19211926
"Passing PeriodDtype data is invalid. Use `data.to_timestamp()` instead"
19221927
)
19231928

1924-
elif is_categorical_dtype(data):
1929+
elif is_categorical_dtype(data.dtype):
19251930
# GH#18664 preserve tz in going DTI->Categorical->DTI
19261931
# TODO: cases where we need to do another pass through this func,
19271932
# e.g. the categories are timedelta64s
19281933
data = data.categories.take(data.codes, fill_value=NaT)._values
19291934
copy = False
19301935

1931-
elif is_extension_array_dtype(data) and not is_datetime64tz_dtype(data):
1936+
elif is_extension_array_dtype(data.dtype) and not is_datetime64tz_dtype(data.dtype):
19321937
# Includes categorical
19331938
# TODO: We have no tests for these
19341939
data = np.array(data, dtype=np.object_)

0 commit comments

Comments
 (0)