Skip to content

Commit b2f2d1e

Browse files
author
Giacomo Ferroni
committed
Merge branch 'gh15077' of https://github.com/mralgos/pandas into gh15077
2 parents fcbcb5b + 9723c5d commit b2f2d1e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+1416
-544
lines changed

appveyor.yml

+2-1
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,8 @@ install:
6565

6666
# install our build environment
6767
- cmd: conda config --set show_channel_urls true --set always_yes true --set changeps1 false
68-
- cmd: conda update -q conda
68+
#- cmd: conda update -q conda
69+
- cmd: conda install conda=4.2.15
6970
- cmd: conda config --set ssl_verify false
7071

7172
# add the pandas channel *before* defaults to have defaults take priority

doc/source/io.rst

+10
Original file line numberDiff line numberDiff line change
@@ -187,6 +187,16 @@ skipinitialspace : boolean, default ``False``
187187
skiprows : list-like or integer, default ``None``
188188
Line numbers to skip (0-indexed) or number of lines to skip (int) at the start
189189
of the file.
190+
191+
If callable, the callable function will be evaluated against the row
192+
indices, returning True if the row should be skipped and False otherwise:
193+
194+
.. ipython:: python
195+
196+
data = 'col1,col2,col3\na,b,1\na,b,2\nc,d,3'
197+
pd.read_csv(StringIO(data))
198+
pd.read_csv(StringIO(data), skiprows=lambda x: x % 2 != 0)
199+
190200
skipfooter : int, default ``0``
191201
Number of lines at bottom of file to skip (unsupported with engine='c').
192202
skip_footer : int, default ``0``

doc/source/whatsnew/v0.20.0.txt

+29-9
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,25 @@ support for bz2 compression in the python 2 c-engine improved (:issue:`14874`).
9191
df = pd.read_table(url, compression='bz2') # explicitly specify compression
9292
df.head(2)
9393

94+
.. _whatsnew_0200.enhancements.uint64_support:
95+
96+
Pandas has significantly improved support for operations involving unsigned,
97+
or purely non-negative, integers. Previously, handling these integers would
98+
result in improper rounding or data-type casting, leading to incorrect results.
99+
Notably, a new numerical index, ``UInt64Index``, has been created (:issue:`14937`)
100+
101+
.. ipython:: python
102+
103+
idx = pd.UInt64Index([1, 2, 3])
104+
df = pd.DataFrame({'A': ['a', 'b', 'c']}, index=idx)
105+
df.index
106+
107+
- Bug in converting object elements of array-like objects to unsigned 64-bit integers (:issue:`4471`, :issue:`14982`)
108+
- Bug in ``Series.unique()`` in which unsigned 64-bit integers were causing overflow (:issue:`14721`)
109+
- Bug in ``DataFrame`` construction in which unsigned 64-bit integer elements were being converted to objects (:issue:`14881`)
110+
- Bug in ``pd.read_csv()`` in which unsigned 64-bit integer elements were being improperly converted to the wrong data types (:issue:`14983`)
111+
- Bug in ``pd.unique()`` in which unsigned 64-bit integers were causing overflow (:issue:`14915`)
112+
94113
.. _whatsnew_0200.enhancements.other:
95114

96115
Other enhancements
@@ -110,6 +129,7 @@ Other enhancements
110129
- ``pd.qcut`` has gained the ``duplicates='raise'|'drop'`` option to control whether to raise on duplicated edges (:issue:`7751`)
111130
- ``Series`` provides a ``to_excel`` method to output Excel files (:issue:`8825`)
112131
- The ``usecols`` argument in ``pd.read_csv`` now accepts a callable function as a value (:issue:`14154`)
132+
- The ``skiprows`` argument in ``pd.read_csv`` now accepts a callable function as a value (:issue:`10882`)
113133
- ``pd.DataFrame.plot`` now prints a title above each subplot if ``suplots=True`` and ``title`` is a list of strings (:issue:`14753`)
114134
- ``pd.Series.interpolate`` now supports timedelta as an index type with ``method='time'`` (:issue:`6424`)
115135
- ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`)
@@ -244,10 +264,11 @@ Other API Changes
244264
- ``CParserError`` has been renamed to ``ParserError`` in ``pd.read_csv`` and will be removed in the future (:issue:`12665`)
245265
- ``SparseArray.cumsum()`` and ``SparseSeries.cumsum()`` will now always return ``SparseArray`` and ``SparseSeries`` respectively (:issue:`12855`)
246266
- ``DataFrame.applymap()`` with an empty ``DataFrame`` will return a copy of the empty ``DataFrame`` instead of a ``Series`` (:issue:`8222`)
247-
267+
- ``.loc`` has compat with ``.ix`` for accepting iterators, and NamedTuples (:issue:`15120`)
248268
- ``pd.read_csv()`` will now issue a ``ParserWarning`` whenever there are conflicting values provided by the ``dialect`` parameter and the user (:issue:`14898`)
249269
- ``pd.read_csv()`` will now raise a ``ValueError`` for the C engine if the quote character is larger than than one byte (:issue:`11592`)
250270
- ``inplace`` arguments now require a boolean value, else a ``ValueError`` is thrown (:issue:`14189`)
271+
- ``pandas.api.types.is_datetime64_ns_dtype`` will now report ``True`` on a tz-aware dtype, similar to ``pandas.api.types.is_datetime64_any_dtype``
251272

252273
.. _whatsnew_0200.deprecations:
253274

@@ -296,8 +317,6 @@ Bug Fixes
296317

297318
- Bug in ``Index`` power operations with reversed operands (:issue:`14973`)
298319
- Bug in ``TimedeltaIndex`` addition where overflow was being allowed without error (:issue:`14816`)
299-
- Bug in ``DataFrame`` construction in which unsigned 64-bit integer elements were being converted to objects (:issue:`14881`)
300-
- Bug in ``pd.read_csv()`` in which unsigned 64-bit integer elements were being improperly converted to the wrong data types (:issue:`14983`)
301320
- Bug in ``astype()`` where ``inf`` values were incorrectly converted to integers. Now raises error now with ``astype()`` for Series and DataFrames (:issue:`14265`)
302321
- Bug in ``DataFrame(..).apply(to_numeric)`` when values are of type decimal.Decimal. (:issue:`14827`)
303322
- Bug in ``describe()`` when passing a numpy array which does not contain the median to the ``percentiles`` keyword argument (:issue:`14908`)
@@ -318,13 +337,11 @@ Bug Fixes
318337
- Bug in ``Series`` construction with a datetimetz (:issue:`14928`)
319338

320339
- Bug in compat for passing long integers to ``Timestamp.replace`` (:issue:`15030`)
340+
- Bug in ``.loc`` that would not return the correct dtype for scalar access for a DataFrame (:issue:`11617`)
321341

322342

323343

324344

325-
- Bug in ``Series.unique()`` in which unsigned 64-bit integers were causing overflow (:issue:`14721`)
326-
- Bug in ``pd.unique()`` in which unsigned 64-bit integers were causing overflow (:issue:`14915`)
327-
328345

329346

330347

@@ -348,18 +365,21 @@ Bug Fixes
348365

349366

350367
- Require at least 0.23 version of cython to avoid problems with character encodings (:issue:`14699`)
351-
- Bug in converting object elements of array-like objects to unsigned 64-bit integers (:issue:`4471`, :issue:`14982`)
352368
- Bug in ``pd.pivot_table()`` where no error was raised when values argument was not in the columns (:issue:`14938`)
369+
- Bug in ``.to_json()`` where ``lines=True`` and contents (keys or values) contain escaped characters (:issue:`15096`)
353370

371+
- Bug in ``DataFrame.groupby().describe()`` when grouping on ``Index`` containing tuples (:issue:`14848`)
372+
- Bug in creating a ``MultiIndex`` with tuples and not passing a list of names; this will now raise ``ValueError`` (:issue:`15110`)
354373

374+
- Bug in catching an overflow in ``Timestamp`` + ``Timedelta/Offset`` operations (:issue:`15126`)
355375

356376

377+
- Bug in ``pd.merge_asof()`` where ``left_index``/``right_index`` together caused a failure when ``tolerance`` was specified (:issue:`15135`)
357378

358379

359380

360381

361382

362-
363-
383+
- Bug in ``Series`` constructor when both ``copy=True`` and ``dtype`` arguments are provided (:issue:`15125`)
364384
- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792`)
365385
- Incorrect dtyped ``Series`` was returned by comparison methods (e.g., ``lt``, ``gt``, ...) against a constant for an empty ``DataFrame`` (:issue:`15077`)

pandas/api/tests/test_api.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ class TestPDApi(Base, tm.TestCase):
5353
classes = ['Categorical', 'CategoricalIndex', 'DataFrame', 'DateOffset',
5454
'DatetimeIndex', 'ExcelFile', 'ExcelWriter', 'Float64Index',
5555
'Grouper', 'HDFStore', 'Index', 'Int64Index', 'MultiIndex',
56-
'Period', 'PeriodIndex', 'RangeIndex',
56+
'Period', 'PeriodIndex', 'RangeIndex', 'UInt64Index',
5757
'Series', 'SparseArray', 'SparseDataFrame',
5858
'SparseSeries', 'TimeGrouper', 'Timedelta',
5959
'TimedeltaIndex', 'Timestamp']

pandas/computation/tests/test_compat.py

-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
#!/usr/bin/env python
21

32
# flake8: noqa
43

pandas/computation/tests/test_eval.py

-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
#!/usr/bin/env python
21

32
# flake8: noqa
43

pandas/core/api.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,8 @@
1010
from pandas.core.groupby import Grouper
1111
from pandas.formats.format import set_eng_float_format
1212
from pandas.core.index import (Index, CategoricalIndex, Int64Index,
13-
RangeIndex, Float64Index, MultiIndex)
13+
UInt64Index, RangeIndex, Float64Index,
14+
MultiIndex)
1415

1516
from pandas.core.series import Series, TimeSeries
1617
from pandas.core.frame import DataFrame

pandas/core/indexing.py

+85-15
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
is_categorical_dtype,
1010
is_list_like,
1111
is_sequence,
12+
is_iterator,
1213
is_scalar,
1314
is_sparse,
1415
_is_unorderable_exception,
@@ -859,15 +860,20 @@ def _convert_for_reindex(self, key, axis=0):
859860
return labels[key]
860861
else:
861862
if isinstance(key, Index):
862-
# want Index objects to pass through untouched
863-
keyarr = key
863+
keyarr = labels._convert_index_indexer(key)
864864
else:
865865
# asarray can be unsafe, NumPy strings are weird
866866
keyarr = _asarray_tuplesafe(key)
867867

868-
if is_integer_dtype(keyarr) and not labels.is_integer():
869-
keyarr = _ensure_platform_int(keyarr)
870-
return labels.take(keyarr)
868+
if is_integer_dtype(keyarr):
869+
# Cast the indexer to uint64 if possible so
870+
# that the values returned from indexing are
871+
# also uint64.
872+
keyarr = labels._convert_arr_indexer(keyarr)
873+
874+
if not labels.is_integer():
875+
keyarr = _ensure_platform_int(keyarr)
876+
return labels.take(keyarr)
871877

872878
return keyarr
873879

@@ -1043,11 +1049,10 @@ def _getitem_iterable(self, key, axis=0):
10431049
return self.obj.take(inds, axis=axis, convert=False)
10441050
else:
10451051
if isinstance(key, Index):
1046-
# want Index objects to pass through untouched
1047-
keyarr = key
1052+
keyarr = labels._convert_index_indexer(key)
10481053
else:
1049-
# asarray can be unsafe, NumPy strings are weird
10501054
keyarr = _asarray_tuplesafe(key)
1055+
keyarr = labels._convert_arr_indexer(keyarr)
10511056

10521057
if is_categorical_dtype(labels):
10531058
keyarr = labels._shallow_copy(keyarr)
@@ -1300,17 +1305,24 @@ class _LocationIndexer(_NDFrameIndexer):
13001305
_exception = Exception
13011306

13021307
def __getitem__(self, key):
1303-
if isinstance(key, tuple):
1304-
key = tuple(com._apply_if_callable(x, self.obj) for x in key)
1305-
else:
1306-
# scalar callable may return tuple
1307-
key = com._apply_if_callable(key, self.obj)
1308-
13091308
if type(key) is tuple:
1309+
key = tuple(com._apply_if_callable(x, self.obj) for x in key)
1310+
try:
1311+
if self._is_scalar_access(key):
1312+
return self._getitem_scalar(key)
1313+
except (KeyError, IndexError):
1314+
pass
13101315
return self._getitem_tuple(key)
13111316
else:
1317+
key = com._apply_if_callable(key, self.obj)
13121318
return self._getitem_axis(key, axis=0)
13131319

1320+
def _is_scalar_access(self, key):
1321+
raise NotImplementedError()
1322+
1323+
def _getitem_scalar(self, key):
1324+
raise NotImplementedError()
1325+
13141326
def _getitem_axis(self, key, axis=0):
13151327
raise NotImplementedError()
13161328

@@ -1389,7 +1401,8 @@ def _has_valid_type(self, key, axis):
13891401
return True
13901402

13911403
# TODO: don't check the entire key unless necessary
1392-
if len(key) and np.all(ax.get_indexer_for(key) < 0):
1404+
if (not is_iterator(key) and len(key) and
1405+
np.all(ax.get_indexer_for(key) < 0)):
13931406

13941407
raise KeyError("None of [%s] are in the [%s]" %
13951408
(key, self.obj._get_axis_name(axis)))
@@ -1420,6 +1433,36 @@ def error():
14201433

14211434
return True
14221435

1436+
def _is_scalar_access(self, key):
1437+
# this is a shortcut accessor to both .loc and .iloc
1438+
# that provide the equivalent access of .at and .iat
1439+
# a) avoid getting things via sections and (to minimize dtype changes)
1440+
# b) provide a performant path
1441+
if not hasattr(key, '__len__'):
1442+
return False
1443+
1444+
if len(key) != self.ndim:
1445+
return False
1446+
1447+
for i, k in enumerate(key):
1448+
if not is_scalar(k):
1449+
return False
1450+
1451+
ax = self.obj.axes[i]
1452+
if isinstance(ax, MultiIndex):
1453+
return False
1454+
1455+
if not ax.is_unique:
1456+
return False
1457+
1458+
return True
1459+
1460+
def _getitem_scalar(self, key):
1461+
# a fast-path to scalar access
1462+
# if not, raise
1463+
values = self.obj.get_value(*key)
1464+
return values
1465+
14231466
def _get_partial_string_timestamp_match_key(self, key, labels):
14241467
"""Translate any partial string timestamp matches in key, returning the
14251468
new key (GH 10331)"""
@@ -1536,6 +1579,33 @@ def _has_valid_type(self, key, axis):
15361579
def _has_valid_setitem_indexer(self, indexer):
15371580
self._has_valid_positional_setitem_indexer(indexer)
15381581

1582+
def _is_scalar_access(self, key):
1583+
# this is a shortcut accessor to both .loc and .iloc
1584+
# that provide the equivalent access of .at and .iat
1585+
# a) avoid getting things via sections and (to minimize dtype changes)
1586+
# b) provide a performant path
1587+
if not hasattr(key, '__len__'):
1588+
return False
1589+
1590+
if len(key) != self.ndim:
1591+
return False
1592+
1593+
for i, k in enumerate(key):
1594+
if not is_integer(k):
1595+
return False
1596+
1597+
ax = self.obj.axes[i]
1598+
if not ax.is_unique:
1599+
return False
1600+
1601+
return True
1602+
1603+
def _getitem_scalar(self, key):
1604+
# a fast-path to scalar access
1605+
# if not, raise
1606+
values = self.obj.get_value(*key, takeable=True)
1607+
return values
1608+
15391609
def _is_valid_integer(self, key, axis):
15401610
# return a boolean if we have a valid integer indexer
15411611

pandas/core/series.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -237,7 +237,8 @@ def __init__(self, data=None, index=None, dtype=None, name=None,
237237
# create/copy the manager
238238
if isinstance(data, SingleBlockManager):
239239
if dtype is not None:
240-
data = data.astype(dtype=dtype, raise_on_error=False)
240+
data = data.astype(dtype=dtype, raise_on_error=False,
241+
copy=copy)
241242
elif copy:
242243
data = data.copy()
243244
else:

pandas/indexes/api.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
from pandas.indexes.category import CategoricalIndex # noqa
55
from pandas.indexes.multi import MultiIndex # noqa
66
from pandas.indexes.numeric import (NumericIndex, Float64Index, # noqa
7-
Int64Index)
7+
Int64Index, UInt64Index)
88
from pandas.indexes.range import RangeIndex # noqa
99

1010
import pandas.core.common as com
@@ -13,7 +13,7 @@
1313
# TODO: there are many places that rely on these private methods existing in
1414
# pandas.core.index
1515
__all__ = ['Index', 'MultiIndex', 'NumericIndex', 'Float64Index', 'Int64Index',
16-
'CategoricalIndex', 'RangeIndex',
16+
'CategoricalIndex', 'RangeIndex', 'UInt64Index',
1717
'InvalidIndexError',
1818
'_new_Index',
1919
'_ensure_index', '_get_na_value', '_get_combined_index',

0 commit comments

Comments
 (0)