Skip to content

Commit 4095c8c

Browse files
committed
Merge branch 'master' into fix-25587
* master: (22 commits) Fixturize tests/frame/test_operators.py (pandas-dev#25641) Update ValueError message in corr (pandas-dev#25729) DOC: fix some grammar and inconsistency issues in the User Guide (pandas-dev#25728) ENH: Add public start, stop, and step attributes to RangeIndex (pandas-dev#25720) Make Rolling.apply documentation clearer (pandas-dev#25712) pandas-dev#25707 - Fixed flakiness in stata write test (pandas-dev#25714) Json normalize nan support (pandas-dev#25619) TST: resolve issues with test_constructor_dtype_datetime64 (pandas-dev#24868) DEPR: Deprecate box kwarg for to_timedelta and to_datetime (pandas-dev#24486) BUG: Preserve name in DatetimeIndex.snap (pandas-dev#25585) Fix concat not respecting order of OrderedDict (pandas-dev#25224) CLN: remove pandas.core.categorical (pandas-dev#25655) TST/CLN: Remove more Panel tests (pandas-dev#25675) Pinned pycodestyle (pandas-dev#25701) DOC: update date of 0.24.2 release notes (pandas-dev#25699) BUG: Fix error in replace with strings that are large numbers (pandas-dev#25616) (pandas-dev#25644) BUG: fix usage of na_sentinel with sort=True in factorize() (pandas-dev#25592) BUG: Fix to_string output when using header (pandas-dev#16718) (pandas-dev#25602) CLN: Remove unused test code (pandas-dev#25670) CLN: remove Panel from concat error message (pandas-dev#25676) ... # Conflicts: # doc/source/whatsnew/v0.25.0.rst
2 parents aafd214 + 998e1de commit 4095c8c

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+583
-631
lines changed

doc/source/user_guide/text.rst

+3-3
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,8 @@ Since ``df.columns`` is an Index object, we can use the ``.str`` accessor
4646
df.columns.str.lower()
4747
4848
These string methods can then be used to clean up the columns as needed.
49-
Here we are removing leading and trailing white spaces, lower casing all names,
50-
and replacing any remaining white spaces with underscores:
49+
Here we are removing leading and trailing whitespaces, lower casing all names,
50+
and replacing any remaining whitespaces with underscores:
5151

5252
.. ipython:: python
5353
@@ -65,7 +65,7 @@ and replacing any remaining white spaces with underscores:
6565
``Series``.
6666

6767
Please note that a ``Series`` of type ``category`` with string ``.categories`` has
68-
some limitations in comparison of ``Series`` of type string (e.g. you can't add strings to
68+
some limitations in comparison to ``Series`` of type string (e.g. you can't add strings to
6969
each other: ``s + " " + s`` won't work if ``s`` is a ``Series`` of type ``category``). Also,
7070
``.str`` methods which operate on elements of type ``list`` are not available on such a
7171
``Series``.

doc/source/whatsnew/v0.24.2.rst

+7-45
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22

33
.. _whatsnew_0242:
44

5-
Whats New in 0.24.2 (February XX, 2019)
6-
---------------------------------------
5+
Whats New in 0.24.2 (March 12, 2019)
6+
------------------------------------
77

88
.. warning::
99

@@ -18,7 +18,7 @@ including other versions of pandas.
1818
.. _whatsnew_0242.regressions:
1919

2020
Fixed Regressions
21-
^^^^^^^^^^^^^^^^^
21+
~~~~~~~~~~~~~~~~~
2222

2323
- Fixed regression in :meth:`DataFrame.all` and :meth:`DataFrame.any` where ``bool_only=True`` was ignored (:issue:`25101`)
2424
- Fixed issue in ``DataFrame`` construction with passing a mixed list of mixed types could segfault. (:issue:`25075`)
@@ -31,71 +31,32 @@ Fixed Regressions
3131
- Fixed regression in ``IntervalDtype`` construction where passing an incorrect string with 'Interval' as a prefix could result in a ``RecursionError``. (:issue:`25338`)
3232
- Fixed regression in creating a period-dtype array from a read-only NumPy array of period objects. (:issue:`25403`)
3333
- Fixed regression in :class:`Categorical`, where constructing it from a categorical ``Series`` and an explicit ``categories=`` that differed from that in the ``Series`` created an invalid object which could trigger segfaults. (:issue:`25318`)
34+
- Fixed regression in :func:`to_timedelta` losing precision when converting floating data to ``Timedelta`` data (:issue:`25077`).
3435
- Fixed pip installing from source into an environment without NumPy (:issue:`25193`)
36+
- Fixed regression in :meth:`DataFrame.replace` where large strings of numbers would be coerced into ``int64``, causing an ``OverflowError`` (:issue:`25616`)
37+
- Fixed regression in :func:`factorize` when passing a custom ``na_sentinel`` value with ``sort=True`` (:issue:`25409`).
3538
- Fixed regression in :meth:`DataFrame.to_csv` writing duplicate line endings with gzip compress (:issue:`25311`)
3639

37-
.. _whatsnew_0242.enhancements:
38-
39-
Enhancements
40-
^^^^^^^^^^^^
41-
42-
-
43-
-
44-
4540
.. _whatsnew_0242.bug_fixes:
4641

4742
Bug Fixes
4843
~~~~~~~~~
4944

50-
**Conversion**
51-
52-
-
53-
-
54-
-
55-
56-
**Indexing**
57-
58-
-
59-
-
60-
-
61-
6245
**I/O**
6346

6447
- Better handling of terminal printing when the terminal dimensions are not known (:issue:`25080`)
6548
- Bug in reading a HDF5 table-format ``DataFrame`` created in Python 2, in Python 3 (:issue:`24925`)
6649
- Bug in reading a JSON with ``orient='table'`` generated by :meth:`DataFrame.to_json` with ``index=False`` (:issue:`25170`)
6750
- Bug where float indexes could have misaligned values when printing (:issue:`25061`)
68-
-
69-
70-
**Categorical**
71-
72-
-
73-
-
74-
-
75-
76-
**Timezones**
77-
78-
-
79-
-
80-
-
81-
82-
**Timedelta**
83-
84-
-
85-
-
86-
-
8751

8852
**Reshaping**
8953

9054
- Bug in :meth:`~pandas.core.groupby.GroupBy.transform` where applying a function to a timezone aware column would return a timezone naive result (:issue:`24198`)
9155
- Bug in :func:`DataFrame.join` when joining on a timezone aware :class:`DatetimeIndex` (:issue:`23931`)
92-
-
9356

9457
**Visualization**
9558

9659
- Bug in :meth:`Series.plot` where a secondary y axis could not be set to log scale (:issue:`25545`)
97-
-
98-
-
9960

10061
**Other**
10162

@@ -130,6 +91,7 @@ A total of 25 people contributed patches to this release. People with a "+" by t
13091
* Joris Van den Bossche
13192
* Josh
13293
* Justin Zheng
94+
* Kendall Masse
13395
* Matthew Roeschke
13496
* Max Bolingbroke +
13597
* rbenes +

doc/source/whatsnew/v0.25.0.rst

+8-3
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ Other Enhancements
2626
- :meth:`DataFrame.set_index` now works for instances of ``abc.Iterator``, provided their output is of the same length as the calling frame (:issue:`22484`, :issue:`24984`)
2727
- :meth:`DatetimeIndex.union` now supports the ``sort`` argument. The behaviour of the sort parameter matches that of :meth:`Index.union` (:issue:`24994`)
2828
- :meth:`DataFrame.rename` now supports the ``errors`` argument to raise errors when attempting to rename nonexistent keys (:issue:`13473`)
29+
- :class:`RangeIndex` has gained :attr:`~RangeIndex.start`, :attr:`~RangeIndex.stop`, and :attr:`~RangeIndex.step` attributes (:issue:`25710`)
2930

3031
.. _whatsnew_0250.api_breaking:
3132

@@ -86,14 +87,15 @@ Other API Changes
8687
- :class:`DatetimeTZDtype` will now standardize pytz timezones to a common timezone instance (:issue:`24713`)
8788
- ``Timestamp`` and ``Timedelta`` scalars now implement the :meth:`to_numpy` method as aliases to :meth:`Timestamp.to_datetime64` and :meth:`Timedelta.to_timedelta64`, respectively. (:issue:`24653`)
8889
- :meth:`Timestamp.strptime` will now rise a ``NotImplementedError`` (:issue:`25016`)
89-
-
90+
- Bug in :meth:`DatetimeIndex.snap` which didn't preserving the ``name`` of the input :class:`Index` (:issue:`25575`)
9091

9192
.. _whatsnew_0250.deprecations:
9293

9394
Deprecations
9495
~~~~~~~~~~~~
9596

9697
- Deprecated the `M (months)` and `Y (year)` `units` parameter of :func: `pandas.to_timedelta`, :func: `pandas.Timedelta` and :func: `pandas.TimedeltaIndex` (:issue:`16344`)
98+
- The functions :func:`pandas.to_datetime` and :func:`pandas.to_timedelta` have deprecated the ``box`` keyword. Instead, use :meth:`to_numpy` or :meth:`Timestamp.to_datetime64`/:meth:`Timedelta.to_timedelta64`. (:issue:`24416`)
9799

98100
.. _whatsnew_0250.prior_deprecations:
99101

@@ -122,7 +124,7 @@ Bug Fixes
122124
~~~~~~~~~
123125
- Bug in :func:`to_datetime` which would raise an (incorrect) ``ValueError`` when called with a date far into the future and the ``format`` argument specified instead of raising ``OutOfBoundsDatetime`` (:issue:`23830`)
124126
- Bug in an error message in :meth:`DataFrame.plot`. Improved the error message if non-numerics are passed to :meth:`DataFrame.plot` (:issue:`25481`)
125-
- Fixed bug where :class:`api.extensions.ExtensionArray` could not be used in matplotlib plotting (:issue:`25587`)
127+
- Bug in error messages in :meth:`DataFrame.corr` and :meth:`Series.corr`. Added the possibility of using a callable. (:issue:`25729`)
126128

127129
Categorical
128130
^^^^^^^^^^^
@@ -214,14 +216,16 @@ I/O
214216
- Bug in :func:`read_json` for ``orient='table'`` when it tries to infer dtypes by default, which is not applicable as dtypes are already defined in the JSON schema (:issue:`21345`)
215217
- Bug in :func:`read_json` for ``orient='table'`` and float index, as it infers index dtype by default, which is not applicable because index dtype is already defined in the JSON schema (:issue:`25433`)
216218
- Bug in :func:`read_json` for ``orient='table'`` and string of float column names, as it makes a column name type conversion to Timestamp, which is not applicable because column names are already defined in the JSON schema (:issue:`25435`)
219+
- Bug in :func:`json_normalize` for ``errors='ignore'`` where missing values in the input data, were filled in resulting ``DataFrame`` with the string "nan" instead of ``numpy.nan`` (:issue:`25468`)
217220
- :meth:`DataFrame.to_html` now raises ``TypeError`` when using an invalid type for the ``classes`` parameter instead of ``AsseertionError`` (:issue:`25608`)
218-
-
221+
- Bug in :meth:`DataFrame.to_string` and :meth:`DataFrame.to_latex` that would lead to incorrect output when the ``header`` keyword is used (:issue:`16718`)
219222
-
220223

221224

222225
Plotting
223226
^^^^^^^^
224227

228+
- Fixed bug where :class:`api.extensions.ExtensionArray` could not be used in matplotlib plotting (:issue:`25587`)
225229
-
226230
-
227231
-
@@ -241,6 +245,7 @@ Reshaping
241245
- Bug in :func:`pandas.merge` adds a string of ``None`` if ``None`` is assigned in suffixes instead of remain the column name as-is (:issue:`24782`).
242246
- Bug in :func:`merge` when merging by index name would sometimes result in an incorrectly numbered index (:issue:`24212`)
243247
- :func:`to_records` now accepts dtypes to its `column_dtypes` parameter (:issue:`24895`)
248+
- Bug in :func:`concat` where order of ``OrderedDict`` (and ``dict`` in Python 3.6+) is not respected, when passed in as ``objs`` argument (:issue:`21510`)
244249

245250

246251
Sparse

environment.yml

+1
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ dependencies:
1919
- hypothesis>=3.82
2020
- isort
2121
- moto
22+
- pycodestyle=2.4
2223
- pytest>=4.0.2
2324
- pytest-mock
2425
- sphinx

pandas/_libs/tslibs/timedeltas.pyx

+16-3
Original file line numberDiff line numberDiff line change
@@ -246,9 +246,11 @@ def array_to_timedelta64(object[:] values, unit='ns', errors='raise'):
246246
return iresult.base # .base to access underlying np.ndarray
247247

248248

249-
cdef inline int64_t cast_from_unit(object ts, object unit) except? -1:
250-
""" return a casting of the unit represented to nanoseconds
251-
round the fractional part of a float to our precision, p """
249+
cpdef inline object precision_from_unit(object unit):
250+
"""
251+
Return a casting of the unit represented to nanoseconds + the precision
252+
to round the fractional part.
253+
"""
252254
cdef:
253255
int64_t m
254256
int p
@@ -285,6 +287,17 @@ cdef inline int64_t cast_from_unit(object ts, object unit) except? -1:
285287
p = 0
286288
else:
287289
raise ValueError("cannot cast unit {unit}".format(unit=unit))
290+
return m, p
291+
292+
293+
cdef inline int64_t cast_from_unit(object ts, object unit) except? -1:
294+
""" return a casting of the unit represented to nanoseconds
295+
round the fractional part of a float to our precision, p """
296+
cdef:
297+
int64_t m
298+
int p
299+
300+
m, p = precision_from_unit(unit)
288301

289302
# just give me the unit back
290303
if ts is None:

pandas/core/algorithms.py

+13-7
Original file line numberDiff line numberDiff line change
@@ -619,13 +619,19 @@ def factorize(values, sort=False, order=None, na_sentinel=-1, size_hint=None):
619619

620620
if sort and len(uniques) > 0:
621621
from pandas.core.sorting import safe_sort
622-
try:
623-
order = uniques.argsort()
624-
order2 = order.argsort()
625-
labels = take_1d(order2, labels, fill_value=na_sentinel)
626-
uniques = uniques.take(order)
627-
except TypeError:
628-
# Mixed types, where uniques.argsort fails.
622+
if na_sentinel == -1:
623+
# GH-25409 take_1d only works for na_sentinels of -1
624+
try:
625+
order = uniques.argsort()
626+
order2 = order.argsort()
627+
labels = take_1d(order2, labels, fill_value=na_sentinel)
628+
uniques = uniques.take(order)
629+
except TypeError:
630+
# Mixed types, where uniques.argsort fails.
631+
uniques, labels = safe_sort(uniques, labels,
632+
na_sentinel=na_sentinel,
633+
assume_unique=True)
634+
else:
629635
uniques, labels = safe_sort(uniques, labels,
630636
na_sentinel=na_sentinel,
631637
assume_unique=True)

pandas/core/arrays/timedeltas.py

+9-6
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
from pandas._libs.tslibs import NaT, Timedelta, Timestamp, iNaT
1212
from pandas._libs.tslibs.fields import get_timedelta_field
1313
from pandas._libs.tslibs.timedeltas import (
14-
array_to_timedelta64, parse_timedelta_unit)
14+
array_to_timedelta64, parse_timedelta_unit, precision_from_unit)
1515
import pandas.compat as compat
1616
from pandas.util._decorators import Appender
1717

@@ -918,12 +918,15 @@ def sequence_to_td64ns(data, copy=False, unit="ns", errors="raise"):
918918
copy = copy and not copy_made
919919

920920
elif is_float_dtype(data.dtype):
921-
# treat as multiples of the given unit. If after converting to nanos,
922-
# there are fractional components left, these are truncated
923-
# (i.e. NOT rounded)
921+
# cast the unit, multiply base/frace separately
922+
# to avoid precision issues from float -> int
924923
mask = np.isnan(data)
925-
coeff = np.timedelta64(1, unit) / np.timedelta64(1, 'ns')
926-
data = (coeff * data).astype(np.int64).view('timedelta64[ns]')
924+
m, p = precision_from_unit(unit)
925+
base = data.astype(np.int64)
926+
frac = data - base
927+
if p:
928+
frac = np.round(frac, p)
929+
data = (base * m + (frac * m).astype(np.int64)).view('timedelta64[ns]')
927930
data[mask] = iNaT
928931
copy = False
929932

pandas/core/categorical.py

-9
This file was deleted.

pandas/core/dtypes/cast.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -794,10 +794,10 @@ def soft_convert_objects(values, datetime=True, numeric=True, timedelta=True,
794794
# Immediate return if coerce
795795
if datetime:
796796
from pandas import to_datetime
797-
return to_datetime(values, errors='coerce', box=False)
797+
return to_datetime(values, errors='coerce').to_numpy()
798798
elif timedelta:
799799
from pandas import to_timedelta
800-
return to_timedelta(values, errors='coerce', box=False)
800+
return to_timedelta(values, errors='coerce').to_numpy()
801801
elif numeric:
802802
from pandas import to_numeric
803803
return to_numeric(values, errors='coerce')

pandas/core/frame.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -7088,8 +7088,8 @@ def corr(self, method='pearson', min_periods=1):
70887088
correl[j, i] = c
70897089
else:
70907090
raise ValueError("method must be either 'pearson', "
7091-
"'spearman', or 'kendall', '{method}' "
7092-
"was supplied".format(method=method))
7091+
"'spearman', 'kendall', or a callable, "
7092+
"'{method}' was supplied".format(method=method))
70937093

70947094
return self._constructor(correl, index=idx, columns=cols)
70957095

pandas/core/groupby/generic.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -822,7 +822,7 @@ def _aggregate_multiple_funcs(self, arg, _level):
822822
columns.append(com.get_callable_name(f))
823823
arg = lzip(columns, arg)
824824

825-
results = {}
825+
results = collections.OrderedDict()
826826
for name, func in arg:
827827
obj = self
828828
if name in results:

pandas/core/indexes/datetimelike.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -300,7 +300,8 @@ def asobject(self):
300300
return self.astype(object)
301301

302302
def _convert_tolerance(self, tolerance, target):
303-
tolerance = np.asarray(to_timedelta(tolerance, box=False))
303+
tolerance = np.asarray(to_timedelta(tolerance).to_numpy())
304+
304305
if target.size != tolerance.size and tolerance.size > 1:
305306
raise ValueError('list-like tolerance size must match '
306307
'target index size')

pandas/core/indexes/datetimes.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -787,8 +787,8 @@ def snap(self, freq='S'):
787787
snapped[i] = s
788788

789789
# we know it conforms; skip check
790-
return DatetimeIndex._simple_new(snapped, freq=freq)
791-
# TODO: what about self.name? tz? if so, use shallow_copy?
790+
return DatetimeIndex._simple_new(snapped, name=self.name, tz=self.tz,
791+
freq=freq)
792792

793793
def join(self, other, how='left', level=None, return_indexers=False,
794794
sort=False):

pandas/core/indexes/range.py

+26-1
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,9 @@ class RangeIndex(Int64Index):
4848
4949
Attributes
5050
----------
51-
None
51+
start
52+
stop
53+
step
5254
5355
Methods
5456
-------
@@ -209,6 +211,29 @@ def _format_data(self, name=None):
209211
return None
210212

211213
# --------------------------------------------------------------------
214+
@property
215+
def start(self):
216+
"""
217+
The value of the `start` parameter (or ``0`` if this was not supplied)
218+
"""
219+
# GH 25710
220+
return self._start
221+
222+
@property
223+
def stop(self):
224+
"""
225+
The value of the `stop` parameter
226+
"""
227+
# GH 25710
228+
return self._stop
229+
230+
@property
231+
def step(self):
232+
"""
233+
The value of the `step` parameter (or ``1`` if this was not supplied)
234+
"""
235+
# GH 25710
236+
return self._step
212237

213238
@cache_readonly
214239
def nbytes(self):

0 commit comments

Comments
 (0)