Skip to content

Commit 5049b5e

Browse files
committed
Merge pull request #11173 from bashtage/to-numeric
ENH: Restore original convert_objects and add _convert
2 parents 9fc9201 + d234d84 commit 5049b5e

22 files changed

+529
-339
lines changed

doc/source/basics.rst

+9-24
Original file line numberDiff line numberDiff line change
@@ -1711,36 +1711,26 @@ then the more *general* one will be used as the result of the operation.
17111711
object conversion
17121712
~~~~~~~~~~~~~~~~~
17131713

1714-
.. note::
1715-
1716-
The syntax of :meth:`~DataFrame.convert_objects` changed in 0.17.0. See
1717-
:ref:`API changes <whatsnew_0170.api_breaking.convert_objects>`
1718-
for more details.
1719-
1720-
:meth:`~DataFrame.convert_objects` is a method that converts columns from
1721-
the ``object`` dtype to datetimes, timedeltas or floats. For example, to
1722-
attempt conversion of object data that are *number like*, e.g. could be a
1723-
string that represents a number, pass ``numeric=True``. By default, this will
1724-
attempt a soft conversion and so will only succeed if the entire column is
1725-
convertible. To force the conversion, add the keyword argument ``coerce=True``.
1726-
This will force strings and number-like objects to be numbers if
1727-
possible, and other values will be set to ``np.nan``.
1714+
:meth:`~DataFrame.convert_objects` is a method to try to force conversion of types from the ``object`` dtype to other types.
1715+
To force conversion of specific types that are *number like*, e.g. could be a string that represents a number,
1716+
pass ``convert_numeric=True``. This will force strings and numbers alike to be numbers if possible, otherwise
1717+
they will be set to ``np.nan``.
17281718

17291719
.. ipython:: python
17301720
17311721
df3['D'] = '1.'
17321722
df3['E'] = '1'
1733-
df3.convert_objects(numeric=True).dtypes
1723+
df3.convert_objects(convert_numeric=True).dtypes
17341724
17351725
# same, but specific dtype conversion
17361726
df3['D'] = df3['D'].astype('float16')
17371727
df3['E'] = df3['E'].astype('int32')
17381728
df3.dtypes
17391729
1740-
To force conversion to ``datetime64[ns]``, pass ``datetime=True`` and ``coerce=True``.
1730+
To force conversion to ``datetime64[ns]``, pass ``convert_dates='coerce'``.
17411731
This will convert any datetime-like object to dates, forcing other values to ``NaT``.
17421732
This might be useful if you are reading in data which is mostly dates,
1743-
but occasionally contains non-dates that you wish to represent as missing.
1733+
but occasionally has non-dates intermixed and you want to represent as missing.
17441734

17451735
.. ipython:: python
17461736
@@ -1749,15 +1739,10 @@ but occasionally contains non-dates that you wish to represent as missing.
17491739
'foo', 1.0, 1, pd.Timestamp('20010104'),
17501740
'20010105'], dtype='O')
17511741
s
1752-
s.convert_objects(datetime=True, coerce=True)
1742+
s.convert_objects(convert_dates='coerce')
17531743
1754-
Without passing ``coerce=True``, :meth:`~DataFrame.convert_objects` will attempt
1755-
*soft* conversion of any *object* dtypes, meaning that if all
1744+
In addition, :meth:`~DataFrame.convert_objects` will attempt the *soft* conversion of any *object* dtypes, meaning that if all
17561745
the objects in a Series are of the same type, the Series will have that dtype.
1757-
Note that setting ``coerce=True`` does not *convert* arbitrary types to either
1758-
``datetime64[ns]`` or ``timedelta64[ns]``. For example, a series containing string
1759-
dates will not be converted to a series of datetimes. To convert between types,
1760-
see :ref:`converting to timestamps <timeseries.converting>`.
17611746

17621747
gotchas
17631748
~~~~~~~

doc/source/whatsnew/v0.17.0.txt

+3-65
Original file line numberDiff line numberDiff line change
@@ -640,71 +640,6 @@ New Behavior:
640640
Timestamp.now()
641641
Timestamp.now() + offsets.DateOffset(years=1)
642642

643-
.. _whatsnew_0170.api_breaking.convert_objects:
644-
645-
Changes to convert_objects
646-
^^^^^^^^^^^^^^^^^^^^^^^^^^
647-
648-
``DataFrame.convert_objects`` keyword arguments have been shortened. (:issue:`10265`)
649-
650-
===================== =============
651-
Previous Replacement
652-
===================== =============
653-
``convert_dates`` ``datetime``
654-
``convert_numeric`` ``numeric``
655-
``convert_timedelta`` ``timedelta``
656-
===================== =============
657-
658-
Coercing types with ``DataFrame.convert_objects`` is now implemented using the
659-
keyword argument ``coerce=True``. Previously types were coerced by setting a
660-
keyword argument to ``'coerce'`` instead of ``True``, as in ``convert_dates='coerce'``.
661-
662-
.. ipython:: python
663-
664-
df = pd.DataFrame({'i': ['1','2'],
665-
'f': ['apple', '4.2'],
666-
's': ['apple','banana']})
667-
df
668-
669-
The old usage of ``DataFrame.convert_objects`` used ``'coerce'`` along with the
670-
type.
671-
672-
.. code-block:: python
673-
674-
In [2]: df.convert_objects(convert_numeric='coerce')
675-
676-
Now the ``coerce`` keyword must be explicitly used.
677-
678-
.. ipython:: python
679-
680-
df.convert_objects(numeric=True, coerce=True)
681-
682-
In earlier versions of pandas, ``DataFrame.convert_objects`` would not coerce
683-
numeric types when there were no values convertible to a numeric type. This returns
684-
the original DataFrame with no conversion.
685-
686-
.. code-block:: python
687-
688-
In [1]: df = pd.DataFrame({'s': ['a','b']})
689-
In [2]: df.convert_objects(convert_numeric='coerce')
690-
Out[2]:
691-
s
692-
0 a
693-
1 b
694-
695-
The new behavior will convert all non-number-like strings to ``NaN``,
696-
when ``coerce=True`` is passed explicity.
697-
698-
.. ipython:: python
699-
700-
pd.DataFrame({'s': ['a','b']})
701-
df.convert_objects(numeric=True, coerce=True)
702-
703-
In earlier versions of pandas, the default behavior was to try and convert
704-
datetimes and timestamps. The new default is for ``DataFrame.convert_objects``
705-
to do nothing, and so it is necessary to pass at least one conversion target
706-
in the method call.
707-
708643
Changes to Index Comparisons
709644
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
710645

@@ -992,6 +927,7 @@ Deprecations
992927
- ``Series.is_time_series`` deprecated in favor of ``Series.index.is_all_dates`` (:issue:`11135`)
993928
- Legacy offsets (like ``'A@JAN'``) listed in :ref:`here <timeseries.legacyaliases>` are deprecated (note that this has been alias since 0.8.0), (:issue:`10878`)
994929
- ``WidePanel`` deprecated in favor of ``Panel``, ``LongPanel`` in favor of ``DataFrame`` (note these have been aliases since < 0.11.0), (:issue:`10892`)
930+
- ``DataFrame.convert_objects`` has been deprecated in favor of type-specific function ``pd.to_datetime``, ``pd.to_timestamp`` and ``pd.to_numeric`` (:issue:`11133`).
995931

996932
.. _whatsnew_0170.prior_deprecations:
997933

@@ -1188,3 +1124,5 @@ Bug Fixes
11881124
- Bug in ``DataFrame`` construction from nested ``dict`` with ``timedelta`` keys (:issue:`11129`)
11891125
- Bug in ``.fillna`` against may raise ``TypeError`` when data contains datetime dtype (:issue:`7095`, :issue:`11153`)
11901126
- Bug in ``.groupby`` when number of keys to group by is same as length of index (:issue:`11185`)
1127+
- Bug in ``convert_objects`` where converted values might not be returned if all null and ``coerce`` (:issue:`9589`)
1128+
- Bug in ``convert_objects`` where ``copy`` keyword was not respected (:issue:`9589`)

pandas/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@
5252
from pandas.tools.pivot import pivot_table, crosstab
5353
from pandas.tools.plotting import scatter_matrix, plot_params
5454
from pandas.tools.tile import cut, qcut
55+
from pandas.tools.util import to_numeric
5556
from pandas.core.reshape import melt
5657
from pandas.util.print_versions import show_versions
5758
import pandas.util.testing

pandas/core/common.py

-65
Original file line numberDiff line numberDiff line change
@@ -1858,71 +1858,6 @@ def _maybe_box_datetimelike(value):
18581858
_values_from_object = lib.values_from_object
18591859

18601860

1861-
def _possibly_convert_objects(values,
1862-
datetime=True,
1863-
numeric=True,
1864-
timedelta=True,
1865-
coerce=False,
1866-
copy=True):
1867-
""" if we have an object dtype, try to coerce dates and/or numbers """
1868-
1869-
conversion_count = sum((datetime, numeric, timedelta))
1870-
if conversion_count == 0:
1871-
import warnings
1872-
warnings.warn('Must explicitly pass type for conversion. Defaulting to '
1873-
'pre-0.17 behavior where datetime=True, numeric=True, '
1874-
'timedelta=True and coerce=False', DeprecationWarning)
1875-
datetime = numeric = timedelta = True
1876-
coerce = False
1877-
1878-
if isinstance(values, (list, tuple)):
1879-
# List or scalar
1880-
values = np.array(values, dtype=np.object_)
1881-
elif not hasattr(values, 'dtype'):
1882-
values = np.array([values], dtype=np.object_)
1883-
elif not is_object_dtype(values.dtype):
1884-
# If not object, do not attempt conversion
1885-
values = values.copy() if copy else values
1886-
return values
1887-
1888-
# If 1 flag is coerce, ensure 2 others are False
1889-
if coerce:
1890-
if conversion_count > 1:
1891-
raise ValueError("Only one of 'datetime', 'numeric' or "
1892-
"'timedelta' can be True when when coerce=True.")
1893-
1894-
# Immediate return if coerce
1895-
if datetime:
1896-
return pd.to_datetime(values, errors='coerce', box=False)
1897-
elif timedelta:
1898-
return pd.to_timedelta(values, errors='coerce', box=False)
1899-
elif numeric:
1900-
return lib.maybe_convert_numeric(values, set(), coerce_numeric=True)
1901-
1902-
# Soft conversions
1903-
if datetime:
1904-
values = lib.maybe_convert_objects(values,
1905-
convert_datetime=datetime)
1906-
1907-
if timedelta and is_object_dtype(values.dtype):
1908-
# Object check to ensure only run if previous did not convert
1909-
values = lib.maybe_convert_objects(values,
1910-
convert_timedelta=timedelta)
1911-
1912-
if numeric and is_object_dtype(values.dtype):
1913-
try:
1914-
converted = lib.maybe_convert_numeric(values,
1915-
set(),
1916-
coerce_numeric=True)
1917-
# If all NaNs, then do not-alter
1918-
values = converted if not isnull(converted).all() else values
1919-
values = values.copy() if copy else values
1920-
except:
1921-
pass
1922-
1923-
return values
1924-
1925-
19261861
def _possibly_castable(arr):
19271862
# return False to force a non-fastpath
19281863

pandas/core/convert.py

+132
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
"""
2+
Functions for converting object to other types
3+
"""
4+
5+
import numpy as np
6+
7+
import pandas as pd
8+
from pandas.core.common import (_possibly_cast_to_datetime, is_object_dtype,
9+
isnull)
10+
import pandas.lib as lib
11+
12+
# TODO: Remove in 0.18 or 2017, which ever is sooner
13+
def _possibly_convert_objects(values, convert_dates=True,
14+
convert_numeric=True,
15+
convert_timedeltas=True,
16+
copy=True):
17+
""" if we have an object dtype, try to coerce dates and/or numbers """
18+
19+
# if we have passed in a list or scalar
20+
if isinstance(values, (list, tuple)):
21+
values = np.array(values, dtype=np.object_)
22+
if not hasattr(values, 'dtype'):
23+
values = np.array([values], dtype=np.object_)
24+
25+
# convert dates
26+
if convert_dates and values.dtype == np.object_:
27+
28+
# we take an aggressive stance and convert to datetime64[ns]
29+
if convert_dates == 'coerce':
30+
new_values = _possibly_cast_to_datetime(
31+
values, 'M8[ns]', errors='coerce')
32+
33+
# if we are all nans then leave me alone
34+
if not isnull(new_values).all():
35+
values = new_values
36+
37+
else:
38+
values = lib.maybe_convert_objects(
39+
values, convert_datetime=convert_dates)
40+
41+
# convert timedeltas
42+
if convert_timedeltas and values.dtype == np.object_:
43+
44+
if convert_timedeltas == 'coerce':
45+
from pandas.tseries.timedeltas import to_timedelta
46+
new_values = to_timedelta(values, coerce=True)
47+
48+
# if we are all nans then leave me alone
49+
if not isnull(new_values).all():
50+
values = new_values
51+
52+
else:
53+
values = lib.maybe_convert_objects(
54+
values, convert_timedelta=convert_timedeltas)
55+
56+
# convert to numeric
57+
if values.dtype == np.object_:
58+
if convert_numeric:
59+
try:
60+
new_values = lib.maybe_convert_numeric(
61+
values, set(), coerce_numeric=True)
62+
63+
# if we are all nans then leave me alone
64+
if not isnull(new_values).all():
65+
values = new_values
66+
67+
except:
68+
pass
69+
else:
70+
# soft-conversion
71+
values = lib.maybe_convert_objects(values)
72+
73+
values = values.copy() if copy else values
74+
75+
return values
76+
77+
78+
def _soft_convert_objects(values, datetime=True, numeric=True, timedelta=True,
79+
coerce=False, copy=True):
80+
""" if we have an object dtype, try to coerce dates and/or numbers """
81+
82+
conversion_count = sum((datetime, numeric, timedelta))
83+
if conversion_count == 0:
84+
raise ValueError('At least one of datetime, numeric or timedelta must '
85+
'be True.')
86+
elif conversion_count > 1 and coerce:
87+
raise ValueError("Only one of 'datetime', 'numeric' or "
88+
"'timedelta' can be True when when coerce=True.")
89+
90+
91+
if isinstance(values, (list, tuple)):
92+
# List or scalar
93+
values = np.array(values, dtype=np.object_)
94+
elif not hasattr(values, 'dtype'):
95+
values = np.array([values], dtype=np.object_)
96+
elif not is_object_dtype(values.dtype):
97+
# If not object, do not attempt conversion
98+
values = values.copy() if copy else values
99+
return values
100+
101+
# If 1 flag is coerce, ensure 2 others are False
102+
if coerce:
103+
# Immediate return if coerce
104+
if datetime:
105+
return pd.to_datetime(values, errors='coerce', box=False)
106+
elif timedelta:
107+
return pd.to_timedelta(values, errors='coerce', box=False)
108+
elif numeric:
109+
return pd.to_numeric(values, errors='coerce')
110+
111+
# Soft conversions
112+
if datetime:
113+
values = lib.maybe_convert_objects(values,
114+
convert_datetime=datetime)
115+
116+
if timedelta and is_object_dtype(values.dtype):
117+
# Object check to ensure only run if previous did not convert
118+
values = lib.maybe_convert_objects(values,
119+
convert_timedelta=timedelta)
120+
121+
if numeric and is_object_dtype(values.dtype):
122+
try:
123+
converted = lib.maybe_convert_numeric(values,
124+
set(),
125+
coerce_numeric=True)
126+
# If all NaNs, then do not-alter
127+
values = converted if not isnull(converted).all() else values
128+
values = values.copy() if copy else values
129+
except:
130+
pass
131+
132+
return values

pandas/core/frame.py

+4-7
Original file line numberDiff line numberDiff line change
@@ -3543,9 +3543,8 @@ def combine(self, other, func, fill_value=None, overwrite=True):
35433543
# convert_objects just in case
35443544
return self._constructor(result,
35453545
index=new_index,
3546-
columns=new_columns).convert_objects(
3547-
datetime=True,
3548-
copy=False)
3546+
columns=new_columns)._convert(datetime=True,
3547+
copy=False)
35493548

35503549
def combine_first(self, other):
35513550
"""
@@ -4026,9 +4025,7 @@ def _apply_standard(self, func, axis, ignore_failures=False, reduce=True):
40264025

40274026
if axis == 1:
40284027
result = result.T
4029-
result = result.convert_objects(datetime=True,
4030-
timedelta=True,
4031-
copy=False)
4028+
result = result._convert(datetime=True, timedelta=True, copy=False)
40324029

40334030
else:
40344031

@@ -4158,7 +4155,7 @@ def append(self, other, ignore_index=False, verify_integrity=False):
41584155
other = DataFrame(other.values.reshape((1, len(other))),
41594156
index=index,
41604157
columns=combined_columns)
4161-
other = other.convert_objects(datetime=True, timedelta=True)
4158+
other = other._convert(datetime=True, timedelta=True)
41624159

41634160
if not self.columns.equals(combined_columns):
41644161
self = self.reindex(columns=combined_columns)

0 commit comments

Comments
 (0)