Skip to content

Commit 7e18d1a

Browse files
Sheppard, KevinSheppard, Kevin
Sheppard, Kevin
authored and
Sheppard, Kevin
committed
REF: Move conversion functions to core/convert.py
Move conversion function to core/convert.py Restore docs for convert_objects Add bug fix notes for restores convert_objects Ensure copy is passed to _possibly_convert_objects when needed
1 parent aba927f commit 7e18d1a

File tree

6 files changed

+154
-218
lines changed

6 files changed

+154
-218
lines changed

doc/source/basics.rst

+9-24
Original file line numberDiff line numberDiff line change
@@ -1711,36 +1711,26 @@ then the more *general* one will be used as the result of the operation.
17111711
object conversion
17121712
~~~~~~~~~~~~~~~~~
17131713

1714-
.. note::
1715-
1716-
The syntax of :meth:`~DataFrame.convert_objects` changed in 0.17.0. See
1717-
:ref:`API changes <whatsnew_0170.api_breaking.convert_objects>`
1718-
for more details.
1719-
1720-
:meth:`~DataFrame.convert_objects` is a method that converts columns from
1721-
the ``object`` dtype to datetimes, timedeltas or floats. For example, to
1722-
attempt conversion of object data that are *number like*, e.g. could be a
1723-
string that represents a number, pass ``numeric=True``. By default, this will
1724-
attempt a soft conversion and so will only succeed if the entire column is
1725-
convertible. To force the conversion, add the keyword argument ``coerce=True``.
1726-
This will force strings and number-like objects to be numbers if
1727-
possible, and other values will be set to ``np.nan``.
1714+
:meth:`~DataFrame.convert_objects` is a method to try to force conversion of types from the ``object`` dtype to other types.
1715+
To force conversion of specific types that are *number like*, e.g. could be a string that represents a number,
1716+
pass ``convert_numeric=True``. This will force strings and numbers alike to be numbers if possible, otherwise
1717+
they will be set to ``np.nan``.
17281718

17291719
.. ipython:: python
17301720
17311721
df3['D'] = '1.'
17321722
df3['E'] = '1'
1733-
df3.convert_objects(numeric=True).dtypes
1723+
df3.convert_objects(convert_numeric=True).dtypes
17341724
17351725
# same, but specific dtype conversion
17361726
df3['D'] = df3['D'].astype('float16')
17371727
df3['E'] = df3['E'].astype('int32')
17381728
df3.dtypes
17391729
1740-
To force conversion to ``datetime64[ns]``, pass ``datetime=True`` and ``coerce=True``.
1730+
To force conversion to ``datetime64[ns]``, pass ``convert_dates='coerce'``.
17411731
This will convert any datetime-like object to dates, forcing other values to ``NaT``.
17421732
This might be useful if you are reading in data which is mostly dates,
1743-
but occasionally contains non-dates that you wish to represent as missing.
1733+
but occasionally has non-dates intermixed and you want to represent as missing.
17441734

17451735
.. ipython:: python
17461736
@@ -1749,15 +1739,10 @@ but occasionally contains non-dates that you wish to represent as missing.
17491739
'foo', 1.0, 1, pd.Timestamp('20010104'),
17501740
'20010105'], dtype='O')
17511741
s
1752-
s.convert_objects(datetime=True, coerce=True)
1742+
s.convert_objects(convert_dates='coerce')
17531743
1754-
Without passing ``coerce=True``, :meth:`~DataFrame.convert_objects` will attempt
1755-
*soft* conversion of any *object* dtypes, meaning that if all
1744+
In addition, :meth:`~DataFrame.convert_objects` will attempt the *soft* conversion of any *object* dtypes, meaning that if all
17561745
the objects in a Series are of the same type, the Series will have that dtype.
1757-
Note that setting ``coerce=True`` does not *convert* arbitrary types to either
1758-
``datetime64[ns]`` or ``timedelta64[ns]``. For example, a series containing string
1759-
dates will not be converted to a series of datetimes. To convert between types,
1760-
see :ref:`converting to timestamps <timeseries.converting>`.
17611746

17621747
gotchas
17631748
~~~~~~~

doc/source/whatsnew/v0.17.0.txt

+3-65
Original file line numberDiff line numberDiff line change
@@ -640,71 +640,6 @@ New Behavior:
640640
Timestamp.now()
641641
Timestamp.now() + offsets.DateOffset(years=1)
642642

643-
.. _whatsnew_0170.api_breaking.convert_objects:
644-
645-
Changes to convert_objects
646-
^^^^^^^^^^^^^^^^^^^^^^^^^^
647-
648-
``DataFrame.convert_objects`` keyword arguments have been shortened. (:issue:`10265`)
649-
650-
===================== =============
651-
Previous Replacement
652-
===================== =============
653-
``convert_dates`` ``datetime``
654-
``convert_numeric`` ``numeric``
655-
``convert_timedelta`` ``timedelta``
656-
===================== =============
657-
658-
Coercing types with ``DataFrame.convert_objects`` is now implemented using the
659-
keyword argument ``coerce=True``. Previously types were coerced by setting a
660-
keyword argument to ``'coerce'`` instead of ``True``, as in ``convert_dates='coerce'``.
661-
662-
.. ipython:: python
663-
664-
df = pd.DataFrame({'i': ['1','2'],
665-
'f': ['apple', '4.2'],
666-
's': ['apple','banana']})
667-
df
668-
669-
The old usage of ``DataFrame.convert_objects`` used ``'coerce'`` along with the
670-
type.
671-
672-
.. code-block:: python
673-
674-
In [2]: df.convert_objects(convert_numeric='coerce')
675-
676-
Now the ``coerce`` keyword must be explicitly used.
677-
678-
.. ipython:: python
679-
680-
df.convert_objects(numeric=True, coerce=True)
681-
682-
In earlier versions of pandas, ``DataFrame.convert_objects`` would not coerce
683-
numeric types when there were no values convertible to a numeric type. This returns
684-
the original DataFrame with no conversion.
685-
686-
.. code-block:: python
687-
688-
In [1]: df = pd.DataFrame({'s': ['a','b']})
689-
In [2]: df.convert_objects(convert_numeric='coerce')
690-
Out[2]:
691-
s
692-
0 a
693-
1 b
694-
695-
The new behavior will convert all non-number-like strings to ``NaN``,
696-
when ``coerce=True`` is passed explicity.
697-
698-
.. ipython:: python
699-
700-
pd.DataFrame({'s': ['a','b']})
701-
df.convert_objects(numeric=True, coerce=True)
702-
703-
In earlier versions of pandas, the default behavior was to try and convert
704-
datetimes and timestamps. The new default is for ``DataFrame.convert_objects``
705-
to do nothing, and so it is necessary to pass at least one conversion target
706-
in the method call.
707-
708643
Changes to Index Comparisons
709644
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
710645

@@ -992,6 +927,7 @@ Deprecations
992927
- ``Series.is_time_series`` deprecated in favor of ``Series.index.is_all_dates`` (:issue:`11135`)
993928
- Legacy offsets (like ``'A@JAN'``) listed in :ref:`here <timeseries.legacyaliases>` are deprecated (note that this has been alias since 0.8.0), (:issue:`10878`)
994929
- ``WidePanel`` deprecated in favor of ``Panel``, ``LongPanel`` in favor of ``DataFrame`` (note these have been aliases since < 0.11.0), (:issue:`10892`)
930+
- ``DataFrame.convert_objects`` has been deprecated in favor of type-specific function ``pd.to_datetime``, ``pd.to_timestamp`` and ``pd.to_numeric``.
995931

996932
.. _whatsnew_0170.prior_deprecations:
997933

@@ -1187,3 +1123,5 @@ Bug Fixes
11871123
- Bug in ``DataFrame`` construction from nested ``dict`` with ``timedelta`` keys (:issue:`11129`)
11881124
- Bug in ``.fillna`` against may raise ``TypeError`` when data contains datetime dtype (:issue:`7095`, :issue:`11153`)
11891125
- Bug in ``.groupby`` when number of keys to group by is same as length of index (:issue:`11185`)
1126+
- Bug in ``convert_objects`` where converted values might not be returned if all null and ``coerce`` (:issue:`9589`)
1127+
- Bug in ``convert_objects`` where ``copy`` keyword was not respected (:issue:`9589`)

pandas/core/common.py

-122
Original file line numberDiff line numberDiff line change
@@ -1857,128 +1857,6 @@ def _maybe_box_datetimelike(value):
18571857

18581858
_values_from_object = lib.values_from_object
18591859

1860-
# TODO: Remove in 0.18 or 2017, which ever is sooner
1861-
def _possibly_convert_objects(values, convert_dates=True,
1862-
convert_numeric=True,
1863-
convert_timedeltas=True,
1864-
copy=True):
1865-
""" if we have an object dtype, try to coerce dates and/or numbers """
1866-
1867-
# if we have passed in a list or scalar
1868-
if isinstance(values, (list, tuple)):
1869-
values = np.array(values, dtype=np.object_)
1870-
if not hasattr(values, 'dtype'):
1871-
values = np.array([values], dtype=np.object_)
1872-
1873-
# convert dates
1874-
if convert_dates and values.dtype == np.object_:
1875-
1876-
# we take an aggressive stance and convert to datetime64[ns]
1877-
if convert_dates == 'coerce':
1878-
new_values = _possibly_cast_to_datetime(
1879-
values, 'M8[ns]', errors='coerce')
1880-
1881-
# if we are all nans then leave me alone
1882-
if not isnull(new_values).all():
1883-
values = new_values
1884-
1885-
else:
1886-
values = lib.maybe_convert_objects(
1887-
values, convert_datetime=convert_dates)
1888-
1889-
# convert timedeltas
1890-
if convert_timedeltas and values.dtype == np.object_:
1891-
1892-
if convert_timedeltas == 'coerce':
1893-
from pandas.tseries.timedeltas import to_timedelta
1894-
new_values = to_timedelta(values, coerce=True)
1895-
1896-
# if we are all nans then leave me alone
1897-
if not isnull(new_values).all():
1898-
values = new_values
1899-
1900-
else:
1901-
values = lib.maybe_convert_objects(
1902-
values, convert_timedelta=convert_timedeltas)
1903-
1904-
# convert to numeric
1905-
if values.dtype == np.object_:
1906-
if convert_numeric:
1907-
try:
1908-
new_values = lib.maybe_convert_numeric(
1909-
values, set(), coerce_numeric=True)
1910-
1911-
# if we are all nans then leave me alone
1912-
if not isnull(new_values).all():
1913-
values = new_values
1914-
1915-
except:
1916-
pass
1917-
else:
1918-
# soft-conversion
1919-
values = lib.maybe_convert_objects(values)
1920-
1921-
values = values.copy() if copy else values
1922-
1923-
return values
1924-
1925-
1926-
def _soft_convert_objects(values, datetime=True, numeric=True, timedelta=True,
1927-
coerce=False, copy=True):
1928-
""" if we have an object dtype, try to coerce dates and/or numbers """
1929-
1930-
conversion_count = sum((datetime, numeric, timedelta))
1931-
if conversion_count == 0:
1932-
raise ValueError('At least one of datetime, numeric or timedelta must '
1933-
'be True.')
1934-
elif conversion_count > 1 and coerce:
1935-
raise ValueError("Only one of 'datetime', 'numeric' or "
1936-
"'timedelta' can be True when when coerce=True.")
1937-
1938-
1939-
if isinstance(values, (list, tuple)):
1940-
# List or scalar
1941-
values = np.array(values, dtype=np.object_)
1942-
elif not hasattr(values, 'dtype'):
1943-
values = np.array([values], dtype=np.object_)
1944-
elif not is_object_dtype(values.dtype):
1945-
# If not object, do not attempt conversion
1946-
values = values.copy() if copy else values
1947-
return values
1948-
1949-
# If 1 flag is coerce, ensure 2 others are False
1950-
if coerce:
1951-
# Immediate return if coerce
1952-
if datetime:
1953-
return pd.to_datetime(values, errors='coerce', box=False)
1954-
elif timedelta:
1955-
return pd.to_timedelta(values, errors='coerce', box=False)
1956-
elif numeric:
1957-
return lib.maybe_convert_numeric(values, set(), coerce_numeric=True)
1958-
1959-
# Soft conversions
1960-
if datetime:
1961-
values = lib.maybe_convert_objects(values,
1962-
convert_datetime=datetime)
1963-
1964-
if timedelta and is_object_dtype(values.dtype):
1965-
# Object check to ensure only run if previous did not convert
1966-
values = lib.maybe_convert_objects(values,
1967-
convert_timedelta=timedelta)
1968-
1969-
if numeric and is_object_dtype(values.dtype):
1970-
try:
1971-
converted = lib.maybe_convert_numeric(values,
1972-
set(),
1973-
coerce_numeric=True)
1974-
# If all NaNs, then do not-alter
1975-
values = converted if not isnull(converted).all() else values
1976-
values = values.copy() if copy else values
1977-
except:
1978-
pass
1979-
1980-
return values
1981-
19821860

19831861
def _possibly_castable(arr):
19841862
# return False to force a non-fastpath

0 commit comments

Comments
 (0)