You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At present, I am using the convert_objects function to convert any columns which are entirely made up of numbers formatted as strings, to numeric values if possible. I note that the convert_objects function is deprecated, so I attempted to update my code to use infer_objects instead.
However, the infer_objects function appears to work differently, and will only convert a column to a numeric type if all rows in a particular column are numbers, but the series was previously configured in the dataframe (as shown in the example)
I understand the conversion of columns consisting entirely of string formatted numbers to numeric types may not be desirable for the default behavior, however it would be handy to give an argument which allows either behavior.
Alternatively, one must loop through each column and attempt conversion using the to_numeric function.
Expected Output
# output from df.convert_objects(convert_numeric=True).dtypesAint64dtype: object# output from df.infer_objects().dtypesAobjectdtype: object
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
I see that numeric coersion was specifically disabled as part of #16915 and 3670711; what is the reasoning behind coercing dates and time_deltas but not numers? It would be handy to have a to_numeric function that operates across the whole dataframe.
@gfyoung : that's right, or something to that effect. In my mind numbers can be inferred from columns that exist entirely of string formatted numbers, so it would logically be associated with the infer_objects function, although I understand that this may not be desirable by default.
Code Sample
In the code sample below: column "A" consists entirely of numbers formatted as strings.
Problem description
At present, I am using the convert_objects function to convert any columns which are entirely made up of numbers formatted as strings, to numeric values if possible. I note that the convert_objects function is deprecated, so I attempted to update my code to use infer_objects instead.
However, the infer_objects function appears to work differently, and will only convert a column to a numeric type if all rows in a particular column are numbers, but the series was previously configured in the dataframe (as shown in the example)
I understand the conversion of columns consisting entirely of string formatted numbers to numeric types may not be desirable for the default behavior, however it would be handy to give an argument which allows either behavior.
Alternatively, one must loop through each column and attempt conversion using the to_numeric function.
Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None
pandas: 0.23.1
pytest: 3.2.1
pip: 18.0
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.2.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: