Skip to content

unexpected behavior when using infer_object function #22212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rora002 opened this issue Aug 6, 2018 · 4 comments
Open

unexpected behavior when using infer_object function #22212

rora002 opened this issue Aug 6, 2018 · 4 comments
Labels

Comments

@rora002
Copy link

rora002 commented Aug 6, 2018

Code Sample

In the code sample below: column "A" consists entirely of numbers formatted as strings.

df = pd.DataFrame({"A": ["1","2","3"]})
df.convert_objects(convert_numeric=True).dtypes
df.infer_objects().dtypes

Problem description

At present, I am using the convert_objects function to convert any columns which are entirely made up of numbers formatted as strings, to numeric values if possible. I note that the convert_objects function is deprecated, so I attempted to update my code to use infer_objects instead.

However, the infer_objects function appears to work differently, and will only convert a column to a numeric type if all rows in a particular column are numbers, but the series was previously configured in the dataframe (as shown in the example)

I understand the conversion of columns consisting entirely of string formatted numbers to numeric types may not be desirable for the default behavior, however it would be handy to give an argument which allows either behavior.

Alternatively, one must loop through each column and attempt conversion using the to_numeric function.

Expected Output

# output from df.convert_objects(convert_numeric=True).dtypes
A    int64
dtype: object

# output from df.infer_objects().dtypes
A    object
dtype: object

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en
LOCALE: None.None

pandas: 0.23.1
pytest: 3.2.1
pip: 18.0
setuptools: 36.5.0.post20170921
Cython: 0.26.1
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.2.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.999999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@rora002 rora002 changed the title infer_objects functions differently to convert_objects unexpected behavior when using infer_object function Aug 6, 2018
@rora002
Copy link
Author

rora002 commented Aug 6, 2018

I see that numeric coersion was specifically disabled as part of #16915 and 3670711; what is the reasoning behind coercing dates and time_deltas but not numers? It would be handy to have a to_numeric function that operates across the whole dataframe.

@gfyoung gfyoung added the Numeric Operations Arithmetic, Comparison, and Logical operations label Aug 6, 2018
@gfyoung
Copy link
Member

gfyoung commented Aug 6, 2018

@rora002 : IIUC, you're proposing to have pd.to_numeric expand to DataFrame ?

@rora002
Copy link
Author

rora002 commented Aug 6, 2018

@gfyoung : that's right, or something to that effect. In my mind numbers can be inferred from columns that exist entirely of string formatted numbers, so it would logically be associated with the infer_objects function, although I understand that this may not be desirable by default.

@Youssefares
Copy link

Any progress/different thoughts on this?

@mroeschke mroeschke added Bug and removed Numeric Operations Arithmetic, Comparison, and Logical operations labels Jun 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants