Skip to content

DEPR: Deprecate convert_float #41176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
May 26, 2021
9 changes: 0 additions & 9 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3648,15 +3648,6 @@ one can pass an :class:`~pandas.io.excel.ExcelWriter`.
df1.to_excel(writer, sheet_name="Sheet1")
df2.to_excel(writer, sheet_name="Sheet2")

.. note::

Wringing a little more performance out of ``read_excel``
Internally, Excel stores all numeric data as floats. Because this can
produce unexpected behavior when reading in data, pandas defaults to trying
to convert integers to floats if it doesn't lose information (``1.0 -->
1``). You can pass ``convert_float=False`` to disable this behavior, which
may give a slight performance improvement.

.. _io.excel_writing_buffer:

Writing Excel files to memory
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -611,6 +611,7 @@ Deprecations
- Deprecated the ``level`` keyword for :class:`DataFrame` and :class:`Series` aggregations; use groupby instead (:issue:`39983`)
- The ``inplace`` parameter of :meth:`Categorical.remove_categories`, :meth:`Categorical.add_categories`, :meth:`Categorical.reorder_categories` is deprecated and will be removed in a future version (:issue:`37643`)
- Deprecated :func:`merge` producing duplicated columns through the ``suffixes`` keyword and already existing columns (:issue:`22818`)
- Deprecated the ``convert_float`` optional argument in :func:`read_excel` and :meth:`ExcelFile.parse` (:issue:`41127`)

.. ---------------------------------------------------------------------------

Expand Down
29 changes: 26 additions & 3 deletions pandas/io/excel/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,10 @@
Convert integral floats to int (i.e., 1.0 --> 1). If False, all numeric
data will be read in as floats: Excel stores all numbers as floats
internally.

.. deprecated:: 1.3.0
convert_float will be removed in a future version

mangle_dupe_cols : bool, default True
Duplicate columns will be specified as 'X', 'X.1', ...'X.N', rather than
'X'...'X'. Passing in False will cause data to be overwritten if there
Expand Down Expand Up @@ -355,7 +359,7 @@ def read_excel(
thousands=None,
comment=None,
skipfooter=0,
convert_float=True,
convert_float=None,
mangle_dupe_cols=True,
storage_options: StorageOptions = None,
):
Expand Down Expand Up @@ -489,11 +493,30 @@ def parse(
thousands=None,
comment=None,
skipfooter=0,
convert_float=True,
convert_float=None,
mangle_dupe_cols=True,
**kwds,
):

if convert_float is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is slightly magical. can we instead have a function that shows the deprecation and just call it where appropriate so can set the stacklevel w/o this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this would get around needing logic to set the stacklevel. Maybe you're asking to move the logic to the new function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback @rhshadrach I've consolidated both stack inspections from _base.py into a single function. It seems like an improvement to me, but let me know if you see other adjustments I should make.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After rereading @jreback's comment, it now makes sense to me. The request is to move the deprecation itself to a function (not the determination of stacklevel), and call this further up in the call stack where we don't need to have any logic to determine the stacklevel. If this is doable, I agree it's a better approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would work: move the warnings.warn to a function, call in both read_excel and ExcelFile.parse when appropriate, but in read_excel after emitting the warning change the value of convert_float. Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might be able to use this: from pandas.util._exceptions import find_stack_level

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice. I looked at pandas.util._exceptions.find_stack_level when I made my last edit, but at the time it seemed specific to "astype" et al. so I tried making something similar. The edits from #41560 two days ago look great. I'll give it a try right away. (I couldn't see a clean way without stack inspection to raise a single warning from both read_excel and ExcelFile.parse. The latter isn't really advertised in the documentation, but it should still raise the correct warnings.)

convert_float = True
else:
caller = inspect.stack()[2]
if (
caller.filename.endswith(
os.path.join("pandas", "io", "excel", "_base.py")
)
and caller.function == "read_excel"
):
stacklevel = 5
else:
stacklevel = 3
warnings.warn(
"convert_float is deprecated and will be removed in a future version",
FutureWarning,
stacklevel=stacklevel,
)

validate_header_arg(header)

ret_dict = False
Expand Down Expand Up @@ -1225,7 +1248,7 @@ def parse(
thousands=None,
comment=None,
skipfooter=0,
convert_float=True,
convert_float=None,
mangle_dupe_cols=True,
**kwds,
):
Expand Down
34 changes: 25 additions & 9 deletions pandas/tests/io/excel/test_readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -433,9 +433,17 @@ def test_reader_special_dtypes(self, request, read_ext):
float_expected = expected.copy()
float_expected["IntCol"] = float_expected["IntCol"].astype(float)
float_expected.loc[float_expected.index[1], "Str2Col"] = 3.0
actual = pd.read_excel(
basename + read_ext, sheet_name="Sheet1", convert_float=False
)
with tm.assert_produces_warning(
FutureWarning,
match="convert_float is deprecated",
raise_on_extra_warnings=False,
):
# raise_on_extra_warnings because xlrd raises a PendingDeprecationWarning
# on database job Linux_py37_IO (ci/deps/actions-37-db.yaml)
# See GH#41176
actual = pd.read_excel(
basename + read_ext, sheet_name="Sheet1", convert_float=False
)
tm.assert_frame_equal(actual, float_expected)

# check setting Index (assuming xls and xlsx are the same here)
Expand All @@ -455,12 +463,20 @@ def test_reader_special_dtypes(self, request, read_ext):

no_convert_float = float_expected.copy()
no_convert_float["StrCol"] = no_convert_float["StrCol"].apply(str)
actual = pd.read_excel(
basename + read_ext,
sheet_name="Sheet1",
convert_float=False,
converters={"StrCol": str},
)
with tm.assert_produces_warning(
FutureWarning,
match="convert_float is deprecated",
raise_on_extra_warnings=False,
):
# raise_on_extra_warnings because xlrd raises a PendingDeprecationWarning
# on database job Linux_py37_IO (ci/deps/actions-37-db.yaml)
# See GH#41176
actual = pd.read_excel(
basename + read_ext,
sheet_name="Sheet1",
convert_float=False,
converters={"StrCol": str},
)
tm.assert_frame_equal(actual, no_convert_float)

# GH8212 - support for converters and missing values
Expand Down
16 changes: 12 additions & 4 deletions pandas/tests/io/excel/test_writers.py
Original file line number Diff line number Diff line change
Expand Up @@ -474,9 +474,12 @@ def test_int_types(self, np_type, path):
float_frame = df.astype(float)
float_frame.columns = float_frame.columns.astype(float)
float_frame.index = float_frame.index.astype(float)
recons = pd.read_excel(
path, sheet_name="test1", convert_float=False, index_col=0
)
with tm.assert_produces_warning(
FutureWarning, match="convert_float is deprecated"
):
recons = pd.read_excel(
path, sheet_name="test1", convert_float=False, index_col=0
)
tm.assert_frame_equal(recons, float_frame)

@pytest.mark.parametrize("np_type", [np.float16, np.float32, np.float64])
Expand Down Expand Up @@ -1293,7 +1296,12 @@ def test_merged_cell_custom_objects(self, merge_cells, path):
)
expected = DataFrame(np.ones((2, 2)), columns=mi)
expected.to_excel(path)
result = pd.read_excel(path, header=[0, 1], index_col=0, convert_float=False)
with tm.assert_produces_warning(
FutureWarning, match="convert_float is deprecated"
):
result = pd.read_excel(
path, header=[0, 1], index_col=0, convert_float=False
)
# need to convert PeriodIndexes to standard Indexes for assert equal
expected.columns = expected.columns.set_levels(
[[str(i) for i in mi.levels[0]], [str(i) for i in mi.levels[1]]],
Expand Down