Skip to content

ENH: set __module__ for objects in pandas pd.DataFrame API #55171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 23, 2024
2 changes: 1 addition & 1 deletion doc/source/development/contributing_docstring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -940,7 +940,7 @@ Finally, docstrings can also be appended to with the ``doc`` decorator.

In this example, we'll create a parent docstring normally (this is like
``pandas.core.generic.NDFrame``). Then we'll have two children (like
``pandas.core.series.Series`` and ``pandas.core.frame.DataFrame``). We'll
``pandas.core.series.Series`` and ``pandas.DataFrame``). We'll
substitute the class names in this docstring.

.. code-block:: python
Expand Down
2 changes: 1 addition & 1 deletion doc/source/user_guide/enhancingperf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -453,7 +453,7 @@ by evaluate arithmetic and boolean expression all at once for large :class:`~pan
:func:`~pandas.eval` is many orders of magnitude slower for
smaller expressions or objects than plain Python. A good rule of thumb is
to only use :func:`~pandas.eval` when you have a
:class:`.DataFrame` with more than 10,000 rows.
:class:`~pandas.core.frame.DataFrame` with more than 10,000 rows.

Supported syntax
~~~~~~~~~~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6400,7 +6400,7 @@ ignored.
In [2]: df = pd.DataFrame({'A': np.random.randn(sz), 'B': [1] * sz})

In [3]: df.info()
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 2 columns):
A 1000000 non-null float64
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -840,7 +840,7 @@ then all the columns are dummy-encoded, and a :class:`SparseDataFrame` was retur
In [2]: df = pd.DataFrame({"A": [1, 2], "B": ['a', 'b'], "C": ['a', 'a']})

In [3]: type(pd.get_dummies(df, sparse=True))
Out[3]: pandas.core.frame.DataFrame
Out[3]: pandas.DataFrame

In [4]: type(pd.get_dummies(df[['B', 'C']], sparse=True))
Out[4]: pandas.core.sparse.frame.SparseDataFrame
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -414,7 +414,7 @@ Extended verbose info output for :class:`~pandas.DataFrame`
... "text_col": ["a", "b", "c"],
... "float_col": [0.0, 0.1, 0.2]})
In [2]: df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
int_col 3 non-null int64
Expand Down
2 changes: 1 addition & 1 deletion pandas/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ def ignore_doctest_warning(item: pytest.Item, path: str, message: str) -> None:
item : pytest.Item
pytest test item.
path : str
Module path to Python object, e.g. "pandas.core.frame.DataFrame.append". A
Module path to Python object, e.g. "pandas.DataFrame.append". A
warning will be filtered when item.name ends with in given path. So it is
sufficient to specify e.g. "DataFrame.append".
message : str
Expand Down
2 changes: 2 additions & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@
Appender,
Substitution,
doc,
set_module,
)
from pandas.util._exceptions import (
find_stack_level,
Expand Down Expand Up @@ -498,6 +499,7 @@
# DataFrame class


@set_module("pandas")
class DataFrame(NDFrame, OpsMixin):
"""
Two-dimensional, size-mutable, potentially heterogeneous tabular data.
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ def iloc(self) -> _iLocIndexer:
a b c d
0 1 2 3 4
>>> type(df.iloc[[0]])
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.DataFrame'>

>>> df.iloc[[0, 1]]
a b c d
Expand Down
2 changes: 1 addition & 1 deletion pandas/io/formats/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -855,7 +855,7 @@ class DataFrameRenderer:
- to_csv
- to_latex

Called in pandas.core.frame.DataFrame:
Called in pandas.DataFrame:
- to_html
- to_string

Expand Down
8 changes: 4 additions & 4 deletions pandas/io/formats/info.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@
Prints information of all columns:

>>> df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
# Column Non-Null Count Dtype
Expand All @@ -87,7 +87,7 @@
information:

>>> df.info(verbose=False)
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Columns: 3 entries, int_col to float_col
dtypes: float64(1), int64(1), object(1)
Expand Down Expand Up @@ -115,7 +115,7 @@
... 'column_3': np.random.choice(['a', 'b', 'c'], 10 ** 6)
... })
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
# Column Non-Null Count Dtype
Expand All @@ -127,7 +127,7 @@
memory usage: 22.9+ MB

>>> df.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 3 columns):
# Column Non-Null Count Dtype
Expand Down
4 changes: 4 additions & 0 deletions pandas/tests/api/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -401,3 +401,7 @@ def test_pandas_array_alias():
res = pd.arrays.PandasArray

assert res is pd.arrays.NumpyExtensionArray


def test_set_module():
assert pd.DataFrame.__module__ == "pandas"
6 changes: 3 additions & 3 deletions pandas/tests/frame/methods/test_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def test_info_empty():
result = buf.getvalue()
expected = textwrap.dedent(
"""\
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.DataFrame'>
RangeIndex: 0 entries
Empty DataFrame\n"""
)
Expand Down Expand Up @@ -208,7 +208,7 @@ def test_info_memory():
bytes = float(df.memory_usage().sum())
expected = textwrap.dedent(
f"""\
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 1 columns):
# Column Non-Null Count Dtype
Expand Down Expand Up @@ -501,7 +501,7 @@ def test_info_int_columns():
result = buf.getvalue()
expected = textwrap.dedent(
"""\
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.DataFrame'>
Index: 2 entries, A to B
Data columns (total 2 columns):
# Column Non-Null Count Dtype
Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/groupby/test_grouping.py
Original file line number Diff line number Diff line change
Expand Up @@ -509,7 +509,7 @@ def test_groupby_with_datetime_key(self):
assert len(gb.groups.keys()) == 4

def test_grouping_error_on_multidim_input(self, df):
msg = "Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional"
msg = "Grouper for '<class 'pandas.DataFrame'>' not 1-dimensional"
with pytest.raises(ValueError, match=msg):
Grouping(df.index, df[["A", "A"]])

Expand Down
21 changes: 21 additions & 0 deletions pandas/util/_decorators.py
Original file line number Diff line number Diff line change
Expand Up @@ -503,3 +503,24 @@ def indent(text: str | None, indents: int = 1) -> str:
"future_version_msg",
"Substitution",
]


def set_module(module):
Copy link
Member

@rhshadrach rhshadrach Nov 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just came across this today, I do not see the benefit of having this decorator. We either have:

class DataFrame:
    __module__ = "pandas"

or

@set_module("pandas")
class DataFrame:
    ...

There is a cost, particularly for reading code you have to chase down the definition. I'm also seeing 6.5% longer runtime to instantiate an empty class with this decorator, which is a small impact on import time for pandas.

These are certainly small costs, so if there is a gain to be had then great. But I'm just not seeing what the gain is.

Copy link
Member

@jorisvandenbossche jorisvandenbossche Nov 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't think I had thought about that option for classes (setting it afterwards like DataFrame.__module__ = "pandas" would move it further away, but is also an option), but we just copied the approach used in numpy.

If it has overhead, it's certainly another reason to do it the other way (and agree that too many decorators make the code harder too read)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that as the decorator is untyped, we also loose type definitions when applied to functions but I can add this to the todo list (along with a whats new) in the issue itself.

And also investigate the testing so that no public api classes/functions get missed.

Rather than holding up the merging of the open PRs, this discussion should probably be moved to the open issue also.

I'll go ahead and get the open ones merged so that I can update the issue with what's left to close out the issue.

The task was more about checking whether any other changes were needed and it was only the Series one that needed other changes. So if we want to not use the decorator this could be an easy followup rather than a blocker.

"""Private decorator for overriding __module__ on a function or class.

Example usage::

@set_module("pandas")
def example():
pass


assert example.__module__ == "pandas"
"""

def decorator(func):
if module is not None:
func.__module__ = module
return func

return decorator