-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: set __module__
on top-level public objects
#55178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Wow: https://github.com/search?q=%22from+pandas.core.frame+import+DataFrame%22&type=code I agree with the trade off proposed here. I think not exposing the internal locations in such a visible way to users is more valuable than aiding developers in navigating the pandas codebase in this fashion. |
FWIW, for |
@jorisvandenbossche can you clarify the benefits for the top level functions. I'm able to explain what the user will see differently when we set the module for the objects, but not clear when it comes to the functions. |
eg. for a function
after setting
|
That's a good question. Based on what you show above, it might be good to just focus on the classes. However, it might depend on the console you are using. Or how you are looking at it. It's strange that for
So here the actual repr this shows it as Now, it's certainly less important for functions (and also less common to look at them that way), but I think there is still some value is hiding the full path in the output above. |
No problem. some functions have been done too. |
This is IPython logic. |
the discussion above lists the top level objects exposed in the In addition, following @rhshadrach comment #60268 (comment), the |
Currently the repr of the DataFrame class (and any other class or method in the main namespace) shows the "full code path" of where the object is actually defined:
while we could also make it show the code path of how it is publicly exposed (and expected to be imported and used):
The above can be achieved by setting the
__module__
attribute on the classes and methods. In numpy they already do this for several years, and so the repr of top-level functions or objects shows "numpy.<..>", and not things like "numpy.core.multiarray..". The main PR in numpy that implemented this: numpy/numpy#12382I think the main benefits are:
from pandas.core.frame import DataFrame
, a pattern that we often see in downstream packages). I think this would also help for makingpandas.core
private (and potentially renaming it, xref DEPR: pandas/core #27522, cc @rhshadrach)The main disadvantage is that we thus mask where an object lives, which makes it harder for contributors to figure that out. On the draft PR, @jbrockmendel also commented:
This does not change any
*
imports (it only changes the visual repr), but that aside, it certainly hides a bit more where something is defined, making it harder to find the location (the file) in the source code. But this masking is the purpose of the proposal, with the idea that this is better for users (see bullet points above). I certainly comes with the drawback for contributors, but in making the trade-off, there are much more users, so I would personally go with prioritizing that use case (and for contributors, there are still many other ways to find where something is defined: looking at our imports in the codebase, searching for "class DataFrame", ...).Overview of objects:
__module__
onSeries
#60263read_..
functionsconcat
,isna
,merge
, etcdate_range
,timedelta_range
etcNamedAgg
,IndexSlice
The text was updated successfully, but these errors were encountered: