Skip to content

DOC: move info docs to DataFrameInfo #38062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Nov 29, 2021
Merged

Conversation

ivanovmg
Copy link
Member

@ivanovmg ivanovmg commented Nov 25, 2020

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Precursor for #37320
Move info docstring from pandas/core/frame.py to class DataFrameInfo using doc decorator.

@simonjayhawkins need your help here as discussed previously.

@jreback
Copy link
Contributor

jreback commented Jan 1, 2021

@ivanovmg can you merge master

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Feb 10, 2021
@jreback
Copy link
Contributor

jreback commented Feb 11, 2021

@ivanovmg if you can merge master

@ivanovmg
Copy link
Member Author

Merged.
@simonjayhawkins I need help on creating a shared docstring for DataFrame.info() and Series.info().
Currently this implementation fails as apparently I do not use doc properly.
Can you please help?

@mroeschke mroeschke mentioned this pull request Mar 10, 2021
5 tasks
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logs are expired and I can't see them. If the comments don't fix the issue you mention, I'll have a look after the CI reruns.

)


frame_subs = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you do this? This could be simply specified in the decorator, doesn't seem to add value.

So, instead of:

    @doc(BaseInfo.render, **frame_subs)

You can use:

    @doc(BaseInfo.render, klass='DataFrame', type_sub='and colmns', ...)

@@ -174,26 +302,26 @@ def render(
show_counts: Optional[bool],
) -> None:
"""
Print a concise summary of a %(klass)s.
Print a concise summary of a {klass}.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have variables in this docstring, this render should also have the @doc decorator, with the values of the variables here, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comment!
I tried that, but the problem is that in future I would like to share the docstring with Series.info().
If we put @doc with params pertaining to dataframes onto BaseInfo.render, then how would we handle the same for series?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @doc decorator does some magic. Basically it renders the docstring (template) with the provided variables, and it stores it as the __doc__, so that's the docstring of the function/class. But also, it saves the original docstring before rendering the variables in something like __doc_original__ (forgot the actual name), so if you reuse that docstring somewhere else with different variables, it'll work as "expected".

@ivanovmg ivanovmg marked this pull request as ready for review June 3, 2021 11:25
@ivanovmg
Copy link
Member Author

ivanovmg commented Jun 3, 2021

@datapythonista can you please review the workaround I have just commited?

@ivanovmg ivanovmg requested a review from datapythonista June 3, 2021 11:27
@datapythonista
Copy link
Member

My understanding here is that we have a docstring in an object, and we want to reuse in another object too. With some small changes. This is done by:

@doc(name='render1')
def render1():
    """
    Docstring or {name}.
    """

@doc(render1, name='render2')
def render2():
    pass

With Appender and Supstitution, it made more sense to have docstrings in variables. here you write it normally in one object, and then pass that object name for doc for the other object to know where to get the base docstring for.

Then, if it makes things simpler, change what's being reused. In many cases it's preferrable to duplicate a bit of the docstring, or make a part of it not so generic, that have huge blocks of codes in the variables.

The important part is that once you finished the changes, you run validate_docstrings.py for your objects, check that things look good, that the variables are rendering well and with the values corresponding to the right object. And ideally, you can also build the html for the docs locally, and check that things look good.

@ivanovmg
Copy link
Member Author

ivanovmg commented Jun 4, 2021

My original intention was to do the following.

from abc import ABC, abstractmethod

from pandas.util._decorators import doc


class BaseInfo(ABC):
    @doc(klass='Base')
    @abstractmethod
    def render(self):
        """Info for {klass}"""


class DataFrameInfo(BaseInfo):
    @doc(BaseInfo.render, klass='DataFrame')
    def render(self):
        pass


class SeriesInfo(BaseInfo):
    @doc(BaseInfo.render, klass='Series')
    def render(self):
        pass


class DataFrame:
    @doc(DataFrameInfo.render)
    def info(self):
        return DataFrameInfo().render()


if __name__ == '__main__':
    print(f'BaseInfo.render(): {BaseInfo.render.__doc__}')
    print(f'DataFrameInfo.render(): {DataFrameInfo.render.__doc__}')
    print(f'SeriesInfo.render(): {SeriesInfo.render.__doc__}')
    print(f'DataFrame.info(): {DataFrame.info.__doc__}')

However, the problem is that when executing this script we get the following error:

Traceback (most recent call last):
  File "pandas\temp_doc.py", line 26, in <module>
    class DataFrame:
  File "pandas\temp_doc.py", line 28, in DataFrame
    def info(self):
  File "c:\users\mivanov4\code\pandas\pandas\util\_decorators.py", line 387, in decorator
    decorated.__doc__ = "".join(
  File "c:\users\mivanov4\code\pandas\pandas\util\_decorators.py", line 388, in <genexpr>
    component.format(**params)
KeyError: 'klass'

I guess the problem is that doc decorator does not support such multiple redirections.

Therefore, I extracted a constant INFO_DOCSTRING and formatted it.

======================
In comparison:

from abc import ABC, abstractmethod

from pandas.util._decorators import doc

INFO_DOCSTRING = """Info for {klass} in the constant"""

class BaseInfo(ABC):
    @doc(klass='Base')
    @abstractmethod
    def render(self):
        """Info for {klass} in the base class"""


class DataFrameInfo(BaseInfo):
    @doc(BaseInfo.render, klass='DataFrame')
    def render(self):
        pass


class SeriesInfo(BaseInfo):
    @doc(BaseInfo.render, klass='Series')
    def render(self):
        pass


class DataFrame:
    @doc(INFO_DOCSTRING, klass='DataFrame')
    def info(self):
        return DataFrameInfo().render()


if __name__ == '__main__':
    print(f'BaseInfo.render(): {BaseInfo.render.__doc__}')
    print(f'DataFrameInfo.render(): {DataFrameInfo.render.__doc__}')
    print(f'SeriesInfo.render(): {SeriesInfo.render.__doc__}')
    print(f'DataFrame.info(): {DataFrame.info.__doc__}')

Gives output:

BaseInfo.render(): Info for Base in the base class
DataFrameInfo.render(): Info for DataFrame in the base class
SeriesInfo.render(): Info for Series in the base class
DataFrame.info(): Info for DataFrame in the constant

So, docstrings for DataFrameInfo and SeriesInfo work fine, as expected.
But additional redirection from DataFrame.info() was not possible due to originally empty __doc__ for DataFrameInfo and SeriesInfo, I suppose.

Linking previous comment on the docstrings: #37320 (review) (the reason the docstrings are to be generic for both DataFrame and Series and done via doc decorator).

@datapythonista
Copy link
Member

This is cumbersome, better to avoid even if it was supported. If it's the same docstring make all them be based in the same base one. Also, are all these public? If they are not rendered in the docs, we probably don't care much on having tge docstring.

@ivanovmg
Copy link
Member Author

ivanovmg commented Jun 4, 2021

This is cumbersome, better to avoid even if it was supported. If it's the same docstring make all them be based in the same base one. Also, are all these public? If they are not rendered in the docs, we probably don't care much on having tge docstring.

Only DataFrame.info() is public.
Just to clarify, do you consider the current approach (committed) as cumbersome?

Another option, as you already mentioned above, is to simply have different docstrings for DataFrame.info and Series.info, but looks like @jreback preferred a more generic approach.

@datapythonista
Copy link
Member

Sorry, what I meant that is cumbersome is your example, where doc is being used to copy one docstring, and then used again to copy the copy.

Not so familiar with those DataFrameInfo class, but if the mehods you are documenting are not public, having those docstrings it's not very useful. Docstrings are useful for two things, for developers reading the code, and for generating the docs as html. With @doc, we ruin the former, so we don't need to repeat long docstrings rendered in the documentation. So, if a method is not public, in my opinion is as useful to have a doctring """Look this other object, does the same.""" as using @doc.

@ivanovmg
Copy link
Member Author

ivanovmg commented Jun 4, 2021

Right, there is no need to have docstrings inside the internal classes.
But the idea was to generate docstrings for DataFrame.info() and Series.info() (to be added) from within pandas/io/formats/info.py.

@@ -284,6 +425,17 @@ def memory_usage_bytes(self) -> int:
deep = False
return self.data.memory_usage(index=True, deep=deep).sum()

@doc(
INFO_DOCSTRING,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you keep INFO_DOCSTRING where it is and reference it here as BaseInfo.render.__doc__?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just tried that, but it messes with the indentation for some reason

image

@jreback
Copy link
Contributor

jreback commented Oct 4, 2021

@ivanovmg in theory this is a good change, if you want to merge master. ideally small changes in a series of PRs are better.

@jreback
Copy link
Contributor

jreback commented Nov 28, 2021

@ivanovmg can you merge master again pls. and ping when green.

@ivanovmg
Copy link
Member Author

@jreback, I merged master.
It is green.

@jreback jreback added this to the 1.4 milestone Nov 29, 2021
@jreback jreback merged commit c448ec7 into pandas-dev:master Nov 29, 2021
@jreback
Copy link
Contributor

jreback commented Nov 29, 2021

thanks @ivanovmg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants