Skip to content

DOC: DataFrame() not included in API reference #4790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Sep 9, 2013 · 10 comments · Fixed by #5160
Closed

DOC: DataFrame() not included in API reference #4790

jorisvandenbossche opened this issue Sep 9, 2013 · 10 comments · Fixed by #5160
Labels
Milestone

Comments

@jorisvandenbossche
Copy link
Member

The docstring of DataFrame() is not present in the documentation (so no 'generated/pandas.DataFrame.html'), because it is not included in the API reference (http://pandas.pydata.org/pandas-docs/dev/api.html).

But there is one for DataFrame.__init__() (http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.__init__.html#pandas.DataFrame.__init__), which seems not that usefull.
Do I replace the init version with DataFrame itself, or should I just add a line for DataFrame in api.rst?

The same applies for Series and Panel.

@jorisvandenbossche
Copy link
Member Author

I went ahead and tried to include pandas.DataFrame in the API, and it seems that a lot of methods of DataFrame are missing in the documentation.
(because on the docstring page of DataFrame, all methods are listed because this is a class, with links to the docstring pages of the methods, but Sphinx was complaining that it could not found a lot of those methods).

It is a huge list:

DataFrame.all
DataFrame.as_blocks
DataFrame.at_time
DataFrame.between_time
DataFrame.bfill
DataFrame.consolidate
DataFrame.divide
DataFrame.dot
DataFrame.eq
DataFrame.ffill
DataFrame.ge
DataFrame.get
DataFrame.get_ftype_counts
DataFrame.get_value
DataFrame.get_values
DataFrame.gt
DataFrame.icol
DataFrame.iget_value
DataFrame.interpolate
DataFrame.irow
DataFrame.iterkv
DataFrame.keys
DataFrame.le
DataFrame.load
DataFrame.lt
DataFrame.mask
DataFrame.mod
DataFrame.ne
DataFrame.pivot_table
DataFrame.pow
DataFrame.product
DataFrame.rename_axis
DataFrame.rmod
DataFrame.rpow
DataFrame.save
DataFrame.set_value
DataFrame.squeeze
DataFrame.swapaxes
DataFrame.to_dense
DataFrame.to_latex
DataFrame.to_sql
DataFrame.to_wide
DataFrame.tshift
DataFrame.where

This are only the missing methods for DataFrame. You can see it here for DataFrame, Series and Panel (methods and attributes):
jorisvandenbossche@6363d21
(I included it as a comment in api.rst, so that the pages are generated, but it does not appear in the output of api.rst itself)

I guess that also for things like pandas.Timestamp, pandas.Index and maybe als for pandas itself there will be some functions missing from the documentation.

@jreback
Copy link
Contributor

jreback commented Sep 28, 2013

@jorisvandenbossche what are we doing with this?

@cpcloud
Copy link
Member

cpcloud commented Sep 28, 2013

weould be nice to get this in ..i'm constantly referring to things with sphinx but they never link bc this isn't defined

@jreback
Copy link
Contributor

jreback commented Sep 28, 2013

The index stuff is included here: #4706, which almost ready to merge

@jreback
Copy link
Contributor

jreback commented Sep 28, 2013

FYI, I wouldn't include ALL methods, some are pseudo-internal, e.g. get_values, get_value, set_value, consolidate, or should older/replace, e.g. icol, to_wide, iterkv, comparators ne,lt.....

@cpcloud
Copy link
Member

cpcloud commented Sep 28, 2013

yes ... i just meant the constructors

@jorisvandenbossche
Copy link
Member Author

To split the discussion into parts:

Document all methods or not

@jreback Personally, I think that all methods that are available to users, should be documented (because a user can stumble on it). Even though they are maybe pseudo-internal, I think we should make it clear:

  • is it available to a user: document it in the api docs (even though it is maybe not the most used function for most users, or not used in the narrative docs)
  • is it a real internal function: then it shouldn't be available to users, and maybe remove it from the public api and make it available in another way for those who need it (but I have to be honest, I don't know if this is possible).

Apart from this, there are also a lot of functions in the above list of methods missing in the docs that are not internal at all (like to_latex, interpolate, pivot_table, ...).

How to include it

At the moment I looked at two ways to include all constructors/methods:

  • Add the DataFrame, Series etc class to the api docs (like in the commit jorisvandenbossche/pandas@6363d21). Because Sphinx autodoc includes a list of all methods/attributes on the classes docstring page, you get a bunch of warnings for links to non-existing (in the docs) methods.
    Therefore, in the linked commit I added all methods I got a warning for in a comment for the time being (so the docstring pages get build and the links to them work). This can be temporary and a reminder to us which functions/methods we should still add manually to one of the sections in the api docs.
  • Another approach I found in numpy, is to automatically generate this kind of comment on the docstring page of the class itself by providing a template for classes for Sphinx autosummary extension:
    https://raw.github.com/numpy/numpy/master/doc/source/_templates/autosummary/class.rst
    http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html#numpy.ndarray -> 'Show source'
    This way, it is ensured that all methods listed on the class docstring page, always have a working link to a docstring page.

If it is concluded for the first discussion that not all methods should be documented, then the automatic documenting/listing of all methods by Sphinx autodoc can be turned off, or specific methods can be listed in the exclude-members option of autoclass.

@jreback
Copy link
Contributor

jreback commented Oct 4, 2013

@jorisvandenbossche

either approach looks fine to me (is first simpler though?)

@cpcloud ?

@jorisvandenbossche
Copy link
Member Author

@jreback I will try some things out, but at the moment I still have a problem with the methods listed twice in the class rendered html page.
As such, the second approach is simpler, because the pages are added automatically (but only listed in the list of methods of the class). But the first (adding manually in api.rst) has the advantage that it is clear which methods/functions are not yet referenced in api.rst (because otherwise this will give a warning during the doc build).

@jorisvandenbossche
Copy link
Member Author

I submitted a PR. I went for the second approach (automatically adding of method pages), as I think the approach of @JanSchulz (https://github.com/pydata/pandas/wiki/Undocumented-public-functions) is a better way to keep track of the methods/functions that are not yet included in the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants