Skip to content

Adding sample datasets to be used in the documentation #19933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Adding sample datasets to be used in the documentation #19933

wants to merge 1 commit into from

Conversation

datapythonista
Copy link
Member

@jreback
Copy link
Contributor

jreback commented Mar 1, 2018

I thought we had an issue about this already, IIRC @mrocklin either made it or commented (was for a slightly different purpose though).

@jreback
Copy link
Contributor

jreback commented Mar 1, 2018

We already have lots of data constructors in pandas.util.testing, though these are 'nicer' ones. These would need testing (e.g. do they run), and can be de-privatized (no leading _).

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Mar 1, 2018

I thought we had an issue about this already, IIRC @mrocklin either made it or commented (was for a slightly different purpose though).

I don't know if there is an older issue, but this is related to what is recently discussed in #19710 as @datapythonista linked to.

We already have lots of data constructors in pandas.util.testing, though these are 'nicer' ones.

Indeed, I don't think the ones in util.testing are suitable for this, as the exact purpose of those are to be 'nicer' relate-able small dataframes.

@datapythonista
Copy link
Member Author

Any more thoughts on this? Knowing which data to use for the examples in the docstrings is the main blocker for the sprint. Any feedback on how you think we can improve this first draft is highly appreciated. Thanks!

@jorisvandenbossche
Copy link
Member

This needs to be imported in some __init__ files, as otherwise you cannnot do pd.io.samples. .... Or what would be the intended use in the docs?

import pandas


def _countries_with_penguins():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these need to be de-privatized (no leading _)

columns = ('Code', 'Name', 'Capital', 'Continent',
'Penguin species', 'Avg. temperature')
data = [
('AO', 'Angola', 'Luanda', 'AF', 1, 21.55),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these need tests

@datapythonista
Copy link
Member Author

For what has been discussed in #19710, seems like it probably makes more sense to simply have some ideas on data to be used, but use custom datasets as simple and illustrative as possible depending on each case. So, closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: develop a set of standard example DataFrames for use in docstring examples
3 participants