-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Clarify how to cite pandas in scientific papers #24036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The paper linked from |
@TomAugspurger thanks for the fast response. So the correct bibtex entry is: @InProceedings{ mckinney-proc-scipy-2010,
author = { Wes McKinney },
title = { Data Structures for Statistical Computing in Python },
booktitle = { Proceedings of the 9th Python in Science Conference },
pages = { 51 - 56 },
year = { 2010 },
editor = { St\'efan van der Walt and Jarrod Millman }
} I guess the only issue then is/was the dead link on https://pandas.pydata.org/talks.html to http://jarrodmillman.com/scipy2010/pdfs/mckinney.pdf when it should link to http://conference.scipy.org/proceedings/scipy2010/pdfs/mckinney.pdf An editor asked for a publisher, but then I'll just reply that there is no one with that role. |
We should eventually include citation reference in the documentations probably in |
@mroeschke yes, I think it's helpful to include it anywhere an author would look after it. Over in the Not sure whether such a discussion belongs here, but it seems to me that also the work since 2010 could be included into an "official" reference recommendation? |
Some package (astropy?) was promotion a `<package>.__citation__` variable
for this:
http://docs.astropy.org/en/stable/changelog.html?highlight=__citation__#id58
…On Mon, Dec 3, 2018 at 4:00 AM Johannes Keyser ***@***.***> wrote:
@mroeschke <https://github.com/mroeschke> yes, I think it's helpful to
include it anywhere an author would look after it. Over in the R world,
most packages come with their citation command, which makes it very
convenient (and thus likely) to be cited correctly.
Not sure whether such a discussion belongs here, but it seems to me that
also the work since 2010 could be included into an "official" reference
recommendation?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24036 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIksCNVtKeDgis_7RIGu28LS3R7Q2ks5u1PYjgaJpZM4Y86S2>
.
|
Does pandas have a DOI for native software citation, like matplotlib? See here, where I can get a bibtex entry for native software citation, depending on the version I use or a concept DOI. |
Pandas could also get a Research Resource Identifier (RRID), see e.g. matplotlib (RRID:SCR_008624). The RRIDs of the software I used were requested in a review for the European Journal of Neuroscience. |
I don't think pandas has either of those. |
Would the Trying to establish citation principles that will give everyone due credit, as well as create traceable records of a work unit like a piece of software, the FORCE11 Software Citation Working Group published a paper presenting software citation principles. This includes the need to cite the software product directly, additionally to software papers (see section 6.2):
A DOI makes native software citation possible. Since |
Sure. What all does that involve? With
https://guides.github.com/activities/citable-code/ it seems like we can
just use Zenodo and each release gets its own DOI?
…On Tue, Sep 3, 2019 at 1:27 PM Iva Laginja ***@***.***> wrote:
Would the pandas team consider obtaining a DOI?
Trying to establish citation principles that will give everyone due
credit, as well as create traceable records of a work unit like a piece of
software, the FORCE11 Software Citation Working Group
<https://www.force11.org/about> published a paper presenting software
citation principles <https://www.force11.org/software-citation-principles>.
This includes the need to cite the software product directly,
*additionally* to software papers (see section 6.2):
6.2 *Software papers.* Currently, and for the foreseeable future,
software papers are being published and cited, in addition to software
itself being published and cited, as many community norms and practices are
oriented towards citation of papers. [...] *the software itself should be
cited on the same basis as any other research product; authors should cite
the appropriate set of software products*. If a software paper exists and
it contains results (performance, validation, etc.) that are important to
the work, then the software paper should also be cited. We believe that a
request from the software authors to cite a paper should typically be
respected, and the paper cited *in addition* to the software.
A DOI makes native software citation possible.
Since pandas is one of the more frequently used packages in many
sciences, giving people the opportunity to stick to this citation
recommendation might spread this best practice in the community. It would
be great to see this included and move towards this as a general standard
in citing software. It sure adds to the maintenance overhead of the
software, but I would recommend considering it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24036?email_source=notifications&email_token=AAKAOIUHV5JZ4ENR3WTRZO3QH2UCXA5CNFSM4GHTUS3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5ZEBFQ#issuecomment-527581334>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIWJU32ZEBHHUEBZ3ZLQH2UCXANCNFSM4GHTUS3A>
.
|
Yup, that's exactly it. While Zenodo is by far not the only solution, it's definitely one of the most widely used ones and the hook with GitHub is super simple. Yes, each release will get its own DOI so that people can cite specifically the software in the state it was in when they used it. On top of that, each software package also gets a so called "Concept DOI", which will always refer to the latest release, and the user can chose which one to cite. The last thing would then be to add an encouragement to cite the software DOI to the citation instructions given for the software. I just realized that this repo doesn't have a |
Cool, thanks. I'll see about adding the Zenodo GitHub integration. One final question, (since you seem to be familiar with this 😄) If we add a CITATION.txt or a badge to the README, does it have to be updated with each release? Or is their some kind of living "master" DOI that refers to pandas, in addition to the per-release DOIs? |
Yes, that would be the "concept DOI"! You can chose to have that in the CITATION.txt and/or in the badge and then you don't have to worry about updating that :) You can find more on the DOI business here: At one point you can also expect the user to know or at least to research what they're doing, so sticking in the "overall" DOI (i.e. Concept DOI) should be more than fine. |
Perfect, thanks. @jreback @jorisvandenbossche I've requested access to add the Zenodo GitHub application to pandas-dev. |
all sgtm |
Has pandas obtained a DOI or RRID yet? I would like to properly cite, in addition to the paper citation |
ah, I have found the DOI here: https://zenodo.org/record/3630805#.XjI9CyOIYdU |
Cool, so the only thing they're missing are citation instructions in their repository, or anywhere really. Maybe time to make a |
I thought we had one somewhere on the website.
…On Wed, Jan 29, 2020 at 8:58 PM Iva Laginja ***@***.***> wrote:
Cool, so the only thing they're missing are citation instructions in their
repository, or anywhere really. Maybe time to make a citation.txt file in
the repo? @TomAugspurger <https://github.com/TomAugspurger>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24036?email_source=notifications&email_token=AAKAOITTOVJWR6CDSZQJKKTRAI66VA5CNFSM4GHTUS3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKJQUGQ#issuecomment-580061722>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOIT3ENKREK7KK7WBGOLRAI66VANCNFSM4GHTUS3A>
.
|
Here? I browsed through all the pages but couldn't find any, maybe I was searching in the wrong place? |
https://www.scipy.org/citing.html#pandas has a citation, and link out to the SciPy2010 conference proceedings with Bibtex entry |
So I guess now that pandas has a way to cite the native software itself, is there a way to get that updated to include that as well? |
It seems to me, now that the DOI exists, this is a more proper way to cite than the link at https://www.scipy.org/citing.html#pandas. If this is correct, the latter link should be updated. |
Absolutely, the Scipy link should be updated. However, I would be hesitant to delete the pandas paper from any citation instructions, since especially scientists usually get more credit for papers than software. That's a decision that the pandas team will have to make internally. And yes, I would update the "cite" section on https://pandas.pydata.org as well. |
happy to take a PR to add a. txt for citation purposes (likely numpy / scipy have an example of how this could look) |
Interestingly, none of them do, as the scientific community often relies more on paper citations than actually giving credit to the software and its writers themselves (find the problem there :P). Full disclosure, I also talked to them about including the native software citation additionally to the bib entries for the papers, which is currently in this PR. |
Note our citation page is here: https://pandas.io/about/citing.html Still could update with the DOI and possibly add a |
I submitted a PR with #32388 to get things going. |
The link is dead. This is the up-to-date url: https://pandas.pydata.org/about/citing.html |
Given that we have https://pandas.pydata.org/about/citing.html, I think that's sufficient for now. Going to close, but if we want to reference this in other ways we can open a new issue for it |
Keep in mind that this citation has two parts: McKinney's paper is "just" a part of the proceeding. So the complete entry is a bit more complex. In German we say "Sammelwerk" and "Sammelwerkbeitrag". It means there is a book (with an editor) where each chapter has a different author.
|
Dear developers,
first thank you for your great work.
I'd like to reference my use of pandas in my scientific work, but have been unable to find a recommendation how to cite this.
I'm aware of https://pandas.pydata.org/talks.html, but the "papers" seem to refer to conference proceedings (slides) that lack a publisher, and seem out-dated?
Maybe something similar to SciPy's recommendations could be added?
The text was updated successfully, but these errors were encountered: