Skip to content

Update citation webpage #33311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 10, 2020
Merged

Update citation webpage #33311

merged 4 commits into from
Apr 10, 2020

Conversation

ivalaginja
Copy link
Contributor

@ivalaginja ivalaginja commented Apr 6, 2020

Follow-up of #32388, addressing #24036

I will leave it to the pandas team to decide whether to put in there a BibTeX entry with the concept DOI or a specific version, some options of dealing with this are described in this comment.

Note how this comment thread on the previous PR asked to replace the author list by "The pandas development team". However, if users go to Zenodo to get the correct BibTeX entry of the version they're actually using, their citation will contain the full author list provided in Zenodo.

@datapythonista
Copy link
Member

I'm not in academia, so not an expert in citations. But I'd say users arriving to this page would like to see how to cite pandas (a single way), or if there is more than one, when to use each.

@wesm I guess you're the right person to ask. How pandas should be cited?

@ivalaginja
Copy link
Contributor Author

ivalaginja commented Apr 6, 2020

There's a failing check on the docs that I don't fully understand, if someone can help me figure it out I can go in and fix it (or also feel free to take that over).

@datapythonista
Copy link
Member

The error is unrelated to this PR, it's being addressed in #33309

@wesm
Copy link
Member

wesm commented Apr 6, 2020

Here's the canonical citation:

https://conference.scipy.org/proceedings/scipy2010/mckinney.html

DOI 10.25080/Majora-92bf1922-00a

BibTeX

@InProceedings{ mckinney-proc-scipy-2010,
  author    = { {W}es {M}c{K}inney },
  title     = { {D}ata {S}tructures for {S}tatistical {C}omputing in {P}ython },
  booktitle = { {P}roceedings of the 9th {P}ython in {S}cience {C}onference },
  pages     = { 56 - 61 },
  year      = { 2010 },
  editor    = { {S}t\'efan van der {W}alt and {J}arrod {M}illman },
  doi       = { 10.25080/Majora-92bf1922-00a }
}

@datapythonista
Copy link
Member

Thanks @wesm, @ivalaginja can you update the page to only contain the citation mentioned by Wes please. Thanks!

@datapythonista
Copy link
Member

I'm not a researcher, may be in missing something obvious. But are users expected to cite both the paper and the zenodo reference? Or depending on the context they will cite one or the other? Feels like with Wes paper should be enough, not sure what I'm missing.

@ivalaginja
Copy link
Contributor Author

The whole point of this PR and the issue it addresses is that the correct way to cite software is to cite the software directly. From the perspective of "proper" citation, this is enough. However, the most merit in the academic world is gained by citations to papers, so it is usual to cite both if both are available, if the authors voice that wish.

So the bare minimum is to cite the software alone; the usual case that satisfies this minimum is to cite both, in order to also provide academic credit. The case where only a paper is cited and the software product itself is not, is a faulty citation, unless the software never got published properly and hence does not have an archived identifier to be cited with.

@wesm
Copy link
Member

wesm commented Apr 7, 2020

I don't have a strong position. I have seen academic papers that have failed to cite pandas beyond a link to the project website so it would be good for it to be clear when people go looking for a citation (they may not, though)

@ivalaginja
Copy link
Contributor Author

ivalaginja commented Apr 7, 2020

I have seen that a lot too and unfortunately, it keeps happening. Part of it comes from the fact that the act of writing software is not valued as highly in academia as it is to write and publish a paper. For licensed tools that users pay for that is usually not an issue, as the license holders get money and mostly don't care about academic credit.

In the past years though there has been a larger and larger shift to free and open-source tools, especially with Python. Most people simply don't know about the difference between making software publicly available (e.g. put it on GitHub) and publishing their software in an archive that creates a permanent record (e.g. Zenodo). The interplay between people not knowing how to cite software because they haven't seen it enough yet, but also not knowing how to publish their own software products in order to make them citable leads to an embarrassingly high number of papers that "cite" software by providing URLs (non-permanent by definition).

It's a very interesting topic if you're into software publication and a great showcase of the rigidity of the academic world :)

@ivalaginja
Copy link
Contributor Author

ivalaginja commented Apr 7, 2020

This is why it certainly helps to make citation instructions clearer. I didn't change the formulation on the pandas citation page because my intent was to get the software citation in there, not to change what the pandas team expects form its users, even if there is a strong recommendation. If you want, I can change the part "we would appreciate citations to the published software and the following paper" to "we would kindly ask you to cite both the software and the following paper".

@datapythonista
Copy link
Member

Not in academia, so not an expert, but I see almost all project in the ecosystem is presenting a paper to cite. Couple of coses offer a citation to the website, and just matplotlib has some zenodo badges.

@TomAugspurger I saw your comments in the issue, are we happy to have a zenodo badge that has to be updated at every release? Is it worth the effort? If we want it, do we want it in the README, in the citing page, or in both?

@jorisvandenbossche I think you've been in academia, any opinion here?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 7, 2020 via email

@ivalaginja
Copy link
Contributor Author

Yes there is. To quote from the link in the PR description:
"If you wanted to give some example bibtex you could replace the authorship with "The Sherpa Developers", simplify the title, remove the version key, and use the concept DOI. Then you could tell your users to "follow the concept DOI to find the that version/author/DOI information relevant to the code version they used." This way if they are lazy they are at least citing the lazy thing (a concept DOI sans authors or version) and not some blended thing."

Do you want me to drop in the general Zenodo citation instead? That would mean that you don't need to update this with every release.

@datapythonista
Copy link
Member

I would have the version independent badge to zenodo in the readme, and a link to also the version independent after the bibtex entry that Wes sent. Does that make sense?

@ivalaginja
Copy link
Contributor Author

It does! And I will also leave a note I there to encourage people to go find the version-specific BibTeX entry, but if they use the general one, nothing is lost. Sounds like a good plan!

@datapythonista
Copy link
Member

Sorry, I think there has been some misunderstanding. What I understood was that we were going to provide only the bibtex reference of pandas paper, as all the other projects in the community. And then, add the svg badge both in the README, and after the paper's bibtex.

The current proposal in this PR is asking people to add two references to pandas, which seems unnecessary, and I don't expect people to do it. I wouldn't be creative, and I'd follow what the rest of the community, just give the paper to cite. And if some people find the zenodo thing useful, we can surely have the badge. But I'd avoid confusion on what's the way we encourage people to cite pandas.

@ivalaginja
Copy link
Contributor Author

In that case, I will retract my PR, as this is completely missing the point of it, and let you take over on how to deal with your citations. Just to reiterate, my suggestion was to move away from the poor standard practice of never mentioning the software directly. I am confident that people who are capable of writing a paper are also capable of following simple instructions on what they need to cite if they find the authors' request. The submitted PR is not about being creative - I did not come up with this myself (see sources below). And the "zenodo thing" does have a purpose beyond providing an additional pretty badge.

I don't want to waste your time further if you don't see the benefit in this. Please do reach out in the future if you'd like to continue the discussion. Note how I was trying to motivate the team to consider following the recommendations put forward by the FORCE11 Software Citation Working Group, which they published, among other resources, in their paper presenting software citation principles. The particular section I am referring to is about citing the software product directly, additionally to software papers (see section 6.2):

6.2 Software papers. Currently, and for the foreseeable future, software papers are being published and cited, in addition to software itself being published and cited, as many community norms and practices are oriented towards citation of papers. [...] the software itself should be cited on the same basis as any other research product; authors should cite the appropriate set of software products. If a software paper exists and it contains results (performance, validation, etc.) that are important to the work, then the software paper should also be cited. We believe that a request from the software authors to cite a paper should typically be respected, and the paper cited in addition to the software.

I will point out that I am aware that these are a set of recommendations, but I also highly recommend to dive a little deeper into the matter to see how all of these things work together. In the end, it is you (the team) who decides whether they would like to be cited properly or not and whatever you put forward, an author should adhere to.

Out of my personal experience, the confusion usually arises from people trying to actually follow these principles but then packages not providing a native software citation they could use.

@jreback jreback added this to the 1.1 milestone Apr 10, 2020
@jreback jreback merged commit 40fd73a into pandas-dev:master Apr 10, 2020
@jreback
Copy link
Contributor

jreback commented Apr 10, 2020

merging this, thanks @ivalaginja

this is a net improvement, especially if it reduces friction for citers.

@ivalaginja ivalaginja deleted the update-citation-page branch May 7, 2020 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants