-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: factor out multiple instances of dtype_backend
parameter descriptions into _shared_docs
#53881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Looks like code checks are failing (also ensure you have the pre-commit checks configured) |
Thanks @mroeschke I need to have a look at it later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your PR
has this been discussed anywhere? to be honest I'm not really keen on adding such complexity to docstrings.
I'm all for making reusable functions in source code, but in docs and tests, repetition isn't too bad
@MarcoGorelli, thanks for the response! I don't think it was discussed specifically anywhere (at least that I know of). Mainly, I was just noticing that some parameter descriptions had been defined in That got me thinking that maybe the Actually the original reason I ended up going down this path was that I meant to suggest some small edits to the wording of the All that being said, I am totally fine to not go this direction with the docstring and I definitely agree that too much string manipulation and preprocessing to create each docstring might just obfuscate the code's inherent documentation and I think over-coupling may definitely be a concern here, especially in cases where functions evolve in their own way and maybe parameters, whose implementation details were initially the same across each function they are associated with, will eventually come to mean slightly different things. In those cases, the shared docstring would have to be unshared again so that the docstring description is specific to one particular function and not the set of all functions it appears in. The counterpoint to that might be that because these are all IO functions, maybe the parameter is expected to produce identical, or at least analogous, behaviour across the whole suite of IO functions, and would lend itself well to being defined in a consistent way in each function, such that "sharing" that piece of the docstring makes sense. But again like you say I'm not sure if it's worth the trouble or the complexity of defining it in that way. The middle ground might be having an expectation that any PR-with a docstring edit could require that the change be made such that all similar parameters in related functions get the same treatment, i.e. in this case I'd just submit a PR where any formatting changes or edits to the The bottom line is I'm happy to help with whatever your group would prefer! Just glad to contribute little bits wherever it helps. |
Thanks for explaining OK yeah, if it's repeated this number of times, it's probably fine to share the docstrings. I get a bit annoyed when I see complex decorators which save something from being written twice, but if it's 16 times, then you're right, that's a different story 😄 will take a closer look later, but should be good |
@@ -352,13 +354,13 @@ def _parse_tfoot_tr(self, table): | |||
""" | |||
raise AbstractMethodError(self) | |||
|
|||
def _parse_tables(self, doc, match, attrs): | |||
def _parse_tables(self, dom_doc, match, attrs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems irrelevant (and I don't know if it affects anyone who might be subclassing) - can we revert this please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I made this change because I thought there was some conflict between the doc
parameter in the _parse_tables
function and the @doc
decorator being applied, but now that you've suggested to just go in and change each instance anyway, this change will certainly be reverted along with the others so no worries there.
Might be simpler to just do this to be honest, docstring substitutions are a common source of confusion (not just for new contributors, but for anyone) It's also harder to review Could you just make the changes to each place the description appears please? |
Keeping the documentation consistent is certainly an issue. It doesn't seem to me Python has sufficient tooling (including third party libraries, but I've only searched a little) to keep documentation DRY. Another way one can enforce consistency is via tests / linters. I've been wondering how difficult it would be to parse the docstrings into sections and detect when an argument is reused with different descriptions. Would need to have mechanisms to ignore cases and probably be opt-in rather than opt-out. |
Thank you, @MarcoGorelli and @rhshadrach for your feedback on this. It definitely seems like a tricky aspect of code maintenance, especially when many functions share parameters and have similar functionality. Based on my understanding of @MarcoGorelli's suggestion, I have abandoned the approach used in this pull request and have submitted a fresh PR where I have edited each instance of the This relates to the suggestion from @rhshadrach about maintaining consistency for parameters across multiple functions, which I was thinking about some more: the idea of handling it via linters and checks definitely feels like the right way to go if the desire is to have the docstrings written directly within each function in a readable way (i.e., without much or any format string substitution to obfuscate meaning). I would be interested in maybe tackling the issue of parsing and splitting up any Sphinx docstring via some side-project tool maybe that could be used for developing a linter but wanted to see if there are already similar tools available. An alternative I thought about would be taking this in the opposite direction, where instead of having plain-text docstrings in each function and using linters and checks to enforce consistency for shared/duplicated descriptions across functions whenever changes are made, the docstrings for public functions could instead be extracted out and maintained within a set of files dedicated to composing each docstring from a set of descriptions and snippets, many of which would be shared. The docstrings would then be attached to their respective functions via some decorator like Both have their pros and cons and I am not sure which would be ideal. Interested to hear more thoughts and I also want to do a bit more research on potential solutions to this. It is interesting as @rhshadrach mentioned that Python does not seem to have great tooling for this problem that has to be addressed by any code base. Curious to hear thoughts on this. |
superseded by #54104, so closing - thanks again! |
I've wondered if this is a good path too, but haven't pursued it enough to know the answer. My experience with pandas docstrings makes me think it is quite hard to have a mix of hard coded vs composed parts, but perhaps with an advanced enough solution it's possible to make this ergonomic. Another (quite minor) consideration is the impact of users with IDEs. For example, here is the docstring of DataFrameGroupBy.sum as reported by PyCharm. For pandas, the bar is quite high to be added as a hard dependency. I would imagine such a mechanism would need to be vendored by pandas rather than installed as a third party library. Still, perhaps other packages could benefit from such a third party library. |
dtype_backend
parameter description is repeated manually in 16 functions #53878