-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
fixed issue#59670. DOC #59714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed issue#59670. DOC #59714
Conversation
StaticAccess
commented
Sep 5, 2024
- closes DOC: Document that DataFrame.from_records()'s columns argument also acts as "include" #59670
@@ -2126,7 +2126,8 @@ def from_records( | |||
associated with them, this argument provides names for the | |||
columns. Otherwise this argument indicates the order of the columns | |||
in the result (any names not found in the data will become all-NA | |||
columns). | |||
columns).Additionally,specifying `columns` will limit the DataFrame to only | |||
include the specified columns, similar to an "include" or "usecols" functionality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest simplifying this language as follows:
Otherwise this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns) and limits the data to these columns if not all column names are provided.
I don't think it's worth mentioning "include" or "usecols" because it's better to keep the description brief.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current:
Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise, this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns).
Propose 1:
Column names to use. If the passed data do not have names associated with them, this argument provides names for the columns. Otherwise, this argument indicates the order of the columns in the result (any names not found in the data will become all-NA columns) and limits the data to these columns if not all column names are provided.
Proposed 2:
The columns argument specifies the column names for the DataFrame. If the data does not have column names, this argument assigns them. If the data already includes column names, this argument determines the order of the columns and limits the DataFrame to include only the columns listed. Any columns not specified will be excluded.
Would this revision work, or do you think there's a better way to phrase it? I’d love to hear your thoughts. Thanks so much for your time!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend leaving the first sentences alone, since they're not part of this issue, and also not limit the scope of the argument to DataFrames. If it's not working for other types, that's something that can be fixed rather than documenting the bug / limitation (see e.g. this issue that was just filed: #59717).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a polite reply to that message:
Thank you for your feedback! I agree, keeping the first sentences unchanged makes sense, and addressing the broader scope beyond just DataFrames is the right approach. If this affects other types, fixing the issue rather than documenting a limitation would indeed be the best course of action. Thanks for pointing that out!
Looks like this issue has already been addressed. Thanks for the PR but closing |