-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
added f strings and typing to frame.py #30021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the PR - can you run flake8 that will fix the github actions code checks CI failure: https://github.com/pandas-dev/pandas/pull/30021/checks?check_run_id=332186140#step:6:41 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR - generally looks good - few minor comments
pandas/core/frame.py
Outdated
"({cols:d} != {counts:d})".format( | ||
cols=len(cols), counts=len(counts) | ||
) | ||
"Columns must equal counts " f"({len(cols)} != {len(counts)})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put all of this line inside the f string for readability
pandas/core/frame.py
Outdated
raise TypeError( | ||
err_msg + " Received column of type {}".format(type(col)) | ||
) | ||
raise TypeError(err_msg + f" Received column of type {type(col)}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could just include the err_msg inside f string
…ame_typing_fstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good - some minor edits
Co-Authored-By: William Ayd <[email protected]>
Co-Authored-By: William Ayd <[email protected]>
Co-Authored-By: William Ayd <[email protected]>
Co-Authored-By: William Ayd <[email protected]>
Co-Authored-By: William Ayd <[email protected]>
pandas/core/frame.py
Outdated
f"{err_msg} Received column of type {type(col)}" | ||
"array, or a list containing only valid column keys and " | ||
f"one-dimensional arrays. Received column of type {type(col)}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why's this changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake, I misinterpreted a previous comment
pandas/core/frame.py
Outdated
@@ -4423,7 +4413,7 @@ def _maybe_casted_values(index, labels=None): | |||
raise ValueError( | |||
"col_fill=None is incompatible " | |||
"with incomplete column name " | |||
"{}".format(name) | |||
f"{name}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks odd and redundant as an f-string. perhaps combine with string above
"Generating numeric_only data with filter_type {f}" | ||
"not supported.".format(f=filter_type) | ||
f"Generating numeric_only data with filter_type {filter_type}" | ||
"not supported." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this appears to have a missing space in the message. do we have a test for this message?
pandas/core/frame.py
Outdated
@@ -8170,4 +8168,4 @@ def _from_nested_dict(data): | |||
|
|||
|
|||
def _put_str(s, space): | |||
return "{s}".format(s=s)[:space].ljust(space) | |||
return f"{s}"[:space].ljust(space) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not a fan of f-strings used like this. could just use str().
def drop_duplicates(self, subset=None, keep="first", inplace=False): | ||
def drop_duplicates( | ||
self, | ||
subset: Optional[Union[Hashable, Sequence[Hashable]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is None a valid value in a sequence of labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, check this out:
df = pd.DataFrame({None:[1,2,2], 'col1':[1,2,3]})
df.drop_duplicates(subset=[None])
outputs:
NaN col1
0 1 1
1 2 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subset: Optional[Union[Hashable, Sequence[Hashable]]] = None, | |
subset: Optional[Union[Hashable, Sequence[Optional[Hashable]]]] = None, |
should probably be Sequence[Optional[Hashable]] in that case.
Co-Authored-By: Simon Hawkins <[email protected]>
… into frame_typing_fstring
Co-Authored-By: Simon Hawkins <[email protected]>
… into frame_typing_fstring
@mck619 looks like some tricky mypy failures. Flag me down at PyData and I'll help you out in person |
could try this as minimum to make mypy pass... diff --git a/pandas/core/frame.py b/pandas/core/frame.py
index 6afd64f64..491c86999 100644
--- a/pandas/core/frame.py
+++ b/pandas/core/frame.py
@@ -15,6 +15,7 @@ import itertools
import sys
from textwrap import dedent
from typing import (
+ Any,
FrozenSet,
Hashable,
Iterable,
@@ -25,6 +26,7 @@ from typing import (
Tuple,
Type,
Union,
+ cast,
)
import warnings
@@ -4192,7 +4194,7 @@ class DataFrame(NDFrame):
inplace: bool = False,
col_level: Hashable = 0,
col_fill: Optional[Hashable] = "",
- ) -> "DataFrame":
+ ) -> Optional["DataFrame"]:
"""
Reset the index, or a level of it.
@@ -4220,8 +4222,8 @@ class DataFrame(NDFrame):
Returns
-------
- DataFrame
- DataFrame with the new index.
+ DataFrame or None
+ DataFrame with the new index or None if ``inplace=True``.
See Also
--------
@@ -4386,6 +4388,7 @@ class DataFrame(NDFrame):
new_index = self.index.droplevel(level)
if not drop:
+ to_insert: Iterable[Tuple[Any, Optional[Any]]]
if isinstance(self.index, ABCMultiIndex):
names = [
(n if n is not None else f"level_{i}")
@@ -4425,6 +4428,8 @@ class DataFrame(NDFrame):
if not inplace:
return new_obj
+ return None
+
# ----------------------------------------------------------------------
# Reindex-based selection methods
@@ -4590,7 +4595,7 @@ class DataFrame(NDFrame):
subset: Optional[Union[Hashable, Sequence[Hashable]]] = None,
keep: Union[str, bool] = "first",
inplace: bool = False,
- ) -> "DataFrame":
+ ) -> Optional["DataFrame"]:
"""
Return DataFrame with duplicate rows removed.
@@ -4612,7 +4617,7 @@ class DataFrame(NDFrame):
Returns
-------
- DataFrame
+ DataFrame or None
"""
if self.empty:
return self.copy()
@@ -4624,6 +4629,7 @@ class DataFrame(NDFrame):
(inds,) = (-duplicated)._ndarray_values.nonzero()
new_data = self._data.take(inds)
self._update_inplace(new_data)
+ return None
else:
return self[-duplicated]
@@ -4675,6 +4681,9 @@ class DataFrame(NDFrame):
):
subset = (subset,)
+ # needed for mypy since can't narrow types using np.iterable
+ subset = cast(Iterable, subset)
+
# Verify all columns in subset exist in the queried dataframe
# Otherwise, raise a KeyError, same as if you try to __getitem__ with a
# key that doesn't exist.
@@ -6024,6 +6033,8 @@ class DataFrame(NDFrame):
raise ValueError("columns must be unique")
df = self.reset_index(drop=True)
+ # TODO: use overload to refine return type of reset_index
+ assert df is not None # needed for mypy
result = df[column].explode()
result = df.drop([column], axis=1).join(result)
result.index = self.index.take(result.index)
diff --git a/pandas/core/reshape/merge.py b/pandas/core/reshape/merge.py
index d671fff56..726a59ca8 100644
--- a/pandas/core/reshape/merge.py
+++ b/pandas/core/reshape/merge.py
@@ -126,7 +126,10 @@ def _groupby_and_merge(
on = [on]
if right.duplicated(by + on).any():
- right = right.drop_duplicates(by + on, keep="last")
+ _right = right.drop_duplicates(by + on, keep="last")
+ # TODO: use overload to refine return type of drop_duplicates
+ assert _right is not None # needed for mypy
+ right = _right
rby = right.groupby(by, sort=False)
except KeyError:
rby = None May want to use overloads with May also want to be more precise with declaration of Ideally, also find a way of avoiding the cast. |
Thanks @mck619 for the PR ! (and @simonjayhawkins for detailed comments) |
Thank you @WillAyd and @simonjayhawkins for all your help with my first PR!! Can't wait to contribute in a more helpful and meaningful manner in the future. |
replaced old formatting with fstrings where possible
added typing hints to duplicated, drop_duplicates, reset_index