Skip to content

added f strings and typing to frame.py #30021

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Dec 6, 2019

Conversation

mck619
Copy link
Contributor

@mck619 mck619 commented Dec 3, 2019

replaced old formatting with fstrings where possible

added typing hints to duplicated, drop_duplicates, reset_index

@alimcmaster1 alimcmaster1 added Clean Code Style Code style, linting, code_checks Typing type annotations, mypy/pyright type checking labels Dec 4, 2019
@alimcmaster1
Copy link
Member

alimcmaster1 commented Dec 4, 2019

Thanks for the PR - can you run flake8 that will fix the github actions code checks CI failure: https://github.com/pandas-dev/pandas/pull/30021/checks?check_run_id=332186140#step:6:41

Copy link
Member

@alimcmaster1 alimcmaster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR - generally looks good - few minor comments

"({cols:d} != {counts:d})".format(
cols=len(cols), counts=len(counts)
)
"Columns must equal counts " f"({len(cols)} != {len(counts)})"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put all of this line inside the f string for readability

raise TypeError(
err_msg + " Received column of type {}".format(type(col))
)
raise TypeError(err_msg + f" Received column of type {type(col)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just include the err_msg inside f string

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - some minor edits

@WillAyd WillAyd added this to the 1.0 milestone Dec 4, 2019
mck619 and others added 2 commits December 3, 2019 20:46
@pep8speaks
Copy link

pep8speaks commented Dec 4, 2019

Hello @mck619! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-12-06 00:05:07 UTC

Comment on lines 4117 to 4119
f"{err_msg} Received column of type {type(col)}"
"array, or a list containing only valid column keys and "
f"one-dimensional arrays. Received column of type {type(col)}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why's this changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, I misinterpreted a previous comment

@@ -4423,7 +4413,7 @@ def _maybe_casted_values(index, labels=None):
raise ValueError(
"col_fill=None is incompatible "
"with incomplete column name "
"{}".format(name)
f"{name}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks odd and redundant as an f-string. perhaps combine with string above

"Generating numeric_only data with filter_type {f}"
"not supported.".format(f=filter_type)
f"Generating numeric_only data with filter_type {filter_type}"
"not supported."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this appears to have a missing space in the message. do we have a test for this message?

@@ -8170,4 +8168,4 @@ def _from_nested_dict(data):


def _put_str(s, space):
return "{s}".format(s=s)[:space].ljust(space)
return f"{s}"[:space].ljust(space)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not a fan of f-strings used like this. could just use str().

def drop_duplicates(self, subset=None, keep="first", inplace=False):
def drop_duplicates(
self,
subset: Optional[Union[Hashable, Sequence[Hashable]]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is None a valid value in a sequence of labels?

Copy link
Contributor Author

@mck619 mck619 Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, check this out:

df = pd.DataFrame({None:[1,2,2], 'col1':[1,2,3]})
df.drop_duplicates(subset=[None])

outputs:

   NaN  col1
0    1     1
1    2     2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
subset: Optional[Union[Hashable, Sequence[Hashable]]] = None,
subset: Optional[Union[Hashable, Sequence[Optional[Hashable]]]] = None,

should probably be Sequence[Optional[Hashable]] in that case.

@WillAyd
Copy link
Member

WillAyd commented Dec 4, 2019

@mck619 looks like some tricky mypy failures. Flag me down at PyData and I'll help you out in person

@simonjayhawkins
Copy link
Member

could try this as minimum to make mypy pass...

diff --git a/pandas/core/frame.py b/pandas/core/frame.py
index 6afd64f64..491c86999 100644
--- a/pandas/core/frame.py
+++ b/pandas/core/frame.py
@@ -15,6 +15,7 @@ import itertools
 import sys
 from textwrap import dedent
 from typing import (
+    Any,
     FrozenSet,
     Hashable,
     Iterable,
@@ -25,6 +26,7 @@ from typing import (
     Tuple,
     Type,
     Union,
+    cast,
 )
 import warnings
 
@@ -4192,7 +4194,7 @@ class DataFrame(NDFrame):
         inplace: bool = False,
         col_level: Hashable = 0,
         col_fill: Optional[Hashable] = "",
-    ) -> "DataFrame":
+    ) -> Optional["DataFrame"]:
         """
         Reset the index, or a level of it.
 
@@ -4220,8 +4222,8 @@ class DataFrame(NDFrame):
 
         Returns
         -------
-        DataFrame
-            DataFrame with the new index.
+        DataFrame or None
+            DataFrame with the new index or None if ``inplace=True``.
 
         See Also
         --------
@@ -4386,6 +4388,7 @@ class DataFrame(NDFrame):
                 new_index = self.index.droplevel(level)
 
         if not drop:
+            to_insert: Iterable[Tuple[Any, Optional[Any]]]
             if isinstance(self.index, ABCMultiIndex):
                 names = [
                     (n if n is not None else f"level_{i}")
@@ -4425,6 +4428,8 @@ class DataFrame(NDFrame):
         if not inplace:
             return new_obj
 
+        return None
+
     # ----------------------------------------------------------------------
     # Reindex-based selection methods
 
@@ -4590,7 +4595,7 @@ class DataFrame(NDFrame):
         subset: Optional[Union[Hashable, Sequence[Hashable]]] = None,
         keep: Union[str, bool] = "first",
         inplace: bool = False,
-    ) -> "DataFrame":
+    ) -> Optional["DataFrame"]:
         """
         Return DataFrame with duplicate rows removed.
 
@@ -4612,7 +4617,7 @@ class DataFrame(NDFrame):
 
         Returns
         -------
-        DataFrame
+        DataFrame or None
         """
         if self.empty:
             return self.copy()
@@ -4624,6 +4629,7 @@ class DataFrame(NDFrame):
             (inds,) = (-duplicated)._ndarray_values.nonzero()
             new_data = self._data.take(inds)
             self._update_inplace(new_data)
+            return None
         else:
             return self[-duplicated]
 
@@ -4675,6 +4681,9 @@ class DataFrame(NDFrame):
         ):
             subset = (subset,)
 
+        # needed for mypy since can't narrow types using np.iterable
+        subset = cast(Iterable, subset)
+
         # Verify all columns in subset exist in the queried dataframe
         # Otherwise, raise a KeyError, same as if you try to __getitem__ with a
         # key that doesn't exist.
@@ -6024,6 +6033,8 @@ class DataFrame(NDFrame):
             raise ValueError("columns must be unique")
 
         df = self.reset_index(drop=True)
+        # TODO: use overload to refine return type of reset_index
+        assert df is not None  # needed for mypy
         result = df[column].explode()
         result = df.drop([column], axis=1).join(result)
         result.index = self.index.take(result.index)
diff --git a/pandas/core/reshape/merge.py b/pandas/core/reshape/merge.py
index d671fff56..726a59ca8 100644
--- a/pandas/core/reshape/merge.py
+++ b/pandas/core/reshape/merge.py
@@ -126,7 +126,10 @@ def _groupby_and_merge(
                 on = [on]
 
             if right.duplicated(by + on).any():
-                right = right.drop_duplicates(by + on, keep="last")
+                _right = right.drop_duplicates(by + on, keep="last")
+                # TODO: use overload to refine return type of drop_duplicates
+                assert _right is not None  # needed for mypy
+                right = _right
         rby = right.groupby(by, sort=False)
     except KeyError:
         rby = None

May want to use overloads with drop_duplicates and reset_index to reduce additional asserts and code changes needed elsewhere.

May also want to be more precise with declaration of to_insert

Ideally, also find a way of avoiding the cast.

@WillAyd WillAyd merged commit 282a0e4 into pandas-dev:master Dec 6, 2019
@WillAyd
Copy link
Member

WillAyd commented Dec 6, 2019

Thanks @mck619 for the PR ! (and @simonjayhawkins for detailed comments)

@mck619
Copy link
Contributor Author

mck619 commented Dec 6, 2019

Thank you @WillAyd and @simonjayhawkins for all your help with my first PR!! Can't wait to contribute in a more helpful and meaningful manner in the future.

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Code Style Code style, linting, code_checks Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants