Skip to content

CLN refactor core dtypes #37584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jan 3, 2021
Merged

Conversation

MarcoGorelli
Copy link
Member

@MarcoGorelli MarcoGorelli commented Nov 2, 2020

Some refactorings found by Sourcery https://sourcery.ai/

I've removed the ones of the kind

- if param:
-     var = a
- else:
-     var = b
+ var = a if param else b

@MarcoGorelli MarcoGorelli marked this pull request as draft November 2, 2020 12:55
@jreback jreback added the Code Style Code style, linting, code_checks label Nov 2, 2020
@jreback jreback added this to the 1.2 milestone Nov 2, 2020
@jreback
Copy link
Contributor

jreback commented Nov 2, 2020

look ok. these may have broken things.

@MarcoGorelli
Copy link
Member Author

MarcoGorelli commented Nov 4, 2020

Indeed - the refactoring

-    if len(types) == 0:
+    if not types:

was wrong, as types could be a Series here (as in the test pandas/tests/arrays/sparse/test_accessor.py::TestFrameAccessor::test_to_coo), in which was not types raises

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Have fixed up and changed the annotation

@@ -1535,7 +1539,7 @@ def maybe_cast_to_datetime(value, dtype: DtypeObj, errors: str = "raise"):
return value


def find_common_type(types: List[DtypeObj]) -> DtypeObj:
def find_common_type(types: Union[List[DtypeObj], Series]) -> DtypeObj:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are we passing a Series? we shouldn't do this

Copy link
Member Author

@MarcoGorelli MarcoGorelli Nov 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in pandas/tests/arrays/sparse/test_accessor.py::TestFrameAccessor::test_to_coo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll dig deeper at the weekend

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback the problem is that when we do

        df = pd.DataFrame({"A": [0, 1, 0], "B": [1, 0, 0]}, dtype="Sparse[int64, 0]")
        result = df.sparse.to_coo()

then, in to_coo from pandas/core/arrays/sparse/accessor.py, we have

        dtype = find_common_type(self._parent.dtypes)

and so this is how we pass a Series (self._parent.dtypes is a Series).

Is there a better way to fix this than

        dtype = find_common_type(self._parent.dtypes.to_list())

?

@MarcoGorelli MarcoGorelli marked this pull request as ready for review November 6, 2020 07:10
@MarcoGorelli MarcoGorelli marked this pull request as draft November 6, 2020 07:10
@MarcoGorelli MarcoGorelli marked this pull request as ready for review November 6, 2020 07:24
@jreback
Copy link
Contributor

jreback commented Nov 18, 2020

can you merge master ping on green

@MarcoGorelli MarcoGorelli mentioned this pull request Nov 18, 2020
1 task
@MarcoGorelli
Copy link
Member Author

can you merge master ping on green

Green, barring (probably unrelated?) travis error:

____________________________ test_deprecated_kwargs ____________________________

[gw0] linux -- Python 3.7.8 /home/travis/miniconda3/envs/pandas-dev/bin/python

    def test_deprecated_kwargs():

        df = pd.DataFrame({"A": [2, 4, 6], "B": [3, 6, 9]}, index=[0, 1, 2])

        buf = df.to_json(orient="split")

        with tm.assert_produces_warning(FutureWarning):

>           tm.assert_frame_equal(df, read_json(buf, "split"))

pandas/tests/io/json/test_deprecated_kwargs.py:15: 

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <contextlib._GeneratorContextManager object at 0x7fc4d8ce5110>

type = None, value = None, traceback = None

    def __exit__(self, type, value, traceback):

        if type is None:

            try:

>               next(self.gen)

E               AssertionError: Caused unexpected warning(s): [('ResourceWarning', ResourceWarning("unclosed <ssl.SSLSocket fd=12, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('10.20.0.66', 55192), raddr=('52.216.144.227', 443)>"), '/home/travis/build/pandas-dev/pandas/pandas/core/dtypes/base.py', 435), ('ResourceWarning', ResourceWarning("unclosed file <_io.BufferedWriter name='/tmp/tmpnm7c5epi.xls'>"), '/home/travis/build/pandas-dev/pandas/pandas/core/dtypes/base.py', 435), ('ResourceWarning', ResourceWarning("unclosed file <_io.BufferedWriter name='/tmp/tmplgig23ur.xlsx'>"), '/home/travis/build/pandas-dev/pandas/pandas/core/dtypes/base.py', 435)]

../../../miniconda3/envs/pandas-dev/lib/python3.7/contextlib.py:119: AssertionError

@@ -329,7 +329,7 @@ def to_coo(self):
import_optional_dependency("scipy")
from scipy.sparse import coo_matrix

dtype = find_common_type(self._parent.dtypes)
dtype = find_common_type(self._parent.dtypes.to_list())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is because find_commont_type from pandas/core/dtypes/cast.py requires a list

@jreback jreback removed this from the 1.2 milestone Nov 24, 2020
@jreback
Copy link
Contributor

jreback commented Nov 24, 2020

moving off 1.2 until ready

@MarcoGorelli
Copy link
Member Author

Sure, no problem (though as far as I can tell, this is ready - anything else need changing?)

@jreback
Copy link
Contributor

jreback commented Nov 24, 2020

Sure, no problem (though as far as I can tell, this is ready - anything else need changing?)

I think it was fine, just want to make sure after merge

@MarcoGorelli
Copy link
Member Author

sure, have merged

# tzawareness compat failure, see GH#28507
return False
elif "boolean value of NA is ambiguous" in str(err):
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is for coverage

@jreback jreback added this to the 1.3 milestone Jan 3, 2021
@jreback jreback merged commit dd865aa into pandas-dev:master Jan 3, 2021
@jreback
Copy link
Contributor

jreback commented Jan 3, 2021

thanks @MarcoGorelli

@MarcoGorelli MarcoGorelli deleted the refactor-core-dtypes branch January 3, 2021 18:06
luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Code Style Code style, linting, code_checks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants