-
-
Notifications
You must be signed in to change notification settings - Fork 141
gh-372 : Fixing Series.astype() #519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
You need to build a table first before doing a PR. Test locally as well before pushing. You have syntax errors. |
Work is still under progress I'll keep commiting I created this draft PR so that you could be updated with the progress |
OK, but I think you should build the table first before pushing any more code, so I can review the table. Once we have the table looking right, then we can move forward with code. Also, you should test locally before pushing to avoid CI failures. E.g., |
I'll post a picture of the table in a 10 mins |
Can you post the table as a Markdown table, so then it becomes easier to collaborate via my edits? You need to add numpy types (np.int, np.float, etc.). You need to add the string representations, e.g., I things like I think it is the case that we need to remove some of the values from >>> import pandas as pd
>>> import datetime as dt
>>> a=[dt.datetime(2023, 1, 25), dt.datetime(2023, 1, 26)]
>>> a
[datetime.datetime(2023, 1, 25, 0, 0), datetime.datetime(2023, 1, 26, 0, 0)]
>>> s=pd.Series(a)
>>> s
0 2023-01-25
1 2023-01-26
dtype: datetime64[ns]
>>> s.dtype
dtype('<M8[ns]')
>>> type(s.dtype)
<class 'numpy.dtype[datetime64]'> So in the above, I created a Getting this right is not an easy task and you have to experiment to make sure that the types in |
Yeah i'll create a markdown table and do the required changes i'll do these changes tommorow as it's night here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure there are entries in the left column for each of the types currently in S1
Once we get this table right, then I think we should split this into 2 PR's. One would remove elements from S1
in _typing.pyi
. Once that is done, then we can work on astype()
.
I have updated the table can you please check it
|
The above ones can be deleted because they get converted to numpy types.
We can't delete those because of how the typing works. But we can delete many of the references to |
I think the table is right |
ok then i'll start with it |
Will we have to remove args like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For any of the tests such as:
check(assert_type(s.astype("bool"), "Series[bool]"), pd.Series, np.bool_)
you can remove the need to import Series
, by changing it to this form:
check(assert_type(s.astype("bool"), "pd.Series[bool]"), pd.Series, np.bool_)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting closer, and thanks for all the hard work.
Seeing the issue that DataFrame.astype()
has no type for the dtype
argument, we need to do a bit more to support that as well. So, can you do the following:
- Take all of the type combinations that you have in
series.pyi
and make them into new types in_typing.pyi
using the following pattern:
BooleanDtypeArg: TypeAlias = type[bool] | type[np.bool_] | pd.BooleanDtype | Literal["bool"]
IntDtypeArg: TypeAlias = type[_str] | pd.StringDtype | Literal["str"]
# etc for each set of arguments
- Then modify
series.pyi
to import those aliases and use them inseries.pyi
, which should shorten up that piece of code. - In
_typing.pyi
, changeAsTypeArg
to be the union of all of the types created in step 1 PLUSExtensionDtype
- Create a new overload that should be last in
series.pyi
fordtype: ExtensionDtype
that returnsSeries
. Note - you may need to add some# type: ignore[misc]
in different places because I'm pretty sure mypy will complain about this. - To test the argument of
dtype: ExtensionDtype
, do the following:
from decimal import Decimal
from pandas.tests.extension.decimal import DecimalDtype
check(assert_type(pd.Series([Decimal(x) for x in [1,2,3]]).astype(DecimalDtype()), pd.Series), Decimal)
NOTE: If for some reason either type checker complains about importing DecimalDtype
(because it will type check the test code), then you can skip this test for now, but just leave it in commented out with a note that the type checkers aren't liking that import.
- Add tests in
tests/test_frame.py
that testDataFrame.astype()
. Only need to test for a few of the arguments, but should also check using thedict
argument as well.
thanks sir, i'll do the changes and get back |
Actually I get this when creating a |
Follow the pattern in |
…type in test_series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still need to move all the XXXDtypeArg
declarations to right above AstypeArg
, in addition to the other comments below
It appears the CI is failing because we test without pandas being installed. So go back to commenting out the tests for |
Done sir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still some unresolved comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address these comments:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still need to move all the XXXDtypeArg declarations from the bottom of _typing.pyi
to right above AstypeArg, in addition to the other comments below
pandas-stubs/core/frame.pyi
Outdated
@@ -1441,8 +1441,7 @@ class DataFrame(NDFrame, OpsMixin): | |||
) -> DataFrame: ... | |||
def astype( | |||
self, | |||
dtype: AstypeArg | Mapping[Hashable, AstypeArg] | Series, | |||
# dtype: AstypeArg | dict[Any, AstypeArg], | |||
dtype: AstypeArg | Mapping[Hashable, AstypeArg] | Series | Any, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any
is unacceptable here. Please remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done with all the changes sir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ramvikrams . This was not an easy one to get right, and I anticipate that we may get some feedback when this is released.
yes if any further change is required i'll do it and thanks sir |
* adding overloads to astype * Created the table for astype * Update table.rst * Updated the table and added numpy dtypes * Update table.rst * updated np.datetime64 * Update table.rst * added types in Timedelta * removed not required args in Dtype * removed np.timedelta64 in Timedelta * Removed timedelta64 * expanding series astype * Added type in args in dtype * corrected the args * adding a overload for 'category' and normal changes * added tests * removed unused args * corrected tests * Delete table.rst * added the bool overload to top and done the required test changes * added type_checker * added types for check and did requested changes * updated the check types * added astype in dataframe and other changes * Update test_series.py * Update test_series.py * added dict test for astype in datatest_frame and tests for ExtensionDtype in test_series * commented out the decimal tests * Update test_series.py * updated dtype args in astype * added any to list of args for astype * changed dtype args
assert_type()
to assert the type of any return valueI have created this draft PR and I'll keep commiting to it, so you can also check whether the changes are acceptable or not.