Skip to content

tm.makeObjectSeries returns a Series with dtype="datetime64[ns]" #28378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
WillAyd opened this issue Sep 10, 2019 · 14 comments · Fixed by #28444
Closed

tm.makeObjectSeries returns a Series with dtype="datetime64[ns]" #28378

WillAyd opened this issue Sep 10, 2019 · 14 comments · Fixed by #28444
Labels
good first issue Testing pandas testing functions or related to the test suite
Milestone

Comments

@WillAyd
Copy link
Member

WillAyd commented Sep 10, 2019

>>> import pandas.util.testing as tm
>>> tm.makeObjectSeries()
...
t1Q4jcW3nW   2000-02-09
ElZCr3WTz2   2000-02-10
p43Nqg7u4D   2000-02-11
dtype: datetime64[ns]

@jbrockmendel and I were both surprised that calling makeObjectSeries returns a Series with a datetime dtype. I think it would be more logical for this function to actually return an object dtype Series

@WillAyd WillAyd added the Testing pandas testing functions or related to the test suite label Sep 10, 2019
@WillAyd WillAyd added this to the Contributions Welcome milestone Sep 10, 2019
@WillAyd
Copy link
Member Author

WillAyd commented Sep 10, 2019

@jmg7173
Copy link
Contributor

jmg7173 commented Sep 12, 2019

Hi, I want to contribute to this issue.

Is it okay just add dtype=object at Series generation for this issue?
Or is there more deep modification about this issue? (I'm not meaning about modification of test code related to makeObjectSeries)

@WillAyd
Copy link
Member Author

WillAyd commented Sep 12, 2019

Yea I think OK to start with that and see what breaks. Ideally may want to have it contain more than date time objects but not sure what effort that would take; just getting it to object dtype in the first place will be a good start

@jmg7173
Copy link
Contributor

jmg7173 commented Sep 12, 2019

I modified it to object. But performing add at Series, it modifies dates and dtype as datetime64.

How can I forcibly prevent add/sub operator from modifying data and dtype?
Is it natural that automatically changing the dtype when some modification occurs?
I think data modification should not be happend if it modifies dtype as datetime64. (In case of datetime64, it raises exception)

@jbrockmendel
Copy link
Member

How can I forcibly prevent add/sub operator from modifying data and dtype?

Can you be more specific about what you mean by "performing add at Series"?

@jmg7173
Copy link
Contributor

jmg7173 commented Sep 12, 2019

I mean, at tests/arithmetic/test_object.py test_object_arr_invalid, it performs add operation.

I simulated it manually, it manipulates Series with dtype=object to dtype=datetime64[ns] and also, it also manipulates data also.

>>> obj_ser = tm.makeObjectSeries()
>>> obj_ser 
YipVeLYwc8   2000-01-03
...
wHtkS7TlO2    2000-02-08 00:00:00
tTYkAjpXUN    2000-02-09 00:00:00
5OGw3PNMfM    2000-02-10 00:00:00
14tcE5tlIl    2000-02-11 00:00:00
dtype: object

>>> operator.add(obj_ser, 1)
YipVeLYwc8   2000-01-04
...
tTYkAjpXUN   2000-02-10
5OGw3PNMfM   2000-02-11
14tcE5tlIl   2000-02-14
dtype: datetime64[ns]

@jbrockmendel
Copy link
Member

what happens if you forget the datetimes altogether and fill it with strings?

@jmg7173
Copy link
Contributor

jmg7173 commented Sep 13, 2019

It passes tests.

If I change 1 to "1", it appends "1" to all the data.
And it doesn't convert dtype.

>>> tm.makeObjectSeries() + "1"

sb7X5CxFoh    Zn6Ur8Sd6Z1
...
5ixyCT4dWC    ECQrisl5Ln1
vAvu2yKqsG    HN5LTCpT0V1
Dor4uHXigH    Mx1u5ukW141
dtype: object

@jmg7173
Copy link
Contributor

jmg7173 commented Sep 13, 2019

Is it okay to change filling data as datetime64[ns] to String with dtype: object?

@jbrockmendel
Copy link
Member

Is it okay to change filling data as datetime64[ns] to String with dtype: object?

Probably, yes. The one thing I want to double-check is that we aren't losing coverage for cases where we actually do want a datetime64 Series, which I think might be the case in the arithmetic test.

@WillAyd thoughts on how to track this down more generally? grepping shows only 11 usages of makeObjectSeries, but some of those are in fixtures.

@jmg7173
Copy link
Contributor

jmg7173 commented Sep 13, 2019

As I searched, there is no usecase that objectSeries used just for datetime excluding arithmetic test.

I searched it including object_series fixture and instance of TestData(TestData.objSeries).
I hope I found everything :)

@WillAyd
Copy link
Member Author

WillAyd commented Sep 13, 2019

@WillAyd thoughts on how to track this down more generally? grepping shows only 11 usages of makeObjectSeries, but some of those are in fixtures.

Yea I think @jmg7173 if you can search for the fixture where this is used and check that as well would be helpful.

Mostly could just change and see what breaks though. The naming of this obviously means the intent is to deal with object data, so I don't think should break too much actually making this return an object dtype

@jmg7173
Copy link
Contributor

jmg7173 commented Sep 14, 2019

Yea I think @jmg7173 if you can search for the fixture where this is used and check that as well would be helpful.

Here's list where this is used:

  • As function makeObjectSeries itself:
pandas/tests/arithmetic/test_object.py L144 test_objarr_add_invalid
pandas/tests/dtypes/test_missing.py L94-101 test_isna_isnull
pandas/tests/dtypes/test_missing.py L51-59 test_notna_notnull
pandas/tests/generic/test_generic.py L703-704 test_squeeze
pandas/tests/generic/test_generic.py L746 test_transpose
pandas/tests/generic/test_generic.py L768 test_take
pandas/tests/io/test_packers.py L459 test_basic at TestSeries (defined as self.d["object"])
pandas/tests/io/json/test_pandas.py L885-890 test_series_from_json_to_json at TestPandasContainer (defined as self.objSeries)
  • As return of method objSeries at class TestData:
Defined:
pandas/tests/series/common.py  L23 as method objSeries of class TestData
(TestData fixture defined at pandas/tests/series/indexing/conftest.py)

Used:
pandas/pandas/tests/series/test_repr.py L74 (in test_repr)
pandas/tests/series/indexing/test_indexing.py L92 test_getitem_get
pandas/tests/series/indexing/test_indexing.py L121 test_getitem_fancy
pandas/tests/series/indexing/test_indexing.py L559 test_slice
  • As fixture object_series:
Defined:
pandas/tests/series/conftest.py L26 as object_series

Used:
pandas/tests/series/test_combine_concat.py L14 test_append as object_series fixture

Mostly could just change and see what breaks though. The naming of this obviously means the intent is to deal with object data, so I don't think should break too much actually making this return an object dtype

As you say, there is no breaks at test.

Even though tests are passed, I think test_objarr_add_invalid should be checked whether it used as right way.

I'll make PR about this issue!

@WillAyd
Copy link
Member Author

WillAyd commented Sep 14, 2019

Awesome - thanks for digging into this

@jreback jreback modified the milestones: Contributions Welcome, 1.0 Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants