Skip to content

.loc lookup with missing values #11428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jennolsen84 opened this issue Oct 25, 2015 · 5 comments · Fixed by #27362
Closed

.loc lookup with missing values #11428

jennolsen84 opened this issue Oct 25, 2015 · 5 comments · Fixed by #27362
Labels
Datetime Datetime data dtype good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@jennolsen84
Copy link
Contributor

Hi everyone, I recently upgraded from 0.16.2 to 0.17.0, and I am wondering if this change was intentional:

0.16.2
(numpy version 1.10)

In [15]: s
Out[15]: 
             0
2001-01-01   2
2001-01-02   5
2001-01-03   8
2001-01-04  11

In [16]: ddd
Out[16]: array(['2001-01-04', '2001-01-02', '2001-01-04', '2001-01-14'], dtype='datetime64[D]')

In [17]: s.loc[ddd]
Out[17]: 
             0
2001-01-04  11
2001-01-02   5
2001-01-04  11
2001-01-14 NaN

pd version 0.17.0
np version 1.10.1

There's a long message, but here is what happens at the end:
In [12]: ddd = np.array(['2001-01-04','2001-01-02','2001-01-04','2001-01-14'], dtype='datetime64')

In [13]: s = pd.DataFrame([2, 5, 8, 11], pd.date_range('2001-01-01', freq='D', periods=4))

In [14]: s.loc[ddd]
ValueError: Inferred frequency None from passed dates does not conform to passed frequency D

I realize that 2001-01-14 is not in the index of series s, and I was hoping that .loc would just fill it in as nan. I can understand if it threw a KeyError, but I am now getting a ValueError. Was this change intentional? I checked the docs, but I didn't see it there.

@jorisvandenbossche
Copy link
Member

I can confirm this, and this does not sound as intentional I think.

Using reindex still works as expected:

In [6]:  s.reindex(ddd)
Out[6]:
             0
2001-01-04  11
2001-01-02   5
2001-01-04  11
2001-01-14 NaN

@jorisvandenbossche jorisvandenbossche added Bug Datetime Datetime data dtype Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version labels Oct 25, 2015
@jorisvandenbossche jorisvandenbossche added this to the 0.17.1 milestone Oct 25, 2015
@jreback jreback modified the milestones: Next Major Release, 0.17.1 Nov 13, 2015
@jennolsen84
Copy link
Contributor Author

not sure if this is related, but this also looks off. The tests pass when timezones are not used.

In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: d = np.datetime64('2000-01-01T00Z')
In [4]: s = pd.Series([17], index=pd.DatetimeIndex(['2000-01-01'], tz='UTC'))
In [5]: s.ix[d]
Out[5]: 17
In [6]: s.reindex([d])
Out[6]: 
2000-01-01   NaN
dtype: float64
In [7]: s.ix[[d]]
Out[7]: 
2000-01-01   NaN
dtype: float64
In [8]: pd.__version__
Out[8]: '0.17.1'
In [9]: np.__version__
Out[9]: '1.10.2'
In [10]: s
Out[10]: 
2000-01-01 00:00:00+00:00    17
dtype: int64

@mroeschke
Copy link
Member

This looks fixed on master, but as the warning notes this will deprecated in the future.

In [1]: In [12]: ddd = np.array(['2001-01-04','2001-01-02','2001-01-04','2001-01-14'], dtype='datetime64')
   ...:
   ...: In [13]: s = pd.DataFrame([2, 5, 8, 11], pd.date_range('2001-01-01', freq='D', periods=4))
   ...:
   ...: In [14]: s.loc[ddd]
/anaconda3/envs/pandas-dev/bin/ipython:5: FutureWarning:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  import sys
Out[1]:
               0
2001-01-04  11.0
2001-01-02   5.0
2001-01-04  11.0
2001-01-14   NaN

In [2]: pd.__version__
Out[2]: '0.25.0.dev0+334.g7721f7009'

@anordin95
Copy link

Does this still need unit testing? I'm new to working on pandas and I'm not sure if I'd be doing redundant work here.

@mroeschke
Copy link
Member

Sure. Feel free to add a unit test for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype good first issue Indexing Related to indexing on series/frames, not to indexes themselves Needs Tests Unit test(s) needed to prevent regressions Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants