Skip to content

Fix bug in contains when looking up a string in a non-monotonic datet… #13574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 45 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
f3b114f
Fix bug in contains when looking up a string in a non-monotonic datet…
tjader Jul 6, 2016
cc0a188
BUG: Groupby.nth includes group key inconsistently #12839
adneu Jul 6, 2016
2655dae
In gbq, use googleapiclient instead of apiclient #13454 (#13458)
parthea Jul 7, 2016
f11b9c1
RLS: switch master from 0.18.2 to 0.19.0 (#13586)
jorisvandenbossche Jul 8, 2016
ba82b51
BUG: Datetime64Formatter not respecting ``formatter``
haleemur Jul 8, 2016
f95576b
BUG: Fix TimeDelta to Timedelta (#13600)
yui-knk Jul 9, 2016
5701c69
COMPAT: 32-bit compat fixes mainly in testing
jreback Jul 7, 2016
3c202b1
Added more exhaustive tests for __contains__.
tjader Jul 9, 2016
713eaa6
BUG: DatetimeIndex - Period shows ununderstandable error
sinhrks Jul 10, 2016
675a6e3
ENH: add downcast to pd.to_numeric
gfyoung Jul 10, 2016
1edc1df
CLN: remove radd workaround in ops.py
sinhrks Jul 10, 2016
2a96ab7
DEPR: rename Timestamp.offset to .freq
sinhrks Jul 10, 2016
c989570
CLN: Remove the engine parameter in CSVFormatter and to_csv
gfyoung Jun 10, 2016
c2cc68d
BUG: Block/DTI doesnt handle tzlocal properly
sinhrks Jul 10, 2016
2e8c993
BUG: Series contains NaT with object dtype comparison incorrect (#13592)
sinhrks Jul 11, 2016
5605f99
CLN/TST: Add tests for nan/nat mixed input (#13477)
sinhrks Jul 11, 2016
2f7fdd0
BUG: groupby apply on selected columns yielding scalar (GH13568) (#13…
jorisvandenbossche Jul 11, 2016
65849d3
TST: Clean up tests of DataFrame.sort_{index,values} (#13496)
IamJeffG Jul 11, 2016
8dbc0f4
DOC: asfreq clarify original NaNs are not filled (GH9963) (#13617)
jorisvandenbossche Jul 12, 2016
93b7d13
BUG: Invalid Timedelta op may raise ValueError
sinhrks Jul 12, 2016
dbd5330
CLN: Cleanup ops.py
sinhrks Jul 12, 2016
7c357d2
CLN: Removed outtype in DataFrame.to_dict (#13627)
gfyoung Jul 12, 2016
27d2915
CLN: Fix compile time warnings
yui-knk Jul 13, 2016
06103dd
Pin IPython for doc build to 4.x (see #13639)
jorisvandenbossche Jul 13, 2016
7dd4091
CLN: reorg type inference & introspection
jreback Jul 13, 2016
20de266
BLD: included pandas.api.* in setup.py (#13640)
gfyoung Jul 13, 2016
44f3229
DOC/BLD: pin IPython version to 4.2.0 (#13639) (#13647)
jorisvandenbossche Jul 14, 2016
6f0a020
TST: reorganize tools.tests (#13619)
sinhrks Jul 14, 2016
a711b42
BF(TST): allow AttributeError being raised (in addition to TypeError)…
yarikoptic Jul 14, 2016
084ceae
API, DEPR: Raise and Deprecate Reshape for Pandas Objects
gfyoung Jul 14, 2016
3f6d4bd
CLN: Fix compile time warnings
yui-knk Jul 14, 2016
c9a27ed
CLN: fix some issues in asv benchmark suite (#13630)
jorisvandenbossche Jul 14, 2016
05b976c
TST: add tests for Timestamp.toordinal/fromordinal
sinhrks Jul 15, 2016
71a0675
CLN: Initialization coincides with mapping, hence with uniqueness check
toobaz Jul 15, 2016
0a70b5f
API: Change Period('NAT') to return NaT
sinhrks Jul 15, 2016
1bee56e
BUG: construction of Series with integers on windows not default to i…
jreback Jul 15, 2016
d7c028d
CLN: Removed levels attribute from Categorical
gfyoung Jul 15, 2016
401b0ed
Fix bug in contains when looking up a string in a non-monotonic datet…
tjader Jul 6, 2016
1a86b3a
Added more exhaustive tests for __contains__.
tjader Jul 9, 2016
3bf7cce
Fix bug in contains when looking up a string in a non-monotonic datet…
tjader Jul 6, 2016
0f5a4e0
Added more exhaustive tests for __contains__.
tjader Jul 9, 2016
783ea6d
Fix bug in contains when looking up a string in a non-monotonic datet…
tjader Jul 6, 2016
592a09d
Added more exhaustive tests for __contains__.
tjader Jul 9, 2016
690e034
Fix bug in contains when looking up a string in a non-monotonic datet…
tjader Jul 6, 2016
d4348d3
Merge remote-tracking branch 'origin/bugfixes' into bugfixes
tjader Jul 16, 2016
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions asv_bench/asv.conf.json
Original file line number Diff line number Diff line change
Expand Up @@ -77,11 +77,11 @@
// On conda install pytables, otherwise tables
{"environment_type": "conda", "tables": ""},
{"environment_type": "conda", "pytables": null},
{"environment_type": "virtualenv", "tables": null},
{"environment_type": "virtualenv", "pytables": ""},
{"environment_type": "(?!conda).*", "tables": null},
{"environment_type": "(?!conda).*", "pytables": ""},
// On conda&win32, install libpython
{"sys_platform": "(?!win32).*", "libpython": ""},
{"sys_platform": "win32", "libpython": null},
{"environment_type": "conda", "sys_platform": "win32", "libpython": null},
{"environment_type": "(?!conda).*", "libpython": ""}
],
"include": [],
Expand Down
20 changes: 0 additions & 20 deletions asv_bench/benchmarks/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,24 +19,6 @@ def time_dataframe_getitem_scalar(self):
self.df[self.col][self.idx]


class datamatrix_getitem_scalar(object):
goal_time = 0.2

def setup(self):
try:
self.klass = DataMatrix
except:
self.klass = DataFrame
self.index = tm.makeStringIndex(1000)
self.columns = tm.makeStringIndex(30)
self.df = self.klass(np.random.rand(1000, 30), index=self.index, columns=self.columns)
self.idx = self.index[100]
self.col = self.columns[10]

def time_datamatrix_getitem_scalar(self):
self.df[self.col][self.idx]


class series_get_value(object):
goal_time = 0.2

Expand Down Expand Up @@ -498,5 +480,3 @@ def setup(self):

def time_float_loc(self):
self.ind.get_loc(0)


21 changes: 20 additions & 1 deletion asv_bench/benchmarks/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,4 +135,23 @@ def setup(self):
self.df_timedelta64 = DataFrame(dict(A=(self.df_datetime64['A'] - self.df_datetime64['B']), B=self.df_datetime64['B']))

def time_dtype_infer_uint32(self):
(self.df_uint32['A'] + self.df_uint32['B'])
(self.df_uint32['A'] + self.df_uint32['B'])


class to_numeric(object):
N = 500000

param_names = ['data', 'downcast']
params = [
[(['1'] * (N / 2)) + ([2] * (N / 2)),
(['-1'] * (N / 2)) + ([2] * (N / 2)),
np.repeat(np.array(['1970-01-01', '1970-01-02'],
dtype='datetime64[D]'), N),
(['1.1'] * (N / 2)) + ([2] * (N / 2)),
([1] * (N / 2)) + ([2] * (N / 2)),
np.repeat(np.int32(1), N)],
[None, 'integer', 'signed', 'unsigned', 'float'],
]

def time_to_numeric(self, data, downcast):
pd.to_numeric(data, downcast=downcast)
16 changes: 0 additions & 16 deletions asv_bench/benchmarks/join_merge.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,10 +179,6 @@ def setup(self):
self.df_multi = DataFrame(np.random.randn(len(self.index2), 4), index=self.index2, columns=['A', 'B', 'C', 'D'])
except:
pass
try:
self.DataFrame = DataMatrix
except:
pass
self.df = pd.DataFrame({'data1': np.random.randn(100000), 'data2': np.random.randn(100000), 'key1': self.key1, 'key2': self.key2, })
self.df_key1 = pd.DataFrame(np.random.randn(len(self.level1), 4), index=self.level1, columns=['A', 'B', 'C', 'D'])
self.df_key2 = pd.DataFrame(np.random.randn(len(self.level2), 4), index=self.level2, columns=['A', 'B', 'C', 'D'])
Expand Down Expand Up @@ -210,10 +206,6 @@ def setup(self):
self.df_multi = DataFrame(np.random.randn(len(self.index2), 4), index=self.index2, columns=['A', 'B', 'C', 'D'])
except:
pass
try:
self.DataFrame = DataMatrix
except:
pass
self.df = pd.DataFrame({'data1': np.random.randn(100000), 'data2': np.random.randn(100000), 'key1': self.key1, 'key2': self.key2, })
self.df_key1 = pd.DataFrame(np.random.randn(len(self.level1), 4), index=self.level1, columns=['A', 'B', 'C', 'D'])
self.df_key2 = pd.DataFrame(np.random.randn(len(self.level2), 4), index=self.level2, columns=['A', 'B', 'C', 'D'])
Expand Down Expand Up @@ -241,10 +233,6 @@ def setup(self):
self.df_multi = DataFrame(np.random.randn(len(self.index2), 4), index=self.index2, columns=['A', 'B', 'C', 'D'])
except:
pass
try:
self.DataFrame = DataMatrix
except:
pass
self.df = pd.DataFrame({'data1': np.random.randn(100000), 'data2': np.random.randn(100000), 'key1': self.key1, 'key2': self.key2, })
self.df_key1 = pd.DataFrame(np.random.randn(len(self.level1), 4), index=self.level1, columns=['A', 'B', 'C', 'D'])
self.df_key2 = pd.DataFrame(np.random.randn(len(self.level2), 4), index=self.level2, columns=['A', 'B', 'C', 'D'])
Expand Down Expand Up @@ -272,10 +260,6 @@ def setup(self):
self.df_multi = DataFrame(np.random.randn(len(self.index2), 4), index=self.index2, columns=['A', 'B', 'C', 'D'])
except:
pass
try:
self.DataFrame = DataMatrix
except:
pass
self.df = pd.DataFrame({'data1': np.random.randn(100000), 'data2': np.random.randn(100000), 'key1': self.key1, 'key2': self.key2, })
self.df_key1 = pd.DataFrame(np.random.randn(len(self.level1), 4), index=self.level1, columns=['A', 'B', 'C', 'D'])
self.df_key2 = pd.DataFrame(np.random.randn(len(self.level2), 4), index=self.level2, columns=['A', 'B', 'C', 'D'])
Expand Down
2 changes: 1 addition & 1 deletion ci/lint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ RET=0

if [ "$LINT" ]; then
echo "Linting"
for path in 'core' 'indexes' 'types' 'formats' 'io' 'stats' 'compat' 'sparse' 'tools' 'tseries' 'tests' 'computation' 'util'
for path in 'api' 'core' 'indexes' 'types' 'formats' 'io' 'stats' 'compat' 'sparse' 'tools' 'tseries' 'tests' 'computation' 'util'
do
echo "linting -> pandas/$path"
flake8 pandas/$path --filename '*.py'
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements-2.7_DOC_BUILD.run
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ipython
ipython=4.2.0
ipykernel
sphinx
nbconvert
Expand Down
102 changes: 78 additions & 24 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1754,39 +1754,93 @@ Convert a subset of columns to a specified type using :meth:`~DataFrame.astype`
object conversion
~~~~~~~~~~~~~~~~~

:meth:`~DataFrame.convert_objects` is a method to try to force conversion of types from the ``object`` dtype to other types.
To force conversion of specific types that are *number like*, e.g. could be a string that represents a number,
pass ``convert_numeric=True``. This will force strings and numbers alike to be numbers if possible, otherwise
they will be set to ``np.nan``.
pandas offers various functions to try to force conversion of types from the ``object`` dtype to other types.
The following functions are available for one dimensional object arrays or scalars:

- :meth:`~pandas.to_numeric` (conversion to numeric dtypes)

.. ipython:: python

m = ['1.1', 2, 3]
pd.to_numeric(m)

- :meth:`~pandas.to_datetime` (conversion to datetime objects)

.. ipython:: python

import datetime
m = ['2016-07-09', datetime.datetime(2016, 3, 2)]
pd.to_datetime(m)

- :meth:`~pandas.to_timedelta` (conversion to timedelta objects)

.. ipython:: python

m = ['5us', pd.Timedelta('1day')]
pd.to_timedelta(m)

To force a conversion, we can pass in an ``errors`` argument, which specifies how pandas should deal with elements
that cannot be converted to desired dtype or object. By default, ``errors='raise'``, meaning that any errors encountered
will be raised during the conversion process. However, if ``errors='coerce'``, these errors will be ignored and pandas
will convert problematic elements to ``pd.NaT`` (for datetime and timedelta) or ``np.nan`` (for numeric). This might be
useful if you are reading in data which is mostly of the desired dtype (e.g. numeric, datetime), but occasionally has
non-conforming elements intermixed that you want to represent as missing:

.. ipython:: python
:okwarning:

df3['D'] = '1.'
df3['E'] = '1'
df3.convert_objects(convert_numeric=True).dtypes
import datetime
m = ['apple', datetime.datetime(2016, 3, 2)]
pd.to_datetime(m, errors='coerce')

# same, but specific dtype conversion
df3['D'] = df3['D'].astype('float16')
df3['E'] = df3['E'].astype('int32')
df3.dtypes
m = ['apple', 2, 3]
pd.to_numeric(m, errors='coerce')

m = ['apple', pd.Timedelta('1day')]
pd.to_timedelta(m, errors='coerce')

To force conversion to ``datetime64[ns]``, pass ``convert_dates='coerce'``.
This will convert any datetime-like object to dates, forcing other values to ``NaT``.
This might be useful if you are reading in data which is mostly dates,
but occasionally has non-dates intermixed and you want to represent as missing.
The ``errors`` parameter has a third option of ``errors='ignore'``, which will simply return the passed in data if it
encounters any errors with the conversion to a desired data type:

.. ipython:: python

import datetime
s = pd.Series([datetime.datetime(2001,1,1,0,0),
'foo', 1.0, 1, pd.Timestamp('20010104'),
'20010105'], dtype='O')
s
pd.to_datetime(s, errors='coerce')
import datetime
m = ['apple', datetime.datetime(2016, 3, 2)]
pd.to_datetime(m, errors='ignore')

m = ['apple', 2, 3]
pd.to_numeric(m, errors='ignore')

m = ['apple', pd.Timedelta('1day')]
pd.to_timedelta(m, errors='ignore')

In addition to object conversion, :meth:`~pandas.to_numeric` provides another argument ``downcast``, which gives the
option of downcasting the newly (or already) numeric data to a smaller dtype, which can conserve memory:

.. ipython:: python

m = ['1', 2, 3]
pd.to_numeric(m, downcast='integer') # smallest signed int dtype
pd.to_numeric(m, downcast='signed') # same as 'integer'
pd.to_numeric(m, downcast='unsigned') # smallest unsigned int dtype
pd.to_numeric(m, downcast='float') # smallest float dtype

As these methods apply only to one-dimensional arrays, lists or scalars; they cannot be used directly on multi-dimensional objects such
as DataFrames. However, with :meth:`~pandas.DataFrame.apply`, we can "apply" the function over each column efficiently:

In addition, :meth:`~DataFrame.convert_objects` will attempt the *soft* conversion of any *object* dtypes, meaning that if all
the objects in a Series are of the same type, the Series will have that dtype.
.. ipython:: python

import datetime
df = pd.DataFrame([['2016-07-09', datetime.datetime(2016, 3, 2)]] * 2, dtype='O')
df
df.apply(pd.to_datetime)

df = pd.DataFrame([['1.1', 2, 3]] * 2, dtype='O')
df
df.apply(pd.to_numeric)

df = pd.DataFrame([['5us', pd.Timedelta('1day')]] * 2, dtype='O')
df
df.apply(pd.to_timedelta)

gotchas
~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion doc/source/categorical.rst
Original file line number Diff line number Diff line change
Expand Up @@ -653,7 +653,7 @@ The same applies to ``df.append(df_different)``.
Unioning
~~~~~~~~

.. versionadded:: 0.18.2
.. versionadded:: 0.19.0

If you want to combine categoricals that do not necessarily have
the same categories, the `union_categorical` function will
Expand Down
2 changes: 1 addition & 1 deletion doc/source/merging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1133,7 +1133,7 @@ fill/interpolate missing data:
Merging AsOf
~~~~~~~~~~~~

.. versionadded:: 0.18.2
.. versionadded:: 0.19.0

A :func:`merge_asof` is similar to an ordered left-join except that we match on nearest key rather than equal keys. For each row in the ``left`` DataFrame, we select the last row in the ``right`` DataFrame whose ``on`` key is less than the left's key. Both DataFrames must be sorted by the key.

Expand Down
2 changes: 1 addition & 1 deletion doc/source/text.rst
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,7 @@ then ``extractall(pat).xs(0, level='match')`` gives the same result as
``Index`` also supports ``.str.extractall``. It returns a ``DataFrame`` which has the
same result as a ``Series.str.extractall`` with a default index (starts from 0).

.. versionadded:: 0.18.2
.. versionadded:: 0.19.0

.. ipython:: python

Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ What's New

These are new features and improvements of note in each release.

.. include:: whatsnew/v0.18.2.txt
.. include:: whatsnew/v0.19.0.txt

.. include:: whatsnew/v0.18.1.txt

Expand Down
Loading