Skip to content

ENH: accept dict of column:dtype as dtype argument in DataFrame.astype #12086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
0aeee8d
ENH: inplace dtype changes, df per-column dtype changes; GH7271
StephenKappel May 8, 2016
58dd71b
ENH: NDFrame astype() now accepts inplace arg and dtype arg can be a …
StephenKappel May 10, 2016
43989fd
DOC: xref #13112, add back lexsorting example
jreback May 10, 2016
f0e47a9
COMPAT: boto import issues
jreback May 11, 2016
d0734ba
BUG: Added checks for NaN in __call__ of EngFormatter
yaduart May 11, 2016
2a99394
TST: fix assert_categorical_equal message
sinhrks May 11, 2016
4aa6323
BUG: Series ops with object dtype may incorrectly fail
sinhrks May 3, 2016
4de83d2
PERF: quantile now operates per block boosting perf / fix quantile wi…
jreback May 12, 2016
c9ffd78
DOC: Fix delim_whitespace regex typo.
dsm054 May 13, 2016
e5c18b4
BUG: Correct KeyError from matplotlib when processing Series yerr
gliptak May 13, 2016
00d4ec3
BUG: Misc fixes for SparseSeries indexing with MI
sinhrks May 13, 2016
82f54bd
ENH/BUG: str.extractall doesn't support index
sinhrks May 13, 2016
01dd111
DOC: Fix additional join examples in "10 Minutes to pandas" #13029
Xndr7 May 13, 2016
feee089
BUG: Bug in .groupby(..).resample(..) when the same object is called …
jreback May 14, 2016
b385799
DOC: Clarify Categorical Crosstab Behaviour
gfyoung May 14, 2016
2de2884
BUG: GH12896 where extra elements are returned in MultiIndex slicing
kawochen May 14, 2016
f637aa3
TST: Use compatible time zones
neirbowj May 15, 2016
62bed0e
COMPAT: Add Pathlib, py.path support for read_hdf
quintusdias May 16, 2016
4e4a7d9
COMPAT/TST: sparse formatting test for platform, xref #13163
jreback May 16, 2016
62fc481
CLN: no return on init
max-sixty May 17, 2016
20ea406
BUG: fix to_records confict with unicode_literals #13172
starplanet May 17, 2016
00e0f3e
BUG: Period and Series/Index comparison raises TypeError
sinhrks May 17, 2016
2429ec5
TST: change test comparison to work on older numpies, #13178
jreback May 17, 2016
009d1df
PERF: DataFrame transform
chris-b1 May 18, 2016
86f68e6
BUG: Sparse creation with object dtype may raise TypeError
sinhrks May 18, 2016
4b50149
TST: Test resampling with NaT
May 18, 2016
eeccd05
BUG: Fix #13213 json_normalize() and non-ascii characters in keys
May 19, 2016
070e877
BUG: Fix argument order in call to super
eddiejessup May 19, 2016
2a120cf
DOC: add v0.19.0 whatsnew doc
jreback May 19, 2016
fecb2ca
COMPAT: Further Expand Compatibility with fromnumeric.py
gfyoung May 20, 2016
123f2ee
BUG: Bug in .to_datetime() when passing integers or floats, no unit a…
jreback May 20, 2016
cc25040
BUG: GH12824 fixed apply() returns different result depending on whet…
adneu May 20, 2016
72164a8
API/COMPAT: add pydatetime-style positional args to Timestamp constru…
thejohnfreeman May 20, 2016
9d44e63
BUG: mpl fix to AutoDatFromatter to fix second/us-second formatters
tacaswell May 10, 2016
8e2f70b
TST: xref #13183, for windows compat
jreback May 20, 2016
f5c24d2
Reverse numpy compat changes to tslib.pyx
gfyoung May 21, 2016
d2b5819
BUG: Empty PeriodIndex issues
max-sixty May 21, 2016
6f90340
API: Use np.random's RandomState when seed is None in .sample
May 21, 2016
82bdc1d
TST: check internal Categorical
sinhrks May 21, 2016
b88eb35
TST/ERR: Add Period ops tests / fix error message
sinhrks May 22, 2016
19ebee5
ENH: support decimal option in PythonParser #12933
May 22, 2016
f8a11dd
ERR: Correct ValueError invalid type promotion exception
gliptak May 23, 2016
afde718
BUG: Fix #13149 and ENH: 'copy' param in Index.astype()
pijucha May 23, 2016
9a6ce07
BUG, ENH: Add support for parsing duplicate columns
gfyoung May 23, 2016
8662cb9
TST: assert_dict_equal to check input type
sinhrks May 24, 2016
75714de
BUG: remove_unused_categories dtype coerces to int64
sinhrks May 24, 2016
69ad08b
BUG: Bug in selection from a HDFStore with a fixed format and start a…
jreback May 24, 2016
e0a2e3b
DOC: fixed typos in GroupBy document
mortada May 24, 2016
b638f18
BUG: Properly validate and parse nrows in read_csv
gfyoung May 25, 2016
8749273
BUG: Fix for resampler for grouping kwarg bug
roycoding May 25, 2016
da5fc17
BUG, ENH: Improve infinity parsing for read_csv
gfyoung May 25, 2016
b4e2d34
TST: Remove imp and just use importlib to avoid memory error when sho…
nparley May 25, 2016
f2ce0ac
ERR: error in datetime conversion with non-convertibles
gliptak May 26, 2016
57ea76f
DOC: Improved documentation for DataFrame.join
edublancas May 26, 2016
9662d91
TST/CLN: remove np.assert_equal
sinhrks May 26, 2016
a67ac2a
COMPAT: extension dtypes (DatetimeTZ, Categorical) are now Singleton …
jreback May 25, 2016
5d67720
DOC: Added an example of pitfalls when using astype
pfrcks May 26, 2016
456dcae
TST: skip Fred / YahooOptions tests
jreback May 26, 2016
db43824
TST: split up test_merge
jreback May 26, 2016
40b4bb4
TST: reorg datetime with tz tests a bit
jreback May 26, 2016
4b05055
DOC: low_memory in read_csv
chris-b1 May 26, 2016
0f1666d
ENH: support decimal argument in read_html #12907
ccronca May 27, 2016
e8d9e79
BUG: preserve join keys dtype
jreback May 27, 2016
ae2ca83
COMPAT: windows test compat for merge, xref #13170
jreback May 27, 2016
c2ea8fb
TST: Make numpy_array test strict
sinhrks May 28, 2016
af4ed0f
DOC: remove references to deprecated numpy negation method
mortada May 28, 2016
70be8a9
DOC: Fix read_stata docstring
sinhrks May 29, 2016
721be62
BUG: Check for NaN after data conversion to numeric
gfyoung May 30, 2016
ed4cd3a
TST: Parser tests refactoring
gfyoung May 30, 2016
cc1025a
COMPAT: do not upcast results to float64 when float32 scalar *+/- flo…
jennolsen84 May 30, 2016
d6f814c
TST: remove tests_tseries.py and distribute to other tests files
jreback May 30, 2016
9e7bfdd
BLD: increase clone depth
jreback May 30, 2016
c0850ea
ENH: add support for na_filter in Python engine
gfyoung May 31, 2016
352ae44
TST: more strict testing in lint.sh
jreback May 31, 2016
132c1c5
BUG: Fix describe(): percentiles (#13104), col index (#13288)
pijucha May 31, 2016
d191640
ENH: Respect Key Ordering for OrderedDict List in DataFrame Init
gfyoung May 31, 2016
f3d7c18
BUG: Fix maybe_convert_numeric for unhashable objects
May 31, 2016
8bbd2bc
ENH: Series has gained the properties .is_monotonic*
jreback May 31, 2016
2e3c82e
TST: computation/test_eval.py tests (slow)
jreback May 31, 2016
45bab82
BUG: Parse trailing NaN values for the Python parser
gfyoung Jun 1, 2016
fcd73ad
BUG: GH13219 Fixed. Allow unicode values in usecols
hassanshamim May 19, 2016
99e78da
DOC: fix comment on previous versions cythonmagic
jorisvandenbossche Jun 2, 2016
ce56542
Fix #13306: Hour overflow in tz-aware datetime conversions.
uwedeportivo Jun 2, 2016
0c6226c
ENH: Add support for compact_ints and use_unsigned in Python engine
gfyoung Jun 2, 2016
2061e9e
BUG: Fix series comparison operators when dealing with zero rank nump…
gliptak Jun 3, 2016
103f7d3
DOC: Add example usage to DataFrame.filter
cswarth Jun 3, 2016
faf9b7d
DOC: Fixed a minor typo
Jun 5, 2016
eca7891
DOC: document doublequote in read_csv
gfyoung Jun 5, 2016
863cbc5
DEPR, DOC: Deprecate buffer_lines in read_csv
gfyoung Jun 5, 2016
5a9b498
BUG: Make pd.read_hdf('data.h5') work when pandas object stored conta…
chrish42 Jun 5, 2016
e90d411
DOC: remove obsolete cron job script (#13369)
Jun 5, 2016
b722222
CLN: remove old skiplist code
jreback Jun 5, 2016
3600bca
ENH: incorporate PR feedback; GH7271
StephenKappel Jun 5, 2016
29ecec0
ENH: inplace dtype changes, df per-column dtype changes; GH7271
StephenKappel May 8, 2016
95a029b
ENH: NDFrame astype() now accepts inplace arg and dtype arg can be a …
StephenKappel May 10, 2016
9d8e1b5
ENH: incorporate PR feedback; GH7271
StephenKappel Jun 5, 2016
c960523
resolve merge conflict in rebasing of 7271-df-astype-dict
StephenKappel Jun 5, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ env:

git:
# for cloning
depth: 300
depth: 500

matrix:
fast_finish: true
Expand Down
13 changes: 12 additions & 1 deletion asv_bench/benchmarks/frame_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -423,7 +423,7 @@ class frame_get_dtype_counts(object):
goal_time = 0.2

def setup(self):
self.df = pandas.DataFrame(np.random.randn(10, 10000))
self.df = DataFrame(np.random.randn(10, 10000))

def time_frame_get_dtype_counts(self):
self.df.get_dtype_counts()
Expand Down Expand Up @@ -985,3 +985,14 @@ def setup(self):

def time_series_string_vector_slice(self):
self.s.str[:5]


class frame_quantile_axis1(object):
goal_time = 0.2

def setup(self):
self.df = DataFrame(np.random.randn(1000, 3),
columns=list('ABC'))

def time_frame_quantile_axis1(self):
self.df.quantile([0.1, 0.5], axis=1)
15 changes: 15 additions & 0 deletions asv_bench/benchmarks/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -773,6 +773,21 @@ def setup(self):
def time_groupby_transform_series2(self):
self.df.groupby('id')['val'].transform(np.mean)


class groupby_transform_dataframe(object):
# GH 12737
goal_time = 0.2

def setup(self):
self.df = pd.DataFrame({'group': np.repeat(np.arange(1000), 10),
'B': np.nan,
'C': np.nan})
self.df.ix[4::10, 'B':'C'] = 5

def time_groupby_transform_dataframe(self):
self.df.groupby('group').transform('first')


class groupby_transform_cythonized(object):
goal_time = 0.2

Expand Down
60 changes: 56 additions & 4 deletions asv_bench/benchmarks/parser_vb.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,18 +23,42 @@ class read_csv_default_converter(object):
goal_time = 0.2

def setup(self):
self.data = '0.1213700904466425978256438611,0.0525708283766902484401839501,0.4174092731488769913994474336\n 0.4096341697147408700274695547,0.1587830198973579909349496119,0.1292545832485494372576795285\n 0.8323255650024565799327547210,0.9694902427379478160318626578,0.6295047811546814475747169126\n 0.4679375305798131323697930383,0.2963942381834381301075609371,0.5268936082160610157032465394\n 0.6685382761849776311890991564,0.6721207066140679753374342908,0.6519975277021627935170045020\n '
self.data = """0.1213700904466425978256438611,0.0525708283766902484401839501,0.4174092731488769913994474336\n
0.4096341697147408700274695547,0.1587830198973579909349496119,0.1292545832485494372576795285\n
0.8323255650024565799327547210,0.9694902427379478160318626578,0.6295047811546814475747169126\n
0.4679375305798131323697930383,0.2963942381834381301075609371,0.5268936082160610157032465394\n
0.6685382761849776311890991564,0.6721207066140679753374342908,0.6519975277021627935170045020\n"""
self.data = (self.data * 200)

def time_read_csv_default_converter(self):
read_csv(StringIO(self.data), sep=',', header=None, float_precision=None)


class read_csv_default_converter_with_decimal(object):
goal_time = 0.2

def setup(self):
self.data = """0,1213700904466425978256438611;0,0525708283766902484401839501;0,4174092731488769913994474336\n
0,4096341697147408700274695547;0,1587830198973579909349496119;0,1292545832485494372576795285\n
0,8323255650024565799327547210;0,9694902427379478160318626578;0,6295047811546814475747169126\n
0,4679375305798131323697930383;0,2963942381834381301075609371;0,5268936082160610157032465394\n
0,6685382761849776311890991564;0,6721207066140679753374342908;0,6519975277021627935170045020\n"""
self.data = (self.data * 200)

def time_read_csv_default_converter_with_decimal(self):
read_csv(StringIO(self.data), sep=';', header=None,
float_precision=None, decimal=',')


class read_csv_precise_converter(object):
goal_time = 0.2

def setup(self):
self.data = '0.1213700904466425978256438611,0.0525708283766902484401839501,0.4174092731488769913994474336\n 0.4096341697147408700274695547,0.1587830198973579909349496119,0.1292545832485494372576795285\n 0.8323255650024565799327547210,0.9694902427379478160318626578,0.6295047811546814475747169126\n 0.4679375305798131323697930383,0.2963942381834381301075609371,0.5268936082160610157032465394\n 0.6685382761849776311890991564,0.6721207066140679753374342908,0.6519975277021627935170045020\n '
self.data = """0.1213700904466425978256438611,0.0525708283766902484401839501,0.4174092731488769913994474336\n
0.4096341697147408700274695547,0.1587830198973579909349496119,0.1292545832485494372576795285\n
0.8323255650024565799327547210,0.9694902427379478160318626578,0.6295047811546814475747169126\n
0.4679375305798131323697930383,0.2963942381834381301075609371,0.5268936082160610157032465394\n
0.6685382761849776311890991564,0.6721207066140679753374342908,0.6519975277021627935170045020\n"""
self.data = (self.data * 200)

def time_read_csv_precise_converter(self):
Expand All @@ -45,7 +69,11 @@ class read_csv_roundtrip_converter(object):
goal_time = 0.2

def setup(self):
self.data = '0.1213700904466425978256438611,0.0525708283766902484401839501,0.4174092731488769913994474336\n 0.4096341697147408700274695547,0.1587830198973579909349496119,0.1292545832485494372576795285\n 0.8323255650024565799327547210,0.9694902427379478160318626578,0.6295047811546814475747169126\n 0.4679375305798131323697930383,0.2963942381834381301075609371,0.5268936082160610157032465394\n 0.6685382761849776311890991564,0.6721207066140679753374342908,0.6519975277021627935170045020\n '
self.data = """0.1213700904466425978256438611,0.0525708283766902484401839501,0.4174092731488769913994474336\n
0.4096341697147408700274695547,0.1587830198973579909349496119,0.1292545832485494372576795285\n
0.8323255650024565799327547210,0.9694902427379478160318626578,0.6295047811546814475747169126\n
0.4679375305798131323697930383,0.2963942381834381301075609371,0.5268936082160610157032465394\n
0.6685382761849776311890991564,0.6721207066140679753374342908,0.6519975277021627935170045020\n"""
self.data = (self.data * 200)

def time_read_csv_roundtrip_converter(self):
Expand Down Expand Up @@ -109,4 +137,28 @@ def setup(self):
self.data = (self.data * 200)

def time_read_table_multiple_date_baseline(self):
read_table(StringIO(self.data), sep=',', header=None, parse_dates=[1])
read_table(StringIO(self.data), sep=',', header=None, parse_dates=[1])


class read_csv_default_converter_python_engine(object):
goal_time = 0.2

def setup(self):
self.data = '0.1213700904466425978256438611,0.0525708283766902484401839501,0.4174092731488769913994474336\n 0.4096341697147408700274695547,0.1587830198973579909349496119,0.1292545832485494372576795285\n 0.8323255650024565799327547210,0.9694902427379478160318626578,0.6295047811546814475747169126\n 0.4679375305798131323697930383,0.2963942381834381301075609371,0.5268936082160610157032465394\n 0.6685382761849776311890991564,0.6721207066140679753374342908,0.6519975277021627935170045020\n '
self.data = (self.data * 200)

def time_read_csv_default_converter(self):
read_csv(StringIO(self.data), sep=',', header=None,
float_precision=None, engine='python')


class read_csv_default_converter_with_decimal_python_engine(object):
goal_time = 0.2

def setup(self):
self.data = '0,1213700904466425978256438611;0,0525708283766902484401839501;0,4174092731488769913994474336\n 0,4096341697147408700274695547;0,1587830198973579909349496119;0,1292545832485494372576795285\n 0,8323255650024565799327547210;0,9694902427379478160318626578;0,6295047811546814475747169126\n 0,4679375305798131323697930383;0,2963942381834381301075609371;0,5268936082160610157032465394\n 0,6685382761849776311890991564;0,6721207066140679753374342908;0,6519975277021627935170045020\n '
self.data = (self.data * 200)

def time_read_csv_default_converter_with_decimal(self):
read_csv(StringIO(self.data), sep=';', header=None,
float_precision=None, decimal=',', engine='python')
99 changes: 0 additions & 99 deletions ci/cron/go_doc.sh

This file was deleted.

10 changes: 10 additions & 0 deletions ci/lint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,17 @@ if [ "$LINT" ]; then
if [ $? -ne "0" ]; then
RET=1
fi

done
echo "Linting DONE"

echo "Check for invalid testing"
grep -r -E --include '*.py' --exclude nosetester.py --exclude testing.py '(numpy|np)\.testing' pandas
if [ $? = "0" ]; then
RET=1
fi
echo "Check for invalid testing DONE"

else
echo "NOT Linting"
fi
Expand Down
2 changes: 1 addition & 1 deletion ci/requirements-3.4.run
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
pytz
pytz=2015.7
numpy=1.8.1
openpyxl
xlsxwriter
Expand Down
3 changes: 0 additions & 3 deletions codecov.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,3 @@ coverage:
default:
target: '50'
branches: null
changes:
default:
branches: null
2 changes: 1 addition & 1 deletion doc/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ and `Good as first PR
<https://github.com/pydata/pandas/issues?labels=Good+as+first+PR&sort=updated&state=open>`_
where you could start out.

Or maybe you have an idea of you own, by using pandas, looking for something
Or maybe you have an idea of your own, by using pandas, looking for something
in the documentation and thinking 'this can be improved', let's do something
about that!

Expand Down
11 changes: 11 additions & 0 deletions doc/source/10min.rst
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,17 @@ SQL style merges. See the :ref:`Database style joining <merging.join>`
right
pd.merge(left, right, on='key')

Another example that can be given is:

.. ipython:: python

left = pd.DataFrame({'key': ['foo', 'bar'], 'lval': [1, 2]})
right = pd.DataFrame({'key': ['foo', 'bar'], 'rval': [4, 5]})
left
right
pd.merge(left, right, on='key')


Append
~~~~~~

Expand Down
13 changes: 13 additions & 0 deletions doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -528,6 +528,13 @@ return a copy of the data rather than a view:
jim joe
1 z 0.64094

Furthermore if you try to index something that is not fully lexsorted, this can raise:

.. code-block:: ipython

In [5]: dfm.loc[(0,'y'):(1, 'z')]
KeyError: 'Key length (2) was greater than MultiIndex lexsort depth (1)'

The ``is_lexsorted()`` method on an ``Index`` show if the index is sorted, and the ``lexsort_depth`` property returns the sort depth:

.. ipython:: python
Expand All @@ -542,6 +549,12 @@ The ``is_lexsorted()`` method on an ``Index`` show if the index is sorted, and t
dfm.index.is_lexsorted()
dfm.index.lexsort_depth

And now selection works as expected.

.. ipython:: python

dfm.loc[(0,'y'):(1, 'z')]

Take Methods
------------

Expand Down
4 changes: 4 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -354,6 +354,9 @@ Computations / Descriptive Stats
Series.unique
Series.nunique
Series.is_unique
Series.is_monotonic
Series.is_monotonic_increasing
Series.is_monotonic_decreasing
Series.value_counts

Reindexing / Selection / Label manipulation
Expand Down Expand Up @@ -1333,6 +1336,7 @@ Modifying and Computations
Index.max
Index.reindex
Index.repeat
Index.where
Index.take
Index.putmask
Index.set_names
Expand Down
22 changes: 22 additions & 0 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1726,6 +1726,28 @@ then the more *general* one will be used as the result of the operation.
# conversion of dtypes
df3.astype('float32').dtypes

Convert a subset of columns to a specified type using :meth:`~DataFrame.astype`

.. ipython:: python

dft = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6], 'c': [7, 8, 9]})
dft[['a','b']] = dft[['a','b']].astype(np.uint8)
dft
dft.dtypes

.. note::

When trying to convert a subset of columns to a specified type using :meth:`~DataFrame.astype` and :meth:`~DataFrame.loc`, upcasting occurs.

:meth:`~DataFrame.loc` tries to fit in what we are assigning to the current dtypes, while ``[]`` will overwrite them taking the dtype from the right hand side. Therefore the following piece of code produces the unintended result.

.. ipython:: python

dft = pd.DataFrame({'a': [1,2,3], 'b': [4,5,6], 'c': [7, 8, 9]})
dft.loc[:, ['a', 'b']].astype(np.uint8).dtypes
dft.loc[:, ['a', 'b']] = dft.loc[:, ['a', 'b']].astype(np.uint8)
dft.dtypes

object conversion
~~~~~~~~~~~~~~~~~

Expand Down
2 changes: 1 addition & 1 deletion doc/source/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ and `Difficulty Novice
<https://github.com/pydata/pandas/issues?q=is%3Aopen+is%3Aissue+label%3A%22Difficulty+Novice%22>`_
where you could start out.

Or maybe through using *pandas* you have an idea of you own or are looking for something
Or maybe through using *pandas* you have an idea of your own or are looking for something
in the documentation and thinking 'this can be improved'...you can do something
about it!

Expand Down
2 changes: 1 addition & 1 deletion doc/source/enhancingperf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Plain cython
~~~~~~~~~~~~

First we're going to need to import the cython magic function to ipython (for
cython versions >=0.21 you can use ``%load_ext Cython``):
cython versions < 0.21 you can use ``%load_ext cythonmagic``):

.. ipython:: python
:okwarning:
Expand Down
Loading