Skip to content

BUG: should astype only TRY to convert string columns? #2718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Jan 21, 2013 · 4 comments
Closed

BUG: should astype only TRY to convert string columns? #2718

jreback opened this issue Jan 21, 2013 · 4 comments

Comments

@jreback
Copy link
Contributor

jreback commented Jan 21, 2013

should astype TRY to convert string columns (e.g. if the value of the string column is like '5'), or skip if its an exception (and just return it), rather than raise an exception on the whole operation?

I can fix for 0.10.2

In [4]: df = pd.DataFrame({ 'a' : 'foo', 'b' : 1. },index=np.arange(10))
In [6]: df .dtypes
Out[6]: 
a     object
b    float64

In [8]: df.astype('float64')
/mnt/home/jreback/pandas/pandas/core/internals.pyc in astype(self, dtype)
    613         new_blocks = []
    614         for block in self.blocks:
--> 615             newb = make_block(com._astype_nansafe(block.values, dtype),
    616                               block.items, block.ref_items)
    617             new_blocks.append(newb)

/mnt/home/jreback/pandas/pandas/core/common.pyc in _astype_nansafe(arr, dtype)
   1058         return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
   1059 
-> 1060     return arr.astype(dtype)
   1061 
   1062 

ValueError: could not convert string to float: foo
@changhiskhan
Copy link
Contributor

I think the default behavior should stay as is, but I see nothing wrong with adding a raise keyword that is True by default and can be set to False so errors are passed silently and whatever can be converted is converted

@jreback
Copy link
Contributor Author

jreback commented Jan 21, 2013

ok....next question....what if you astype stuff that loses meaning...not sure what is the 'correct' behavior here
(this is using dtypes branch)

(Pdb) mn['big_float'] = 1234566789.
(Pdb) mn
   a  b  float32  int32   big_float
0  1  2        1      1  1234566789
1  1  2        1      1  1234566789
2  1  2        1      1  1234566789
3  1  2        1      1  1234566789
4  1  2        1      1  1234566789
5  1  2        1      1  1234566789
6  1  2        1      1  1234566789
7  1  2        1      1  1234566789
8  1  2        1      1  1234566789
9  1  2        1      1  1234566789
(Pdb) mn.dtypes
a            float64
b              int64
float32      float32
int32          int32
big_float    float64
(Pdb) mn.astype('uint8')
   a  b  float32  int32  big_float
0  1  2        1      1        133
1  1  2        1      1        133
2  1  2        1      1        133
3  1  2        1      1        133
4  1  2        1      1        133
5  1  2        1      1        133
6  1  2        1      1        133
7  1  2        1      1        133
8  1  2        1      1        133
9  1  2        1      1        133
(Pdb) mn.astype('uint8').dtypes
a            uint8
b            uint8
float32      uint8
int32        uint8
big_float    uint8

@jreback
Copy link
Contributor Author

jreback commented Jan 21, 2013

raise_on_error in astype added in #2708

@jreback
Copy link
Contributor Author

jreback commented Jan 22, 2013

I think I fixed this. Will raise on dtypes that change the shape of the block or itemsize (numpy basically tells us).

@jreback jreback closed this as completed Jan 22, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants