-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Add info on dtype strings #30590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
3rd party libraries can register dtype alises as well, so we can’t provide a comprehensive list. Though we can document the ones we provide. Wrt adding Boolean, I’d maybe prefer going the other way. Adding integer, integer32, etc. that makes us consistently the lowercase spelled out form. |
Does Sparse[int, 1] actually work? I thought we didn’t parse the fill value. |
Yes, that is my proposal.
Only issue there is that |
There is also bool -> boolean, str -> string.
We can consistently use the spelled out version, or the capitalized
version. My preference is for spelled out.
…On Wed, Jan 1, 2020 at 11:17 AM Irv Lustig ***@***.***> wrote:
3rd party libraries can register dtype alises as well, so we can’t provide
a comprehensive list. Though we can document the ones we provide.
Yes, that is my proposal.
Wrt adding Boolean, I’d maybe prefer going the other way. Adding integer,
integer32, etc. that makes us consistently the lowercase spelled out form.
Only issue there is that int, int32, etc., are the numpy dtypes, so that
could create confusion. That's why I suggested using the upper case ones.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#30590?email_source=notifications&email_token=AAKAOITMNQUVZCTIDYDDVQ3Q3TF23A5CNFSM4KBYMJCKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH5I6GI#issuecomment-570068761>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOISWEQYNSNC53MVBKJTQ3TF23ANCNFSM4KBYMJCA>
.
|
Seems like @jorisvandenbossche wrote that docstring. Will wait for him to see if the internal docs there should be updated. |
So does that mean we change from Or allow both but only document the spelled out lowercase versions? |
Both I would think. The ones I'm not sure about is UInt. Is |
Back to the original issue, adding an alias column to that table is a great idea. We also have https://dev.pandas.io/docs/reference/arrays.html. We don't allow specifying the fill value for SparseDtype, just the dytpe In [53]: pd.api.types.pandas_dtype("Sparse[int]")
Out[53]: Sparse[int64, 0]
In [54]: pd.api.types.pandas_dtype("Sparse[int, 1]")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-54-8824dda6d2f2> in <module>
----> 1 pd.api.types.pandas_dtype("Sparse[int, 1]")
~/sandbox/pandas/pandas/core/dtypes/common.py in pandas_dtype(dtype)
1870 # raise a consistent TypeError if failed
1871 try:
-> 1872 npdtype = np.dtype(dtype)
1873 except SyntaxError:
1874 # np.dtype uses `eval` which can raise SyntaxError
TypeError: data type "Sparse[int, 1]" not understood |
So I will add the docs using the current implementation and postpone making changes with respect to the names.
Yes, but
|
Hmm I wouldn't document that. I don't know if it's delibarate. |
Problem description
I've been studying the new
string
,boolean
andIntxx
dtypes and think it would be worthwhile to add something about the strings that you are allowed to use with extension arrays in specifying the dtypes. It could be an additional column in the dtypes table here:https://dev.pandas.io/docs/getting_started/basics.html#dtypes
I think the following table is correct:
DatetimeTZDtype
DatetimeArray
'datetime64[ns, <tz>]'
CategoricalDtype
Categorical
'category'
PeriodDtype
PeriodArray
'period[<freq>]' or 'Period[<freq>]'
SparseDtype
SparseArray
'Sparse'
,'Sparse[int]'
,'Sparse[int32, 0]'
,'Sparse[int64, 0]'
,'Sparse[float64, nan]'
,'Sparse[float32, nan]'
IntervalDtype
IntervalArray
'interval'
,'Interval'
,'Interval[<np.numeric>]'
,'Interval[datetime64[ns, <tz>]]'
,'Interval[timedelta64[<freq>]]'
Int64Dtype
(and others)IntegerArray
'Int8'
,'Int16'
,'Int32'
,'Int64'
,'UInt8'
,'UInt16'
,'UInt32'
,'UInt64'
StringDtype
StringArray
'string'
BooleanDtype
BooleanArray
'boolean'
I also think we may want to make it clear that if you specify a string not in that table, it needs to be a string acceptable as a
numpy
dtype.If people like @TomAugspurger and @jorisvandenbossche think this is useful, I'll add a column to that table in the docs (or maybe have to use a separate table because of the length of the last column above).
Also, should we consider allowing
'Boolean'
and'String'
and'Category'
, i.e. type names with a leading capital letter? We're inconsistent in terms of what case is allowed in different places for the strings representing dtypes (seeperiod/Period
andinterval/Interval
)The text was updated successfully, but these errors were encountered: