Skip to content

Index with dtype int32 #16404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Stanpol opened this issue May 20, 2017 · 5 comments · Fixed by #41153
Closed

Index with dtype int32 #16404

Stanpol opened this issue May 20, 2017 · 5 comments · Fixed by #41153
Labels
32bit 32-bit systems Enhancement Index Related to the Index class or subclasses Performance Memory or execution speed performance
Milestone

Comments

@Stanpol
Copy link

Stanpol commented May 20, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd
pd.Index(np.array([0,1], dtype=np.int32), dtype=np.int32)

Out:
Int64Index([0, 1], dtype='int64')

Problem description

I want to make a DataFrame with Index that has size of int32. Can't do it.
A discussion here: https://stackoverflow.com/questions/44090944/how-to-change-index-dtype-of-pandas-dataframe-to-int32

Expected Output

Index with dtype int32. It will use 4 bytes instead of 8 bytes.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Darwin OS-release: 15.6.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.20.1
pytest: 2.9.2
pip: 8.1.2
setuptools: 34.3.2
Cython: 0.24.1
numpy: 1.12.0
scipy: 0.18.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.3.1
numexpr: 2.6.1
feather: None
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented May 20, 2017

This is possible, but would require some non-trivial work. RangeIndex obviates the need for much of this anyhow. Further most indexing requires int64 anyhow, so you end up upcasting at times. Careful performance testing would be required.

So this would require community contribution to do this.

@jreback jreback added Difficulty Advanced Enhancement Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels May 20, 2017
@jreback jreback added this to the Someday milestone May 20, 2017
@toobaz
Copy link
Member

toobaz commented May 22, 2017

By the way: I think the following should behave as expected:

class Int32Index(pd.Int64Index):
    _default_dtype = np.int32


i = Int32Index(np.array([...], dtype='int32'))

... except that, as suggested by @jreback , unexpected upcastings may happen when doing any non-trivial operation.

@allComputableThings
Copy link

allComputableThings commented Jan 9, 2018

I don't recommend this. At least in pandas 0.22.0 this doesn't work as expected.
i.sort_values will cut the index in (exactly) half. No idea why.

i = np.arange(0, 600002, dtype=np.int32)
arr = Int32Index(i, name="i") 
arr2 = arr.sort_values()
print arr.shape, arr2.shape   #  600002, 300001
assert arr.shape == arr2.shape

Seems like sort_values is missing some internal validation (the output len should be the same as the input len)

@jreback
Copy link
Contributor

jreback commented Jan 9, 2018

@stuz5000 this is totally unsupported
it would require quite a bit of testing - thus this is an open issue
if you would like to contribute great!

@toobaz
Copy link
Member

toobaz commented Jan 11, 2018

I don't recommend this.

Me neither ;-)

But in case you want to keep experimenting, try

class Int32Index(pd.Int64Index):
    _default_dtype = np.int32

    @property
    def asi8(self):
        return self.values

which fixes the problem you report.

@toobaz toobaz added Index Related to the Index class or subclasses and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Jun 28, 2019
@mroeschke mroeschke added the 32bit 32-bit systems label May 5, 2020
@jreback jreback modified the milestones: Someday, 1.4 Jul 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
32bit 32-bit systems Enhancement Index Related to the Index class or subclasses Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants