Skip to content

Make pd.Series(index=values) equivalent to pd.Series(index=pd.Index(values))? #18484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
toobaz opened this issue Nov 25, 2017 · 14 comments
Open
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Index Related to the Index class or subclasses

Comments

@toobaz
Copy link
Member

toobaz commented Nov 25, 2017

Code Sample, a copy-pastable example if possible

In [2]: d = {(0, 1) : 2, (3, 4) : 5}

In [3]: pd.Index(d)
Out[3]: Index([(0, 1), (3, 4)], dtype='object')

In [4]: pd.Index(list(d))
Out[4]: 
MultiIndex(levels=[[0, 3], [1, 4]],
           labels=[[0, 1], [0, 1]])

In [5]: pd.Series(d).index
Out[5]: 
MultiIndex(levels=[[0, 3], [1, 4]],
           labels=[[0, 1], [0, 1]])

In [6]: pd.Series(index=list(d)).index
Out[6]: Index([(0, 1), (3, 4)], dtype='object')

Problem description

I guess Out[3]: and Out[6]: are wrong.

Expected Output

Out[4]:

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.22.0.dev0+202.g97bd66ea8
pytest: 3.0.6
pip: 9.0.1
setuptools: 33.1.1
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.18.1
pyarrow: None
xarray: None
IPython: 5.2.2
sphinx: None
patsy: 0.4.1+dev
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: None
lxml: 3.7.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.8
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@toobaz
Copy link
Member Author

toobaz commented Nov 25, 2017

This suggests the above is partly indended... but there should be a better fix.

@jreback
Copy link
Contributor

jreback commented Nov 25, 2017

In [16]: pd.Index(list(d), tupleize_cols=True)
Out[16]: 
MultiIndex(levels=[[0, 3], [1, 4]],
           labels=[[0, 1], [0, 1]])

In [17]: pd.Index(list(d), tupleize_cols=False)
Out[17]: Index([(0, 1), (3, 4)], dtype='object')

@jreback
Copy link
Contributor

jreback commented Nov 25, 2017

I hate this kw.

@jreback jreback added API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 25, 2017
@jreback
Copy link
Contributor

jreback commented Nov 25, 2017

@TomAugspurger

@toobaz
Copy link
Member Author

toobaz commented Nov 28, 2017

It gets better:

In [2]: pd.Index([[0, 1, 2], [3, 4, 5]])
Out[2]: Index([[0, 1, 2], [3, 4, 5]], dtype='object')

In [3]: pd.Series(index=[[0, 1, 2], [3, 4, 5]])
Out[3]: 
0  3   NaN
1  4   NaN
2  5   NaN
dtype: float64

... which is not "tupleization" (notice the orientation)!

The interpretation of lists as index levels in Series (and DataFrame) is totally unexpected to me... but since it is there, maybe we want to support it in Index too?

@jreback
Copy link
Contributor

jreback commented Nov 28, 2017

[2] is invalid but prob not checked

these are not hashable sub elements

absolutely we do not want to add complexity like this

@jorisvandenbossche
Copy link
Member

Yes, and I didn't know this worked like that in Series. I would rather disallow it there as well.

@toobaz
Copy link
Member Author

toobaz commented Nov 28, 2017

I'm totally fine with disallowing this feature, but it is precisely the main task of lib.clean_index_list, and it is tested, so it's not accidental.

(and assuming we didn't want to drop it, integrating it into Index would actually simplify the code, which is why I was suggesting it... but again, if you want to drop it, good)

@jorisvandenbossche
Copy link
Member

and it is tested, so it's not accidental.

It's not accidental, that's true, but that test is more testing that the Period is preserved and not converted into an int, than testing the "index levels as list" feature I think. Although there are maybe other more explicit tests.

@toobaz
Copy link
Member Author

toobaz commented Nov 29, 2017

OK, what about this related behavior? (Just for reference), it is used at ~20 places in testing.

In [2]: s = pd.Series([1, 2, 3])

In [3]: s.index = [['a', 'b', 'b'], ['d', 'd', 'e']]

In [4]: s
Out[4]: 
a  d    1
b  d    2
   e    3
dtype: int64

Another variation:

In [5]: s.index = [pd.Index(['a', 'b', 'b']), pd.Index(['d', 'd', 'e'])]

Do we like them?

@toobaz
Copy link
Member Author

toobaz commented Nov 29, 2017

By the way: In [3]: was actually fixed by #18514 (not In [6]). Retitling accordingly.

@toobaz toobaz changed the title Incoherent behavior in initialization from dict with tuple keys Make pd.Series(index=values) equivalent to pd.Series(index=pd.Index(values))? Nov 29, 2017
@jreback
Copy link
Contributor

jreback commented Nov 29, 2017

[3] is pretty clear. We coerce this to a MultiIndex. [5] should work as well (to a MI). I see why you think #18484 (comment) is inconsistent.

are:

a) pd.Index([[....], [....]])

b) infer_to_index([[...], [....]])

the same, where infer_to_index happens upon assignment (e.g. b). I actually could buy that these infer to the same, namely a MultiIndex (an Index with nested elements is completely non-supported however)

@toobaz
Copy link
Member Author

toobaz commented Nov 29, 2017

OK. So finally if I understand correctly the plan is to

  • make pd.Index([[0, 1, 2], [3, 4, 5]]) return a MultiIndex (without any argument in the vein of tupleize_cols=False, since anyway we don't want lists as labels)
  • suppress _ensure_index, which becomes redundant (since it seems to me it isn't catching any other case)

@jreback
Copy link
Contributor

jreback commented Nov 29, 2017

suppress _ensure_index, which becomes redundant (since it seems to me it isn't catching any other case)

the original intent of _ensure_index was I think to infer whether things should be an Index or a MultiIndex. If you can remove code great.

@jbrockmendel jbrockmendel added the Constructors Series/DataFrame/Index/pd.array Constructors label Jul 23, 2019
@simonjayhawkins simonjayhawkins added Index Related to the Index class or subclasses and removed Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 17, 2020
@mroeschke mroeschke added Bug and removed API Design labels Jun 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Index Related to the Index class or subclasses
Projects
None yet
Development

No branches or pull requests

6 participants