Skip to content

str.extractall gives ValueError: Length of names must match number of levels in MultiIndex. #14248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lmudge opened this issue Sep 19, 2016 · 2 comments
Labels
Bug Duplicate Report Duplicate issue or pull request MultiIndex Strings String extension data type and string data

Comments

@lmudge
Copy link

lmudge commented Sep 19, 2016

Code Sample, a copy-pastable example if possible

s = Series(["HCPSS_000_123", "HCPSS_100_1234", "HCPSS_200_1234"], index=["CM_1","CM_2","CM_3"])
module = s.str.extractall("HCPSS_(\d{3})")

Gives the following error

ValueError Traceback (most recent call last)
in ()
----> 1 module = s.str.extractall("HCPSS_(\d{3})")

C:\Users\lmudge\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\strings.pyc in extractall(self, pat, flags)
1619 @copy(str_extractall)
1620 def extractall(self, pat, flags=0):
-> 1621 return str_extractall(self._orig, pat, flags=flags)
1622
1623 _shared_docs['find'] = ("""

C:\Users\lmudge\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\strings.pyc in str_extractall(arr, pat, flags)
711 if 0 < len(index_list):
712 index = MultiIndex.from_tuples(
--> 713 index_list, names=arr.index.names + ["match"])
714 else:
715 index = None

C:\Users\lmudge\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\indexes\multi.pyc in from_tuples(cls, tuples, sortorder, names)
893 arrays = lzip(*tuples)
894
--> 895 return MultiIndex.from_arrays(arrays, sortorder=sortorder, names=names)
896
897 @classmethod

C:\Users\lmudge\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\indexes\multi.pyc in from_arrays(cls, arrays, sortorder, names)
848
849 return MultiIndex(levels=levels, labels=labels, sortorder=sortorder,
--> 850 names=names, verify_integrity=False)
851
852 @classmethod

C:\Users\lmudge\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\indexes\multi.pyc in new(cls, levels, labels, sortorder, names, copy, verify_integrity, _set_identity, name, **kwargs)
95 if names is not None:
96 # handles name validation
---> 97 result._set_names(names)
98
99 if sortorder is not None:

C:\Users\lmudge\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\indexes\multi.pyc in _set_names(self, names, level, validate)
450 raise ValueError('Length of names must match length of level.')
451 if validate and level is None and len(names) != self.nlevels:
--> 452 raise ValueError('Length of names must match number of levels in '
453 'MultiIndex.')
454

ValueError: Length of names must match number of levels in MultiIndex.

Expected Output

No error

output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.12.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: 0.24
numpy: 1.11.1
scipy: 0.17.1
statsmodels: 0.6.1
xarray: None
IPython: 4.2.0
sphinx: 1.4.1
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: 1.1.0
tables: 3.2.2
numexpr: 2.6.0
matplotlib: 1.5.1
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.2
lxml: 3.6.0
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.40.0
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Sep 19, 2016

Looks like solved in #13156

In [9]: module
Out[9]: 
              0
     match     
CM_1 0      000
CM_2 0      100
CM_3 0      200

In [10]: s
Out[10]: 
CM_1     HCPSS_000_123
CM_2    HCPSS_100_1234
CM_3    HCPSS_200_1234
dtype: object

In [11]: pd.__version__
Out[11]: '0.19.0rc1+23.gdb9dc65'

@jreback jreback closed this as completed Sep 19, 2016
@jreback jreback added Bug Duplicate Report Duplicate issue or pull request Strings String extension data type and string data MultiIndex labels Sep 19, 2016
@jreback jreback added this to the No action milestone Sep 19, 2016
@jorisvandenbossche
Copy link
Member

@lmudge BTW, I think in this case you rather want to use extract instead of extractall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request MultiIndex Strings String extension data type and string data
Projects
None yet
Development

No branches or pull requests

3 participants