Skip to content

BUG: Calling sample() on an empty GroupbyDataFrame returns ValueError instead of an empty DF #48459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
ntachukwu opened this issue Sep 8, 2022 · 0 comments
Open
3 tasks done

Comments

@ntachukwu
Copy link
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
empty_df = pd.DataFrame({'a': [], 'b': []})
empty_df.groupby('a').sample()

Issue Description

Calling sample() on an empty dataframe returns

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [1], in <cell line: 3>()
      1 empty_df = pd.DataFrame({'a': [], 'b': []})
----> 2 empty_df.groupby('a').sample()

File ~.../pandas/core/groupby/groupby.py:4284, in sample(self, n, frac, replace, weights, random_state)
   4275         assert frac is not None
   4276         sample_size = round(frac * group_size)
   4278     grp_sample = sample.sample(
   4279         group_size,
   4280         size=sample_size,
   4281         replace=replace,
   4282         weights=None if weights is None else weights_arr[grp_indices],
   4283         random_state=random_state,
-> 4284     )
   4285     sampled_indices.append(grp_indices[grp_sample])
   4287 sampled_indices = np.concatenate(sampled_indices)

File <__array_function__ internals>:180, in concatenate(*args, **kwargs)

ValueError: need at least one array to concatenate

Expected Behavior

Should return an empty dataframe

Empty DataFrame
Columns: [a, b]
Index: []

Installed Versions

python : 3.9.10.final.0 python-bits : 64 OS : Darwin OS-release : 21.3.0 Version : Darwin Kernel Version 21.3.0: Wed Jan 5 21:37:58 PST 2022; root:xnu-8019.80.24~20/RELEASE_ARM64_T8101 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : None LOCALE : None.UTF-8

pandas : 1.5.0.dev0+1364.g201cbf6bc1.dirty
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
setuptools : 60.9.3
pip : 22.1.1
Cython : 0.29.32
pytest : 7.1.2
hypothesis : 6.52.3
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.8.0
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.3.0
pandas_datareader: 0.10.0
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

@ntachukwu ntachukwu added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 8, 2022
@jorisvandenbossche jorisvandenbossche added Groupby and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 8, 2022
@rhshadrach rhshadrach added this to the 1.6 milestone Sep 10, 2022
@mroeschke mroeschke modified the milestones: 1.6, 2.0 Oct 13, 2022
@mroeschke mroeschke removed this from the 2.0 milestone Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants