Skip to content

DataFrame() returns an empty DataFrame, if DatetimeIndex with timezone-info and column label are passed #19157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JQGoh opened this issue Jan 9, 2018 · 2 comments · Fixed by #19330
Labels
Bug Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Milestone

Comments

@JQGoh
Copy link
Contributor

JQGoh commented Jan 9, 2018

Problem description

Converting a DatetimeIndex to Dataframe will result in different behaviors, depending on whether DatetimeIndex is timezone-naive or timezone-aware. This issue only happens if columns label is provided.

Input of Example 1: DatetimeIndex with timezone-info.
import pandas as pd
start_dt2 = pd.to_datetime('20180101T10:00:00')
start_dt2 = start_dt2.tz_localize('Asia/Singapore')
end_dt2 = pd.to_datetime('20180101T10:03:00')
end_dt2 = end_dt2.tz_localize('Asia/Singapore')
date_range2 = pd.date_range(start_dt2, end_dt2, freq='T')
print(date_range2, '\n')

df2 = pd.DataFrame(date_range2, columns=['timestamps'])
print(df2)
Output of Example1: an empty DataFrame.
DatetimeIndex(['2018-01-01 10:00:00+08:00', '2018-01-01 10:01:00+08:00',
               '2018-01-01 10:02:00+08:00', '2018-01-01 10:03:00+08:00'],
              dtype='datetime64[ns, Asia/Singapore]', freq='T') 

Empty DataFrame
Columns: [timestamps]
Index: [] 
Input of Example 2: DatetimeIndex is timezone-naive.
start_dt = pd.to_datetime('20180101T10:00:00')
end_dt = pd.to_datetime('20180101T10:03:00')
date_range1 = pd.date_range(start_dt, end_dt, freq='T')
print(date_range1, '\n')

df1 = pd.DataFrame(date_range1, columns=['timestamps'])
print(df1)
Output of Example2: DataFrame returned.
DatetimeIndex(['2018-01-01 10:00:00', '2018-01-01 10:01:00',
               '2018-01-01 10:02:00', '2018-01-01 10:03:00'],
              dtype='datetime64[ns]', freq='T') 

           timestamps
0 2018-01-01 10:00:00
1 2018-01-01 10:01:00
2 2018-01-01 10:02:00
3 2018-01-01 10:03:00 

The above two examples include the argument columns=['timestamp']. However, if we simply convert the DatetimeIndex (timezone-aware) with input argument as a dictionary object, it returns a DataFrame as shown in the example 2 above.

Input of Example 3: DatetimeIndex with timezone-info, passed as part of a dictionary.
start_dt3 = pd.to_datetime('20180101T10:00:00')
start_dt3 = start_dt3.tz_localize('Asia/Singapore')
end_dt3 = pd.to_datetime('20180101T10:03:00')
end_dt3 = end_dt3.tz_localize('Asia/Singapore')
date_range3 = pd.date_range(start_dt3, end_dt3, freq='T')
print(date_range3, '\n')

df3 = pd.DataFrame({"timestamps": date_range3})
print(df3)
Output of Example3: DataFrame returned.
DatetimeIndex(['2018-01-01 10:00:00+08:00', '2018-01-01 10:01:00+08:00',
               '2018-01-01 10:02:00+08:00', '2018-01-01 10:03:00+08:00'],
              dtype='datetime64[ns, Asia/Singapore]', freq='T') 

                 timestamps
0 2018-01-01 10:00:00+08:00
1 2018-01-01 10:01:00+08:00
2 2018-01-01 10:02:00+08:00
3 2018-01-01 10:03:00+08:00

Question

Shouldn't the Example 1 returns a non-empty DataFrame which agrees with the output Example 3? As I expect the outputs will be consistent, regardless of the timezone-info or timezone-naive of DatetimeIndex. Thank you.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.0.5
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: 1.5.1
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.6
lxml: 3.6.4
bs4: 4.5.1
html5lib: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: None
jinja2: 2.9.5
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1
None

@chris-b1
Copy link
Contributor

chris-b1 commented Jan 9, 2018

Yep, this looks buggy - #13407 may be related. PR would be welcome! I'd suggest walking through the construction path, starting around here:

elif isinstance(data, (np.ndarray, Series, Index)):

@chris-b1 chris-b1 added Bug Timezones Timezone data dtype labels Jan 9, 2018
@chris-b1 chris-b1 added this to the Next Major Release milestone Jan 9, 2018
@JQGoh
Copy link
Contributor Author

JQGoh commented Jan 21, 2018

@chris-b1 Could you help to review the changes? I am new to contributing for Pandas, appreciate your feedback on my approach.

@jreback jreback modified the milestones: Next Major Release, 0.23.0 Jan 21, 2018
@jreback jreback added Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jan 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants