-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Determine datetime_format from more than one element #15075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
xref #3341 you need to be explicit. with ambiguous dayfirst dates.
|
I understand that |
see this issue: #12585 we have explicity discussed this. its is just waiting for someone to come along and do it. its not very difficult actually, pull -requests are welcome. |
@jreback It is not completely related to issue #12585. But here, there are also other issues:
|
yeah I suppose the format could be guess better here, but (maybe take up to three elements) and see if they match. BUT you still can contrive an example where this fails, so I would not be in favor of changing this, rather having the parser raise if the day/year first are violated, which is #12585 |
That is correct, I tried checking with If we agree on this point, then I guess my issue is indeed basically the same as #12585. Sorry for mixing up two different problems, and thanks for your comments. |
Code Sample, a copy-pastable example if possible
Problem description
One would hope to obtain the same result when applying
pd.to_datetime
to the same series twice, but shuffled.Expected Output
My understanding is that the format determined when setting
infer_datetime_format=True
is obtained from the first non-null value of the Series (see function_guess_datetime_format_for_array
intseries.tools
), which explains the result above. I understand this logic in terms of optimizing the operation, and it does work as expected most of the time.However, I feel like the example provided above is fairly generic. Ideally, the function would find the best
datetime_format
for the entire Series. Any ideas on how to implement this ?Output of
pd.show_versions()
pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 28.8.0
Cython: None
numpy: 1.11.3
scipy: 0.18.0
statsmodels: None
xarray: None
IPython: 2.4.1
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.5.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: