-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
numpy error using read_csv with parse_dates=[...] and index_col=[...] #10245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
you need to show your parsing function. |
def GPStime2datetime(GPSweek, GPS_TOW, correctLeapSeconds=True,
correctRollover=True, pyDatetime=False):
'''Converts integer GPS week (no 1024 rollover!) and sequence time of week
(seconds) to a datetime object.
Parameters
==========
GPSweek : int
GPS week for the whole sequence
GPS_TOW : array_like
seconds of GPS week
correctLeapSeconds : bool
Correct for leap seconds based on the first element in `GPS_TOW`
correctRollover : bool
correct for GPS week rollover in `GPS_TOW`
(see :func:`correct_TOW_rollover`)
pyDatetime : bool
Force output to python's builtin :class:`~dt.datetime` instead of
numpy's :class:`~np.datetime64`. Incurs a big performance hit for large
arrays.
'''
# make sure we have a numpy array
GPS_TOW = np.asarray(GPS_TOW)
# convert to float in case a list of strings is passed
GPS_TOW = GPS_TOW.astype(np.float64)
# correct rollover
if correctRollover:
GPS_TOW = correct_TOW_rollover(GPS_TOW)
msAfterEpoch = GPS_TOW*1000 + np.int64(GPSweek)*604800*1000
# correct for leap seconds
if correctLeapSeconds:
firstDate = np.datetime64('1980-01-06') + (msAfterEpoch[0]).astype('timedelta64[ms]')
secondsToSubtract = leapSecondsSinceGPSepoch(firstDate)
np.subtract(msAfterEpoch, secondsToSubtract*1000, out=msAfterEpoch)
# make into a list of datetime objects and return
dates = np.datetime64('1980-01-06') + msAfterEpoch.astype('timedelta64[ms]')
if pyDatetime:
dates = dates.astype(dt.datetime)
return dates |
and what does this produce when you don't specify the index columns. show a sample of the frame and |
I run this code: df = pd.read_csv(FILE, date_parser=GPStime2datetime,
parse_dates={'datetime': ['week', 'sow']}) Output of
Output of
|
can you show a simple example which doesn't involve this function as I cannot run it. |
This should be an entirely self-contained example. It seems that when the date parser returns numpy's import numpy as np
import pandas as pd
import datetime as dt
from StringIO import StringIO
contents = r'''week,sow,prn,rxstatus,az,elv,l1_cno,s4,s4_cor,secsigma1,secsigma3,secsigma10,secsigma30,secsigma60,code_carrier,c_cstdev,tec45,tecrate45,tec30,tecrate30,tec15,tecrate15,tec00,tecrate00,l1_loctime,chanstatus,l2_locktime,l2_cno
2013-11-03,19:00:00,126,00E80000,0.00,0.00,39.38,0.118447,0.107595,0.252663,0.532384,0.600540,0.603073,0.603309,-13.255543,0.114,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,1692.182,8C023D84,0.000,0.00
2013-11-03,19:00:00,23,00E80000,0.00,0.00,53.48,0.034255,0.021177,0.035187,0.042985,0.061142,0.061738,0.061801,-22.760003,0.015,24.955111,0.112239,25.115330,-0.119774,25.146603,-0.065852,24.747576,-0.243804,10426.426,08109CC4,10409.660,44.52
2013-11-03,19:00:00,13,00E80000,0.00,0.00,54.28,0.046218,0.019314,0.037818,0.056421,0.060602,0.060698,0.060735,-20.679035,0.090,25.670250,-0.070761,25.752224,-0.055089,26.045048,-0.180056,25.360369,-0.062119,7553.020,18109CA4,7202.660,47.27'''
def parse_np_datetime64(date, time):
datetime = np.array([date + 'T' + time + 'Z'], dtype='datetime64[s]')
return datetime
def parse_py_datetime(date, time):
datetime = parse_np_datetime64(date, time).astype(dt.datetime).ravel()
return datetime
# this will run
pd.read_csv(StringIO(contents), date_parser=parse_py_datetime,
parse_dates={'datetime': ['week', 'sow']},
index_col=['datetime', 'prn'])
# this will fail
pd.read_csv(StringIO(contents), date_parser=parse_np_datetime64,
parse_dates={'datetime': ['week', 'sow']},
index_col=['datetime', 'prn']) EDIT after #10245 (comment): def parse_np_datetime64(date, time):
datetime = np.array([date + 'T' + time + 'Z'], dtype='datetime64[s]')
return datetime |
@cmeeren ok thanks. The basic issue is that some the inference in pull-requests are welcome! |
I've tried to find I can't really test anything, because I'm on Windows and I've never gotten compiling to work reliably, which means |
See the docs here for creating a development environment. This is where the code codes: I think this could just call |
Re. the build environment, I still get the same errors. Something about query_vcvarsall. I've seen it before and followed some other instructions I found, and when I try to run it in the Visual Studio 2008 command prompt, I get a fatal error from python27.lib concerning 32 vs. 64 bit (I have a 64 bit python installation). |
you need to use conda, VS is not required, just install |
Oh, I just forgot to |
yes, this will in general give you a nice environment for doing nice c-extensions on windows |
Great, thanks. I've had a look at the problem, and I've tried adding This packs my datetime64 array from After that I have no idea what happens, because that's a C library and I don't know any C. |
This works just fine (and is the point of
|
Yes, I tried that myself and can confirm that it works. I entirely forgot to mention that when I ran the rest script #10245 (comment) with the one-line edit I mentioned above, I get an error which I've no idea what to make of:
|
yeh, you will have to step thru the code and see |
Sure, but again, this happens inside a C extension and I don't know how to deal with that. As far as the entire traceback goes and whether the object should end up there in the first place, I have absolutely no idea - the codebase is vast and the style is rather opaque to me. Perhaps it's best if someone else tries to crack this nut. 😞 |
no, the error is in the input to the extension. you can easily just look at the cython code, in |
I'll assume you were talking about |
yes I am talking about |
It is indeed a numpy array of objects (the
It seems strange to me that |
an array of arrays is not correct. Not sure what is happening, you'll have to step thru and see where that is generated. |
I found two separate causes for the error I experienced, and I have a suggestion as to the solution. First, my example was wrong and that was the reason for the "array of arrays" problem. My
and not
Secondly, everything works fine when I use How do you propose we continue from here? Is there any reason why _DATELIKE_DTYPES = set([np.dtype(t+r)
for t in ['M8', '<M8', '>M8', 'm8', '<m8', '>m8']
for r in ['[ns]', '[ms]', '[s]']]) Which resolutions should be allowed? Should I submit a pull request? |
Thanks for digging in here. Pandas should certainly accept any datetime64 array from numpy, but internally pandas only uses |
Would replacing https://github.com/pydata/pandas/blob/4fde9462bd53f5f6b446bdcc6f222199a3f11ca5/pandas/io/parsers.py#L2157 with |
@cmeeren you can setup Travis-CI to run the pandas test suite on your fork: http://docs.travis-ci.com/user/getting-started/ Without running the test suite I honestly have no idea :) |
@cmeeren you need to make the change that I suggested above run the result of |
see the contributing guidelines here |
@jreback, I have however wrapped all the three cases in pydata:08d60e6...cmeeren:7d6f7c4 I'll make a pull request for whichever you want (if Travis test passes for the one in progress). |
best to do a pull-request. you need to add your example as a test. |
Consider a file of the following format:
I try to read that with the following code
Here I'm parsing
week
andsow
into adatetime
column using a custom function (this works properly) and usingdatetime
and theprn
column as aMultiIndex
. The file is read successfully whenindex_col='datetime'
, but not when trying to create theMultiIndex
usingindex_col=['datetime', 'prn']
(or when using column numbers instead of names). I get the following traceback:I am using Python 2.7, Pandas 0.16.1 and numpy 1.9.2.
The text was updated successfully, but these errors were encountered: