You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This section probably has at least one typo, but more generally, doesn't seem to be documenting current behaviour.
I'll quickly run through the example here, but with a bit of cleaning so we don't have to run the entire page.
importpandasaspdimportnumpyasnpdf_for_r=pd.DataFrame({"first": np.random.rand(100),
"second": np.random.rand(100),
"class": np.random.randint(0, 2, (100, ))},
index=range(100))
store_export=pd.HDFStore('export.h5')
# In the documentation, this is written with 'data_columns=df_dc.columns', which I'm assuming is a mistakestore_export.append('df_for_r', df_for_r, data_columns=df_for_r.columns)
store_export
Next, there is an R function for reading in this data. Just from comparing the given function to the written file I think we can see there is a mismatch:
library(rhdf5)
loadhdf5data<-function(h5File) {
listing<- h5ls(h5File)
# Find all data nodes, values are stored in *_values and corresponding column# titles in *_itemsdata_nodes<- grep("_values", listing$name)
name_nodes<- grep("_items", listing$name)
data_paths= paste(listing$group[data_nodes], listing$name[data_nodes], sep="/")
name_paths= paste(listing$group[name_nodes], listing$name[name_nodes], sep="/")
columns=list()
for (idxin seq(data_paths)) {
# NOTE: matrices returned by h5read have to be transposed to obtain# required Fortran order!data<-data.frame(t(h5read(h5File, data_paths[idx])))
names<- t(h5read(h5File, name_paths[idx]))
entry<-data.frame(data)
colnames(entry) <-namescolumns<- append(columns, entry)
}
data<-data.frame(columns)
return(data)
}
For example, there are no entries in export.h5 which have _values or _items in the names.
If we actually call this function, we get an empty dataframe back:
This is a bit contrary to the prose for this section which reads:
HDFStore writes table format objects in specific formats suitable for producing loss-less round trips to pandas objects. For external compatibility, HDFStore can read native PyTables format tables.
It is possible to write an HDFStore object that can easily be imported into R using the rhdf5 library (Package website). Create a table format store like this:
Suggested fix for documentation
This should probably specify that the "table" format doesn't work here. In addition, since external compatibility relies on the user writing code to read this format, maybe a specification for the format should be documented here?
The text was updated successfully, but these errors were encountered:
Location of the documentation
https://pandas.pydata.org/docs/user_guide/io.html#external-compatibility.
Documentation problem
This section probably has at least one typo, but more generally, doesn't seem to be documenting current behaviour.
I'll quickly run through the example here, but with a bit of cleaning so we don't have to run the entire page.
We can take a look at what's in this file:
Output
Next, there is an R function for reading in this data. Just from comparing the given function to the written file I think we can see there is a mismatch:
For example, there are no entries in
export.h5
which have_values
or_items
in the names.If we actually call this function, we get an empty dataframe back:
This function does seem to work if the file is written using "fixed" format
This is a bit contrary to the prose for this section which reads:
Suggested fix for documentation
This should probably specify that the "table" format doesn't work here. In addition, since external compatibility relies on the user writing code to read this format, maybe a specification for the format should be documented here?
The text was updated successfully, but these errors were encountered: