Skip to content

provide rebuild option in appending to an HDFStore #824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Feb 24, 2012 · 1 comment
Closed

provide rebuild option in appending to an HDFStore #824

jreback opened this issue Feb 24, 2012 · 1 comment

Comments

@jreback
Copy link
Contributor

jreback commented Feb 24, 2012

currently if you try to append to an existing table in an HDFStore and the fields are different from the existing fields,
pandas raises an Exception("append items do not match existing")

the reason for this the current structure of an HDFStore table is: index (time column), column (a string), values (a float array)
the values are in the order of the fields propertery in _v_attrs; so appending a different order (or different columns) invalidates the read-back mechanism, so we don't allow this

rebuilding is simple: read in the existing idea, concatenate with new data (which automatically reindexes), rename the existing file, write back the data file (could also write to a new file, then rename existing file, and rename new file to original file name - more atomic this way)

here is code that does this externally to HDFStore:

self.store is the HDFStore object (open,close methods are pretty simple so didn't show them)

   def append(self, data, is_verbose = False, is_rebuild = True):
            """ store a data object to a file """
            if data is not None:
                    if not self.open(is_writer = True): return None
                    try:
                            self.store.append(self.label,data)
                    except (Exception), detail:
                            if is_rebuild and str(detail).startswith("appended items do not match existing items"):
                                    self.rebuild(data)
                            else:
                                    raise
                    finally:
                            self.close()
                            if self.is_verbose or is_verbose:
                                    self.show_pretty('append', data = data)

    def rebuild(self, data):
            """ append to existiing data with a rebuild """
            self.lo("rebuilding datafile -> [file->%s]" % self.file)

            try:
                    # read it in, index and append
                    self.close()
                    new_data = apandas.Panel.append_many([self.select(), data])

                    # rename it
                    import os
                    os.rename(self.file, self.file + '.hdf_append_bak')

                    # write again
                    self.open(is_writer = True)
                    self.store.append(self.label,new_data)

                    self.lo("successful in append adjustment-> [file->%s]" % self.file)

            except (Exception), detail:
                    self.lo("error in rebuild -> [file->%s] %s -> bailing out!" % (self.file,detail))
                    raise

as you can see - right now I have to catch and match a specific error, so this ideally would be done internally in HDFStore._write_table

one thing I don't handle is recompressing the original file to the filters that are currently present

Jeff

@jreback
Copy link
Contributor Author

jreback commented Nov 24, 2012

not really necessary - maybe will provide an option for this in the update to pytables (see GH #2346)

@jreback jreback closed this as completed Nov 24, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant