Skip to content

Uploading multiple import files from the same source without exeuting the pipeline imbetween will overwrite data #363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sposerina opened this issue Jun 13, 2021 · 5 comments

Comments

@sposerina
Copy link
Collaborator

Each time a new import file is uploaded the previous file is archived. Since the import files are only loaded into the database on each pipeline execution subsequent uploads of import files from the same data source will cause data to get archived without ever making into the database.

Since this is a capability exposed to the end user, a more user friendly and intuitive behavior is necessary.

@sposerina sposerina changed the title Uploading multiple import files from the same source without exeuting the pipeline imbetween will ovewrite data Uploading multiple import files from the same source without exeuting the pipeline imbetween will overwrite data Jun 13, 2021
@c-simpson c-simpson self-assigned this Jun 14, 2021
@c-simpson
Copy link
Collaborator

This looks like a good time to move away from storing the import files in the filesystem.
I propose we store the files as blobs in the DB, keep a queue of not-yet-imported files, and also store the hash so that we can skip duplicates.
I also think the current practice of renaming the files as 'upload_type' + import timestamp makes it easy to miss files if you have multiple files of the same type you need to upload.

@c-simpson c-simpson added this to the Lauren-2 milestone Jun 15, 2021
@jwtruver
Copy link
Collaborator

we should control the uploading portion so the user does not make any errors.

@jwtruver jwtruver removed this from the Lauren-2 milestone Jun 22, 2021
@jwtruver
Copy link
Collaborator

Pursue getting rid of execute button, and switch to "automated" execution when new files are uploaded

@jwtruver
Copy link
Collaborator

@c-simpson will test and confirm

@carlos-dominguez
Copy link
Collaborator

This would be fixed by #480, though not exactly in the way Cris recommends; instead my PR does some normalization on upload rather than storing a JSON blob

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants