Skip to content

Add COPY FROM support #183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Globegitter opened this issue Oct 18, 2019 · 15 comments
Closed

Add COPY FROM support #183

Globegitter opened this issue Oct 18, 2019 · 15 comments
Labels
status: ideal-for-contribution An issue that a contributor can help us with type: enhancement A general enhancement

Comments

@Globegitter
Copy link

I have not been able to find anything in the documentation or through the github code-search, but does this driver support the COPY FROM command? Here is the official documentation of the command that I am talking about: https://www.postgresql.org/docs/current/sql-copy.html

@mp911de
Copy link
Collaborator

mp911de commented Oct 18, 2019

What exactly should the driver do? It's just an SQL command, right? You might want to consider pgjdbc/pgjdbc#1299 for the stdin case. There's no CopyManager API that allows copy in/copy out operations. We're happy to review a pull request to make these things happen.

@mp911de mp911de added the type: enhancement A general enhancement label Oct 18, 2019
@Globegitter
Copy link
Author

Globegitter commented Oct 28, 2019

Thanks for the pointer - I am not sure I can really answer your question, if it should just be working then it might just be worth adding a note about in that in the documentation or adding a test for it. Part of the problem is I was comparing a few different drivers and so far the ones that I have tried have not supported it. After I found this one I could not find anything looking at the website / searching through the repository here. So based on my experience so far I assumed it was not supported.

And yes, while a CopyManager API would indeed be nice, just basic support is all that I need.

@mp911de
Copy link
Collaborator

mp911de commented Oct 28, 2019

You'd help us by outlining your use-case and what you're trying to achieve to get a better impression of what's needed to make COPY FROM work, ideally with a bit of SQL.

@mp911de mp911de added the status: ideal-for-contribution An issue that a contributor can help us with label Mar 6, 2020
@igor-susic
Copy link
Contributor

@mp911de If no one is working on this one, I'm more than willing to submit PR.

@mp911de
Copy link
Collaborator

mp911de commented Oct 25, 2020

Thanks for your support. Feel free to come up with a design proposal so we can discuss the overall direction before submitting a pull request.

@mp911de mp911de added the status: in-progress An issue that is currently being worked on label Oct 25, 2020
@igor-susic
Copy link
Contributor

Sure, no problem, just give me a little to get to know the codebase better and figure out the proposal :)

@ArjanSchouten
Copy link
Contributor

@igor-susic is there any progress on this issue?

@mp911de
Copy link
Collaborator

mp911de commented Jan 3, 2022

As per #183 (comment), we're happy to review proposals if you have a use case for a reactive COPY FROM.

@ArjanSchouten
Copy link
Contributor

ArjanSchouten commented Jan 3, 2022

The use case is a lot of data from a REST interface that has to be inserted into a Greenplum database. Greenplum and lots of insert statements is slow and COPY FROM is the best performant way for us to insert a lot of data.

I don't know where to start to implement this and don't have much time to figure out how the JDBC driver is currently doing it.

@mp911de
Copy link
Collaborator

mp911de commented Jan 3, 2022

It sounds as if the API would need to consume a Publisher<ByteBuffer> as a data source that is then piped into the database.

@ArjanSchouten
Copy link
Contributor

Ok so I currently use this:

COPY myschema.my_table (my_column1, my_column2) FROM STDIN WITH  DELIMITER '|'  NULL '<<null>>'  ESCAPE '\'

Then with the JDBC api I do something like:

PgConnection pgConnection = (PgConnection) connection;
var copyIn = pgConnection.getCopyAPI().copyIn(sql);
copyIn.writeToCopy(someBytes, 0, someBytes.length);
copyIn.endCopy();            

This worked when we where not in the Reactive world 😄 . We switched to R2DBC some time ago and still used the PgConnection / CopyManager stuff. This works well when you accept at least some blocking code and not to many calls to the blocking endpoints.

We use spring-data-r2dbc so I don't know the r2dbc-postgresql api very well. But looking at the code in this repo and how the postgresql driver implemented it I would think this works for the use cases described so far in this issue:

  1. A method on the PostgresConnection api: Publisher<Long> copyIn(String sql, Publisher<ByteBuffer> stdin) that returns the amount of rows that are inserted. The postgresql jdbc driver returns a CopyIn object with the finishCopy method but I guess we know that the Publisher is finished and that the copyIn operation should be finished accordingly?
    You give that copyIn method the sql that I shared at the top of this comment and the STDIN via the publisher.
  2. The actual implementation for the API 😄 ...

@mp911de
Copy link
Collaborator

mp911de commented Jan 4, 2022

Thanks a lot. That helps to get an idea how this could work.

@mp911de mp911de removed the status: in-progress An issue that is currently being worked on label Jan 4, 2022
@mp911de
Copy link
Collaborator

mp911de commented Jan 4, 2022

Cleared the assignee/progress label as there hasn't been a movement. This ticket is up for grabs.

@ArjanSchouten
Copy link
Contributor

Discussed with our team, this ticket is on our backlog scheduled for upcoming weeks.

@ArjanSchouten
Copy link
Contributor

To make sure we're moving in the right direction for the implementation direction:

  1. The PostgresqlConnection api will look like this:
/**
 * Copy bulk data from client into a PostgreSQL table very fast.
 */
Mono<Integer> copyIn(String sql, Publisher<ByteBuffer> stdin);
  1. The PostgresqlConnection implementation:
@Override
public Mono<Integer> copyIn(String sql, Publisher<ByteBuffer> stdin) {
    return new PostgresqlCopyIn(resources).copy(sql, stdin);
}
  1. The PostgresqlCopyIn:
/**
 * An implementation for {@link CopyData} PostgreSQL queries.
 */
final class PostgresqlCopyIn {

    private final ConnectionResources context;

    PostgresqlCopyIn(ConnectionResources context) {
        this.context = Assert.requireNonNull(context, "context must not be null");
    }

    // TODO -as- 20220105 document
    public Mono<Integer> copy(String sql, Publisher<ByteBuffer> stdin) {
        Client client = context.getClient();

        Flux<FrontendMessage> copyData = Flux.from(stdin)
            .map(Unpooled::wrappedBuffer)
            .<FrontendMessage>map(CopyData::new)
            .concatWith(Mono.just(CopyDone.INSTANCE));

        return client.exchange(
                backendMessage -> backendMessage instanceof CopyInResponse,
                Mono.just(new Query(sql))
            )
            .thenMany(client.exchange(ReadyForQuery.class::isInstance, copyData))
            .onErrorResume((e) -> client.exchange(ReadyForQuery.class::isInstance, Mono.just(new CopyFail("Copy operation failed: " + e.getMessage()))))
            .as(messages -> toResult(context, messages, ExceptionFactory.INSTANCE).getRowsUpdated());

        // TODO -as- 20220104 handle on error and on cancel etc (send CopyFail)..
    }

    @Override
    public String toString() {
        return "PostgresqlCopyIn{" +
            "context=" + this.context +
            '}';
    }
}

Work in progress can be found here: RWS-NL@aa4625a.

@mp911de mp911de linked a pull request Apr 4, 2022 that will close this issue
@mp911de mp911de changed the title COPY FROM support Add COPY FROM support May 27, 2022
mp911de added a commit that referenced this issue May 27, 2022
Extract builder API to enable flexibility in providing copy data. Add safeguards to terminate copy data in cases of cancellation or errors. Reorder methods.

[resolves #500][#183]

Signed-off-by: Mark Paluch <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: ideal-for-contribution An issue that a contributor can help us with type: enhancement A general enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants