Skip to content

Provide a legal way to compose result of query when handling rows manually #110

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dimsmol opened this issue Mar 21, 2012 · 15 comments
Closed

Comments

@dimsmol
Copy link

dimsmol commented Mar 21, 2012

Handling rows manually I often end up with something like:

var processedRows = [];
query
    .on('row', function (row) {
        var processedRow = processRow(row); // create an object from row or whatever
        processedRows.push(processedRow);
    })
    .on('end', function () {
        doSomething(processedRows);
    });

The code could be more clear if we had legal access to result object from row event:

query
    .on('row', function (row, result) {
        var processedRow = processRow(row); // create an object from row or whatever
        result.addRow(processedRow);
    })
    .on('end', function (result) {
        doSomething(result.rows);
    });
@brianc
Copy link
Owner

brianc commented Mar 21, 2012

Supplying rows to the end event would require buffering all the rows into memory. Doing that completely romoves the evented models ability to stream large result sets without buffering them.

@dimsmol
Copy link
Author

dimsmol commented Mar 21, 2012

Totally agree, but it's a very convenient way if query for sure returns few rows only (e.g. has explicit limit clause).

Another example - if you perform processing which produces very few results even on large number of rows. Having control on what to add you can selectively store data you really want to buffer.

Code already has mechanics for such kind of buffering used for callback-style calls, so providing this feature will require very minor changes.

@dimsmol
Copy link
Author

dimsmol commented Mar 21, 2012

It would be also useful to reserve some property of result object to be used to store non-array results of rows processing. It could be data, for example. This will allow code like this:

query
    .on('row', function (row, result) {
        result.data = result.data || {};
        var kind = veryComplexFuncToDetermineKind(row);
        var count = result.data[kind];
        result.data[kind] = (count == null ? 1 : count + 1);
    })
    .on('end', function (result) {
        done(result.data); // calculated counts for different kinds
    });

The main idea is to allow to compose result object staying encapsulated within particular query processing. Without need for introducing additional variables which are not query-bound and will be messed up when several queries are processed within single scope.

@dimsmol
Copy link
Author

dimsmol commented Mar 21, 2012

Just to be clear.

I'm not suggesting to always add some results of row processing to result object.
I suggest to provide such an ability with full programmer's control of what to add (or not to add).

Surely, nothing will be added by default.
If your code doesn't use proposed second argument of row event, it will behave exactly same way as before without any buffering.
Old code will also behave exactly as before without any buffering.

@brianc
Copy link
Owner

brianc commented Mar 21, 2012

I think some of the use cases you've outlined could be good in certain scenarios; however, the value/complexity ratio to me is unjustifiable for a database client. A database client (or any low-level infrastructure library) should strive for the minimal interface possible to "get the job done" and work towards complete stability and database feature support. Sugar, as suggested by this issue and issue #109, can be added in your application code with very little to no monkey-patching. I'm not aiming to provide the jQuery of database interfaces. I'm aiming to provide a battle-tested, low level, bug free way to run queries against postgres. I know it's nice to have some extra features, but from a library standpoint what can be added in your own code should be added in your own code.

Speaking from a bit of experience and going off completely on a tangent, I want to give an example so you know "I feel your pain." .NET has probably the world's most verbose database client. I make no exaggeration when I provide the following example of the steps involved to run a simple, parameterized query against a database (postgres included)


using(var connection = new SqlConnection(/*connection string*/))
{
  using(var client = connection.CreateClient())
  {
    using(var cmd = client.CreateCommand())
    {
      cmd.CommandText = "SELECT name FROM people WHERE id = @id"
      var param = cmd.CreateParameter();
      param.ParameterName = "id";
      param.value = 1;
      cmd.Parameters.add(param);
      using(var reader = cmd.ExecuteReader())
      {
        while(reader.read())
        {
          return reader["name"];
        }
      }
    }
  }
}

While that's an extreme and verbose amount of code, you can wrap most of that up in such a way as to provide a single line interface for running queries. Does this mean the .NET guys should have made a 1 line way to run queries? Probably not, because the low level interface their client provides allows you to set types on parameters, stream row results, execute queries async, try/catch varios parts of the query execution, use transactions, log steps of query execution etc...

tl; dr: node-postgres strives to be sugar-free, bug-free, and overhead-free. Anything that can be handled in code of the consuming library should be handled there.

@dimsmol
Copy link
Author

dimsmol commented Mar 21, 2012

I agree that things should be kept as simple as possible.
Also it for sure can be done in client code, but requires Query subclassing that relies on it's internal organization and rewriting query method of Client (to use this new subclass) that relies on internal organization even more.

On the other hand this issue requires to change one row of code only, replacing self.emit('row', row); with self.emit('row', row, _result); which does not look to imply much complexity.

And it requires only one additional row of code to reserve data field of Result:

var Result = function() {
  this.rows = [];
  this.data = null; // reserved for client's code usage
};

However, data can be added to Result in client code even without this change, this only guarantees that it is safe to do and helps a little with code execution optimisation. So this step is very optional if it looks too obscure.

These very little changes will greatly simplify extension of results handling mechanics still keeping things simple.
They don't try to add any new features to the library, just provide simpler way to do it in client code.

@dimsmol
Copy link
Author

dimsmol commented Mar 21, 2012

I created little fetchAll helper function that looks to be better (and more js-style) solution than attempt to tune library internals.
I think this issue can be closed with "wontfix" )
Sorry for all the buzz around.

@dimsmol
Copy link
Author

dimsmol commented Mar 21, 2012

Here is the code if someone interested:

var Query = require('pg/lib/query');
var fetchAll = function (config, cb) {
    if (config instanceof Query)
    {
        config = {
            query: config
        };
    }

    var query = config.query;
    var data = config.initialData || [];
    var collect = config.collect;

    var doCollect;

    if (collect) {
        doCollect = function (row) {
            collect(row, data);
        };
    }
    else
    {
        doCollect = function (row) {
            data.push(row);
        };
    };

    query
        .on('row', doCollect)
        .on('error', cb)
        .on('end', function (result) {
            // NOTE ensure not in error, because error is already handled at this point
            if (result != null)
            {
                cb(null, data);
            }
        });
};

and usage:

// collect all query rows
fetchAll(query, cb);

// collect ids extracted from rows
fetchAll({
    query: query,
    collect: function (row, data) {
        data.push(row.id);
    }
}, cb);

// create dict of rows using id as a key
fetchAll({
    query: query,
    initialData: {},
    collect: function (row, data) {
        data[row.id] = row;
    }
}, cb);

@chowey
Copy link
Contributor

chowey commented Mar 24, 2012

For the record, although it is bad form, you can do this:

query
    .on('row', function (row) {
        var processedRow = processRow(row); // create an object from row or whatever
        this._result.addRow(processedRow);
    })
    .on('end', function (result) {
        doSomething(result.rows);
    });

@dimsmol
Copy link
Author

dimsmol commented Mar 26, 2012

very bad form, I agree )

@chowey
Copy link
Contributor

chowey commented Mar 26, 2012

Having used this in code now, I see that there is a very strong case for providing the result object to the row event.

It allows you to buffer all the rows into memory, but it doesn't do it automatically.

The result object would just need to be documented.

So +1 from me for this addition.

@dimsmol
Copy link
Author

dimsmol commented Mar 27, 2012

I think it's better to remove rows collection code from Result and maybe remove this class at all, using simple dictionary on end. But this cannot be done without breaking backwards compatibility, so first document it as deprecated along with callback-style and remove completely in future, keep support for event-style only. It will simplify the things and remove confusing dualism from library.

@dimsmol
Copy link
Author

dimsmol commented Mar 27, 2012

It's also very confusing that readme mention callback-style as "preferred" when event-style looks more mature and appropriate by now.

@chowey
Copy link
Contributor

chowey commented Mar 27, 2012

There's nothing wrong with result.addRow() as an interface. It would help people who are just getting started. The callback-style basically just does this (although implemented differently in code):

query.on('row', function (row, result) {
    result.addRow(row);
});

This could be documented, so that a beginner knows how to accumulate his rows.

Since result is just an object, you can add any old property you want to on-the-fly. So you can already do custom dictionary stuff if you want:

query.on('row', function (row, result) {
    if (isInterestingRow(row))
        result.interestingRow = row;
});
query.on('end', function (result) {
    console.log('Here is some interesting data:');
    console.log(result.interestingRow);
});

Again, this just needs to be documented so that people understand its use.

@brianc
Copy link
Owner

brianc commented Mar 27, 2012

Thanks for the pull request & good discussion.

@brianc brianc closed this as completed Mar 27, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants