Skip to content

Gotta batch them all #3248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 36 commits into from

Conversation

TravisGesslein
Copy link

merge pls

Talisca and others added 30 commits November 30, 2015 12:02
…(greatly helps with accuracy, also 1024-1024 is very arbitrary
…ist file, the texturename should have a # in front (like with sprites) to indicate a spriteframe texture. can also set manually with .setTexture (also with # in front of texturename)
…p. inlining copy() manually sped up the function by 20%
@pandamicro
Copy link
Contributor

Thanks for your contribution, I saw very interest things in your PR, but as it's a big PR, it's very hard to review. Can you explain in detail your batch solution ?

By the way, you have removed all canvas render commands, this will disable our Canvas compatibility which can not be accepted, can you explain why you remove them ?

@TravisGesslein
Copy link
Author

Hi!

Sorry, this pull request arrived in your inbox only accidentally, it was intended for our internal cocos2d-html5 fork :D That's why the canvas rendering stuff is removed: we don't need it internally, and it reduces the size of the executable quite a bit once all is removed. Needless to say this is not intended for the main cocos2d-html5 branch!

However I can explain our batching solution a bit, it does a bunch of stuff that is hardcoded because it fits the specific of our project, but the general concept should transfer to usual cocos2d-html5 without problems. Here's roughlyl what happens every frame:

Some things that already happen in standard cocos2d-html5:

  • All renderCmds are gathered into a linear list (by visit()ing all nodes or by reusing from last frame).
  • The renderCmds are called in order (the order in the list reflects the order resulting from the node hierarchy and relative localZOrders) and draw whatever they want to draw. This is important, because in principle all the batching information can be gathered from this linear sorted list.

What we roughly do for batching (we only implemented cc.Sprite.SpriteWebGLRenderCmd batching as a proof of concept, but the concept can be implemented for other node types):

  • Before drawing:
    • iterate through the renderCmd list, and "ask" each renderCmd if it can batch together with other nodes in the list. We do this by:
      • Calling a renderCmds[i].configureBatch(renderCmdList, myIndexInList) function. This function is supposed to "look ahead" into the renderCmd list and see if it can combine nodes in that list to a single draw call.
      • When it does find a node that it can batch together with, it tags that node by setting a "_batched" flag (important later), and it also sets a "_batching" flag on itself, to indicate that the renderCmd is not just drawing itself, but is batching multiple other renderCmds.
      • This configureBatch currently also takes care of uploading the required GPU data correctly (uploading all quads of the different sprites to a single buffer, etc.) in order to make sure we can draw everything in 1 draw call. gl.bufferSubData (or worse, gl.bufferData) is really slow, so we buffer the uploaded data internally using a Int32Array uploadbuffer, which is then "transferred to GPU" in a single gl.bufferSubData call.
      • This step also pools and reuses WebGL buffers so we don't constantly have to create new ones.
    • Iterate through all renderCmds again, and draw them. Specifically:
      • If the cmd's _batching is set to true, call its batchedRendering() function. Otherwise, call its usual rendering() function.
      • If the cmd's _batched flag is set to true, do not draw it (its already drawn by another renderCmd's batchedRendering())
    • Iterate through all renderCmds again, and set its _batched and _batching flags to false ("reset" for next frame)

As an example, batching SpriteWebGLRenderCmd works by (just simple and proof of concept, can be improved a lot):

  • When configureBatch is called, the current renderCmds just scans the list ahead of its own index in the list, and marks all renderCmds for batching that
    • are also a SpriteWebGLRenderCmd
    • appear consecutively (so no other node types in between)
    • have the same texture
  • configureBatch also uploads all the quads of the different sprites to a single GPU buffer, and uploads an index buffer (gl_element_array) because we want to call everything in one draw call
  • then later, batching SpriteWebGLRenderCmds call their batchedRendering() function, which just draws the previously uploaded buffer.

And bam! Batching works. Some things to note:

  • All of the above can obviously be improved a lot. But it works!
    • Currently all of the above happens every frame, regardless if anything changes. So every SpriteWebGLRenderCmd that wants to batch something uploads an entire frame worth of buffer data to GL. This sounds like a lot of work,
      but it DOES result in a significant speed up over non-batched sprites in terms of CPU work done. Drawing 1000 non-batched sprites distributed over the screen brings my small notebook to a very low framerate, while doing it with automatic batching keeps it easily below 5ms (aka... 200 fps?) each frame. This should be no surprise: Even though we transfer a lot of data, the overall number of WebGL calls are significantly reduced, and for the most part we don't really create or allocate much new data, we just move a lot of stuff around.
    • I haven't tested it, but it would not surprise me if using a SpriteBatchNode is faster than the current batching implementation. But that's not the point! Automatic batching doesn't suffer from the same limitations as SpriteBatchNodes. Also, this automatic batching can be improved a lot!
    • it should be noted that due to WebGL limitations, when batching Sprites I upload the transformation matrix of the renderCmd 4 times for each quad! Yes that sounds like a lot, but it's fast enough. The only way around it is to transform all your quad positions by the transformation matrix on the CPU side, and then upload the "final" vertex data. But we don't do that since the idea is to save CPU time as much as possible (the GPU falls asleep in any modern 2D-load anyways)
  • The above batching concept for Sprites only works with consecutive sprite draws, but with some more extensions we could batch even more:
    • We could write a correct global Z value into the depth buffer of WebGL by computing it from the node hierarchy Z orders and somehow handing it to the shaders (we already do this in this branch).
    • Once you do that, you can even batch non-consecutive sprites (when other node types are in between), because with z-buffering, out of order draws are possible and the buffer will take care of having only the top most sprite appear on the screen.
    • But of course, out-of-order only works for non-transparent sprites. I guess this can be implemented by giving cc.Nodes a flag that indicates whether it is important to the user that the sprite is drawn in the correct blending order. If we don't care about color-correct blending (only care that some parts of a sprite texture are "seethrough") we can draw it out of order still.
  • There's currently the problem that you would need to implement different batching methods for different node types, if you want them to be batched. Even if they are similar logically and in terms of code (many of them use the same shaders, and vertex data types, etc.). But this can be solved also, and modern 3D engines already do it:
    • The basic concept is outlined here for example (although that article misses a lot of technical details): http://realtimecollisiondetection.net/blog/?p=86 . The basic idea is to create "draw information" for everything you want to draw, that contains info like "which buffer do i need to draw?", "which shader do i need to draw","what blending state do i need to draw?", then encode that into an integer and sort it (using something like radix sort). From those sorted integer codes, you can immediately read a perfect batching order.
    • Applying this to cocos2d-html5 renderer, it would mean that instead of actually using any gl.drawArrays, gl.bufferData etc. calls, the renderCmds only somehow submit this drawing information and then there is some central piece of code that can render it all (because everything you need to draw will be available from those encoded integer keys). So you don't need custom drawing code for anything anymore. The only thing that will be custom per node type will be what kind of render information it submits.

That's basically it. It's a lot of text, I hope it's not too convoluted.

@pandamicro
Copy link
Contributor

Thanks a lot for the explanation, we are also very interested in batched rendering. I will read it when I got home, by the way, do you have performance comparison before and after batch ?

@jareguo
Copy link
Contributor

jareguo commented Mar 29, 2016

The basic concept is outlined here for example (although that article misses a lot of technical details): http://realtimecollisiondetection.net/blog/?p=86 . The basic idea is to create "draw information" for everything you want to draw, that contains info like "which buffer do i need to draw?", "which shader do i need to draw","what blending state do i need to draw?", then encode that into an integer and sort it (using something like radix sort). From those sorted integer codes, you can immediately read a perfect batching order.

Wow, that's a really good idea.

var buf = pool[i];
if(buf.size >= numSprites)
{
pool.removeByLastSwap(i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where have you defined this function removeByLastSwap ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that you haven't reset the buffers here, then it will append data into the buffer each frame, am I miss anything here ?

Now I see bufferSubData is replacing the buffer from offset 0, but somehow my version doesn't work very well, the buffer is keeping growing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pandamicro You're right, it is actually a bug (fixed on current version of our branch as well). Those three lines need to be

pool.removeByLastSwap(minBufIndex);
this.initBatchBuffers(minBuf.arrayBuffer,minBuf.elementBuffer,numSprites);
minBuf.size = numSprites;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means bufferSubData has a bug ? It can't actually replace the data from requested offset ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, it can. The bug in this piece of code is just that the code keeps making new buffers (unnecessarily) because it thinks that the pooled buffer is too small (since .size is never rewritten). That shouldn't cause the rendering to break, but it's wasteful for performance (but only maybe) and memory (until the old ones are garbage collected).

In case you're using bufferSubData differently in your code, note that it can only replace data if the total stuff you write into it remains in the size that you gave the buffer when you initialized it with bufferData(). bufferSubData can't resize the buffer.

@TravisGesslein
Copy link
Author

Oh, sorry, forgot about that! It's a custom array extension we use internally. It's defined like this:

//removes the element at index i1 in the array by swapping it with the last element and reducing array length by 1
if (!Array.prototype.removeByLastSwap) {
    Array.prototype.removeByLastSwap = function (i1) {
        if(this.length > 0) this[i1] = this[this.length - 1];
        this.length--;
    };
};

For testing, you can just define it anywhere in your code. The main use case is when you need to remove something from an array, and don't care about the order of what is inside the array. In this case, this is obviously constant time and very fast, while generic array removal needs to move up everything behind the removed index.

Also regarding the requested performance comparison: Will try to get it to work asap. I'm extremely busy right now so please forgive me! :D

@TravisGesslein
Copy link
Author

Also, just for completeness sake, one more thing on optimizing this further: It currently uploads everything every frame (because I haven't had more time to spend on it), but that's obviously not necessary. The batch order only changes when the children order changes, and even then not necessarily (basically if the renderCmd order stays the same, you don't need to do anything). And even then most likely sub-batches are going to stay the same (for example if you render a text in between 50 previously batched sprites, now you have 2x 25 batches), and you can probably take account for that somehow. Also, when something changes, not everything changes (e.g. the quad data of sprites mostly stays the same unless you change its color, change the underlying texture rect, etc., but transform matrices often change every frame).

@pandamicro
Copy link
Contributor

The main use case is when you need to remove something from an array, and don't care about the order of what is inside the array. In this case, this is obviously constant time and very fast, while generic array removal needs to move up everything behind the removed index.

Great idea, thanks for the clarification

I like your implementation here and really appreciate that you share all these informations with us. I will try to make one working version by myself, and I will mark the origin of the idea inline for the parts inspired from your implementation. If it's a problem please tell me how you want us to integrate part of your implementation into our open sourced code base.

@pandamicro
Copy link
Contributor

The basic idea is to create "draw information" for everything you want to draw, that contains info like "which buffer do i need to draw?", "which shader do i need to draw","what blending state do i need to draw?", then encode that into an integer and sort it (using something like radix sort). From those sorted integer codes, you can immediately read a perfect batching order.

I'd also like to try this solution, I will keep you informed while I submit my pull request if that don't bother you.

var cmd = locCmds[i];
if(!cmd._batched && cmd.configureBatch) //may be set to true by processed cmds during this loop
{
cmd.configureBatch(locCmds,i);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see the implementation of this function, I assume it should be implemented in cc.Sprite.WebGLRenderCmd. Normally it should set commands' _batching and _batched property by comparing their type and rendering data.

@TravisGesslein
Copy link
Author

Ah, sorry, another thing that is missing from this pull request.

The purpose of the configureBatch function is to:

  • check which sprites can be batched (and write the batched sprite count to this._batchedNodes, because the count is probably needed for a WebGL draw call later on)
  • mark the batched ones as batched for the currently processed frame by setting _batched = true
  • mark the batching sprite (where configureBatch is called) as _batching = true.
  • upload the actual data to WebGL buffers and whatever else you need

As for the last point, it could be split off (uploading is kind of a seperate issue from determining what can be batched), but I just put it in there for simplicity

The _batched and _batching flags are there so rendererWebGL later knows to call a _batching renderCmd's .batchedRendering() function instead of rendering(), and also to skip .rendering() for the sprites which were already batched.
_batched and _batching will later be reset to false in .rendering of RendererWebGL

As for the actual function implementation that is used in this pull request: I subclassed cc.SpriteWebGLRenderCmd because I wanted to play with different implementations, but I added the subclasses to the wrong repository (our internal game project)... if you look at cc.Sprite._createRenderCmd, it returns a SpriteWebGLBasicRenderCmd. Here's that class:

(function() {
    cc.Sprite.BasicWebGLRenderCmd = function (renderable) {
        cc.Sprite.WebGLRenderCmd.call(this, renderable);
        this._needDraw = true;

        if(!proto.vertexDataPerSprite)
        {
            proto.vertexDataPerSprite = cc.V3F_C4B_T2F_Quad.BYTES_PER_ELEMENT;
            proto.matrixByteSize =  4*4*4; //4 rows of 4 floats, 4 bytes each
            proto.byteSizePerSprite = proto.vertexDataPerSprite + proto.matrixByteSize*4;
            proto.indicesPerSprite = 6;
        }
    };

    var proto = cc.Sprite.BasicWebGLRenderCmd.prototype = Object.create(cc.Sprite.WebGLRenderCmd.prototype);

    proto.constructor = cc.Sprite.BasicWebGLRenderCmd;

    proto.configureBatch = function (renderCmds, myIndex) {
        var node = this._node;
        var texture = node.getTexture();

        for (var i = myIndex + 1, len = renderCmds.length; i < len; ++i) {
            var cmd = renderCmds[i];

            //only consider other sprites for now
            if (!(cmd instanceof cc.Sprite.WebGLRenderCmd)) {
                break;
            }

            var otherNode = cmd._node;
            if (texture !== otherNode.getTexture()) {
                break;
            }

            cmd._batched = true;
        }

        var count = this._batchedNodes = i - myIndex;

        if (count > 1) {
            this._batching = true;
        }
        else {
            return 0;
        }

        var buf = this.pooledBuffer = this.getBatchBuffer(count);
        this._batchBuffer = buf.arrayBuffer;
        this._batchElementBuffer = buf.elementBuffer;

        //all of the divisions by 4 are just because we work with uint32arrays instead of uint8 arrays so all indexes need to be shortened by the factor of 4
        var totalSpriteVertexData = this.vertexDataPerSprite * count / 4;
        var matrixData = this.matrixByteSize / 4;
        var vertexDataPerSprite = this.vertexDataPerSprite / 4;
        var vertexDataOffset = 0;
        var matrixDataOffset = 0;

        var totalBufferSize = count * this.byteSizePerSprite;
        var uploadBuffer = new Uint32Array(totalBufferSize / 4);

        gl.bindBuffer(gl.ARRAY_BUFFER, this._batchBuffer);

        for (var j = myIndex; j < i; ++j) {
            var cmd = renderCmds[j];
            //copy(uploadBuffer, cmd._quadBufferView, vertexDataOffset);

            var source = cmd._quadBufferView;
            var len = source.length;
            for (var k = 0; k < len; ++k) {
                uploadBuffer[vertexDataOffset + k] = source[k];
            }

            var matData = new Uint32Array(cmd._stackMatrix.mat.buffer);

            source = matData;
            len = source.length;

            var base = totalSpriteVertexData + matrixDataOffset;
            var offset0 = base + matrixData * 0;
            var offset1 = base + matrixData * 1;
            var offset2 = base + matrixData * 2;
            var offset3 = base + matrixData * 3;

            for (var k = 0; k < len; ++k) {
                var val = source[k];
                uploadBuffer[offset0 + k] = val;
                uploadBuffer[offset1 + k] = val;
                uploadBuffer[offset2 + k] = val;
                uploadBuffer[offset3 + k] = val;
            }

            vertexDataOffset += vertexDataPerSprite;
            matrixDataOffset += matrixData * 4;
        }

        gl.bufferSubData(gl.ARRAY_BUFFER, 0, uploadBuffer);

        //create element buffer
        gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, this._batchElementBuffer);

        var indices = new Uint16Array(count * 6);

        var currentQuad = 0;
        for (var i = 0; i < count * 6; i += 6) {
            indices[i] = currentQuad + 0;
            indices[i + 1] = currentQuad + 1;
            indices[i + 2] = currentQuad + 2;
            indices[i + 3] = currentQuad + 1;
            indices[i + 4] = currentQuad + 2;
            indices[i + 5] = currentQuad + 3;

            currentQuad += 4;
        }

        gl.bufferSubData(gl.ELEMENT_ARRAY_BUFFER, 0, indices);
        return count;
    }

    proto.batchedRendering = function (ctx) {
        var node = this._node;
        var locTexture = node._texture;
        var count = this._batchedNodes;

        var bytesPerRow = 16; //4 floats with 4 bytes each
        var matrixData = this.matrixByteSize;
        var totalSpriteVertexData = this.vertexDataPerSprite * count;

        this._batchShader.use();
        this._batchShader._updateProjectionUniform();

        cc.glBlendFunc(node._blendFunc.src, node._blendFunc.dst);
        cc.glBindTexture2DN(0, locTexture);                   // = cc.glBindTexture2D(locTexture);

        gl.bindBuffer(gl.ARRAY_BUFFER, this._batchBuffer);

        cc.glEnableVertexAttribs(cc.VERTEX_ATTRIB_FLAG_POS_COLOR_TEX);

        gl.vertexAttribPointer(0, 3, gl.FLOAT, false, 24, 0);                   //cc.VERTEX_ATTRIB_POSITION
        gl.vertexAttribPointer(1, 4, gl.UNSIGNED_BYTE, true, 24, 12);           //cc.VERTEX_ATTRIB_COLOR
        gl.vertexAttribPointer(2, 2, gl.FLOAT, false, 24, 16);                  //cc.VERTEX_ATTRIB_TEX_COORDS
        //enable matrix vertex attribs
        for (var i = 0; i < 4; ++i) {
            gl.enableVertexAttribArray(cc.VERTEX_ATTRIB_MVMAT0 + i);
            gl.vertexAttribPointer(cc.VERTEX_ATTRIB_MVMAT0 + i, 4, gl.FLOAT, false, bytesPerRow * 4, totalSpriteVertexData + bytesPerRow * i); //stride is one row
        }

        gl.bindBuffer(gl.ELEMENT_ARRAY_BUFFER, this._batchElementBuffer);
        //gl.drawArrays(gl.TRIANGLE_STRIP, 0, count*4);
        gl.drawElements(gl.TRIANGLES, count * 6, gl.UNSIGNED_SHORT, 0);

        for (var i = 0; i < 4; ++i) {
            gl.disableVertexAttribArray(cc.VERTEX_ATTRIB_MVMAT0 + i);
        }

        this.storeBatchBuffer(this.pooledBuffer);

        cc.g_NumberOfDraws++;
    }
})();

Regarding attribution:

I like your implementation here and really appreciate that you share all these informations with us. I will try to make one working version by myself, and I will mark the origin of the idea inline for the parts inspired from your implementation.

Yep, that is no problem!

@pandamicro
Copy link
Contributor

Thanks again, you are awesome ~ 👍

@pandamicro
Copy link
Contributor

@heishe ok, I found the problem may be that I haven't enabled and set cc.ATTRIBUTE_NAME_MVMAT (a_mvMatrix). You use cc.VERTEX_ATTRIB_MVMAT0 (3) to bind with matrix data, is it a bug ? Or am I using anything wrong ?

EDIT: I tried to bind MVMAT, but still no luck

0abde17

Here is what I captured for one batched buffer, the uploadBuffer looks strange, the values are all very big:

gl.bufferSubData(gl.ARRAY_BUFFER, 0, uploadBuffer);

1

Another question, in batchedRendering, vertexAttribPointer is invoked for position, color, tex coordinates (for a vertex), then it's invoked four times for four matrix, it's a little bit odd, it should only need to be invoked once for matrix data. Why is that ?

@TravisGesslein
Copy link
Author

No guarantee, but I think the buffer values might be fine. WebGLInspector just interprets a lot of uploaded packed floating point values (position, texcoord, matrix values, etc.) and packed 8 bit integers (colors) as integers (because the uploadBuffer is a typed integer array) resulting in these huge looking numbers.

Regarding the vertex attribute: Correct, the relevant shader is added in CCShaders.js as cc.SHADER_POSITION_TEXTURE_COLOR_VERT_BATCHED.

Note that the javascript code activates 4 consecutive vertex attributes (need 1 for each row of the matrix), starting with cc.VERTEX_ATTRIB_MVMAT0 through cc.VERTEX_ATTRIB_MVMAT3. So the program needs to add ...MVMAT0,...MVMAT1,...MVMAT2 and ...MVMAT3

@pandamicro
Copy link
Contributor

Note that the javascript code activates 4 consecutive vertex attributes (need 1 for each row of the matrix), starting with cc.VERTEX_ATTRIB_MVMAT0 through cc.VERTEX_ATTRIB_MVMAT3

That's the part I don't understand, why we need 4 identical matrix data for one vertex ? and I haven't seen anywhere they are bound to attribute names

@TravisGesslein
Copy link
Author

Well there are two separate concepts there:

First, for a single matrix we need 4 vec4 attributes in the vertex shader, because WebGL (and desktop OpenGL for that matter) does not have vertexAttribPointer calls that set up an entire matrix worth of data. So a 4x4 matrix is just treated like it was 4 separate vec4 attributes, but inside the actual shader code you only bind one mat4, but it takes up 4 "attribute slots".

edit: To make this clearer, the vertex attributes are activated here in batchedRendering:

//enable matrix vertex attribs
        for (var i = 0; i < 4; ++i) {
            gl.enableVertexAttribArray(cc.VERTEX_ATTRIB_MVMAT0 + i);
            gl.vertexAttribPointer(cc.VERTEX_ATTRIB_MVMAT0 + i, 4, gl.FLOAT, false, bytesPerRow * 4, totalSpriteVertexData + bytesPerRow * i); //stride is one row
        }

So it activates 4 vertex attributes where each has 4 elements and is of type gl.FLOAT, aka a vec4.

Second (and a separate issue), the code also uploads the same matrix 4 times (once for each vertex). This is because you need to be able to access the same matrix from 4 different vertices (one for each point of the quad). Desktop OpenGL makes this easier by having functions that sets the "advancement rate" of vertex attributes, so you can say "use this mat4 for 4 vertices, and only then advance to the next matrix", but WebGL does not have this kind of feature. So you have to upload the same matrix 4 times.

Theoretically you could upload the matrices once each into a texture, then only upload 4 indices into the actual vertex buffer for the batched draw call, and use that index to index this texture that contains the matrices. The problem is that vertex texture fetches (using textures in a vertex shader) are not universally supported and some browsers and machines (especially on mobile) don't support it. You also can't really use uniform arrays, because their size has to be static and known when the shader is compiled and besides, they are not designed for this kind of thing (accessing different data with every vertex) so there might be performance implications.

You could, of course, also transform the batched quads on the CPU and only upload the transformed vertex data. I chose not to do that because my aim was to minimize CPU work done at any cost.

@pandamicro
Copy link
Contributor

First, for a single matrix we need 4 vec4 attributes in the vertex shader, because WebGL (and desktop OpenGL for that matter) does not have vertexAttribPointer calls that set up an entire matrix worth of data. So a 4x4 matrix is just treated like it was 4 separate vec4 attributes, but inside the actual shader code you only bind one mat4, but it takes up 4 "attribute slots".

Got it, that's the reason I'm looking for, so what I did is to addAttribute for MVMAT in CCShaderCache.js which isn't shown in your PR:

case this.TYPE_POSITION_TEXTURECOLOR_ALPHATEST_BATCHED:
    program.initWithVertexShaderByteArray(cc.SHADER_POSITION_TEXTURE_COLOR_VERT_BATCHED, cc.SHADER_POSITION_TEXTURE_COLOR_ALPHATEST_FRAG);
    program.addAttribute(cc.ATTRIBUTE_NAME_POSITION, cc.VERTEX_ATTRIB_POSITION);
    program.addAttribute(cc.ATTRIBUTE_NAME_COLOR, cc.VERTEX_ATTRIB_COLOR);
    program.addAttribute(cc.ATTRIBUTE_NAME_TEX_COORD, cc.VERTEX_ATTRIB_TEX_COORDS);
    program.addAttribute(cc.ATTRIBUTE_NAME_MVMAT, cc.VERTEX_ATTRIB_MVMAT0);

Then I corrected the code to set vertex attribute pointer for matrix data row by row.

And finally I got something visible now, not correct yet, but great, thanks for the help

@pandamicro
Copy link
Contributor

@heishe It's working now ! Can you take some time to review my PR when you are available ?

#3265

@pandamicro
Copy link
Contributor

Closing this PR and please go to #3265 for more discussion

@pandamicro pandamicro closed this Apr 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants