-
Notifications
You must be signed in to change notification settings - Fork 904
Gotta batch them all #3248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gotta batch them all #3248
Conversation
…(greatly helps with accuracy, also 1024-1024 is very arbitrary
…ist file, the texturename should have a # in front (like with sprites) to indicate a spriteframe texture. can also set manually with .setTexture (also with # in front of texturename)
…p. inlining copy() manually sped up the function by 20%
…of uint8, since all data sizes involved should be multiples of 4
Thanks for your contribution, I saw very interest things in your PR, but as it's a big PR, it's very hard to review. Can you explain in detail your batch solution ? By the way, you have removed all canvas render commands, this will disable our Canvas compatibility which can not be accepted, can you explain why you remove them ? |
Hi! Sorry, this pull request arrived in your inbox only accidentally, it was intended for our internal cocos2d-html5 fork :D That's why the canvas rendering stuff is removed: we don't need it internally, and it reduces the size of the executable quite a bit once all is removed. Needless to say this is not intended for the main cocos2d-html5 branch! However I can explain our batching solution a bit, it does a bunch of stuff that is hardcoded because it fits the specific of our project, but the general concept should transfer to usual cocos2d-html5 without problems. Here's roughlyl what happens every frame: Some things that already happen in standard cocos2d-html5:
What we roughly do for batching (we only implemented cc.Sprite.SpriteWebGLRenderCmd batching as a proof of concept, but the concept can be implemented for other node types):
As an example, batching SpriteWebGLRenderCmd works by (just simple and proof of concept, can be improved a lot):
And bam! Batching works. Some things to note:
That's basically it. It's a lot of text, I hope it's not too convoluted. |
Thanks a lot for the explanation, we are also very interested in batched rendering. I will read it when I got home, by the way, do you have performance comparison before and after batch ? |
Wow, that's a really good idea. |
var buf = pool[i]; | ||
if(buf.size >= numSprites) | ||
{ | ||
pool.removeByLastSwap(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where have you defined this function removeByLastSwap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that you haven't reset the buffers here, then it will append data into the buffer each frame, am I miss anything here ?
Now I see bufferSubData is replacing the buffer from offset 0, but somehow my version doesn't work very well, the buffer is keeping growing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pandamicro You're right, it is actually a bug (fixed on current version of our branch as well). Those three lines need to be
pool.removeByLastSwap(minBufIndex);
this.initBatchBuffers(minBuf.arrayBuffer,minBuf.elementBuffer,numSprites);
minBuf.size = numSprites;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means bufferSubData
has a bug ? It can't actually replace the data from requested offset ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no, it can. The bug in this piece of code is just that the code keeps making new buffers (unnecessarily) because it thinks that the pooled buffer is too small (since .size is never rewritten). That shouldn't cause the rendering to break, but it's wasteful for performance (but only maybe) and memory (until the old ones are garbage collected).
In case you're using bufferSubData differently in your code, note that it can only replace data if the total stuff you write into it remains in the size that you gave the buffer when you initialized it with bufferData(). bufferSubData can't resize the buffer.
Oh, sorry, forgot about that! It's a custom array extension we use internally. It's defined like this:
For testing, you can just define it anywhere in your code. The main use case is when you need to remove something from an array, and don't care about the order of what is inside the array. In this case, this is obviously constant time and very fast, while generic array removal needs to move up everything behind the removed index. Also regarding the requested performance comparison: Will try to get it to work asap. I'm extremely busy right now so please forgive me! :D |
Also, just for completeness sake, one more thing on optimizing this further: It currently uploads everything every frame (because I haven't had more time to spend on it), but that's obviously not necessary. The batch order only changes when the children order changes, and even then not necessarily (basically if the renderCmd order stays the same, you don't need to do anything). And even then most likely sub-batches are going to stay the same (for example if you render a text in between 50 previously batched sprites, now you have 2x 25 batches), and you can probably take account for that somehow. Also, when something changes, not everything changes (e.g. the quad data of sprites mostly stays the same unless you change its color, change the underlying texture rect, etc., but transform matrices often change every frame). |
Great idea, thanks for the clarification I like your implementation here and really appreciate that you share all these informations with us. I will try to make one working version by myself, and I will mark the origin of the idea inline for the parts inspired from your implementation. If it's a problem please tell me how you want us to integrate part of your implementation into our open sourced code base. |
I'd also like to try this solution, I will keep you informed while I submit my pull request if that don't bother you. |
var cmd = locCmds[i]; | ||
if(!cmd._batched && cmd.configureBatch) //may be set to true by processed cmds during this loop | ||
{ | ||
cmd.configureBatch(locCmds,i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't see the implementation of this function, I assume it should be implemented in cc.Sprite.WebGLRenderCmd. Normally it should set commands' _batching and _batched property by comparing their type and rendering data.
Ah, sorry, another thing that is missing from this pull request. The purpose of the configureBatch function is to:
As for the last point, it could be split off (uploading is kind of a seperate issue from determining what can be batched), but I just put it in there for simplicity The _batched and _batching flags are there so rendererWebGL later knows to call a _batching renderCmd's .batchedRendering() function instead of rendering(), and also to skip .rendering() for the sprites which were already batched. As for the actual function implementation that is used in this pull request: I subclassed cc.SpriteWebGLRenderCmd because I wanted to play with different implementations, but I added the subclasses to the wrong repository (our internal game project)... if you look at cc.Sprite._createRenderCmd, it returns a SpriteWebGLBasicRenderCmd. Here's that class:
Regarding attribution:
Yep, that is no problem! |
Thanks again, you are awesome ~ 👍 |
@heishe ok, I found the problem may be that I haven't enabled and set EDIT: I tried to bind MVMAT, but still no luck Here is what I captured for one batched buffer, the
Another question, in |
No guarantee, but I think the buffer values might be fine. WebGLInspector just interprets a lot of uploaded packed floating point values (position, texcoord, matrix values, etc.) and packed 8 bit integers (colors) as integers (because the uploadBuffer is a typed integer array) resulting in these huge looking numbers. Regarding the vertex attribute: Correct, the relevant shader is added in CCShaders.js as cc.SHADER_POSITION_TEXTURE_COLOR_VERT_BATCHED. Note that the javascript code activates 4 consecutive vertex attributes (need 1 for each row of the matrix), starting with cc.VERTEX_ATTRIB_MVMAT0 through cc.VERTEX_ATTRIB_MVMAT3. So the program needs to add ...MVMAT0,...MVMAT1,...MVMAT2 and ...MVMAT3 |
That's the part I don't understand, why we need 4 identical matrix data for one vertex ? and I haven't seen anywhere they are bound to attribute names |
Well there are two separate concepts there: First, for a single matrix we need 4 vec4 attributes in the vertex shader, because WebGL (and desktop OpenGL for that matter) does not have vertexAttribPointer calls that set up an entire matrix worth of data. So a 4x4 matrix is just treated like it was 4 separate vec4 attributes, but inside the actual shader code you only bind one mat4, but it takes up 4 "attribute slots". edit: To make this clearer, the vertex attributes are activated here in batchedRendering:
So it activates 4 vertex attributes where each has 4 elements and is of type gl.FLOAT, aka a vec4. Second (and a separate issue), the code also uploads the same matrix 4 times (once for each vertex). This is because you need to be able to access the same matrix from 4 different vertices (one for each point of the quad). Desktop OpenGL makes this easier by having functions that sets the "advancement rate" of vertex attributes, so you can say "use this mat4 for 4 vertices, and only then advance to the next matrix", but WebGL does not have this kind of feature. So you have to upload the same matrix 4 times. Theoretically you could upload the matrices once each into a texture, then only upload 4 indices into the actual vertex buffer for the batched draw call, and use that index to index this texture that contains the matrices. The problem is that vertex texture fetches (using textures in a vertex shader) are not universally supported and some browsers and machines (especially on mobile) don't support it. You also can't really use uniform arrays, because their size has to be static and known when the shader is compiled and besides, they are not designed for this kind of thing (accessing different data with every vertex) so there might be performance implications. You could, of course, also transform the batched quads on the CPU and only upload the transformed vertex data. I chose not to do that because my aim was to minimize CPU work done at any cost. |
Got it, that's the reason I'm looking for, so what I did is to addAttribute for MVMAT in CCShaderCache.js which isn't shown in your PR:
Then I corrected the code to set vertex attribute pointer for matrix data row by row. And finally I got something visible now, not correct yet, but great, thanks for the help |
Closing this PR and please go to #3265 for more discussion |
merge pls