This week I have implemented a basic OpenGL framework for future application of the planet renderer. It can be found on Github, which will be frequently be updated to keep track of the progress of the rendering tech.
I reused a bunch of (modified) code from the OpenGL Framework I wrote last spring, without actually writing the project inside in order to avoid making the techdemo unnecessarily complex. The goal afterall is implementing a techdemo that has the necessary code for rendering planets and nothing more, so that it is easy to identify the code necessary. It will always be possible to transfer the project into a more game friendly engine.
So for now, the framework contains the following components:
- A shader class that combines all shader types and handles loading and simple preprocessing of .glsl files
- An input manager
- a transform class that contains basic model-world matrix transformations and information about position rotation and scale of an object
- a camera class that handles view and projection matrices
- opengl and sdl initialisation, a main loop
- window, time and settings information
- a scene handling all important objects
By necessity I will probably also add sprite rendering for post processing with framebuffers and bitmap text rendering.
I also took the opportunity to test the generation of icospheres directly in the framework, the process of which I outlined in my last post. In a first step, the 12 vertices are generated using the golden ratio, and then the triangles are subdivided, with the vertices generated being extended out to follow the radius of the ideal sphere.
The nice thing about this approach is that it allows stopping the recursion depending on a different hueristic than being below the maximum level. As an example, the distance from the camera and the visibility in the viewport could be used. 😉 For actual planet rendering, the entire thing should be placed in some sort of tree hierachy, but for now this works.
I also tested the real time updating of the vertex buffer with the most simple approach (simply rebinding it). When changing the buffer, actually recursively generating the geometry takes the longest time. When taking out this step (while still rebinding the vertex buffer), I was able to send nearly a million triangles to the GPU (NVidia Quadro K3100M, Intel i7-4700MQ) at around 180fps. Without rebinding the buffer those triangles ran at around 550 fps. I was quite impressed with the amount of data I was able to push to the gpu, by an extension of these measurments I could probably send one triangle per pixel to the gpu every frame and stay above 60fps.
Of course, spending all of the performance on streaming vertices is less than ideal, since in a real world application a lot more calculations need to be done apart from terrain generation (not to mention actually generating the vertex data which was not done every frame here). It is defenitly not necessary to have one triangle represent every pixel though. If every triangle covers a 5- 10 pixel square it should still be enough to represent curves rather nicely, and for areas that do not form silhouttes from the camera perspective even less vertex detail is necessary since a lot can be done with nice shaders and textures.
On top of that, most modern algorithms I have been researching for terrain rendering use some approach that is hybrid of performing LOD calculations and having patches of precalculated data as a form of minimizing the amount of bandwidth that is used. This is done because, as I demonstrated earlier, the main performance killer is not having a lot of triangles on screen, but sending a lot of them from the cpu to the gpu, and having predefined patches is an effective way of reducing the amount of data that needs to be sent (for instance with an index list). It is not inconcievable to even generate low level details with a geometry or tesselation shader on the gpu to minimize that effect where it doesnt affect things like collision anymore.
I have noticed that there are two ways of dealing with low resolution details. The first is putting a lot of terrain data into paged files on the harddrive and streaming in the necessary data when the resolution is needed, and the second is procedural generation. Since I am researching planet renering for the usage in games, holding detailed information about entire planets is probably an impractical solution to persue, since a lot of game applications will require moving between a multitude of planets, and storing all that data will be likely to be inconvenient for both users and developers. Therefore it seems more adequate to either generate the planets completely procedurally or have some low resolution data and use procedural refinement at lower altitudes. This doesnt throw low resolution data completely out of the window, since it might still be useful to be able to add detailed modifications in specific positions (though a lot of that could be achieved by loading specific meshes at certain distances).
On the side of generating the geometry in real time, a lot can be done to minimize the amount of time the cpu is busy with that. The most important one is actually keeping all the generated data in a tree structure and only updating the tree when the camera is moved. Also all triangles that are not in the viewport can be culled (including their children). This can be checked by transforming their positions with the viewProjection matrix into the screen coordinate system. The second form of culling can be done by checking if a triangle is backfacing from the camera perspective, a simple dot product will suffice to check this.
Lastly an interesting trick used in the ROAM algorithm is queueing pointers to triangles that are to be split in a list sorted by their priority, and triangle children that are to be merged in another queue. This way, the subdivision level of the triangle will only change by one subdivision per frame, and it limits the amount of calculations done and changes pushed. As a result it will prevent spikes in calculation time during rapid camera movement.
The research I have done mainly served as a reminder how complex this topic can get, but I have found some general rules for my approach:
- Limit the amount of geometry to recalculate by caching data in a tree hierachy and only allowing one subdivision change per frame
- Cull invisible triangles at the parent position in their tree structure
- maximum triangle resulution needed is a square of 5 px
- stop using LOD at a higher resolution and patch the in between space with non LOD dependant methods
- Dont page an entire planet in memory, rather refine the terrain procedurally
- Possibly only resend changed parts of the vertex information to the gpu
- additional quality improvement is still possble with shaders and meshes on top afterwards.
I hope this post summarizes the progress I made since last week nicely.