[Arm-netbook] [review] SoC proposal

Deano Calver deano at rattie.demon.co.uk
Wed Feb 29 11:42:47 GMT 2012


I'm new here but been reading for a bit. I've had a lot of experience with
low level 3D for a good many (too many ;) ) years on the software side and
worked with a fair few GPU HW guys so have some idea of the issues of
getting good GPU performance.


Reading about the tensilica architecture and possible floating point
performance in detail at the moment, but I'm doubtful of it achieving good
3D performance without some specific modifications. 
Today most GPUs are sold on how many TFLOPs they do, but that's purely a
recent thing to sell in the GPGPU market, back in the day the thing they
sold on was the fixed function stuff they do (Triangle setup, rasterisation,
hierarchal depth, Porter-Duff, MSAA, tesselation, perspective correct
interpolators, texture samplers, etc.) This still accounts for much of the
speed and why they can support high FLOP counts. The shader cores (the ALUs
with the FMACs) achieve the high throughput due to the fixed function
operations and schedulers which feed them with the data when they need it
and in the format they want. A CPU has to perform all those operations
before it can even get to running the shaders programs (and with texturing
during the shader), this often reduces the comparable FLOP throughput
massively. Just doing porter-duff per pixel (which is basically free on GPU)
can consume massive CPU performance. 
Memory bandwidth is even scarier thing to compare, a GPU HW guy once told me
he thinks of GPUs, as memory with a bit of ALU logic stuck on top. A 1080P
30Hz can eat available memory bandwidth very quickly!
The last significant 3D on CPU attempt was Larrabee, even there they used
fixed function texture units, wide SIMD units and lots of simple x86 cores
with custom ISA extension for 3D graphics operations. And it didn't beat
GPUs in performance or power.
Without knowing what DSP operations a tinsilica chip would have it hard to
say how performant it would be, but guessing more of several generations
behind current mobile GPUs would be more probable than comparing to current
desktop GPUs (even low end ones).
However if the main use is mostly for desktop composition, then a number of
short cuts can be used given much higher performance. Desktop composition is
driven by 3D chips generally because they are there, however the operations
are mostly 2D (sprite) porter-duff blends and not general 3D, for a software
engine its likely 100x faster to detect and optimise for the porter-duff
sprites versus the full 3D pipeline.
The other easy to forget item, is even if you manage to get a CPU
architecture that can perform the 3D operations fast enough, using 100% CPU
time on display means 0% for everything else.
Sorry for being a bit negative, but I think it is important to note quite
how complex high performance 3D is and why almost without exception, systems
have GPU where they are expected to display a modern desktop.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.phcomp.co.uk/pipermail/arm-netbook/attachments/20120229/9dea8bef/attachment-0001.html 

More information about the arm-netbook mailing list