[Arm-netbook] Engineers Boost Computer Processor Performance By Over 20 Percent

Tue Feb 14 03:37:27 GMT 2012

On Tue, Feb 14, 2012 at 12:00 AM, Alexander Ross <abushcrafter at gmail.com> wrote:
> http://news.ncsu.edu/releases/wmszhougpucpu/

 ... by combining CPU and GPU to share and collaborate on tasks -
interesting, and thank you for raising this, it's kinda relevant given
that i have a rather important decision coming up on the internal
architecture of the planned SoC.

this dual approach looks very familiar given that i had to work with
the Aspex Semiconductor's "ASP" technology, where you had to
*manually* schedule data loading and I/O processing.

 i can tell you right now that it was a complete f*****g bitch.

 we had to write spreadsheets for god's sake to calculate the optimum
division of labour, hand-crafting the algorithm a few lines of code at
a time.  the reason was that the ASP is a 1-bit (ok, 2-bit pipelined
in newer versions) processor, connected in "strings".  so you could
either have 32 processors doing one 32-bit add, all in one clock
cycle, or you could have 32 processors doing 32 *sequential* 1-bit
adds, producing 32 results but of course taking 32 clock cycles to do
it... or anything in between!

 it just depended on how fast you wanted the data to be calculated
[multiplications had to be done as a series of additions], so if you
did faster parallel processing you risked ending up being I/O bound,
but if you did the calculations too slowly (using sequential
processing) you could end up being CPU-bound.

the spreadsheets helped assess which hard-coded assembly instructions
would be optimal.

 the bottom line is that this is just a rough indicator of what's
ahead for this mixed CPU-GPU stuff, and it's why i'm going to be much
happier with an architecture that revolves around beefing up a
standard CPU architecture to have GPU capabilities, and then have
several of them in order to ensure that the use of some of the CPUs
for 3D tasks doesn't leave you with no spare processing capacity for
standard CPU tasks.

... enough headaches having to choose between SIMD and VLIW without
having _separate_ CPUs and GPUs :)

l.