[Arm-netbook] [review] SoC proposal

Thu Feb 9 15:13:40 GMT 2012

lkcl luke wrote:
> On Thu, Feb 9, 2012 at 11:41 AM, Gordan Bobic <gordan at bobich.net> wrote:
>> lkcl luke wrote:
>>> On Thu, Feb 9, 2012 at 7:41 AM, Vladimir Pantelic <vladoman at gmail.com> wrote:
>>>
>>>>>   we evaluated the possibility of coping with 1080p30 video decode, and
>>>>> worked out that after one of the cores has been forced to deal with
>>>>> CABAC decode all on its own, the cores could then carry out the
>>>>> remaining parts of 1080p30 decode in parallel, at about 1ghz, quantity
>>>>> 4.
>>>> I would not recommend fully loading the cpu while decoding video, HD
>>>> video is becoming a commodity and people might soon use it as an "animated"
>>>> wallpaper while doing other CPU intensive stuff
>>>  last year the target speed was 1.5ghz, 4 cores.  this time we
>>> envisage 8 cores at over 1.2ghz, and the cores can be made to support
>>> VLIW which can result in 3x the effective clock-rate.  so i don't
>>> think that CPU horsepower is something to worry about.  the only thing
>>> that's of concern is to not put too _much_ horsepower down so that it
>>> goes beyond the gate-count budget.
>> I think you need to look at this from the practical standpoint.
>> Specifically:
>>
>> 1) Is there GCC support for this SoC's instruction set,
> 
>  yes.
> 
>> including VLIW,
> 
>  don't know.  tensilica have a proprietary "pre-processor" compiler
> that turns c and c++ into VLIW-capable c and c++.

That sounds like another one of those hare-brained solutions like the 
JZ4760. If it's going to be mainstream, it needs to be done cleanly, 
i.e. a proper GCC back-end - preferably one that delivers decent 
performance unlike the current vectorization efforts.

>> SSE,
> 
>  what is SSE?

Sorry, s/SSE/SIMD/

>> and all the relevant CPU extensions?
> 
>  yes.
> 
>> How many man-hours will it
>> take to add reasonably well optimized support for this, and how long
>> will it take to stabilize it for production use (i.e. good enough to
>> rebuild the entire Linux distro including kernel, glibc, and other
>> packages that resort to assembly in places)?
> 
>  already done. http://linux-xtensa.org/

Good to see. Is there a dev kit availble? Rebuilding a distro in qemu is 
going to take years.

>> How many man-hours will that take before it is
>> sufficiently tested and stable for an actual product that the end
>> consumers can use?
> 
>  don't know.  tensilica have proprietary software libraries for audio
> and video CODECs already (obviously)

Proprietary as in closed-source?

>> 3) How long will it take to add support for this SoC to all important
>> packages that use assembly in places?
> 
>  don't know, but to do so would not be smart unless it's _really_
> critical, simple and has a big pay-off.

The problem is that some packages won't build/work without it, and you 
probably know just how hairy some of the dependencies can get in a 
modern Linux distro.

>> You might have the hardware out in 18 months' time, but I would be
>> pretty amazed if you managed to get the OSS community enthusiastic
>> enough about this to get the whole software stack ported in an amount of
>> time that is less than years - by which time the SoC will be thoroughly
>> deprecated.
> 
>  98% of what's needed is already in place.
> 
>  the goal is: to design the architecture in such a way that the
> remaining 2% can be done quickly and simply.
> 
>  there's enough examples of how _not_ to do this out there such that i
> think it would be possible to find a fast route to market in amongst
> them.

OK, OK, I'm convinced. You have pushed it into the realm of plausible in 
my mind. :)

>> Look at the rate of progress Linaro is making, and they have a
>> multi-million $ budget to pay people to push things along, and an OSS
>> community that already has ARM well boot-strapped and supported.
> 
>  yes.  that just adds to the cost of the CPUs, which is unacceptable.
> 
>  i'd like to apply a leeetle bit more intelligence to the task, namely
> for example to add instruction extensions that will help optimise 3D,
> and to say... find out how much effort it would take to port llvm and
> to evaluate whether gallium3d on llvmpipe would be "good enough" if
> optimised to use the 3D-accelerating instructions etc. etc.

Don't put too much stock in that yet. Writing a good optimizing compiler 
that produces beneficial SIMD/VLIW binary code is _hard_. So hard that 
only Intel have so far managed to do a decent job of SIMD (SSE).

See the difference between AMD and Nvidia GPUs, for example. Radeon has 
always had much, much higher theoretical throughput, but has always 
lagged behind GeForce in practice. Unified shaders are generic so even a 
relatively crap compiler can do something sensible.

Gordan