<br><br><div class="gmail_quote">2012/2/11 lkcl luke <span dir="ltr"><<a href="mailto:luke.leighton@gmail.com">luke.leighton@gmail.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On Sat, Feb 11, 2012 at 8:51 AM, Iliya Georgiev <<a href="mailto:ikgeorgiev@gmail.com">ikgeorgiev@gmail.com</a>> wrote:<br>
<br>
> Hi,<br>
> According to my very rough estimation 100 Giga-MACs performance is at least<br>
> equivalent to the performance of AMD Radeon HD 6250 GPU.<br>
<br>
</div> but... but... that's just one processor, and i was naively thinking<br>
it'd be great to put down 8! it can't _possibly_ be right that the<br>
power consumption is 6mW to do 100 Giga-MACs in 28nm @ 1ghz, and only<br>
0.02 <a href="http://sq.mm" target="_blank">sq.mm</a> surely??<br>
<div class="im"><br>
> That is the GPU<br>
> used in AMD G-Series SoC - the one you have chosen for future EOMA-68 SoCs.<br>
> (I suggest you have to ask Xtensa about what stays behind "100 Giga-MACs<br>
> performance" - 1-, 2-, 4- or 8-core configuration?)<br>
<br>
</div> yeah the list of questions is getting considerable :)<br>
<br>
thanks iliya.<br>
<div class="HOEnZb"><div class="h5"><br>
l.<br>
<br>
_______________________________________________<br>
arm-netbook mailing list <a href="mailto:arm-netbook@lists.phcomp.co.uk">arm-netbook@lists.phcomp.co.uk</a><br>
<a href="http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook" target="_blank">http://lists.phcomp.co.uk/mailman/listinfo/arm-netbook</a><br>
Send large attachments to <a href="mailto:arm-netbook@files.phcomp.co.uk">arm-netbook@files.phcomp.co.uk</a><br>
</div></div></blockquote></div><div><br></div>Luke,<div>I have been a witness of many presentations, where the numbers were used to show whatever the presenter wanted. Even the worst financial results can be presented as a success.</div>
<div>So as you pointed correctly we have to find the measure. Back to the die size measurment, most of die area in CPUs is occupied my caches. In the specifications of Xtensa LX4 data caches are optional, so the reported die size for 28 nm part of 0.02 mm^2 (for 45 nm part - 0.05 mm^2) should be checked for the included features, caches and etc. For certain workloads like graphics accerelation bigger caches is not necessary, unlike high speed memory access. But for general purpose CPUs, it is better to have bigger cache. </div>
<div><br></div><div>A. Here are the questions that I would ask Teselica application engineers if I had to deal with them. First of all I will have an idea of workloads that I need. For example:</div><div><br></div><div>1. For general processing - a workload equivalent to Tegra 2 CPU performance. How Xtensa compares to Tegra 2 CPU performance? What is the Xtensa configuration to catch up that level of performance? </div>
<div>2. For video decoding - a workload equivalent to fixed unit of video decoding unit like in Allwinner 10. What is the Xtensa configuration to catch up that level of performance?</div><div>3. For 3D graphics - a workload equivalent to Radeon 6250 GPU. What is the Xtensa configuration to catch up that level of performance?</div>
<div>4. For base band...</div><div>Then after lot of discussions, tests and simulations I suppose that I will have to decide on the right balance of features - number of cores, core clocks, caches, memory access speed and etc.</div>
<div><br></div><div>B. To the key selling point of Tensilica, programmability, I would have the following groups of questions:</div><div><br></div><div>1. Can all of the optional pre-defined execution units co-exist simultaneously in one configuration? What is the price for the optional pre-defined execution units?</div>
<div><br></div><div>Optional pre-defined execution units as in page 2 of the Xtensa LX4 specifications are:</div><div><div>- 32-bit multiplier and/or 16-bit multiplier and MAC </div><div>- Single-precision floating point unit</div>
<div>- Double-precision floating point acceleration</div><div>- 3-way 64-bit VLIW (VLIW3)</div><div>- Pre-defined 32-bit GPIO and FIFO-like Queue interfaces</div></div><div><br></div><div>2. Can all of the optional execution units for additional licensing co-exist simultaneously with optional pre-defined execution units in one configuration? Or they must substitute one or more of pre-defined units? </div>
<div>For example if only one unit for additional licensing can exists per configuration and it occupies the 3-way 64-bit VLIW (VLIW3) unit, that means that if we want audio using VLIW we will have difficulties to implement 3D acceleration in the same core at the same time.</div>
<div>And what is the price of the licenses?</div><div><br></div><div>Optional execution units for additional licensing according the Xtensa LX4 specifications are:</div><div><div>- ConnX Vectra LX DSP engine</div></div><div>
<div>- ConnX Vectra VMB for baseband acceleration </div></div><div><div>- ConnX D2 DSP engine </div></div><div><div>- ConnX BBE16 Baseband engines </div></div><div><div>- HiFi 2 and HiFi EP Audio engines</div></div><div>
<br></div><div>3. Are the software/hardware development kits for programming and configuration of Xtensa freely available for developers? If not, what is the price?</div><div><br></div><div>4. What is the difference between Xtensa and Xilinx Zynq/<font class="Apple-style-span" face="arial, helvetica, sans-serif">Altera Cyclone that have 28nm Dual-Core Cortex A9 with on-board FPGA? Both solutions have fixed part and programmable part. Both solutions use SystemC as description language. Xtensa claims that their solution is more close to software programmability than pure hardware </font><span class="Apple-style-span" style="font-family:arial,helvetica,sans-serif">programmability in FPGAs. But is this true?</span></div>
<div><br></div><div><br></div><div>Iliya</div>