Innovate SoC Processors ™
An Efficient Software/Hardware Development Platform for Internet Multimedia Applications
By Minghau Lee Ph.D Director of Marketing & Service Andes Technology
WWW.ANDESTECH.COM OverviewOverview ofof AndesAndes TechnologyTechnology
Andes Highlights • Founded in 2005 March • More than 200 man-year in embedded expertise • First tier investors and partners (Government VC, MediaTek, and Faraday) • USD$20M capital for financial stability
Andes’ Mission
• Provide the best processor-based SoC solution
Andes’ Strength
• Processor-based SoC solution • Self-owned SW development tools • Strong and professional technical support • Flexible business model
Page 2 Andes’Andes’ Main Main LinesLines ofof BusinessBusiness
AndeStar™ AndesCore™ Andes 16/32-bit Mixable ISA CPU Core Family
AndESLive™ Andes AndeShape™ ESL Integrated Embedded™ SoC + EVB + ICE Virtual Environment
AndeSight™ AndeSoft™ Integrated Development Optimized Target SW such as Environment Linux/RTOS, Middleware, and Application Software.
Page 3 ApproachingApproaching thethe ZettabyteZettabyte Era Era –– By By CISCOCISCO
Annual global IP Traffic will exceed half a zettabyte in four years will generate 27 exabytes per month in 2012 (nearly 7 billion DVDs each month) in late 2012 will be 522 exabytes per year P2P is growing in volume 600 petabytes per month (150 million DVDs) The sum of all forms of video (TV, VoD, Internet, and P2P) will account for close to 90% of consumer traffic by 2012
Page 4 IPIP TrafficTraffic inin 20122012
Cisco forecasts 44 exabytes per month of IP traffic in 2012
Source: Cisco, 2008
Page 5 GlobalGlobal ConsumerConsumer InternetInternet TrafficTraffic ForecastForecast
Video Is Causing IP Traffic Growth
~50% (2.6 billions of DVDs)
Primarily the Video Traffic
all forms of video (TV, VoD, Internet, and P2P) will account for close to 90 percent of consumer traffic in 2012
Source: Cisco, 2008 Page 6 GlobalGlobal ConsumerConsumer InternetInternet TrafficTraffic ForecastForecast
Source: Cisco, 2008
Page 7 ThreeThree WavesWaves ofof ConsumerConsumer InternetInternet TrafficTraffic GrowthGrowth
Home Theater Internet Is the Video Best Platform Communications YouTube (Internet STBs) Is Just (Entertainment…) the Beginning (Social Interaction..) Video Conference
Source: Cisco, 2008 Page 8 InternetInternet VideoVideo TrafficTraffic BenchmarksBenchmarks
Google (YouTube) - worldwide (Cisco estimate for May 2008) 100,000 terabytes per month P2P Video Streaming in China (January 2008) 33,000 terabytes per month Google (YouTube) - United States (May 2008) 30,500 terabytes per month U.S. Internet backbone at year end 2000 25,000 terabytes per month
Page 9 TrendsTrends -- Online Online VideoVideo
Page 10 OpportunitiesOpportunities inin ChinaChina
China's planners are cranking up their initiative to seed as many as 30 fabless semiconductor startups Government planners have earmarked a slice of the estimated $586 million China economic stimulus package as a source of funds for grants, loans and equipment for fabless startups The next generation needs to be made up of platform companies with multiple products and [design targets ranging] from cellphones to set- top boxes, digital cameras and netbooks
Page 11 TheThe DemandDemand –– Internet Internet MultimediaMultimedia PlatformPlatform
Media Player, Browser, E-book, Adobe Flash, Remote Desktop, Other APs
Desktop Environment
GTK+, QT
Multimedia Framework KVM/CVM/JVM & TCP/IP Stack, USB Stack, File System Dalvik
Multimedia Security Graphics Webkit libc Library Library Library
Linux/Android/Chrome platform (kernels, drivers, and power managers)
32-bit CPUs, Co-processors, Memory controllers, Hardware engines, Peripherals
SoC Reference Platform
Page 12 VideoVideo ResolutionResolution
YouTube 320x240 and 480x360 dominate now, 720p (1280x720) is just at the beginning H.263, H.264, and Sorenson Spark now Google merged ON2, and expect VP6 to dominate Adobe Flash Up to 720p now H.264, Sorenson Spark, VP6 Internet TV <640x480 H.264, VP6, VC-1 Obstacles to HD on the Web Size • Less than 4% of the internet population has a computer with a screen resolution of 1920x1200 or higher. The stats are going up to reach 40% at 2012 • You will instead stream the contents to TV directly • HD Video is going to be a very important part of a digital-living-room, where the content will come from sources online Bandwidth • 256 Kbit/s DSL to watch a YouTube video, but 9 Mbit/s connection to stream Full HD video • 100 HD quality videos are close to the physical limit that most servers can handle • This is why the few companies that already publish HD quality use dedicated ISPs like Akamai to be able to deliver this much data in a reliable way
Page 13 VideoVideo ContentContent SourcesSources -- Broadcast Broadcast
DVB-T MPEG-2, H.264 DVB-H H.264 ISDB-T MPEG-2, MPEG-4, H.264 S/T-DMB H.264 DMB-T/H MPEG-2, H.264, AVS CMMB H.264, Real Video, AVS MediaFLO H.264 IPTV H.264, VC-1, AVS 3GP MPEG-4 SP (H.263)
Page 14 VideoVideo ContentContent SourcesSources -- Internet Internet
Messenger Codecs used: Skype: On2 VP6/7 MSN: WMV (VC-1) Yahoo: H.264 AIM: WMV (VC-1) iChatt (Apple): H.264 BP Video Conferencing SW: WebEx: H.264 Adobe Flash H.263 H.264 Sorenson SparkH.264 ON2 VP6 P2P download RM, RMVB, MPEG-2, MPEG-4 SP/ASP (Divx/Xvid), H.264, WMV
Page 15 VideoVideo ContentContent SourcesSources -- Storage Storage
VCD MPEG-1 SVCD MPEG-2 DVD MPEG-2, MPEG-4 (DivX) Blu-ray MPEG-2, H.264, VC-1 HD-DVD MPEG-2, H.264, VC-1
Page 16 MarketMarket RequirementsRequirements –– Video Video CodecsCodecs
H.264 MPEG-2 MPEG-4 H.263 RV VC-1 DivX AVS VP6
Mobile TV
Multimedia phone Smart phone /MID PMP/DMP
Camcorder
HDTV/ IPTV UMPC/ Netbook Blu-ray DVD
Digital signage Surveillance
Page 17 MarketMarket RequirementsRequirements –– Audio Audio CodecsCodecs
Audio Cell phone, STB IPTV Home Theater Player PMP, Mobile TV Ogg Vorbis 2 MPEG-2 I, II MP3 AAC-LC aacPlus v1, v2 AC-3 5.1ch WMA DD+ 5.1ch DD+ 7.1ch Dolby TrueHD DTS DTS Surround AMR-NB AMR-WB SBC G.7xx
Page 18 VideoVideo ProcessorProcessor ArchitectureArchitecture
Media Processors
Programmable Dedicated Reconfigurable
Special General Modular Monolithic Fine grained Coarse grained Purpose Purpose
DSP Video Processors CISC RISC
With extended Without extended With extended Without extended ISA ISA ISA ISA
Page 19 VariousVarious WaysWays ofof APAP AccelerationAcceleration
In-Pipe instructions: Short latency (1~5 cycles) allow/need tightly mixing with core instructions/states Using core resources such as register file Light semantics: such as SIMD instructions (32/64/128-bit data) Co-processor instructions: Longer latency (10~100 cycles) loosely coupling with core instructions/states Mostly operating on cop states (registers/memory) Heavier semantics such as macroblock DCT/VLC or crypto engine. “Hardwired engines”: High overhead to set up (going thru shared bus) Limit to even larger chunk of data processing (say, frame or slice) Comparison: Operation granularity determines programmability and efficiency. Choose the right combination to gain what is needed at the right place.
Page 20 AndeStarAndeStar Instruction Instruction ExtensionsExtensions
In-pipe and Coprocessor Instruction Set Extension
AndesCore Integer Data path In-Core Data path
m ra u x 32 m rb u x 32 Register File custom custom ALU Shifter Mul/Div execution engine state bypass rt 32
Cop Interface instruction Coprocessor Pipeline decode Cop IQ execute data movement Cop decode Cop load/store Cop execute state write back exception control Cop complete
Page 21 DesignDesign ConsiderationsConsiderations ofof VideoVideo IPIP
Architecture Programmable? • Is it really the same hardware for different codecs? Roadmap to HD • Same architecture from SD to HD? Benchmark MHz vs. bitrate MHz vs. memory latency Area • SRAM size and type of SRAM (1-port, 2-port, dual-port) Fmax for a certain process DRAM bandwidth AV sync System layer Multimedia framework API layer OS layer
Page 22 DesignDesign ConsiderationsConsiderations ofof VideoVideo IPIP
Required product features Complete format support HD 1080p Platform based, especially SW Design priorities • 1st – Minimal gate count/memory size • 2nd - Frequency required for decoding a worst-case Allegro sequence is 30% under Fmax in TSMC 90nm G with 30 cycles of memory latency • 3rd - Memory bandwidth no greater than that needed just for reference frame reads • 4th – Power in 300-500mW range at 250MHz
Page 23 IdealIdeal SolutionSolution RoadmapRoadmap
Frame Size Example Applications Solution (Resolution) QCIF Toy LCD Displays Software on Baseline ISA (176x144/120) QVGA Smartphone, PMP Software on extended video ISA (320x240) CIF VHS, VCD, Smartphone, (352x288/240) PMP VGA Smartphone, PMP, Software on DSP or video co- (640x480) Computer processor D1 PVR, TV, STB (720x576/480) DVD Player, DVD Recorder
High Definition HD Recorder Hardwired engine (720p, 1080i, 1080p) HD PVR HDTV, IPTV
Page 24 AudioAudio ProcessorProcessor –– Converging Converging GPPGPP andand DSPDSP ArchitecturesArchitectures
DSP DSP/GPP GPP
GPP with application-specific ISA
Page 25 AndesAndes AudioAudio ExtensionsExtensions
Configurable instruction set Basic multiply operations • 32x32 signed/unsigned multiplication • Multiply-and-add, multiply-and-subtract • 64/56-bit data accumulate register Multiply with memory operations • Read 2 32-bit source data • Perform 32x32 signed/unsigned multiply, multiply-and-add, or multiply-and-subtract • Write the result to 64/56-bit data accumulate register • Load 2 registers from memory • Update the load addresses Memory operations • Support 32-bit and 24-bit load/store • Load/store address updated concurrently Data processing operations • Bit stream packing and extraction • Register move • Add/subtract with memory load Special addressing modes • Linear, modulo, and bit reverse • Post or pre increment Support 32-bit or 24-bit audio data word Support zero overhead looping
Page 26 N1233FN1233F –– High High PerformancePerformance ApplicationApplication ProcessorProcessor withwith Multi-CoreMulti-Core SupportSupport
Features: Harvard architecture, 8-stage pipeline. JTAG/EDM EPT I/F FPU COP I/F Up to 4 processors in a SMP system Support L2 cache N12 execution core Dynamic branch prediction ITLB DTLB Fully clock gated pipeline MMU/MPU Co-processor interface Floating point unit MMU with HW page table Instruction Instruction Data Data walker LM &LM/IF Cache Cache LM & LM/IF AHB or HSMP(AXI like) bus Power management instructions DMA Embedded program tracer Applications: External Bus Interface Mobile internet device (MID) Netbook Smart phone Gateway/Router AHB HSMP Home entertainment
Page 27 Cost-benefitCost-benefit ofof Multi-ProcessorMulti-Processor SystemsSystems
Multiprocessors offer performance boost for threaded applications Typical embedded workload is now around 2000 DMIPS A dual N1233 implementation can achieve this in a 90nm process Performance/Area and Power/Area increase by <10% Assumes fixed clock frequency target For applications which need higher absolute performance, a dual processor is a good trade off Other ways to get the same performance Superscalar processor capable of running up to 1GHz Symmetric Multi-threaded processor capable of obtaining a 60% performance However, each of these alternate approaches come with drawbacks: The superscalar processor is enormous, and while its maximum clock frequency is larger, its performance is around the same as a Dual Processor. Performance/Area and Performance/Power suffer greatly. The SMT processor can barely get the same performance, and its larger size and increased power lead to a poor Performance/Area and Performance/Power
Page 28 AndesAndes LinuxLinux MIDMID SoftwareSoftware StackStack
VNCVNC ServerServer WebkitWebkit FireFoxFireFox HomeHome GamesGames GoogleGoogle TalkTalk
RemoteRemote DesktopDesktop MusicMusic PlayerPlayer MediaMedia PlayerPlayer ImageImage ViewerViewer OpenOfficeOpenOffice SkypeSkype
MailMail ClientClient PIMPIM AdobeAdobe FlashFlash PDFPDF E-bookE-book ReaderReader MSNMSN
DesktopDesktop AppApp andand UpdateUpdate ManagerManager
Application;Application; MediaMedia FrameworkFramework andand MiddlewareMiddleware
JblendJblend (JVM) (JVM) WindowWindow ManagerManager VOIPVOIP WebkitWebkit GStreamerGStreamer
QT/EmbeddedQT/Embedded SSLSSL SDLSDL CodecsCodecs GTK+GTK+ Middleware
X11,X11, Cairo,Cairo, Pango,Pango, Matchbox,Matchbox, D-Bus,D-Bus, etc.etc. Ethernet,Ethernet, WiFi,WiFi, DHCP,DHCP, DNS,DNS, Samba,Samba, TFTPTFTP
I/OI/O andand NetworkNetwork DriversDrivers LinuxLinux KernelKernel && LibrariesLibraries PowerPower ManagersManagers Linux 2.6Linux Applications
Page 29 AndesAndes OSOS RoadmapRoadmap
nt lie d Current State i
C Future State in dro Th An
ile ob t M rne te me In ice e ro ev ar Ch D lew dd Mi
2008 2009 2010 2011
Page 30 PlatformPlatform SoCSoC
AndesCore Dual Flash Memory Interfaces N1233F N1233F Interface
256KB L2 Cache
High Speed Video Codec 2D/3D Connectivity Engine Low Graphics Speed Engine Peripherals Audio Interface HD Display Output
Page 31