|HAI LALA AT MATALAUUS009870301B2 DAN TANAMAN MAN (12 ) United States Patent ( 10 ) Patent No. : US 9 , 870 ,301 B2 Kurts et al. ( 45 ) Date of Patent: Jan . 16 , 2018 (54 ) HIGH - SPEED DEBUG PORT USING (58 ) Field of Classification Search STANDARD PLATFORM CONNECTIVITY CPC ...... GO6F 11 / 2236 ; G06F 11 / 3656 ; G06F 11 / 3648 ; G06F 11 /3476 ; G06F 11 /3024 ; (71 ) Applicant: Corporation , Santa Clara , CA G06F 11 /323 ; G06F 11 /3409 ; G06F (US ) 11 /3636 ; G06F 11 /3466 ; HO5K 999 /99 ; GOIR 31 / 318555 (72 ) Inventors : Tsvika Kurts , Haifa ( IL ) ; Eilon Hazan , USPC ...... 714 /30 Hogla ( IL ) ; Sean T . Baartmans , See application file for complete search history . Portland , OR (US ) ; Marcus R . Winston , Aloha , OR ( US) ; Rony References Cited Ghattas, Hillsboro , OR (US ) ; Arie ( 56 ) Bernstein , Kfar Saba ( IL ) ; Todd M . U .S . PATENT DOCUMENTS Witter , Orangevale , CA (US ) ; Marcelo 6 ,041 , 406 A * 3 /2000 Mann ...... G06F 11/ 348 Yuffe , Binyamina ( IL ) 712 / 227 8 ,826 ,081 B2 * 9 /2014 Hopkins ...... GOOF 11/ 3636 ( 73 ) Assignee : Intel Corporation , Santa Clara , CA 714 / 27 (US ) 2010/ 0318848 A1 * 12/ 2010 Yang ...... GO6F 11/ 3656 714 / 30 ( * ) Notice : Subject to any disclaimer , the term of this patent is extended or adjusted under 35 (Continued ) U .S .C . 154 (b ) by 124 days . Primary Examiner — Matt Kim Assistant Examiner — Katherine Lin (21 ) Appl. No. : 14 /231 ,240 (74 ) Attorney, Agent, or Firm — Lowenstein Sandler LLP (22 ) Filed : Mar. 31, 2014 (57 ) ABSTRACT A processing device comprises a debug port controller to (65 ) Prior Publication Data monitor operations of the processing device to determine US 2015/ 0278058 A1 Oct . 1, 2015 whether the processing device is operating in a first mode or a second mode and to collect trace information comprising (51 ) Int. Ci. operating characteristics of the processing device . The pro G06F 11/ 00 ( 2006 .01 ) cessing device further comprises a display engine logic to G06F 11/ 34 ( 2006 .01 ) process display data for output to a display device . In G06F 11 /30 ( 2006 . 01 ) addition , the processing device comprises a display engine G06F 11 / 32 ( 2006 .01 ) interface to provide , to a plurality of existing platform G06F 11/ 36 ( 2006 .01 ) connectors , the display data from the display engine logic (52 ) U . S . CI. when the processing device is operating in the first primary CPC ...... G06F 11/ 3476 (2013 .01 ) ; G06F 11 /3024 mode and the trace information from the debug port con ( 2013 .01 ) ; G06F 11/ 323 (2013 .01 ) ; G06F troller when the processing device is operating in the second 11 /3409 ( 2013 .01 ) ; G06F 11 / 3636 ( 2013 .01 ) ; mode as determined by the debug port controller . GO6F 11/ 3656 ( 2013 .01 ) ; G06F 11 /3466 (2013 .01 ) 20 Claims, 10 Drawing Sheets

Processor Debugging 2600

Receive debug status value from debug tool F 610 Store debug status value in debug status register 4620

Access debug status register to read debug status value 2 630 2670 Provide control signal to Is processing multiplexers K device operating in first mode of indicating operation ? Z an debug NO operation YES Provide control signal to multiplexers indicating normal operation 650 2680 Output trace Output display data to existing platform information to connectors existing 2 - 660 platform connectors

Finish US 9, 870 ,301 B2 Page 2 ( 56 ) References Cited U . S . PATENT DOCUMENTS 2012/ 0110353 Al * 5 / 2012 Ehrlich ...... GO6F 1 / 3203 713 / 300 2013 /0007537 A1 * 1 / 2013 Kanematsu ...... GO6F 11/ 3055 714 /49 2013/ 0048372 A1 * 2/ 2013 Overby ...... G09G 3 / 006 174 / 70 R * cited by examiner U . S . Patent Jan . 16 , 2018 Sheet 1 of 10 US 9 ,870 ,301 B2

100 - Special DebugTool 180 TraceBox 160

Port 170 150 Display Connectors Fig.1

ComputingPlatform102 Display I/O 140 ProcessingDevice110 ExecutionUnit112 DebugPortController120 KA DisplayEngine 130 U . S . Patent Jan . 16 , 2018 Sheet 2 of 10 US 9 ,870 , 301 B2

Platform 102

Processing Device 110

Debug L0 - 3 Port Controller L4 - 7 120 CTRL CLK v 241 W251 DDIO DE1 " W 242 252 MUX DDI1 245 Display DE2 Engine 130 W 243 253 MUX DDI2 DE3 246 Fes- Wtoe 244 W 254 DD13 DE4 MUX Be LimexLt 2474DDI3247 _

Fig . 2 U . S . Patent Jan . 16 , 2018 Sheet 3 of 10 US 9 ,870 , 301 B2

Debug Port Controller 120

Processor Monitoring Module 310

Debug Information Module 320

Debug Status Register Interface Module 330

Multiplexer Control Module 340

Processing Device 110

Data Store 350

Debug Status Register 352

Fig . 3 U . S . Patent Jan . 16 , 2018 Sheet 4 of 10 US 9 ,870 ,301 B2

3210 0x00xO0x00xO

54 OxO Description HDPORT_EN DDIO_used HDMIDPO DDI1_used HDMIDP1 DDI2used HDMI1DP2 DDI3_used HDMIDP2 RSRV RSRV RSRV DPLLO_used DPLL1_used DPLL2_used DPLL3_used

Fig.4A Fig.4B DebugStatusRegister352 9876 OxO Bit 11 12 13 14 15

10 Oxo 1211 0x00x0 13 OxO

14 Oxo

15 OxO U . S . Patent Jan . 16 , 2018 Sheet 5 of 10 US 9 ,870 , 301 B2

Processor Debugging Mode 500

Monitor operations of processing device R 510

Collect trace information for processing device 2 520

Process display data for output to display 2 560 Provide trace information to Is display existing device connected to existing - 540 platform NO platform connectors ? connectors YES Provide display data to existing platform connectors R 550550

Finish

Fig . 5 U . S . Patent Jan . 16 , 2018 Sheet 6 of 10 US 9 ,870 , 301 B2

Processor Debugging 2 600

Receive debug status value from debug tool R 610

Store debug status value in debug status register R 620

Access debug status register to read debug status value 2 630 2670 Provide control signal to Is processing multiplexers device operating in first mode of indicating operation ? 640 debug NO operation YES Provide control signal to multiplexers indicating normal operation 680 Output trace Output display data to existing platform information to connectors existing R 660 platform connectors

Finish

Fig . 6 atent Jan . 16 , 2018 Sheet 7 of 10 US 9 ,870 , 301 B2

728

734 DATASTORAGE CODEANDDATA DATA MEMORY 716c 730-

IMC -720 PROCESSOR AVIONOAUDIO10 724 272776778750788786782770PPPPA[780 PP-794CHIPSET798– 790796VF792IF 1727 Fig.7 752 714 COMM ODEVICESI/ODEVICES DEVICES

1 PROCESSOR IMC 739 MOUSE HIGH-PERF 738 |BUSBRIDEBUSBRIDGE 718 KEYBOARD MEMORY 732 GRAPHICS 700 U . S . Patent Jan . 16 , 2018 Sheet 8 of 10 US 9 ,870 ,301 B2

BUS UNIT(S) 816 DISPLAYUNIT 840 SYSTEMAGENT UNIT 810 CONTROLLER

DMAUNIT 832 CORE802A CACHE UNIT(S) 804(a) ------SHAREDCACHEUNIT(S) INTERCONNECTUNIT(S)812 820 806 - SRAMUNIT 830 APPLICATIONPROCESSOR Fig.8 CORE802A UNIT(S) 804(a) CACHE UNIT(S) INTEGRATED MEMORY CONTROLLER 814 OKMEDIAPROCESSOR(S) 818 INTEGRATED GRAPHICS 808 IMAGEPROCESSOR 824 AUDIOPROCESSOR 826 VIDEOPROCESSOR 828

800 U . S . Patent Jan. 16 , 2018 Sheet 9 of10 US 9, 870 . 301 B2

Bluetooth1970 3G Modem 1975 GPS 1980 902.11 WiFi 985

LCD DPHDMI ??|LCD Video / 1925 PC 1950 Flash\965 Video Codec 1920 " Flash Controller 945 GPU 915 Fig.9 910 DRAM0601 940 900~ 910. SDRAM Core1907 Interconnect Controller 909Contro/CacheL2 BusInterfaceUnit/L2Cache BootROM 1935 Core 1909 1906 SIM 930 11717(Power Control 1955 U . S . Patent Jan . 16 , 2018 Sheet 10 of 10 US 9 ,870 , 301 B2

FrontEnd1001 MicrcodeROM 1032 UOPQueue 1034 FPMove1024 SimpleFPScheduler 1006 FPRegisterFile/Bypass Network 1010 ToLevel1Cache 1022FP

Prefetcher1026 Decoder1028 TraceCache1030 Instruction Instruction SlowALU1020 Integer/FloatingPointUopQueue Slow/GeneralFPScheduler 1004 Fig.10 Allocator/RegisterRenamer FastALU1018 Dub FastScheduler1002 IntegerRegisterFile/BypassNetwork1008 FastALU1016

Processor1000 AGU1014 MemoryUop ToLevel1Cache OutOfOrderEngine1003 Queue MemoryScheduler AGU1012

Exe Block 1011 US 9 ,870 , 301 B2 HIGH -SPEED DEBUG PORT USING will be apparent to one skilled in the art, however , that at STANDARD PLATFORM CONNECTIVITY least some embodiments of the present invention may be practiced without these specific details. In other instances , TECHNICAL FIELD well -known components or methods are not described in 5 detail or are presented in simple block diagram format in This disclosure relates to the field of digital processing order to avoid unnecessarily obscuring the present inven devices and , in particular, to a high - speed debug port using tion . Thus, the specific details set forth are merely exem standard platform connectors . plary . Particular implementations may vary from these exemplary details and still be contemplated to be within the BACKGROUND 10 scope of the present invention . Described herein is a method and system for processor Debugging is a methodical process of finding and reduc debugging using existing high -speed port connectors . Con ing the number of defects ( i. e . , " bugs ” ) in a piece of electronic equipment or a computer program running ventional microprocessors generally do not have a closed thereon . Various debug techniques can be used to detect 15 chassis debug capability (i .e ., the ability to perform a anomalies , assess their impact , and schedule hardware debugging operation without physically opening the com changes, software patches or full updates to a system . The puting platform chassis to access the processor) . The ability goals of debugging include identifying and fixing bugs in the to debug without opening the system , however , has become system ( e. g. , logical or synchronization problems in the a strong time - to -market demand from original equipment code , or a design error in the hardware ) and collecting 20 manufacturers ( OEMs) , and the silicon companies that sup information about the operation of the system that may then port the OEMs, since they spend large amounts of effort and be used to analyze the system to find ways to boost its time trying to probe processor signals in very condensed performance or to optimize other important characteristics. system ( e . g ., a tablet or ultrabook ) . In previous digital system designs, including processor architectures , a dedi BRIEF DESCRIPTION OF THE DRAWINGS 25 cated processor port was used to access debug information in an open chassis debug operation . In an effort to reduce die The present disclosure is illustrated by way of example , size , future processor architectures may no longer include and not by way of limitation , in the figures of the accom the dedicated port , which will also save cost and power. This panying drawings. FIG . 1 is a block diagram illustrating a computing plat - 30 may result however, in a decrease in debug capability . form with an existing high - speed port connector that is also In order to enable a closed chassis debug , save time- to used for transporting debug and performance data , accord market in the processor development stage , and provide ing to an embodiment. realistic powermanagement analysis , in one embodiment, a FIG . 2 is a block diagram illustrating additional details of processing device outputs internal trace information through a computing platform with a processing device and an 35 existing platform connectors, such as a High -Definition existing high -speed port connector used to export trace data , Multimedia Interface (HDMI ) or Display Port (DP ) connec according to an embodiment. tors . The trace information or “ trace data " can include , FIG . 3 is a block diagram illustrating a debug port debug, performance , and /or other information about opera controller in a processing device , according to an embodi tion of the system . In one embodiment, any kind of data ment . 40 related to system performance or operation can be sent FIG . 4A is a diagram illustrating a debug status register in across this link and will fit within an intended use of the a debug port controller, according to an embodiment. existing high - speed port to get data off the chip ( or out of the FIG . 4B is a diagram illustrating a table defining the bits system ) without opening the chassis . This is a unique of a debug status register , according to an embodiment. method for running a debug protocol ( e . g ., Aurora ) on top of FIG . 5 is a flow diagram illustrating a method for pro - 45 the existing HDMI electrical infrastructure . The debug tech cessor debugging using standard high -speed port connec niques described herein support debug operations at extreme tors , according to an embodiment. conditions where other conventional debug hooks fail to FIG . 6 is a flow diagram illustrating a method for pro deliver , such as during a reset sequence, at a low power state , cessor debugging using standard high -speed port platform etc . In addition , this is a non - intrusive solution that provides connectors , according to an embodiment . 50 debug capabilities on all processor dies , including those FIG . 7 is a block diagram of a computer system , according without a dedicated debug port . In other embodiments , other to an embodiment . high - speed ports that are accessible in a closed - chassis FIG . 8 is a block diagram of a system on chip (SoC ) in system , such as eSATA , , USB , and other ports , all accordance with an embodiment of the present disclosure . are potential candidates for conveying trace information as FIG . 9 is a block diagram of an embodiment of a system 55 described herein . The remainder of this document will refer on - chip (SOC ) design in accordance with an embodiment of to an embodiment using display ports ( e . g . , HDMI or DP ) , the present disclosure . however, it should be understood that similar principles may FIG . 10 illustrates a block diagram of the micro -archi - apply when using other ports . tecture for a processor in accordance with one embodiment In one embodiment, the debug solution described herein of the present disclosure . 60 makes intelligent reuse of HDMI/DP interfaces , but not the Display Engine logic . The platform display connectors can DETAILED DESCRIPTION function normally ( i. e ., to provide display data ) in one mode of operation , but in case of a system anomaly , can be The following description sets forth numerous specific configured to provide trace information in a second mode . details such as examples of specific systems, components , 65 For example , the user may unplug the regular display cable methods , and so forth , in order to provide a good under - from the platform display connector and connects instead a standing of several embodiments of the present invention . It trace box or other collection or storage device. This allows US 9 ,870 ,301 B2 for the export of trace information and storage in the trace microcode is potentially updateable to handle logic bugs/ box memory for later extraction to a remote host for post fixes for processing device 110 . processing and analysis . In one embodiment, processing device 110 includes FIG . 1 is a block diagram illustrating a computing plat- debug port controller 120 , display engine 130 , and display form with an existing high - speed port connector that is also 5 engine interface 140 . Debug port controller 120 may be used for transporting debug and performance data , accord - designed to perform operations related to debugging pro ing to an embodiment. In one embodiment, the computing cessing device 110 . As used herein , the term debugging environment 100 includes a computing platform 102 with a refers to a process of finding and reducing the number of processing device 110 . Computing platform 102 may be a defects (i . e ., “ bugs” ) in processing device 110 or in a personal computer (PC ) either laptop or desktop , a subnote - 10 computer program running on processing device 110 . The book or ultraportable PC , a tablet PC , a set- top box (STB ) , debugging process aims to identify and potentially remedy a Personal Digital Assistant (PDA ) , a cellular telephone , a bugs in the operation of processing device 110 and includes web appliance , a server , a network router , switch or bridge , collecting information about the operating states of process or any machine capable of executing a set of instructions ing device 110 that may be used to analyze processing ( sequential or otherwise ) that specify actions to be taken by 15 device 110 to find ways to boost its performance or to that machine . Further, while only a single computing plat - optimize other operational characteristics. In one embodi form is illustrated , the term “ computing platform " or " com - ment, debug port controller 120 monitors operations of puting device ” shall also be taken to include any collection processing device 110 and collects trace information from of machines that individually or jointly execute a set (or processing device 110 . In one embodiment, debug port multiple sets ) of instructions to perform any one or more of 20 controller 120 determines , based on the monitoring of opera the methodologies discussed herein . tions , whether processing device 110 is operating in a first , Processing device 110 may be , for example , a multi - core primary , operation mode or in a second , debug , operation processor including multiple cores . These cores may be mode . In one embodiment, debug port controller 120 physical processors , and may include various components includes a debug status register storing a value indicating the such as front end units , execution units and back end units . 25 mode of operation . In one embodiment, the value in the Processing device 110 may represent one or more general- debug status register is received from an external special purpose processing devices such as a microprocessor, cen debug tool 180 connected to computing platform 102 tral processing unit , or the like . Processing device 110 may through communications port 170 ( e .g . , a universal serial implement a complex instruction set computing (CISC ) (USB ) port ) or other connector . Depending on the mode architecture , a reduced instruction set computer (RISC ) 30 of operation , debug port controller 120 may cause display architecture , a very long instruction word (VLIW ) architec engine interface 140 to output either display data from ture , or other instruction sets , or a combination of instruction display engine 130 or trace information through existing sets , through translation of binary codes in the above men - display connectors 150 on computing platform 102 . In one tioned instruction sets by a compiler. embodiment, the debug status register in debug port con Processing device 110 may employ execution units 35 troller 120 is mirrored to a read - only copy of the register in including logic to perform algorithms for process data , such display engine 130 . Each time a change is made to the as in the embodiments described herein . Processing device register in debug port controller 120 , the copy in display 110 is representative of processing systems based on the engine 130 may be updated as well. This allows a coexis IIITM , PENTIUM 4TM , XeonTM , , tence between the display engine 130 and the debug port XScaleTM and / or StrongARMTM microprocessors available 40 controller 120 and allows display engine 130 to output from Intel Corporation of Santa Clara , Calif ., although other display data to the DDI( s ) that is not being used by debug systems ( including PCs having other microprocessors, engi - port controller 120 . neering workstations, set - top boxes and the like ) may also be In one embodiment, display connectors 150 are existing used . In one embodiment, computing platform 102 executes connectors on computing platform 102 . In conventional a version of the WINDOWSTM operating system available 45 systems, display connectors 150 may be used only to output from Microsoft Corporation of Redmond , Wash ., although display information from computing platform to an attached other operating systems (UNIX and Linux for example ) , display device . In one embodiment, debug port controller embedded software , and /or graphical user interfaces, may 120 repurposes display connectors 150 as a means to export also be used . Thus, embodiments of the present invention are trace information pertaining to processing device 110 . In one not limited to any specific combination of hardware circuitry 50 embodiment , a trace box 160 is connected to computing and software . platform 102 through display connectors 150 . Trace box 160 In this illustrated embodiment , processing device 110 may be any type of computing device or storage device that includes one or more execution units 112 to implement an can temporarily ( or permanently ) store the trace information algorithm that is to perform at least one instruction . One received through display connectors 150 . In one embodi embodiment may be described in the context of a single 55 ment , analysis of the trace information may be performed by processor desktop or server system , but alternative embodi- trace box 160 . In another embodiment , the trace information ments may be included in a multiprocessor system . The is later transferred from trace box 160 to another computing processing device 110 may be coupled to a processor bus system for analysis of the trace information . Trace box 160 that transmits data signals between the processing device can capture hardware and software trace information and 110 and other components in the platform 102 , such as a 60 can also perform protocol conversion . For example , trace memory , etc . box 160 can capture data in the Aurora protocol or in a Execution unit 112 , including logic to perform integer and display protocol and perform encoding for data storage for floating point operations , also resides in the processing to a remote host. In one embodiment, display device 110 . The processing device 110 , in one embodiment, connectors 150 may include digital display interface (DDI ) includes a microcode (ucode ) ROM to store microcode, 65 connectors , such as high -definition multimedia interface which when executed , is to perform algorithms for certain (HDMI ) connectors, display port (DP ) connectors, or a macroinstructions or handle complex scenarios. Here, combination of these and /or other connectors . US 9 ,870 ,301 B2 FIG . 2 is a block diagram illustrating additional details of and 247 to cause them to pass the trace information from a computing platform with a processing device and an outputs L0 - 3 and L4 - 7 of debug port controller 120 to PHY existing high -speed port connector used to export trace data , ports 242 , 243 , 244 and existing platform display connectors according to an embodiment. As described above , comput 252 , 253 , 254 . ing platform 102 includes processing device 110 and mul- 5 FIG . 3 is a block diagram illustrating a debug port tiple display connectors 251 -254 . In one embodiment , pro - controller in a processing device, according to an embodi cessing device 110 includes debug port controller 120 , ment. In one embodiment, debug port controller 120 display engine 130 and display interface 140 . In one includes processor monitoring module 310 , debug informa embodiment, display interface 140 includes physical layer tion module 320 , debug status register interface module 330 ( PHY ) interface ports 241 - 244 . Each of PHY ports 241- 244 10 and multiplexer control module 340 . This arrangement of may correspond to a different one of display connectors modules and components may be a logical separation , and in 251- 254 on platform 102 . In one embodiment , display other embodiments , these modules or other components can interface 140 further includes multiplexers 245 , 246 and be combined together or separated in further components , 247 . In one embodiment, multiplexer 245 falls in the signal according to a particular embodiment. In one embodiment, path of PHY port 242 and display connector 252, multi - 15 data store 350 is connected to debug port controller 120 and plexer 246 falls in the signal path of PHY port 243 and includes debug status register 352 . In one embodiment, display connector 253 , and multiplexer 247 falls in the processing device 110 may include both debug port control signal path of PHY port 244 and display connector 254 . In ler 120 and data store 350 . In another embodiment, data one embodiment , PHY port 241 and display connector 251 store 350 may be external to processing device 110 and may do not include a multiplexer in the signal path as they may 20 be connected to debug port controller 120 over a network or be reserved for display data output, regardless of the mode other connection . In other embodiments , processing device of operation . In one embodiment, PHY port 241 is reserved 110 may include different and / or additional components as an embedded display port ( DP ) . In one embodiment, which are not shown to simplify the description . Data store multiplexers 245 , 246 and 247 are controlled by a control 350 may include a main memory , such as read -only memory signal (CTRL ) received from debug port controller 120 . In 25 (ROM ), flash memory , dynamic random access memory one embodiment , a phase locked - loop (PLL ) in display (DRAM ) ( such as synchronous DRAM (SDRAM ) or Ram engine 130 generates a clock signal for the functional logic . bus DRAM (RDRAM ) , etc . ), or a static memory , such as This clock signal CLK may be provided to debug port flash memory , static random access memory (SRAM ) , etc . controller 120 to control the logic there as well. In other embodiments , data store 350 may include some In one embodiment, each of multiplexers 245 , 246 and 30 other type of storage device for storing information in a form 247 receives display data from display engine 130 and trace ( e . g ., software, processing application ) readable by a information from debug port controller 120 as inputs . In one machine ( e. g ., a computer ). The data store 350 may include embodiment, display engine 130 processes display data for a machine -readable medium including , but not limited to , output to a display device and includes output DE1 , DE2 , magnetic storage medium ( e . g . , floppy diskette ) , optical DE3 and DE4 . In one embodiment, output DE1 is applied 35 storage medium ( e . g . , CD -ROM ) , magneto - optical storage directly to PHY port 241 , output DE2 is applied to multi medium , erasable programmable memory ( e . g . , EPROM plexer 245 , output DE3 is applied to multiplexer 246 and and EEPROM ) , flash memory ; or another type of medium output DE4 is applied to multiplexer 347. In one embodi - suitable for storing electronic instructions . ment, debug port controller 120 includes two output buses In one embodiment, processor monitoring module 310 each to carry four lanes of trace information . The first four 40 monitors the operations of processing device 110 . Processor lanes L0 - 3 may be applied to each of multiplexers 245 , 246 monitoring module 310 may observe and log actions per and 247 , while the second four lanes L4 - 7 may be applied formed by execution unit 112 , display engine 130 , display to multiplexers 246 and 247 . The bandwidth allowed by any interface 140 and other components of processing device one of display connectors 252 , 253, 254 may be insufficient 110 . In one embodiment, this monitoring includes tracking to transfer trace information from all eight output lanes LO - 7 45 the values stored in various processor registers , including for of debug port controller 120 at once . Thus , in one embodi- example , debug status register 352 . This data may be gath ment, two display connectors may be used as once to ered during the various operating stages of processing transfer trace information to trace box 160 ( e . g . , display device 110 , such as during boot- time, during normal opera connector 252 and one of either display connector 253 or tion , or during shut - down. 254) . The number of digital display interfaces and corre - 50 In one embodiment, debug information module 320 gen sponding multiplexers used can vary depending on the erates trace information that can be used during a debugging embodiment and /or on bandwidth requirements of the sys - process . For example, debug information module 320 may tem . For example , certain platforms may include two , three , collect , arrange , organize or summarize the data gathered by or more multiplexers . processor monitoring module 310 . Debug information mod In one embodiment, debug port controller 120 determines 55 ule 320 may create a series of reports to be output to trace whether the processing device 110 is operation in a first, box 160 over display connectors 150 that identify anoma primary , operational mode or a second , debug, operational lies, discrepancies, defects or errors that occur in processing mode by accessing a debug status register storing a value to device 110 . In another embodiment , debug information indicate the mode of operation . If the value in the debug module 320 may provide just the raw data from processor status register indicates the primary mode , debug port con - 60 monitoring module 310 for analysis by another external troller 120 may provide a control signal CTRL to multiplex - computing system . In one embodiment, the trace informa ers 245 , 246 and 247 to cause them to pass the display data tion comprises at least one of power consumption data , from outputs DE2 , DE3 and DE4 of display engine 130 to protection ring data , memory access data , processing device PHY ports 242 , 243 , 244 and existing platform display interconnect data , or other data . The power consumption connectors 252 , 253 , 254 . If the value in the debug status 65 data can include , for example , Pcode power commands, register indicates the debug mode , debug port controller 120 voltage regulator responses , thermal information , package may provide a control signal CTRL to multiplexers 245 , 246 and core C state residency , etc . The protection ring data can US 9 ,870 ,301 B2 include, for example , ring transactions , core and thread i. e ., PHY port 241 and display connector 251 ) is being used identifiers , graphics transactions, cache attributes , secure to output trace information . In one embodiment, however, enclave range transactions, etc . The memory access data can bit 1 is constantly not set ( e . g ., logic O ) since digital display include memory reads/ writes at a signal level which can interface 0 (DDIO — i . e . , PHY port 241 and display connec replace a logic analyzer, such as RAS , CAS , CMD , etc . The 5 tor 251 ) is reserved for use by output DE1 of display engine processing device interconnect data can include data sent 130 . over an on die package interconnect which replaces the In one embodiment , bit 2 is labeled HDMI DPO . If bit 2 direct media interface (DMI ) . Other trace information can is set, this indicates that DDIO is being used in HDMImode . include , for example , architectural event trace data , such as If bit 2 is not set , this indicates that DDIO is being used in WRMSR , RDMSR , core power events , interrupts , etc . , real 10 DP mode . In one embodiment, however , bit 2 is constantly time instruction trace data , or other data . not set since DDIO is reserved for use by output DE1 of In one embodiment, debug status register interface mod display engine 130 . ule 330 manages debug status register 352. Debug status In one embodiment, bit 3 is labeled DDI1 _ used . If bit 3 register interface module 330 may receive a debug status is set , this indicates that digital display interface 1 (DDI1 — value to be stored in debug status register 352 from special 15 i . e . , PHY port 242 and display connector 252) is being used debug tool 180 connected to computing platform 102 to output trace information and thus , should not be used by through communications port 170 . Upon receiving a debug output DE2 of display engine 130 . Accordingly , the control status value, debug status register interface module 330 may signal CTRL applied to multiplexer 245 should pass lanes write the debug status value into debug status register 352 . L0- 3 or L4 - 7 from debug port controller 120 . Either periodically or in response to a request for trace 20 In one embodiment, bit 4 is labeled HDMI_ DP1 . If bit 4 information , debug status register interface module 330 may is set, this indicates that DDI1 is being used in HDMImode . access debug status register 352 to determine the mode of If bit 4 is not set, this indicates that DDI1 is being used in operation of processing device 110 ( i . e . , primary mode or DPmode . HDMImode and DP mode are selected depending debug mode ) . Based on the value stored in debug status on whether the corresponding connector 252 is an HDMI register 352 , debug status register interface module 330 may 25 connector or a DP connector . Depending on the mode , the determine the operationalmode and provide an indication of debug data may be packed differently to accommodate the such mode to multiplexer control module 340 . Additional protocols used and the speed of the connector ( e .g ., 2. 97 details regarding debug status register 352 are provided GHz for HDMI or 5 .4 GHz for DP . below with respect to FIGS . 4A -4B . In one embodiment, bit 5 is labeled DDI2 _ used . If bit 5 In one embodiment, multiplexer control module 340 30 is set , this indicates that digital display interface 2 (DDI2 generates a control signal CTRL to be applied to multiplex - i . e . , PHY port 243 and display connector 253) is being used ers 245 , 246 and 247 . If debug status register interface to output trace information and thus, should not be used by module 330 determines that the processing device is in a output DE3 of display engine 130 . Accordingly , the control primary mode of operation , the control signal CTRL may signal CTRL applied to multiplexer 246 should pass lanes cause multiplexers 245 , 246 and 247 to pass the display data 35 LO - 3 or L4 - 7 from debug port controller 120 . from outputs DE2 , DE3 and DE4 of display engine 130 . If In one embodiment, bit 6 is labeled HDMI_ DP2 . If bit 6 debug status register interface module 330 determines that is set, this indicates that DDI2 is being used in HDMImode . the processing device is in a debug mode of operation , the If bit 6 is not set, this indicates that DDI2 is being used in control signal CTRL may cause multiplexers 245 , 246 and DP mode . 247 to pass the trace information from outputs L0 - 3 and 40 In one embodiment, bit 7 is labeled DD13 _ used . If bit 7 L4 - 7 of debug port controller 120 . is set, this indicates that digital display interface 3 is being FIG . 4A is a diagram illustrating a debug status register in used to output trace information and thus , should not be used a debug port controller, according to an embodiment. In one by output DE4 of display engine 130 . Accordingly , the embodiment, debug status register 352 can be used to store control signal CTRL applied to multiplexer 247 should pass a debug status value received from special debug tool 180 45 lanes LO - 3 or L4 - 7 from debug port controller 120 . through communications port 170 . In one embodiment, In one embodiment, bit 8 is labeled HDMI_ DP3 . If bit 8 debug status register 352 includes a horizontal row of 16 bits is set, this indicates that DDI3 is being used in HDMImode . which is overwritten each time a new debug status value is If bit 8 is not set, this indicates that DDI3 is being used in received . Each bit in the debug status value represents a DP mode . different element of the mode of operation of processing 50 In one embodiment, bits 9 - 11 are labeled RSRV and are device 110 . Descriptions of each bit are provided below with reserved and not otherwise used . Bits 9 - 11 may have a null respect to FIG . 4B . In other embodiments , debug status value , a value of logic 0 , or some other value . register 352 may be arranged in some other fashion . In other In one embodiment, bit 12 is labeled DPLLO _ used , bit 13 embodiments , debug status register 352 may be some other is labeled DPLL1 _ used , bit 14 is labeled DPLL2 _ used , and form of data structure . 55 bit 15 is labeled DPLL3 _ used . In one embodiment, display FIG . 4B is a diagram illustrating a table defining the bits engine 130 includes four PLLs that are used to clock signals of a debug status register, according to an embodiment. In sent from debug port controller 120 . Bits 12 - 15 indicated one embodiment, bit 0 is labeled HDPORT _ EN . If bit 0 is which of those PLLs is currently being used . set ( e . g . , logic 1 ) , this indicates that the processing device is FIG . 5 is a flow diagram illustrating a method for pro in the second , debug, mode of operation and trace informa- 60 ?essor debugging using standard high - speed port connec tion is to be output through existing display connectors 150 . tors , according to an embodiment. The method 500 may be If bit 0 is not set ( e . g ., logic 0 ), this indicates that the performed by processing logic that may comprise hardware processing devices is in the first, primary , mode of operation ( e . g ., circuitry , dedicated logic , programmable logic , micro and display data is to be output through existing display code , etc . ) , software ( e . g ., instructions run on a processing connectors 150 . 65 device to perform hardware simulation ) , or a combination In one embodiment, bit 1 is labeled DDIO _ used . If bit 1 thereof . The processing logic may output trace information is set, this indicates that digital display interface 0 ( DDIO — over existing platform connectors allowing for high - speed US 9 ,870 ,301 B2 10 debug operations in a closed -chassis (or open -chassis ) sys - debug operations in a closed - chassis system . In one embodi tem . In one embodiment, the method 500 is performed by ment, the method 600 is performed by debug port controller debug port controller 120, as shown in FIGS. 1 -3 . 120 , as shown in FIGS. 1 - 3 . Referring to FIG . 5 , at block 510 , method 500 monitors Referring to FIG . 6 , at block 610 , method 600 receives a the operations of processing device 110 . In one embodiment, 5 debug status value . In one embodiment, debug status register processor monitoring module 310 monitors the operations of interface module 330 may receive a debug status value to be stored in debug status register 352 from special debug tool processing device 110 . Processor monitoring module 310 180 connected to computing platform 102 through commu may observe and log actions performed by execution unit nications port 170 . 112, display engine 130 , display interface 140 and other 10 Atblock 620 ,method 600 stores the debug status value in components of processing device 110 . debug status register 352 . Upon receiving a debug status At block 520 , method 500 collects trace information for value, debug status register interface module 330 may write the processing device 110 . In one embodiment, trace infor the debug status value into debug status register 352. In one mation module 320 generates trace information that can be embodiment, debug status register interface module 330 used during a debugging process . For example , debug 15 may overwrite the previously value in debug status register information module 320 may collect, arrange , organize or 352 with the newly received value. summarize the data gathered by processor monitoring mod Atblock 630 , method 600 accesses debug status register ule 310 . The trace information may be used to identify 352 to read the debug status value . Either periodically or in anomalies , discrepancies , defects or errors that occur in response to a request for trace information , debug status processing device 110 . 20 register interface module 330 may access debug status At block 530, method 500 processes display data for register 352 to determine the mode of operation of process output to a display . In one embodiment, display engine 130 ing device 110 ( i. e . , primary mode or debug mode ). may couple an image memory or other image source data to At block 640 , method 600 determines if the processing a display device such that video or image data is processed device 110 is operating in a first , primary ,mode of operation . and properly formatted for the particular display device . 25 Based on the value stored in debug status register 352 , debug Display engine 130 may convert image data that is retrieved status register interface module 330 may determine the from image memory into digital video or graphic display operationalmode and provide an indication of such mode to data that can ultimately be provided to a display device such multiplexer control module 340 . as a television , CRT device , LCD display panel , LED I f processing device 110 is operating in the first , primary , display panel , mobile device display screen , consumer prod - 30 mode of operation , at block 650 , method 600 provides a uct display screen , OLED display, projection display, laser control signal CTRL indicating normal operation to multi projection display or 3 - D display device . plexers 245 , 246 , 247 . In one embodiment, multiplexer At block 540 , method 500 determines if a display device control module 340 generates a control signal CTRL to is connected to the existing platform connectors. When a cause multiplexers 245 , 246 and 247 to pass the display data display device is connected , the processing device 110 is 35 from outputs DE2, DE3 and DE4 of display engine 130 . At operating in a first , primary , mode of operation . In another block 660 , method 600 outputs the display data to existing embodiment, debug status register interface module 330 platform connectors 150 . may access debug status register 352 to determine the mode I f processing device 110 is not operating in the first mode of operation of processing device 110 ( i . e . , primary mode or (i . e ., processing device 110 is operating in a second , debug , debug mode ) . Based on the value stored in debug status 40 mode ) , at block 670 , method 600 provides a control signal register 352 , debug status register interface module 330 may CTRL indicating debug operation to multiplexers 245 , 246 , determine the operational mode and provide an indication of 247 . In one embodiment, multiplexer control module 340 such mode to multiplexer control module 340 . generates a control signal CTRL to cause multiplexers 245 , If processing device 110 is operating in the first , primary , 246 and 247 to pass the trace information from outputs L0- 3 mode of operation , at block 550 , method 500 provides 45 and L4 - 7 of debug port controller 120 . At block 680 , method display data to existing platform connectors 150 . In one 600 outputs the trace information to existing platform con embodiment , multiplexer control module 340 generates a nectors 150 . control signal CTRL to cause multiplexers 245 , 246 and 247 For closed chassis debug systems, the value of this to pass the display data from outputs DE2, DE3 and DE4 of solution may depend on HDMI/ DP port availability . The display engine 130 . 50 solution is very flexible , as it allowsmultiple topologies with If processing device 110 is not operating in the first mode different bandwidths . In addition , sending debug traffic in ( i. e ., processing device 110 is operating in a second , debug , real time over the OPI, to the PCH and USB , might be very mode ) , at block 560 , method 500 provides trace information intrusive to the system . In one embodiment, debug port to existing platform connectors 150 . In one embodiment, controller 120 is inserted as a set of three intelligent multi multiplexer control module 340 generates a control signal 55 plexers or other switches between display engine 130 and CTRL to cause multiplexers 245 , 246 and 247 to pass the display interface 140 . At normal operation , the default is to trace information from outputs LO - 3 and L4 - 7 of debug port pass the display engine output. The intelligent multiplexers controller 120 . 245 , 246 , 247 can also be configured to select trace infor FIG . 6 is a flow diagram illustrating a method for pro - mation from debug port controller 120 . Since debug port cessor debugging using standard high - speed port connec - 60 controller 120 may be working during a reset sequence , tors, according to an embodiment. The method 600 may be debug port controller 120 may include all control and clock performed by processing logic that may comprise hardware (PLL ) setting , allowing it to work independently from ( e. g ., circuitry, dedicated logic , programmable logic , micro - display engine 130 . code , etc . ) , software ( e . g . , instructions run on a processing In one embodiment, the output of debug port controller device to perform hardware simulation ) , or a combination 65 120 is converted to the Aurora protocol or another industry thereof. The processing logic may output trace information standard protocol for debug tools . This may allow easy over existing platform connectors allowing for high - speed adoption of the debug solution described herein by the US 9 ,870 ,301 B2 industry dominant debug tools. The Aurora protocol is very more cores 802A - N and shared cache unit ( s ) 806 ; a system similar to the DP protocol at the physical layer , and thus, can agent unit 810 ; a bus controller unit ( s ) 816 ; an integrated pass through the display interface multiplexers 245 , 246 , memory controller unit ( s ) 814 ; a set or one or more media 247 and output through its PHY ports 242 - 244 . DP /HDMI processors 818 which may include integrated graphics logic cables carry the trace information to the trace box 160 . Thus, 5 808 , an image processor 824 for providing still and/ or video at the physical layer the solution uses the DP /HDMI elec - camera functionality , an audio processor 826 for providing trical but on the link layer the solution uses the Aurora hardware audio acceleration , and a video processor 828 for protocol. The solution described herein also applies also to providing video encode /decode acceleration ; an static ran other logical layers like PCIE STYLE . dom access memory (SRAM ) unit 830 ; a direct memory Referring now to FIG . 7 , shown is a block diagram of a 10 access (DMA ) unit 832 ; and a display unit 840 for coupling system 700 in accordance with an embodiment. As shown in to one or more external displays . In one embodiment, a FIG . 7 , multiprocessor system 700 is a point - to - point inter - memory module may be included in the integrated memory connect system , and includes a first processor 770 and a controller unit ( s ) 814 . In another embodiment, the memory second processor 780 coupled via a point- to -point intercon - module may be included in one or more other components nect 750 . Each of processors 770 and 780 may be some 15 of the SoC 800 that may be used to access and / or control a version of the processing device 110 , as shown in FIG . 1 . memory. The application processor 820 may include a While shown with only two processors 770 , 780 , it is to microcode context and aliased parameter passing logic as be understood that the scope of the present disclosure is not described in embodiments herein . so limited . In other embodiments, one or more additional The memory hierarchy includes one or more levels of processors may be present in a given processor. 20 cache within the cores, a set or one or more shared cache Processors 770 and 780 are shown including integrated units 806 , and external memory (not shown ) coupled to the memory controller units 772 and 782 , respectively . Proces set of integrated memory controller units 814 . The set of sor 770 also includes as part of its bus controller units shared cache units 806 may include one or more mid -level point- to -point (PPP ) interfaces 776 and 778 ; similarly, sec caches , such as level 2 (L2 ) , level 3 (L3 ) , level 4 (L4 ) , or ond processor 780 includes P - P interfaces 786 and 788 . 25 other levels of cache , a last level cache (LLC ) , and / or Processors 770 , 780 may exchange information via a point - combinations thereof. In some embodiments , one or more of to -point ( PPP ) interface 750 using P - P interface circuits 778 , the cores 802A - N are capable of multi - threading . 788 . As shown in FIG . 7 , IMCs 772 and 782 couple the The system agent 810 includes those components coor processors to respective memories, namely a memory 732 dinating and operating cores 802A - N . The system agent unit and a memory 734 , which may be portions ofmain memory 30 810 may include for example a power control unit ( PCU ) locally attached to the respective processors . and a display unit . The PCU may be or include logic and Processors 770 and 780 may each exchange information components needed for regulating the power state of the with a 790 via individual P - P interfaces 752 , 754 cores 802A - N and the integrated graphics logic 808 . The using point to point interface circuits 776 , 794 , 786 , 798 . display unit is for driving one or more externally connected Chipset 790 may also exchange information with a high - 35 displays . performance graphics circuit 738 via a high - performance The cores 802A - N may be homogenous or heterogeneous graphics interface 739 . in terms of architecture and /or instruction set. For example , A shared cache (not shown ) may be included in either some of the cores 802A - N may be in order while others are processor or outside of both processors , yet connected with o ut -of - order . As another example , two or more of the cores the processors via P - P interconnect, such that either or both 40 802A - N may be capable of execution the same instruction processors ’ local cache information may be stored in the set , while others may be capable of executing only a subset shared cache if a processor is placed into a low power mode . of that instruction set or a different instruction set . Chipset 790 may be coupled to a first bus 716 via an The application processor 820 may be a general- purpose interface 796 . In one embodiment, first bus 716 may be a processor, such as a CoreTM i3 , i5 , i7 , 2 Duo and Quad , Peripheral Component Interconnect (PCI ) bus , or a bus such 45 XeonTM , ItaniumTM , XScaleTM or Strong ARMTM processor, as a PCI Express bus or another third generation I / O which are available from IntelTM Corporation , of Santa interconnect bus , although the scope of the present disclo - Clara , Calif . Alternatively , the application processor 820 sure is not so limited . may be from another company , such as ARM HoldingsTM , As shown in FIG . 7 , various I / O devices 714 may be Ltd , MIPSTM , etc . The application processor 820 may be a coupled to first bus 716 , along with a bus bridge 718 which 50 special- purpose processor, such as , for example , a network couples first bus 716 to a second bus 720 . In one embodi- or communication processor, compression engine, graphics ment, second bus 720 may be a ( LPC ) bus. processor , co -processor , embedded processor, or the like. Various devices may be coupled to second bus 720 includ - The application processor 820 may be implemented on one ing, for example , a keyboard and /or mouse 722 , communi or more chips . The application processor 820 may be a part cation devices 727 and a storage unit 728 such as a disk drive 55 of and / or may be implemented on one or more substrates or other mass storage device which may include instruc - using any of a number of process technologies, such as, for tions /code and data 730 , in one embodiment. Further , an example , BiCMOS, CMOS , or NMOS . audio I / O 724 may be coupled to second bus 720 . Note that FIG . 9 is a block diagram of an embodiment of a system other architectures are possible . For example, instead of the on - chip (SoC ) design in accordance with the present disclo point- to -point architecture of FIG . 7 , a system may imple - 60 sure . As a specific illustrative example , SoC 900 is included ment a multi - drop bus or other such architecture . in user equipment (UE ) . In one embodiment, UE refers to Embodiments may be implemented in many different any device to be used by an end -user to communicate , such system types . FIG . 8 is a block diagram of a SoC 800 in as a hand -held phone , smartphone , tablet, ultra - thin note accordance with an embodiment of the present disclosure . book , notebook with broadband adapter, or any other similar Dashed lined boxes are optional features on more advanced 65 communication device . Often a UE connects to a base SoCs. In FIG . 8 , an interconnect unit ( s ) 812 is coupled to : station or node , which potentially corresponds in nature to a an application processor 820 which includes a set of one or mobile station (MS ) in a GSM network . US 9 ,870 ,301 B2 13 14 Here, SOC 900 includes two cores — 906 and 907 . Cores number of micro ops for processing at the instruction 906 and 907 may conform to an Instruction Set Architecture , decoder 1028 . In another embodiment, an instruction can be such as an Intel® Architecture CoreTM -based processor, an stored within the microcode ROM 1032 should a number of Advanced Micro Devices, Inc . ( AMD ) processor, a MIPS micro -ops be needed to accomplish the operation . The trace based processor, an ARM - based processor design , or a 5 cache 1030 refers to an entry point programmable logic customer thereof, as well as their licensees or adopters. array ( PLA ) to determine a correct micro - instruction pointer Cores 906 and 907 are coupled to cache control 908 that is for reading the micro - code sequences to complete one or associated with bus interface unit 909 and L2 cache 910 to more instructions in accordance with one embodiment from communicate with other parts of system 900 . Interconnect the micro -code ROM 1032 . After the microcode ROM 1032 910 includes an on - chip interconnect, such as an IOSF, 10 finishes sequencing micro - ops for an instruction , the front AMBA , or other interconnect discussed above, which poten - end 1001 of the machine resumes fetching micro -ops from tially implements one or more aspects of the described the trace cache 1030 . disclosure . In one embodiment, a microcode context and The out -of - order execution engine 1003 is where the aliased parameter passing logic may be included in cores instructions are prepared for execution . The out -of - order 906 , 907 . 15 execution logic has a number of buffers to smooth out and Interconnect 910 provides communication channels to the re - order the flow of instructions to optimize performance as other components , such as a Subscriber Identity Module they go down the pipeline and get scheduled for execution . ( SIM ) 930 to interface with a SIM card , a bootROM 935 to The allocator logic allocates the machine buffers and hold boot code for execution by cores 906 and 907 to resources that each uop needs in order to execute . The initialize and boot SoC 900 , a SDRAM controller 940 to 20 register renaming logic renames logic registers onto entries interface with external memory ( e . g . DRAM 960 ) , a flash in a register file . The allocator also allocates an entry for controller 945 to interface with non - volatile memory ( e . g . each uop in one of the two uop queues, one for memory Flash 965 ), a peripheral control 950 ( e. g . Serial Peripheral operations and one for non -memory operations, in front of Interface ) to interface with peripherals , video codecs 920 the instruction schedulers :memory scheduler, fast scheduler and Video interface 925 to display ( e . g ., via HDMI or DP 25 1002 , slow / general floating point scheduler 1004 , and connectors ) and receive input ( e . g . touch enabled input) , simple floating point scheduler 1006 . The uop schedulers GPU 915 to perform graphics related computations , etc . Any 1002 , 1004 , 1006 , determine when a uop is ready to execute of these interfaces may incorporate aspects of the disclosure based on the readiness of their dependent input register described herein . In addition , the system 900 illustrates operand sources and the availability of the execution peripherals for communication , such as a Bluetooth module 30 resources the uops need to complete their operation . The fast 970 , 3G modem 975 , GPS 980 , and Wi- Fi 985 . scheduler 1002 of one embodiment can schedule on each FIG . 10 is a block diagram of the micro - architecture for half of the main clock cycle while the other schedulers can a processor 1000 that includes logic circuits to perform only schedule once per main processor clock cycle . The instructions in accordance with one embodiment. The pro schedulers arbitrate for the dispatch ports to schedule uops cessor 1000 may be one example of the processing device 35 for execution . 110 , described above with respect to FIG . 1 . In some Register files 1008 , 1010 , sit between the schedulers embodiments , an instruction in accordance with one 1002 , 1004 , 1006 , and the execution units 1012 , 1014 , 1016 , embodiment can be implemented to operate on data ele - 1018 , 1020 , 1022 , 1024 in the execution block 1011 . There ments having sizes of byte , word , doubleword , quadword , is a separate register file 1008 , 1010 , for integer and floating etc ., as well as datatypes , such as single and double precision 40 point operations, respectively . Each register file 1008 , 1010 , integer and floating point datatypes . In one embodiment, the of one embodiment also includes a bypass network that can in - order front end 1001 is the part of the processor 1000 that bypass or forward just completed results that have not yet fetches instructions to be executed and prepares them to be been written into the register file to new dependent uops . The used later in the processor pipeline . The front end 1001 may integer register file 1008 and the floating point register file include several units . In one embodiment, the instruction 45 1010 are also capable of communicating data with the other. prefetcher 1026 fetches instructions from memory and feeds For one embodiment, the integer register file 1008 is split them to an instruction decoder 1028 which in turn decodes into two separate register files , one register file for the low or interprets them . For example , in one embodiment, the order 32 bits of data and a second register file for the high decoder decodes a received instruction into one or more order 32 bits of data . The floating point register file 1010 of operations called “ micro - instructions ” or “ micro - opera - 50 one embodiment has 128 bit wide entries because floating tions” ( also called micro op or uops ) that the machine can point instructions typically have operands from 64 to 128 execute . In other embodiments , the decoder parses the bits in width . instruction into an opcode and corresponding data and The execution block 1011 contains the execution units control fields that are used by the micro - architecture to 1012 , 1014 , 1016 , 1018 , 1020 , 1022 , 1024 , where the perform operations in accordance with one embodiment. In 55 instructions are actually executed . This section includes the one embodiment, the trace cache 1030 takes decoded uops register files 1008 , 1010 , that store the integer and floating and assembles them into program ordered sequences or point data operand values that the micro - instructions need to traces in the uop queue 1034 for execution . When the trace execute . The processor 1000 of one embodiment is com cache 1030 encounters a complex instruction , the microcode prised of a number of execution units : address generation ROM 1032 provides the uops needed to complete the 60 unit ( AGU ) 1012 , AGU 1014 , fast ALU 1016 , fast ALU operation . 1018 , slow ALU 1020 , floating point ALU 1022 , floating Some instructions are converted into a single micro - op , point move unit 1024 . For one embodiment, the floating whereas others need several micro - ops to complete the full point execution blocks 1022 , 1024 , execute floating point, operation . In one embodiment, if more than four micro -ops MMX , SIMD , and SSE , or other operations . The floating are needed to complete an instruction , the decoder 1028 65 point ALU 1022 of one embodiment includes a 64 bit by 64 accesses the microcode ROM 1032 to do the instruction . For b it floating point divider to execute divide , square root, and one embodiment, an instruction can be decoded into a small remainder micro - ops. For some embodiments , instructions US 9 ,870 ,301 B2 15 16 involving a floating point value may be handled with the one embodiment , floating point and integer data may be floating point hardware . In one embodiment , the ALU stored in different registers or the same registers . operations go to the high -speed ALU execution units 1016 , The following examples pertain to further embodiments . 1018 . The fast ALUS 1016 , 1018 , of one embodiment can Example 1 is a processing device comprising 1 ) a debug execute fast operations with an effective latency of half a 5 port controller to monitor operations of the processing clock cycle . For one embodiment, most complex integer devicedevice to determine whether the processing devicedevi is oper operations go to the slow ALU 1020 as the slow ALU 1020 ating in a first mode or a second mode and to collect trace includes integer execution hardware for long latency type of operations , such as a multiplier, shifts , flag logic , and branch information comprising operating characteristics of the pro processing . Memory load / store operations are executed by 10 cessing device; 2 ) a display engine logic to process display the AGUS 1012 , 1014 . For one embodiment , the integer data for output to a display device ; and 3 ) a display engine ALUS 1016 , 1018 , 1020 , are described in the context of interface , the display engine interface to provide, to a performing integer operations on 64 bit data operands . In plurality of existing platform connectors, the display data alternative embodiments , the ALUS 1016 , 1018 , 1020 , can from the display engine logic when the processing device is be implemented to support a variety of data bits including 15 operating in the first mode as determined by the debug port 16 , 32 , 128 , 256 , etc . Similarly , the floating point units 1022 , controller and the trace information from the debug port 1024 , can be implemented to support a range of operands controller when the processing device is operating in the having bits of various widths . For one embodiment, the second mode as determined by the debug port controller. floating point units 1022 , 1024 , can operate on 128 bits wide In Example 2 , the processing device of Example 1 can packed data operands in conjunction with SIMD and mul- 20 optionally include the first mode comprising a primary timedia instructions . operation mode. In one embodiment, the uops schedulers 1002 , 1004 , In Example 3 , the processing device of Example 1 can 1006 , dispatch dependent operations before the parent load optionally include the second mode comprising a debug has finished executing . As uops are speculatively scheduled operation mode . and executed in processor 1000, the processor 1000 also 25 In Example 4 , the processing device of Example 1 can includes logic to handle memory misses. If a data load optionally include the plurality of existing platform connec misses in the data cache, there can be dependent operations tors comprising digital display interface connectors com in flight in the pipeline that have left the scheduler with prising at least one of a high -definition multimedia interface temporarily incorrect data . A replay mechanism tracks and (HDMI ) connector or a display port (DP ) connector. re - executes instructions that use incorrect data . Only the 30 In Example 5 , the processing device of Example 1 , can dependent operations need to be replayed and the indepen - optionally include the debug port controller accessing a dent ones are allowed to complete . The schedulers and debug status register storing a value to indicate one of the replay mechanism of one embodiment of a processor are first mode or the second mode , wherein the value stored in also designed to catch instruction sequences for text string the debug status register is received from a special debug comparison operations. 35 tool coupled to the debug port controller through a platform The term “ registers ” may refer to the on - board processor communication port. storage locations that are used as part of instructions to In Example 6 , the processing device of Example 1 , can identify operands . In other words , registers may be those optionally include the display interface comprising a mul that are usable from the outside of the processor ( from a tiplexer associated with each of the plurality of existing programmer' s perspective ). However , the registers of an 40 platform connectors , each multiplexer to receive at least a embodiment should not be limited in meaning to a particular portion of the display data from the display engine logic and type of circuit . Rather , a register of an embodiment is at least a portion of the trace information from the debug capable of storing and providing data , and performing the port controller as inputs and to select one of the inputs to functions described herein . The registers described herein output to the existing platform connector in response to a can be implemented by circuitry within a processor using 45 control signal received from the debug port controller . any number of different techniques , such as dedicated physi - In Example 7 , the processing device of Example 1 , can cal registers, dynamically allocated physical registers using optionally include the trace information comprising at least register renaming, combinations of dedicated and dynami- one of power consumption data , protection ring data , cally allocated physical registers, etc . In one embodiment, memory access data , or processing device interconnect data . integer registers store thirty - two bit integer data . A register 50 Example 8 is a method comprising : 1 ) monitoring , by a file of one embodiment also contains eight multimedia debug port controller, operations of a processing device to SIMD registers for packed data . For the discussions below , determine whether the processing device is operating in a the registers are understood to be data registers designed to first mode or a second mode ; 2 ) collecting, by the debug port hold packed data , such as 64 bits wide MMXTM registers controller, trace information comprising operating charac ( also referred to as 'mm ' registers in some instances ) in 55 teristics of the processing device ; 3 ) processing , by a display microprocessors enabled with MMX technology from Intel engine logic , display data for output to a display device; 4 ) Corporation of Santa Clara, Calif. These MMX registers , providing, by a display engine interface to a plurality of available in both integer and floating point forms, can existing platform connectors , the display data from the operate with packed data elements that accompany SIMD display engine logic when the processing device is operating and SSE instructions. Similarly , 128 bits wide XMM regis - 60 in the first mode as determined by the debug port controller; ters relating to SSE2 , SSE3, SSE4 , or beyond ( referred to and 5 ) providing , by the display engine interface to the generically as “ SSEx ” ) technology can also be used to hold plurality of existing platform connectors , the trace informa such packed data operands. In one embodiment, in storing tion from the debug port controller when the processing packed data and integer data , the registers do not need to device is operating in the second mode as determined by the differentiate between the two data types . In one embodi - 65 debug port controller. ment, integer and floating point are either contained in the In Example 9 , the method of Example 8 can optionally same register file or different register files . Furthermore, in include the firstmode comprising a primary operation mode . US 9 ,870 ,301 B2 17 18 In Example 10 , the method of Example 8 can optionally display data from the display engine logic and at least a include the second mode comprising a debug operation portion of the trace information from the debug port con mode . troller as inputs and to select one of the inputs to output to In Example 11, the method of Example 8 can optionally the existing platform connector in response to a control include the plurality of existing platform connectors com - 5 signal received from the debug port controller . prising digital display interface connectors comprising at In Example 20 , the system of Example 15 , can optionally least one of a high -definition multimedia interface (HDMI ) include the trace information comprising at least one of connector or a display port (DP ) connector. power consumption data , protection ring data , memory In Example 12 , the method of Example 8 can optionally access data , or processing device interconnect data . include accessing a debug status register storing a value to 10 Example 21 is an apparatus comprising : 1 ) means for indicate one of the first mode or the second mode , wherein monitoring , by a debug port controller , operations of a the value stored in the debug status register is received from processing device to determine whether the processing a special debug tool coupled to the debug port controller device is operating in a first mode or a second mode ; 2 ) through a platform communication port . means for collecting , by the debug port controller, trace In Example 13 , the method of Example 8 , can optionally 15 information comprising operating characteristics of the pro include receiving , at a multiplexer associated with each of cessing device ; 3 ) means for processing , by a display engine the plurality of existing platform connectors , at least a logic , display data for output to a display device ; 4 ) means portion of the display data from the display engine logic and for providing , by a display engine interface to a plurality of at least a portion of the trace information from the debug existing platform connectors , the display data from the port controller as inputs and selecting one of the inputs to 20 display engine logic when the processing device is operating output to the existing platform connector in response to a in the first mode as determined by the debug port controller ; control signal received from the debug port controller . and 5 ) means for providing , by the display engine interface In Example 14 , the method of Example 8 , can optionally to the plurality of existing platform connectors , the trace include the trace information comprising at least one of information from the debug port controller when the pro power consumption data , protection ring data , memory 25 cessing device is operating in the second mode as deter access data , or processing device interconnect data . mined by the debug port controller. Example 15 is a system comprising : 1) a computing In Example 22 , the apparatus of Example 21 can option platform comprising a plurality of existing platform con - ally include the first mode comprising a primary operation nectors and a processing device , the processing device mode . comprising: a ) a debug port controller to monitor operations 30 In Example 23 , the apparatus of Example 21 can option of the processing device to determine whether the process - ally include the second mode comprising a debug operation ing device is operating in a firstmode or a second mode and mode . to collect trace information comprising operating character - In Example 24 , the apparatus of Example 21 can option istics of the processing device ; b ) a display engine logic to ally include the plurality of existing platform connectors process display data for output to a display device ; and c ) a 35 comprising digital display interface connectors comprising display engine interface , the display engine interface to at least one of a high - definition multimedia interface provide , to the plurality of existing platform connectors , the (HDMI ) connector or a display port (DP ) connector . display data from the display engine logic when the pro - In Example 25 , the apparatus of Example 21 can option cessing device is operating in the first mode as determined ally include means for accessing a debug status register by the debug port controller and the trace information from 40 storing a value to indicate one of the firstmode or the second the debug port controller when the processing device is mode , wherein the value stored in the debug status register operating in the second mode as determined by the debug is received from a special debug tool coupled to the debug port controller ; and 2 ) a trace box , coupled to the computing port controller through a platform communication port. platform through the plurality of existing platform connec In Example 26 , the apparatus of Example 21, can option tors, to receive and store the trace information from the 45 ally include means for receiving , at a multiplexer associated debug port controller . with each of the plurality of existing platform connectors , at In Example 16 , the system of Example 15 can optionally least a portion of the display data from the display engine include the first mode comprising a primary operation mode , logic and at least a portion of the trace information from the and the second mode comprising a debug operation mode . debug port controller as inputs and selecting one of the In Example 17 , the system of Example 15 , can optionally 50 inputs to output to the existing platform connector in include the plurality of existing platform connectors com response to a control signal received from the debug port prises digital display interface connectors comprising at controller. least one of a high - definition multimedia interface (HDMI ) In Example 27 , the apparatus of Example 21 , can option connector or a display port (DP ) connector. ally include the trace information comprising at least one of In Example 18 , the system of Example 15 , can optionally 55 power consumption data , protection ring data , memory include a special debug tool coupled to the computing access data , or processing device interconnect data . platform through a platform communication port , wherein to Example 28 is an apparatus comprising : 1 ) a memory ; and determine whether the processing device is operating in the 2 ) a computing system coupled to the memory, wherein the first mode or the second mode, the debug port controller to computing system is configured to perform the method of at access a debug status register storing a value to indicate one 60 least one of the claims 8 - 14 . of the firstmode or the second mode , and wherein the value In Example 29 , the apparatus of Example 28 , can option stored in the debug status register is received from the ally include the computing system comprising a processing special debug tool. device . In Example 19 , the system of Example 15 , can optionally The algorithms and displays presented herein are not include the display interface comprising a multiplexer asso - 65 inherently related to any particular computer or other appa ciated with each of the plurality of existing platform con - ratus. Various general- purpose systems may be used with nectors , each multiplexer to receive at least a portion of the programs in accordance with the teachings herein , or it may US 9 , 870 , 301 B2 19 20 prove convenient to construct a more specialized apparatus 6 . The processing device of claim 1 , wherein the display to perform the operations . The required structure for a interface comprises a multiplexer associated with each of the variety of these systems will appear from the description plurality of existing platform connectors , each multiplexer below . In addition , the present embodiments are not to receive at least a portion of the display data from the described with reference to any particular programming 5 display engine logic and at least a portion of the trace language . It will be appreciated that a variety of program - information from the debug port controller as inputs and to ming languages may be used to implement the teachings of select one of the inputs to output to the existing platform the embodiments as described herein . connector in response to a control signal received from the The above description sets forth numerous specific details debug port controller . such as examples of specific systems, components, methods 10 7 . The processing device of claim 1 , wherein the trace and so forth , in order to provide a good understanding of information comprises at least one of power consumption several embodiments . It will be apparent to one skilled in the data , protection ring data ,memory access data , or processing art , however , that at least some embodiments may be prac - device interconnect data . ticed without these specific details . In other instances, well 8 . A method comprising: known components or methods are not described in detail or 15 monitoring , by a debug port controller, operations of a are presented in simple block diagram format in order to processing device to determine whether the processing avoid unnecessarily obscuring the present embodiments. device is operating in a first mode or a second mode by Thus , the specific details set forth above are merely exem accessing a debug status register storing a value to plary . Particular embodiments may vary from these exem indicate one of the first mode or the second mode , plary details and still be contemplated to be within the scope 20 wherein the value is received from an external tool of the present embodiments . coupled to a platform communication port ; It is to be understood that the above description is collecting , by the debug port controller, trace information intended to be illustrative and not restrictive . Many other comprising operating characteristics of the processing embodiments will be apparent to those of skill in the art device ; upon reading and understanding the above description . The 25 processing , by a display engine logic , display data for scope of the present embodiments should , therefore , be output to a display device ; determined with reference to the appended claims, along providing , by a display engine interface to a plurality of with the full scope of equivalents to which such claims are existing platform connectors, the display data from the entitled . display engine logic when the processing device is What is claimed is : 30 operating in the first mode as determined by the debug 1 . A processing device comprising: port controller; and a debug port controller to monitor operations of the providing, by the display engine interface to the plurality processing device to determine whether the processing of existing platform connectors not including the plat device is operating in a first mode or a second mode by form communication port, the trace information from accessing a debug status register storing a value to 35 the debug port controller when the processing device is indicate one of the first mode or the second mode , operating in the second mode as determined by the wherein the value is received from an external tool debug port controller. coupled to a platform communication port , and the 9 . The method of claim 8 , wherein the first mode com debug port controller further to collect trace informa- prises a primary operation mode . tion comprising operating characteristics of the pro - 40 10 . The method of claim 8 , wherein the second mode cessing device; comprises a debug operation mode . a display engine logic to process display data for output 11 . The method of claim 8 , wherein the plurality of to a display device ; and existing platform connectors comprises digital display inter a display engine interface, the display engine interface to face connectors comprising at least one of a high - definition provide, to a plurality of existing platform connectors 45 multimedia interface (HDMI ) connector or a display port not including the platform communication port, the (DP ) connector. display data from the display engine logic when the 12 . The method of claim 8 , wherein the value stored in the processing device is operating in the first mode as debug status register is received from a debug tool coupled determined by the debug port controller and to provide , to the debug port controller through a platform communi to the plurality of existing platform connectors not 50 cation port. including the platform communication port, the trace 13 . The method of claim 8 , further comprising : information from the debug port controller when the receiving , at a multiplexer associated with each of the processing device is operating in the second mode as plurality of existing platform connectors , at least a determined by the debug port controller. portion of the display data from the display engine 2 . The processing device of claim 1 , wherein the first 55 logic and at least a portion of the trace information from mode comprises a primary operation mode. the debug port controller as inputs ; and 3 . The processing device of claim 1 , wherein the second selecting one of the inputs to output to the existing mode comprises a debug operation mode . platform connector in response to a control signal 4 . The processing device of claim 1 , wherein the plurality received from the debug port controller. of existing platform connectors comprises digital display 60 14 . The method of claim 8 , wherein the trace information interface connectors comprising at least one of a high - comprises at least one of power consumption data , protec definition multimedia interface (HDMI ) connector or a tion ring data , memory access data , or processing device display port ( DP ) connector. interconnect data . 5 . The processing device of claim 1 , wherein the value 15 . A system comprising : stored in the debug status register is received from a debug 65 a computing platform comprising a plurality of existing tool coupled to the debug port controller through a platform platform connectors and a processing device , the pro communication port . cessing device comprising: US 9 ,870 ,301 B2 21 22 a debug port controller to monitor operations of the 16 . The system of claim 15 , wherein the first mode processing device to determine whether the process comprises a primary operation mode, and wherein the sec ing device is operating in a first mode or a second ond mode comprises a debug operation mode . mode by accessing a debug status register storing a 17 . The system of claim 15 , wherein the plurality of 5 existing platform connectors comprises digital display inter value to indicate one of the first mode or the second 5 face connectors comprising at least one of a high - definition mode, wherein the value is received from an external multimedia interface (HDMI ) connector or a display port tool coupled to a platform communication port , and (DP ) connector. to collect trace information comprising operating 18 . The system of claim 15 , further comprising : characteristics of the processing device ; a debug tool coupled to the computing platform through a display engine logic to process display data for output a platform communication port , wherein the value to a display device; and stored in the debug status register is received from the a display engine interface , the display engine interface debug tool. to provide , to the plurality of existing platform 19 . The system of claim 15 , wherein the display interface connectors not including the platform communica - , comprises a multiplexer associated with each of the plurality tion port , the display data from the display engine of existing platform connectors , each multiplexer to receive logic when the processing device is operating in the at least a portion of the display data from the display engine first mode as determined by the debug port controller logic and at least a portion of the trace information from the and the trace information from the debug port con debug port controller as inputs and to select one of the inputs troller when the processing device is operating in the 20 to output to the existing platform connector in response to a second mode as determined by the debug port con control signal received from the debug port controller . troller; and 20 . The system of claim 15 , wherein the trace information a trace box , coupled to the computing platform through comprises at least one of power consumption data , protec the plurality of existing platform connectors, to receive tion ring data , memory access data , or processing device and store the trace information from the debug port interconnect data . controller . * * * * *