US 20130262910A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2013/0262910 A1 NATHAN (43) Pub. Date: Oct. 3, 2013

(54) TIME KEEPING IN UNKNOWN AND (52) US. Cl. UNSTABLE CLOCK ARCHITECTURE USPC ...... 713/502

(75) Inventor: Ofer NATHAN, Kiryat Yam (IL) (57) ABSTRACT

(73) Assignee: CORPORATION’ Santa Clara’ The present invention may provide a system With a ?xed CA (Us) clock to provide a ?xed clock signal, and a variable clock to _ provide a variable clock signal. The system may also include (21) Appl' No" 13/435992 a chipset With a chipset time stamp counter (TSC) based on (22) Filed. Man 30, 2012 the ?xed clock signal. A processor may include a fast counter that may be based on the variable clock signal and generate a Publication Classi?cation fast count value. A sloW counter may doWnload a time stamp value based on the chipset TSC at Wakeup. The sloW counter (51) Int. Cl. may be based on the ?xed clock signal and may generate a G06F 1/14 (2006.01) sloW count value. A central TSC may combine the fast count G06F 1/04 (2006.01) and sloW count value to generate a central TSC value.

CHIPSET PCHTS‘C VARIABLE FIXED CLOCK m 512 CLOCK (CRY CLK)

UNCORE @

RST

FAST COUNTER SLOW COUNTER w 2.7.2 Patent Application Publication Oct. 3, 2013 Sheet 1 0f 6 US 2013/0262910 A1

PROCESSOR ExEcuTIoN UNIT m m Pl-YCKED CACHE REGISTER FILE INSTRUCTION “24 m SET 109

| + PROCESSOR BUS ] 110i 3 E MEMORY GRAPHICS/ /\_J\ MEMORY /1_[\ INsTRuCTION VIDEO CARD 1_1& CONTROLLER HUB 1_18_ g; H L15 W DATA m E; LEGACY DATA STORAGE I/O CONTROLLER 24' USER INPUT INTERFACE

WIRELESS I/O TRANSCEIVER CI) CONTROLLER HUB 1_26 l) /\\‘:> SERIAL EXPANSION PORT

FLASH BIDS 4 h AUDIO ii \I—/ CONTROLLER @ NETWORK CONTROLLER 100 54‘ FIG. 1 Patent Application Publication Oct. 3, 2013 Sheet 2 0f 6 US 2013/0262910 A1

CHIPSET 210 VARIABLE FIXED CLOCK _ ~ ‘ I CRY CLOCK (CRY CLK) PCéLI1T2sC <_ CLK _2_3& w T _ ( / (P~ 4' CPU m RST l \ CLK FAST COUNTER SLOW COUNTER Fgi; F 2% L52 L NORM 2_5§ E V TO MAIN , _|_ 4 CORE \5 TSCm FIG. 2 E

”””” “ FAST CTR

SLOW CTR

-— TSC Patent Application Publication Oct. 3, 2013 Sheet 3 0f 6 US 2013/0262910 A1

WAKE FROM SLEEP Q

OPERATE FAST COUNTER DOWNLOAD PCHTSC £13 £4.

OPERATE SLOW COUNTER M

ADD SLOW AND FAST COUNTER 4_19

FIG. 4 Patent Application Publication Oct. 3, 2013 Sheet 4 0f 6 US 2013/0262910 A1

CHIPSET ‘ CRY VARIABLE FIXED CLOCK 519 PC5H1T2SC ‘—CLK CLOCK (CRY CLK) — E 5_40_ Q ~ +1 — /

¢ CPU@ UNCORE UNCORE CRY @ ) TSC ‘—C|_K Q62

+1 —

CORE 5_7Q

@E"5L6

RST l r H FAST COUNTER SLOWCOUNTER FEE; 5B 2 NORM # +1 J QOH’K v 58-2 57-4

TO MAIN ' + 1 CORE V TSC@ FIG. 5 @Q Patent Application Publication Oct. 3, 2013 Sheet 5 0f 6 US 2013/0262910 A1

FIG. 6

------" FAST CTR

.. “ SLOW CTR

TSC

N <—>-, Patent Application Publication Oct. 3, 2013 Sheet 6 0f 6 US 2013/0262910 A1

WAKE UNCORE FROM SLEEP E

DOWNLOAD PCH-TSC AND OPERATE UNCORE TSC E4

WAKE CORE FROM SLEEP LEE

MULTIPLY VARIABLE CLOCK WITH N FACTOR DOWNLOAD URT 708 m _

\

OPERATE FAST COUNTER OPERATE SLOW COUNTER 115 m

MULTIPLY SLOW COUNT WITH P FACTOR m

ADD SLOW AND FAST COUNTER H5

FIG. 7 US 2013/0262910 A1 Oct. 3, 2013

TIME KEEPING IN UNKNOWN AND time stamp value based on the chipset TSC at Wakeup. The UNSTABLE CLOCK ARCHITECTURE sloW counter may be based on the ?xed clock signal and may generate a sloW count value. A central TSC may combine the FIELD OF THE INVENTION fast count and sloW count value to generate a central TSC [0001] The present disclosure relates to processor time value. keeping and in particular to time-stamp counters. [0014] In the folloWing description, numerous speci?c details such as processing logic, processor types, micro-ar BACKGROUND chitectural conditions, events, enablement mechanisms, and the like are set forth in order to provide a more thorough [0002] Contemporary processor architectures de?ne a understanding of embodiments of the present invention. It time-stamp counter (TSC) mechanism to monitor and iden Will be appreciated, hoWever, by one skilled in the art that the tify relative time occurrence of processor events. A TSC invention may be practiced Without such speci?c details. typically counts the maximum number of guaranteed clock Additionally, some Well knoWn structures, circuits, and the ticks since the most recent (CPU) like have not been shoWn in detail to avoid unnecessarily reset. Hence, a TSC is advantageous for system timekeeping obscuring embodiments of the present invention. and for applications that check time frequently in their opera [0015] Although the folloWing embodiments are described tions such as operating systems. With reference to a processor, other embodiments are appli [0003] With the increasing use of TSCs, consistent and cable to other types of integrated circuits and logic devices. accurate TSC operations are becoming paramount. To Similar techniques and teachings of embodiments of the improve consistency and accuracy, some contemporary pro present invention can be applied to other types of circuits or cessors include a TSC in the chipset, from Which a counter in semiconductor devices that can bene?t from higher pipeline a coupled CPU core doWnloads a value at poWer transition throughput and improved performance. The teachings of times. For example, the CPU core can doWnload the chipset embodiments of the present invention are applicable to any TSC count value at Wake up times. processor or machine that performs data manipulations. [0004] HoWever, CPU counters generally operate on a HoWever, the present invention is not limited to processors or spread, variable (unstable) clock While chipsets generally machines that perform 512 bit, 256 bit, 128 bit, 64 bit, 32 bit, operate on a non-spread, ?xed (stable) clock. This difference leads to counting inconsistencies especially betWeen CPU or 16 bit data operations and can be applied to any processor and machine in Which manipulation or management of data is counters and other counters. For mo st applications, especially performed. In addition, the following description provides operating systems, this discrepancy and run-time changes of examples, and the accompanying draWings shoW various the TSC are not acceptable because the applications use the TSC to compute their respective operating frequencies. examples for the purposes of illustration. HoWever, these examples should not be construed in a limiting sense as they [0005] Therefore, the inventor recogniZed a need in the art are merely intended to provide examples of embodiments of for a more reliable time keeping technique in an unstable the present invention rather than to provide an exhaustive list clock environment. of all possible implementations of embodiments of the DESCRIPTION OF THE FIGURES present invention. [0016] Although the beloW examples describe instruction [0006] FIG. 1 illustrates a simpli?ed block diagram of a handling and distribution in the context of execution units and system according to an embodiment of the present invention. logic circuits, other embodiments of the present invention can [0007] FIG. 2 illustrates a simpli?ed block diagram of a be accomplished by Way of a data or instructions stored on a processor according to an embodiment of the present inven machine-readable, tangible medium, Which When performed tion. by a machine cause the machine to perform functions consis [0008] FIG. 3 illustrates exemplary counter signals accord tent With at least one embodiment of the invention. In one ing to an embodiment of the present invention. embodiment, functions associated With embodiments of the [0009] FIG. 4 is a simpli?ed process How of a time keeping present invention are embodied in machine-executable operation according to an embodiment of the present inven instructions. The instructions can be used to cause a general tion. purpose or special-purpose processor that is programmed [0010] FIG. 5 illustrates a simpli?ed block diagram of a With the instructions to perform the steps of the present inven processor according to an embodiment of the present inven tion. Embodiments of the present invention may be provided tion. as a computer program product or softWare Which may [0011] FIG. 6 illustrates exemplary counter signals accord include a machine or computer-readable medium having ing to an embodiment of the present invention. stored thereon instructions Which may be used to program a [0012] FIG. 7 is a simpli?ed process How of a time keeping computer (or other electronic devices) to perform one or more operation according to an embodiment of the present inven operations according to embodiments of the present inven tion. tion. Alternatively, steps of embodiments of the present invention might be performed by speci?c hardWare compo DETAILED DESCRIPTION nents that contain ?xed-function logic for performing the [0013] The present invention may provide a system With a steps, or by any combination of programmed computer com ?xed clock to provide a ?xed clock signal, and a variable ponents and ?xed-function hardWare components. clock to provide a variable clock signal. The system may also [0017] Instructions used to program logic to perform include a chipset With a chipset time stamp counter (TSC) embodiments of the invention can be stored Within a memory based on the ?xed clock signal.Aprocessor may include a fast in the system, such as DRAM, cache, ?ash memory, or other counter that may be based on the variable clock signal and storage. Furthermore, the instructions can be distributed via a generate a fast count value. A sloW counter may doWnload a netWork or by Way of other computer readable media. Thus a US 2013/0262910 A1 Oct. 3, 2013

machine-readable medium may include any mechanism for architecture, addressing modes, memory architecture, inter storing or transmitting information in a form readable by a rupt and exception handling, and external input and output machine (e.g., a computer), but is not limited to, ?oppy dis (I/O). kettes, optical disks, Compact Disc, Read-Only Memory [0021] In one embodiment, the instruction set architecture (CD-ROMs), and magneto-optical disks, Read-Only (ISA) may be implemented by one or more micro-architec Memory (ROMs), Random Access Memory (RAM), Eras tures, Which includes processor logic and circuits used to able Programmable Read-Only Memory (EPROM), Electri implement one or more instruction sets. Accordingly, proces cally Erasable Programmable Read-Only Memory (EE sors With different micro-architectures can share at least a PROM), magnetic or optical cards, ?ash memory, or a portion of a common instruction set. For example, Intel® tangible, machine-readable storage used in the transmission Pentium 4 processors, Intel® CoreTM processors, and proces of information over the Internet via electrical, optical, acous sors from Advanced Micro Devices, Inc. of Sunnyvale Calif. tical or other forms of propagated signals (e.g., carrier Waves, implement nearly identical versions of the x86 instruction set infrared signals, digital signals, etc.). Accordingly, the com (With some extensions that have been added With neWer ver puter-readable medium includes any type of tangible sions), but have different internal designs. Similarly, proces sors designed by other processor development companies, machine-readable medium suitable for storing or transmitting such as ARM Holdings, Ltd., MIPS, or their licensees or electronic instructions or information in a form readable by a adopters, may share at least a portion a common instruction machine (e.g., a computer). set, but may include different processor designs. For example, [0018] A design may go through various stages, from cre the same register architecture of the ISA may be implemented ation to simulation to fabrication. Data representing a design in different Ways in different micro-architectures using neW may represent the design in a number of manners. First, as is or Well-knoWn techniques, including dedicated physical reg useful in simulations, the hardWare may be represented using isters, one or more dynamically allocated physical registers a hardWare description language or another functional using a register renaming mechanism (e.g., the use of a Reg description language. Additionally, a circuit level model With ister Alias Table (RAT), a Reorder Buffer (ROB) and a retire logic and/ or transistor gates may be produced at some stages ment register ?le. In one embodiment, registers may include of the design process. Furthermore, most designs, at some one or more registers, register architectures, register ?les, or stage, reach a level of data representing the physical place other register sets that may or may not be addressable by a ment of various devices in the hardWare model. In the case softWare programmer. Where conventional semiconductor fabrication techniques [0022] In one embodiment, an instruction may include one are used, the data representing the hardWare model may be the or more instruction formats. In one embodiment, an instruc data specifying the presence or absence of various features on tion format may indicate various ?elds (number of bits, loca different mask layers for masks used to produce the integrated tion of bits, etc.) to specify, among other things, the operation circuit. In any representation of the design, the data may be to be performed and the operand(s) on Which that operation is stored in any form of a machine readable medium. A memory to be performed. Some instruction formats may be further or a magnetic or optical storage such as a disc may be the broken de?ned by instruction templates (or sub formats). For machine readable medium to store information transmitted example, the instruction templates of a given instruction for via optical or electrical Wave modulated or otherWise gener mat may be de?ned to have different sub sets of the instruction ated to transmit such information. When an electrical carrier format’s ?elds and/or de?ned to have a given ?eld interpreted Wave indicating or carrying the code or design is transmitted, differently. In one embodiment, an instruction is expressed to the extent that copying, buffering, or re-transmission of the using an instruction format (and, if de?ned, in a given one of electrical signal is performed, a neW copy is made. Thus, a the instruction templates of that instruction format) and speci communication provider or a netWork provider may store on ?es or indicates the operation and the operands upon Which a tangible, machine-readable medium, at least temporarily, an the operation Will operate. article, such as information encoded into a carrier Wave, [0023] Scienti?c, ?nancial, auto-vectoriZed general pur embodying techniques of embodiments of the present inven pose, RMS (recognition, mining, and synthesis), and visual tion. and multimedia applications (e.g., 2D/3D graphics, image processing, video compression/decompression, voice recog [0019] In modern processors, a number of different execu nition algorithms and audio manipulation) may require the tion units are used to process and execute a variety of code and same operation to be performed on a large number of data instructions. Not all instructions are created equal as some are items. In one embodiment, Single Instruction Multiple Data quicker to complete While others can take a number of clock cycles to complete. The faster the throughput of instructions, (SIMD) refers to a type of instruction that causes a processor to perform an operation on multiple data elements. SIMD the better the overall performance of the processor. Thus it technology may be used in processors that can logically Would be advantageous to have as many instructions execute divide the bits in a register into a number of ?xed-siZed or as fast as possible. HoWever, there are certain instructions that variable-siZed data elements, each of Which represents a sepa have greater complexity and require more in terms of execu rate value. For example, in one embodiment, the bits in a tion time and processor resources. For example, there are 64-bit register may be organiZed as a source operand contain ?oating point instructions, load/ store operations, data moves, ing four separate 16-bit data elements, each of Which repre etc. sents a separate l6-bit value. This type of data may be referred [0020] As more computer systems are used in intemet, text, to as ‘packed’ data type or ‘vector’ data type, and operands of and multimedia applications, additional processor support this data type are referred to as packed data operands or vector has been introduced over time. In one embodiment, an operands. In one embodiment, a packed data item or vector instruction set may be associated With one or more computer may be a sequence of packed data elements stored Within a architectures, including data types, instructions, register single register, and a packed data operand or a vector operand US 2013/0262910 A1 Oct. 3, 2013

may a source or destination operand of a SIMD instruction (or personal digital assistants (PDAs), and handheld PCs. ‘packed data instruction’ or a ‘vector instruction’). In one Embedded applications can include a micro controller, a digi embodiment, a SIMD instruction speci?es a single vector tal signal processor (DSP), system on a chip, netWork com operation to be performed on tWo source vector operands to puters (NetPC), set-top boxes, netWork hubs, Wide area net generate a destination vector operand (also referred to as a Work (WAN) sWitches, or any other system that can perform result vector operand) of the same or different siZe, With the one or more instructions in accordance With at least one same or different number of data elements, and in the same or embodiment. different data element order. [0028] FIG. 1A is a block diagram of a computer system [0024] SIMD technology, such as that employed by the 100 formed With a processor 102 that includes one or more Intel® CoreTM processors having an instruction set including execution units 108 to perform an algorithm to perform at x86, MMXTM, Streaming SIMD Extensions (SSE), SSE2, least one instruction in accordance With one embodiment of SSE3, SSE4.l, and SSE4.2 instructions, ARM processors, the present invention. One embodiment may be described in such as the ARM Cortex® family of processors having an the context of a single processor desktop or server system, but instruction set including the Vector Floating Point (V PP) alternative embodiments can be included in a multiprocessor and/ or NEON instructions, and MIPS processors, such as the system. System 100 is an example of a ‘hub’ system archi Loongson family of processors developed by the Institute of tecture. The computer system 100 includes a processor 102 to Computing Technology (ICT) of the Chinese Academy of process data signals. The processor 102 can be a complex Sciences, has enabled a signi?cant improvement in applica instruction set computer (CISC) , a reduced tion performance (CoreTM and MMXTM are registered trade instruction set computing (RISC) microprocessor, a very long marks or trademarks of Intel Corporation of Santa Clara, instruction Word (V LIW) microprocessor, a processor imple Calif.). menting a combination of instruction sets, or any other pro [0025] In one embodiment, destination and source regis cessor device, such as a digital signal processor, for example. ters/ data are generic terms to represent the source and desti The processor 102 is coupled to a processor bus 110 that can nation of the corresponding data or operation. In some transmit data signals betWeen the processor 102 and other embodiments, they may be implemented by registers, components in the system 100. The elements of system 100 memory, or other storage areas having other names or func perform their conventional functions that are Well knoWn to tions than those depicted. For example, in one embodiment, those familiar With the art. “DESTl” may be a temporary storage register or other stor [0029] In one embodiment, the processor 102 includes a age area, Whereas “SRC l” and “SRCZ” may be a ?rst and Level 1 (L1) internal cache memory 104. Depending on the second source storage register or other storage area, and so architecture, the processor 102 can have a single internal forth. In other embodiments, tWo or more of the SRC and cache or multiple levels of internal cache. Alternatively, in DEST storage areas may correspond to different data storage another embodiment, the cache memory can reside external elements Within the same storage area (e.g., a SIMD register). to the processor 102. Other embodiments can also include a In one embodiment, one of the source registers may also act combination of both internal and external caches depending as a destination register by, for example, Writing back the on the particular implementation and needs. Register ?le 106 result of an operation performed on the ?rst and second can store different types of data in various registers including source data to one of the tWo source registers serving as a integer registers, ?oating point registers, status registers, and destination registers. instruction pointer register. [0026] FIG. 1A is a block diagram of an exemplary com puter system formed With a processor that includes execution [0030] Execution unit 108, including logic to perform inte units to execute an instruction in accordance With one ger and ?oating point operations, also resides in the processor embodiment of the present invention. System 100 includes a 102. The processor 102 also includes a microcode (ucode) component, such as a processor 102 to employ execution ROM that stores microcode for certain macroinstructions. units including logic to perform algorithms for process data, For one embodiment, execution unit 108 includes logic to in accordance With the present invention, such as in the handle a packed instruction set 109. By including the packed embodiment described herein. System 100 is representative instruction set 109 in the instruction set of a general-purpose of processing systems based on the PENTIUM® III, PEN processor 102, along With associated circuitry to execute the TIUM® 4, XeonTM, Itanium®, XScaleTM and/or Stron instructions, the operations used by many multimedia appli gARMTM available from Intel Corporation cations may be performed using packed data in a general of Santa Clara, Calif., although other systems (including PCs purpose processor 102. Thus, many multimedia applications having other microprocessors, engineering Workstations, set can be accelerated and executed more ef?ciently by using the top boxes and the like) may also be used. In one embodiment, full Width of a processor’s data bus for performing operations sample system 100 may execute a version of the WIN on packed data. This can eliminate the need to transfer smaller DOWSTM operating system available from Microsoft Corpo units of data across the processor’ s data bus to perform one or ration of Redmond, Wash., although other operating systems more operations one data element at a time. (UNIX and Linux for example), embedded softWare, and/or [0031] Alternate embodiments of an execution unit 108 can graphical user interfaces, may also be used. Thus, embodi also be used in micro controllers, embedded processors, ments of the present invention are not limited to any speci?c graphics devices, DSPs, and other types of logic circuits. combination of hardWare circuitry and softWare. System 100 includes a memory 120. Memory 120 can be a [0027] Embodiments are not limited to computer systems. dynamic random access memory (DRAM) device, a static Alternative embodiments of the present invention can be used random access memory (SRAM) device, ?ash memory in other devices such as handheld devices and embedded device, or other memory device. Memory 120 can store applications. Some examples of handheld devices include instructions and/or data represented by data signals that can cellular phones, Internet Protocol devices, digital cameras, be executed by the processor 102. US 2013/0262910 A1 Oct. 3, 2013

[0032] A system logic chip 116 is coupled to the processor shoWn), or the like. The variable clock 230 may typically be bus 110 and memory 120. The system logic chip 116 in the operated at a faster frequency than the ?xed clock 240. For illustrated embodiment is a hub (MCH). example, the variable clock 230 may operate at 100 MHZ The processor 102 can communicate to the MCH 116 via a While the ?xed clock 240 may operate at 24 MHZ. The vari processor bus 110. The MCH 116 provides a high bandwidth able clock 230 may generate a variable clock signal. memory path 118 to memory 120 for instruction and data [0038] The PCH-TSC 212 in the chipset 210 may be based storage and for storage of graphics commands, data and tex on the ?xed clock 240 (i.e., crystal clock). The PCH-TSC 212 tures. The MCH 116 is to direct data signals betWeen the may provide time-stamp count values for the chipset and processor 102, memory 120, and other components in the provide a reference count value for the CPU 250. For system 100 and to bridge the data signals betWeen processor example, the PCH-TSC may be a 64 bit counter. The chipset bus 110, memory 120, and system I/O 122. In some embodi 210 may also include various other knoWn components (not ments, the system logic chip 116 can provide a graphics port shoWn) to perform algorithms to process data, in accordance for coupling to a graphics controller 112. The MCH 116 is With the present invention. coupled to memory 120 through a memory interface 118. The [0039] The CPU 250 may include core and uncore compo graphics card 112 is coupled to the MCH 116 through an nents. The CPU 550 may also include various other knoWn Accelerated Graphics Port (AGP) interconnect 114. components (not shoWn) to perform algorithms to process [0033] System 100 uses a proprietary hub interface bus 122 data, in accordance With the present invention. The CPU 250 to couple the MCH 116 to the I/O controller hub (ICH) 130. may receive the variable clock signal, the ?xed clock (crystal The ICH 130 provides direct connections to some I/ O devices clock) signal, and the PCH-TSC count value. In an embodi via a local I/O bus. The local I/O bus is a high-speed I/O bus ment, the CPU 250 may operate in a variety of different for connecting peripherals to the memory 120, chipset, and operational states such as P-state, C-state, and T-state. For processor 102. Some examples are the audio controller, ?rm example, the CPU 250 may operate in normal operational Ware hub (?ash BIOS) 128, Wireless transceiver 126, data states such as P-state, inpoWer savings operational states such storage 124, legacy I/O controller containing user input and as C-state, or in condition dependent state such as T-state for keyboard interfaces, a serial expansion port such as Universal temperature dependent states. C-state operations may be at Serial Bus (USB), and a netWork controller 134. The data loWer clock speeds for sleep mode and the like. T-state opera storage device 124 can comprise a hard disk drive, a ?oppy tions may be at loWer clock speeds for thermal reasons such as disk drive, a CD-ROM device, a ?ash memory device, or overheating and the like. The variable clock 230 may provide other mass storage device. the different clock speeds for the CPU based on the CPU’s [0034] For another embodiment of a system, an instruction operational state because the variable clock 230 may be in accordance With one embodiment can be used With a sys changed on the ?y as described above. tem on a chip. One embodiment of a system on a chip com [0040] The sloW counter 252 may be based on the crystal prises of a processor and a memory. The memory for one such clock and may doWnload the PCH-TSC value from the PCH system is a ?ash memory. The ?ash memory can be located on TSC 212 at operational state change times. For example, upon the same die as the processor and other system components. the CPU 250 Waking up from sleep mode, the sloW counter Additionally, other logic blocks such as a memory controller 252 may doWnload the PCH-TSC value since the chipset 210 or graphics controller can also be located on a system on a presumably Was in operation for at least a portion of the time chip. the CPU 250 Was in sleep mode. The PCH-TSC value may be [0035] FIG. 2 is a simpli?ed block diagram ofa processor a reference value for the sloW counter 252. 200 in accordance With an embodiment of the present inven [0041] The fast counter 252 may be based on the variable tion. The processor 200 may include a chipset 210, a variable clock 230. As described above, the variable clock 230 may be clock 230, a ?xed clock 240, and a CPU 250. The chipset 210 changing over time, and thus the fast counter’s operations may include a platform control hub time-stamp counter may be based on the variable clock’ s 230 changing frequency. (PCH-TSC) 212. The CPU 250 may include a sloW counter The fast counter 252 may also include a reset input. In an 252, a fast counter 254, a normalizing unit 256, an adder 258, embodiment, the reset input may be controlled by the crystal and a TSC 260. clock. For example, the fast counter 252 may reset at every [0036] The ?xed clock 240 may be provided as a crystal complete cycle of the ?xed clock 240. The normaliZing unit oscillator or may be provided as a direct or indirect connec 256 may normaliZe the fast counter 254 operations to ensure tion to a crystal oscillator. Hence, the ?xed clock 240 may be ?xed factor counting by the ?xed counter 254. In an embodi classi?ed as a stable, un-spread clock. The ?xed clock 240 ment, the fast counter 254, for example, may be an eight bit signal may be provided to various processor components in counter. the chipset 210 and CPU 250. In a client-side implementa [0042] The adder 258 may combine the sloW counter 252 tion, the ?xed clock 240, for example, may operate at 24 output and the fast counter 254 output. The combined value MHZ, and in a server-side implementation, the ?xed clock may be stored in the TSC 260. Of course, in an embodiment, 240, for example, may operate at 25 MHZ. The ?xed clock the combining of the sloW counter 252 and the fast counter 240 may generate a ?xed clock signal. 254 may be integrated into the TSC 260. The TSC 260 may [0037] The variable clock 230 may be provided as a spread store the CPU TSC value (the combination of the fast counter clock Where the clock frequency may be adjusted “on the ?y.” and sloW counter outputs). The CPU 250 may read the TSC Hence, the variable clock 230 may be classi?ed as an unstable 260 via a read instruction for various applications being clock. The variable clock 230 signal may be provided to the executed by the processor 200. CPU 250 to provide different frequencies based on opera [0043] FIG. 3 illustrates a graph of the sloW counter 252 tional states of the CPU 250. In an embodiment, the variable output, the fast counter 254 output, and the TSC 260 value in clock 230 may be derived from the ?xed clock 240 via fre accordance With an embodiment of the present invention. The quency multipliers (not shoWn), frequency dividers (not sloW counter output may correspond to clock cycles of the US 2013/0262910 A1 Oct. 3, 2013

crystal clock, Which may be a stable clock. The fast counter classi?ed as a stable, un-spread clock. The ?xed clock 540 output may correspond to clock cycles of the variable clock, signal may be provided to various processor components in Which may be an unstable clock, and the fast counter output the chipset 510 and CPU 550. In a client-side implementa may be reset at every clock cycle of the crystal clock. As tion, the ?xed clock 540, for example, may operate at 24 shoWn and described above, the TSC value may correspond to MHz, and in a server-side implementation, the ?xed clock the sum of the fast and sloW counter outputs. Thus, accuracy 540, for example, may operate at 25 MHz. The ?xed clock of the TSC 260 may be greatly increased as compared to 240 may generate a ?xed clock signal. conventional systems and may provide improved coherent [0050] The variable clock 530 may be provided as a spread counting betWeen components such as the chipset and the clock Where the clock frequency may be adjusted on the ?y. CPU. Moreover, TSC reads of the TSC 260 may provide more Hence, the variable clock 530 may be classi?ed as an unstable accurate monotonically increasing unique value Whenever clock. The variable clock 530 signal may be provided to the reads are executed. CPU 550 to provide different frequencies based on opera [0044] FIG. 4 illustrates a simpli?ed process How diagram tional states of the CPU 550. In an embodiment, the variable of a timekeeping operation as executed by the processor 200 clock 530 may be derived from the ?xed clock 540 via fre in accordance With an embodiment of the present invention. quency multipliers (not shoWn), frequency dividers (not In step 402, the CPU 250 may Wake from sleep mode or may shoWn), or the like. The variable clock 530 may typically transition from one operational state to another operational operate at a faster frequency than the ?xed clock 540. For state. If the CPU 250 Was previously in sleep mode, the sloW example, the variable clock 530 may operate at 100 MHz and fast counters 252, 254 presumably Were non-operational While the ?xed clock 540 may operate at 24 MHz. The vari during sleep mode. able clock 230 may generate a variable clock signal. [0045] In step 404, the sloW counter 252 may doWnload the [0051] The PCH-TSC 512 in the chipset 510 may be based PCH-TSC value from the PCH-TSC 212 since the PCH-TSC on the ?xed clock 240 (i.e., crystal clock). The PCH-TSC 512 212 Was presumably operating for at least a portion of time may provide time-stamp count values for the chipset and the CPU 250 Was in sleep mode. The doWnloaded PCH-TSC provide a reference count value for the CPU 550. For value may be a reference value for the sloW counter 252 to example, the PCH-TSC may be a 64 bit counter. The chipset correlate the CPU 250 and the chipset 210. With the doWn 510 may also include various other knoWn components (not loaded PCH-TSC value, the sloW counter 252 may start its shoWn) to perform algorithms to process data, in accordance counter based on the crystal clock in step 406. The sloW With the present invention. counter 252 may increment its count value by one at each [0052] The CPU 550 may include core and uncore sections cycle of the crystal clock. 560, 570. The CPU 550 may also include various other knoWn [0046] In step 408, Which may be in parallel With the sloW components (not shoWn) to perform algorithms to process counter operations, the fast counter 254 may begin its opera data, in accordance With the present invention. The CPU 550 tion upon CPU 250 Waking up too. The fast counter 252 may may receive the variable clock signal, the ?xed clock (crystal be based on the variable clock 230 and, thus, may increment clock) signal, and the PCH-TSC count value. In an embodi its count value (by 0, l, or more depending on the normalizing ment, the CPU 550 may operate in a variety of different unit 256) at each cycle of the variable clock 230. HoWever, at operational states such as P-state, C-state, and T-state. For every crystal clock cycle, the fast counter 254 may be reset. example, the CPU 550 may operate in normal operational Moreover, the fast counter value may be normalized by the states such as P-state, inpoWer savings operational states such normalizing unit 256. Normalizing may correspond to adjust as C-state, or in condition dependent state such as T-state for ing the fast count value to account for changes in frequency of temperature dependent states. C-state operations may be at the variable clock 230. For example, although the variable loWer clock speeds for sleep mode and the like. T-state opera clock frequency may be changed on the ?y, the fast count tions may be at loWer clock speeds for thermal reasons such as value may be normalized to reach a de?ned value at every overheating and the like. The variable clock 530 may provide crystal clock cycle. Thus, the fast count value may approach the different clock speeds for the CPU based on the CPU’s the de?ned value before the fast counter is reset at every operational state because the variable clock 530 may be crystal clock cycle. changed on the ?y as described above. Furthermore, the [0047] In step 410, the sloW count and fast count values uncore and core sections 560, 570 may operate in different may be combined to generate the TSC value. The TSC value operational states simultaneously. For example, the uncore may then be available for use by various components such as section 560 may Wake up from sleep mode ?rst While the core operating systems and applications. section 570 may still be in sleep mode. [0048] FIG. 5 is a simpli?ed block diagram ofa processor [0053] The uncore TSC 562 may be based on the crystal 500 in accordance With an embodiment of the present inven clock and may doWnload the PCH-TSC value from the PCH tion. The processor 500 may include a chipset 510, a variable TSC 512 at operational state change times. For example, upon clock 530, a ?xed clock 540, and a CPU 550. The chipset 510 the uncore section 560 of the CPU 550 Waking up from sleep may include a platform control hub time-stamp counter mode, the uncore TSC 562 may doWnload the PCH-TSC (PCH-TSC) 512. The CPU 550 may have an uncore section value since the chipset 510 presumably Was in operation for at 560 and a core section 570. The uncore section 560 may least a portion of the time the CPU 550 (uncore section 560) include an uncore TSC 562. The core section 570 may include Was in sleep mode. The PCH-TSC value may be a reference a sloW counter 572, a P multiplier 574, a N multiplier 576, a value for the sloW counter 252. The uncore TSC 562 may fast counter 578, a normalizing unit 580, an adder 582, and a generate an uncore reference timing (URT) value based on the TSC 590. PCH-TSC value and the crystal clock. [0049] The ?xed clock 540 may be provided as a crystal [0054] The sloW counter 572 may be based on the crystal oscillator or may be provided as a direct or indirect connec clock and may doWnload the URT value from the uncore TSC tion to a crystal oscillator. Hence, the ?xed clock 540 may be 562 at operational state change times. For example, upon the US 2013/0262910 A1 Oct. 3, 2013

core section 570 of the CPU 550 Waking up from sleep mode, the uncore section 560 Was in sleep mode. The doWnloaded the sloW counter 572 may download the URT value since the PCH-TSC value may be a reference value for the uncore TSC uncore section 560 presumably Woke from sleep mode before 562 to correlate the CPU 550 and the chipset 510. With the the core section 570. The URT value may be a reference value doWnloaded PCH-TSC value, the uncore TSC 562 may start for the sloW counter 572. The sloW counter 572 may generate its counter based on the crystal clock in. The uncore TSC 562 an core timing counter (CTC) value based on the URT value may increment its count value by one at each cycle of the and the crystal clock. The sloW counter 572 output may be crystal clock thereby generating the URT value. coupled to a multiplier 574, Which may multiply the sloW [0060] In step 706, the core section 570 of the CPU 550 count value by a P factor. The P factor may be dynamically may Wake from sleep mode or may transition from one opera programmable. tional state to another operational state. If the core section 570 [0055] The variable clock signal may be spread to the core Was previously in sleep mode, the sloW and fast counters 572, components via multiplier 576, Which may multiply the vari 578 presumably may have been non-operational during sleep able clock signal With a N factor. The N factor may be mode. dynamically programmable. The fast counter 578 may be [0061] In step 708, the sloW counter 572 may doWnload the based on the N* variable clock signal. As described above, the URT value from the uncore TSC 562 since the uncore TSC variable clock 530 may be changing over time, and thus the 562 presumably Woke up before the core section 570. With fast counter’s operations may be based on the variable clock’ s the doWnloaded URT value, the sloW counter 572 may start its changing frequency. The fast counter 578 may also include a counter based on the crystal clock in step 710. The sloW reset input. In an embodiment, the reset input may be con counter 252 may increment its count value by one at each trolled by the crystal clock. For example, the fast counter 578 cycle of the crystal clock thereby generating the CTC value. may reset at every complete cycle of the ?xed clock 540. The In step 712, the CTC value (sloW count value) may be multi normalizing unit 580 may normalize the fast counter 578 plied by the P factor via multiplier 574. The P factor may be operations to ensure ?xed factor counting by the ?xed counter programmed to a ?xed factor to ensure that the CTC value 578. In an embodiment, the fast counter 578, for example, step size is consistent (i.e., counting 1/24 for a 24 MHZ crystal may be an eight bit counter. clock). [0056] The adder 582 may combine the N multiplied sloW [0062] In step 714, Which may be in parallel With the sloW counter 572 output and the fast counter 578 output. The counter operations (steps 708-712), the variable clock signal combined value may be stored in the TSC 260. Of course, in may be multiplied by the N factor via multiplier 576. The N an embodiment, the combining of the N multiplied sloW factor may programmed based on the operational state of the counter 572 and the fast counter 574 may be integrated into core section and its desired operational frequency. the TSC 590. The TSC 590 may store the CPU TSC value (the [0063] In step 716, the fast counter 578 may begin its opera combination of the fast counter and sloW counter outputs). tion upon the core section 570 Waking up too. The fast counter The CPU 550 may read the TSC 590 via a read instruction for 578 may be based on the variable clock 530 (N multiplied various applications being executed by the processor 500. variable clock signal) and, thus, may increment its count [0057] FIG. 6 illustrates a graph of the sloW counter 572 value (by 0, l, or more depending on the normalizing unit output, the fast counter 574 output, and the TSC 590 value in 580) at each cycle of the variable clock 530. HoWever, at accordance With an embodiment of the present invention. The every crystal clock cycle, the fast counter 578 may be reset. sloW counter output may correspond to clock cycles of the Moreover, the fast counter value may be normalized by the crystal clock, Which may be a stable clock. The sloW counter normalizing unit 580. Normalizing may correspond to adjust output may also be scaled by P. The fast counter output may ing the fast count value to account for changes in frequency of correspond to clock cycles of the variable clock, Which may the variable clock 530. For example, although the variable be an unstable clock, and the fast counter output may be reset clock frequency may be changed on the ?y, the fast count at every clock cycle of the crystal clock. Further, the fast value may be normalized to reach a de?ned value at every counter clock may be scaled by factor N. As shoWn and crystal clock cycle. Thus, the fast count value may approach described above, the TSC value may correspond to the sum of the de?ne value before the fast counter is reset at every crystal the fast and sloW counter outputs. Thus, accuracy of the TSC clock cycle. 590 may be greatly increased as compared to conventional [0064] In step 718, the sloW count (P multiplied) and fast systems and may provide improved coherent counting count values may be combined to generate the TSC value. The betWeen components such as the chipset and the CPU. More TSC value may then be available for use by various compo over, TSC reads of the TSC 590 may provide more accurate nents such as operating systems and applications. monotonically increasing unique value Whenever reads are [0065] Embodiments of the present invention may be executed. implemented in a computer system. Embodiments of the [0058] FIG. 7 illustrates a simpli?ed process How diagram present invention may also be implemented in other devices of a timekeeping operation as executed by the processor 500 such as handheld devices and embedded applications. Some in accordance With an embodiment of the present invention. examples of handheld devices include cellular phones, Inter In step 702, the uncore section 560 of the CPU 550 may Wake net Protocol devices, digital cameras, personal digital assis from sleep mode or may transition from one operational state tants (PDAs), and handheld PCs. Embedded applications can to another operational state. If the entire CPU 550 Was pre include a micro controller, a digital signal processor (DSP), a viously in sleep mode, the uncore TSC 562 and the sloW and system on a chip, netWork computers (NetPC), set-top boxes, fast counters 572, 578 presumably Were non-operational dur netWork hubs, Wide area netWork (WAN) sWitches, or any ing sleep mode. other suitable system. [0059] In step 704, the uncore TSC 562 may doWnload the [0066] Embodiments may be implemented in code and may PCH-TSC value from the PCH-TSC 512 since the PCH-TSC be stored on a storage medium having stored thereon instruc 512 Was presumably operating for at least a portion of time tions Which can be used to program a system to perform the US 2013/0262910 A1 Oct. 3, 2013

instructions. The storage medium may include, but is not a variable clock signal input to receive a variable clock limited to, any type of disk including ?oppy disks, optical signal; disks, optical disks, solid state drives (SSDs), compact disk a fast counter con?gured to be based on the variable clock read-only memories (CD-ROMs), compact disk reWritables signal and to generate a fast count value; (CD-RWs), and magneto-optical disks, semiconductor a sloW counter con?gured to be based on the ?xed clock devices such as read-only memories (ROMs), random access signal, to doWnload a reference value based on a chipset memories (RAMs) such as dynamic random access memories time stamp counter (TSC) value, and to generate a sloW (DRAMs), static random access memories (SRAMs), eras count value; able programmable read-only memories (EPROMs), ?ash a TSC con?gured to combine the fast count value and the memories, electrically erasable programmable read-only sloW count value to generate a central TSC value. memories (EEPROMs), magnetic or optical cards, or any 8. The processor of claim 7, Wherein the fast counter is other type of media suitable for storing electronic instruc further con?gured to receive a reset signal based on the ?xed tions. clock signal. [0067] While the present invention has been described With 9. The processor of claim 7, further comprises a normal respect to a limited number of embodiments, those skilled in iZation unit coupled to the fast counter. the art Will appreciate numerous modi?cations and variations 10. The processor of claim 7, Wherein the fast counter, sloW therefrom. It is intended that the appended claims cover all counter, and central TSC are provided in a core section of the such modi?cations and variations as fall Within the true spirit CPU. and scope of this present invention. 11. The processor of claim 10, further comprises an uncore What is claimed is: TSC con?gured to provide the reference value to the sloW 1. A system, comprising: counter. a ?xed clock to provide a ?xed clock signal; 12. The processor of claim 1, further comprising at least a variable clock to provide a variable clock signal; one multiplier. a chipset to including a chipset time stamp counter (TSC) 13. A method, comprising: based on the ?xed clock signal; Waking a CPU from sleep mode; a processor, comprising doWnloading a chipset time stamp counter (TSC) value; a fast counter to be based on the variable clock signal and operating a sloW counter based on a ?xed clock signal and to generate a fast count value; also based on the chipset TSC value; a sloW counter to doWnload a time stamp value based on in parallel With the sloW counter operations, operating a the chipset TSC at Wakeup and based on the ?xed fast counter based on a variable clock signal; clock signal and to generate a sloW count value; and combining results of the sloW counter and the fast counter a central TSC to combine the fast count value and sloW to generate a TSC value. count value to generate a central TSC value. 14. The method of claim 13, Wherein operating the fast 2. The system of claim 1, Wherein the fast counter is to counter includes resetting the fast counter based on the ?xed receive a reset signal based on the ?xed clock signal. clock signal. 3. The system of claim 1, further comprises a normalization 15. The method of claim 13, Wherein operating the fast unit coupled to the fast counter. counter includes normalizing the fast counter result. 4. The system of claim 1, Wherein the fast counter, sloW 16. The method of claim 13, further comprising operating counter, and central TSC are provided in a central processing an uncore TSC based on the chipset TSC value. unit (CPU) core. 5. The system of claim 4, further comprises an uncore TSC 17. The method of claim 13, further comprising multiply coupled to the chipset TSC and con?gured to provide the time ing the variable clock signal by a factor and providing the factor multiplied variable clock signal to core components of stamp value to the sloW counter. 6. The system of claim 1, further comprising at least one the CPU. multiplier. 18. The method of claim 13, further comprising multiply 7. A processor, comprising: ing the sloW counter result by a factor. a ?xed clock signal input to receive a ?xed clock signal; * * * * *