(12) United States Patent (10) Patent No.: US 9,612,835 B2 Palanca Et Al

USOO961.2835B2 (12) United States Patent (10) Patent No.: US 9,612,835 B2 Palanca et al. (45) Date of Patent: *Apr. 4, 2017 (54) MFENCE AND LFENCE (58) Field of Classification Search MICRO-ARCHITECTURAL None IMPLEMENTATION METHOD AND SYSTEM See application file for complete search history. (75) Inventors: Salvador Palanca, Folsom, CA (US); (56) References Cited Stephen A. Fischer, Gold River, CA (US); Subramaniam Maiyuran, Gold U.S. PATENT DOCUMENTS River, CA (US); Shekoufeh Qawami, El Dorado Hills, CA (US) 5,265,233 A 11/1993 Frailong 5,636,374. A 6/1997 Rodgers et al. (73) Assignee: Intel Corporation, Santa Clara, CA (Continued) (US) FOREIGN PATENT DOCUMENTS (*) Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 WO WO-9708608 3, 1997 U.S.C. 154(b) by 0 days. This patent is Subject to a terminal dis OTHER PUBLICATIONS claimer. Alpha 21164 Microprocessor Hardware Reference Manual (Dec. (21) Appl. No.: 13/619,919 1998, pp. 1-435).* (Continued) (22) Filed: Sep. 14, 2012 Primary Examiner — George Giroux (65) Prior Publication Data (74) Attorney, Agent, or Firm — Vecchia Patent Agent, US 2013 FOO672OO A1 Mar. 14, 2013 LLC Related U.S. Application Data (57) ABSTRACT (63) Continuation of application No. 13/440,096, filed on A system and method for fencing memory accesses. Apr. 5, 2012, now Pat. No. 9,383,998, which is a Memory loads can be fenced, or all memory access can be (Continued) fenced. The system receives a fencing instruction that sepa (51) Int. Cl. rates memory access instructions into older accesses and G06F 5/00 (2006.01) newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer G06F 9/30 (2006.01) than the fencing instruction are stalled. The older access (Continued) instructions are gradually retired. When all older memory (52) U.S. Cl. accesses are retired, the fencing instruction is dispatched CPC ...... G06F 9/30145 (2013.01); G06F 9/30043 from the buffer. (2013.01); G06F 9/30047 (2013.01); (Continued) 24 Claims, 5 Drawing Sheets FUFETCHES ID DECOCES INTO ETRY IS STORE BUFFERS FENCE STORE DATA ALOCATED INT ACATED IN ACR0 STORE ADDRESSMFENCE RESERVATION EORY INSTRUCTION MICRO-OPERATIONS STATION ORDERING UNI 302 38 34 304 306 310 308 A-RETREVENT AL PREWOS SAL DISPATCHES DISPACHTOL INSERUCTIONS IN PROGRA FENCE OFALNSRUCTIONS CACHE CONTROLER ORDER RETIRED FROM THE READY TO FOONG EORY SBSYSTE SPATCH FENCEN ONSOREPOR PROGRA. ORDER ESPATCH AND RETIRE PREVIOUSNSTRUCTIONS OTSANDING ANY WRE BUFFERS (READ & WRITE COENING BUFFERSN IN CACHE CONTROLLER CACHE CONTROLLERON NOTEE GEOBAY ECTION PROCESS 326 EWCA 524 DEALLOCATEFENCE FROSTORE BUFFER IN MEMORY OUTSANDING WRE COMBINING ORDERING UNIT. CACHE CONTROLLERTREATSAFENCE ASA NOP. RETRENFENCE FROL CACHE CONTROLER. BUFFERS US 9,612,835 B2 Page 2 Related U.S. Application Data Bernstein et al. (Solutions and Debugging for Data Consistency in Multiprocessors with Noncoherent Caches, Feb. 1995, pp. continuation of application No. 10/654,573, filed on 83-103).* Sep. 2, 2003, now Pat. No. 8,171,261, which is a Alpha 21264 Microprocessor Hardware Reference Manual, continuation of application No. 10/194,531, filed on COMPAQ Computer Corporation, Order No. EC-RJRZA-TE, (Jul. Jul. 12, 2002, now Pat. No. 6,651,151, which is a 1999). continuation of application No. 09/475.363, filed on Goodman, "Cache Consistency and Sequential Consistency'. Dec. 30, 1999, now Pat. No. 6,678,810. (1989), pp. 1-4. Thakkar, Shreekant, “The Internet Streaming SIMD Extensions'. (51) Int. Cl. Intel Technology Journal, Retrieved on Nov. 11, 1999, Legal G06F 9/40 (2006.01) Information (C) 1999, pp. 1-8, from http://support.intel.com/technol G06F 9/38 (2006.01) ogy/it q21999/articles/art 1a.htm. (52) U.S. Cl. Advanced Micro Devices, Inc., “AMD-3D Technology Manual”. CPC ........ G06F 9/30087 (2013.01); G06F 9/3834 (Feb. 1998), pp. i-x, 1-58. Barad, Haim, et al., “Intel's Multimedia Architecture Extension', (2013.01); G06F 9/3836 (2013.01); G06F Nineteenth Convention of Electrical and Electronics Engineers in 9/3855 (2013.01); G06F 9/3857 (2013.01) Israel. (1996), pp. 148-151. Control Data Corporation, “Control Data 6400/6500/6600 Com (56) References Cited puter Systems Reference Manual”. Publication No. 601 00000, (1967), 159 pages. U.S. PATENT DOCUMENTS Convex Computer Corporation, “C4/XA Architecture Overview”. 5,675,724. A 10, 1997 Beal et al. Convex Technical Marketing, (Feb. 1994), 279 pages. 5,694,553 A * 12/1997 Abramson et al. ............. 71.2/23 Intel Corporation, “i860 Microprocessor Family Programmer's Ref 5,694,574 A 12/1997 Abramson et al. erence Manual”. (1992), Ch. 1, 3, 8 & 12. 5,724,536 A 3, 1998 Abramson et al. Intel Corporation, “Intel 80386 Programmer's Reference Manual”. 5,751,996 A 5, 1998 Glew et al. (1986), 421 pages. 5,778,245 A 7/1998 Papworth et al. Intel Corporation, "Pentium Processor User's Manual”, vol. 3: 5,790,398 A 8, 1998 Horie Architecture and Programming Manual. (1993), Ch. 1, 3-4, 6, 8 & 5,802,575 A 9/1998 Greenley et al. 18. 5,802,757 A 9, 1998 Duval et al. Kohn, L., et al., “The Visual Instruction Set (VIS) in UltraSPARC, 5,826, 109 A 10, 1998 Abramson et al. SPARC Technology Business-Sun Microsystems, Inc.(1995), pp. 5,860,126 A 1, 1999 Mittal 462-469. 5,881,262 A 3, 1999 Abramson et al. 5,898,854 A 4, 1999 Abramson et al. Lawrence Livermore Laboratory, "S-1 Uniprocessor Architecture'. 5,903,740 A 5, 1999 Walker et al. Apr. 21, 1983, 386 pages. 6,006,325 A * 12/1999 Burger et al. ................. T12/214 Lawrence Livermore Laboratory, “vol. I: Architecture-The 1979 6,038,646 A 3/2000 Sproull Annual Report-The S-1 Project', (1979), 443 pages. 6,047,334 A 4/2000 Langendorf et al. Lawrence Livermore Laboratory, “vol. II: Hardware-The 1979 6,073,210 A 6, 2000 Palanca et al. Annual Report-The S-1 Project', (1979), 366 pages. 6,088,771 A 7/2000 Steely, Jr. et al. Motorola, Inc., “MC88110 Second Generation RISC Microproces 6,088,772 A 7/2000 Harrimon et al. sor User's Manual”, MC811OUM/AD, (1991), 619 pages. 6,148,394 A 11/2000 Tung et al. Philips Electronics, “TriMedia TM1000 Preliminary Data Book”, 6,189,089 B1 2/2001 Walker et al. (1997), 496 pages. 6,216.215 B1 4/2001 Palanca et al. Samsung Electronics, “21164 Alpha Microprocessor Data Sheet”, 6,223,258 B1 4/2001 Palanca et al. 6,233,657 B1 5/2001 Ramagopal et al. (1997), 121 pages. 6,266,767 B1 7/2001 Feiste et al. Shipnes, J., “Graphics Processing with the 88.110 RISC Micropro 6,286,095 B1 9, 2001 Morris et al. cessor", IEEE, (1992), pp. 169-174. 6,356,270 B2 3, 2002 Pentkovski et al. Sun Microsystems, Inc., “VIS Visual Instruction Set User's 6,546,462 B1 4/2003 Palanca et al. Manual”. Part #805-1394-01, (Jul 1997), pp. i-xii. pp. 1-136. 6,636,950 B1 10/2003 Mithal et al. Sun Microsystems, Inc., “Visual Instruction Set (VIS) User's 6,651,151 B2 11/2003 Palanca et al. Guide'. Version 1.1. (Mar. 1997), pp. ixii, pp. 1-127. 6,678,810 B1 1/2004 Palanca et al. Texas Instruments, “TMS320C2X User's Guide', (1993), pp. 3:2- 6,708,269 B1 3, 2004 Tiruvallur et al. 3:11; 3:28-3:34; 4:1-4:22; 4:41; 4:103; 4:119-4:120; 4:122; 4:150 6,754,751 B1 6, 2004 Wilke 4:151. 6,862,679 B2 3/2005 Hacking et al. Texas Instruments, “TMS320C80 (MVP) Master Processor User's Guide”. (1995), 595 pages. OTHER PUBLICATIONS Texas Instruments, “TMS320C80 (MVP) Parallel Processor User's Guide”. (1995), 705 pages. Weaver et al. (The SPARC Architecture Manual: Version 9, Pub lished 1994, pp. 1-369).* * cited by examiner U.S. Patent Apr. 4, 2017 Sheet 1 of 5 US 9,612,835 B2 102 MACRO-INSTRUCTIONS: E. MFENCE, FENCE, 104 MASKMOVO, PREFETCH. NSTRUCTION FETCH UNIT NSTRUCTION DECODER UNIT MICRO-OPERATIONS:LE, STORE ADDRESSMFENCE UOP & STORE DATA MFENCE UOP, LFENCE UOP, MASKED WRITE ADDRESS UOP MASKED WRITE DATA UOP, LOAD HINT UOP. 108 106 RESERVATION STATION STORE PORT 10 REORDER BUFFER LOAD PORT: FENCE UOP, MFENCEUOPS. REGISTER FLE LOAD HINTUOP, - MEMORY ORDERING CACHEABLE 112 LOAD UNIT LOAD HIT (TO BUFFER RECISTER FILE) UNCACHEABLE LOADS CACHEABLE LOADS AND STORES CACHEABLE CACHEABLE STORE LOAD MISS LOAD HTS CACHEABLE UNCACHEABLE STORE MISS STORES L1 CACHE ? RITE COMBINING WCSTORES BUFFERS (WEAKLY WRITE ORDERED) BUFFERS UNCACHEABLE REQUESTS AND CACHEABLE MISSES UNCACHEABLE CACHEABLE TO IN-ORDER Y REQUESTS/ OUT OF ORDERY REQUESTS L2 CACHE FSB QUEUE o QUEUE11a. 150 ( Bus CONTROLER CCHAELEMSSES ) 142 144 146 140 FIG.1 U.S. Patent US 9,612,835 B2 U.S. Patent Apr. 4, 2017 Sheet 3 of 5 US 9,612,835 B2 909) SEHOIHHfliji U.S. Patent Apr. 4, 2017 Sheet 4 of 5 US 9,612,835 B2 807 ZZ; U.S. Patent Apr. 4, 2017 Sheet 5 of 5 US 9,612,835 B2 FU fetches E) decodes into "fence", Entries are LFENCE and and "store data" & allocated into MFENCE in acro "store address infence" reservation instructions micro-operations station 502 MFENCE infrediately after FENCE FENCE stals dispatch of all Load and store FENCE stalls dispatch buffer entries are ready to subsequent Ticao-operations, of MFENCE in program order allocated in memory dispatch ordering unit SO A picvious loads in program order retired from the memory subsyster, and load buffer tail pointer points to FENCE? Dispatch and retire YES previous pads “At-retirement" 518 dispatch to Ll cache controller A read buffers in L.

Load more