Enabling Retargetable Optimizing Compilers for Quantum Accelerators Via a Multi-Level Intermediate Representation

Home , Optimizing compiler

arXiv:2109.00506v1 [quant-ph] 1 Sep 2021 o oeigadsmlto ak nfilssc snuclear [ as learning such machine fields and in scalabilit tasks chemistry, computational simulation physics, enhance and to modeling potential for the has flows ffdrlysosrdrsac nacrac ihteDEP DOE the with State accordance United in (http://energy.gov/downloads/doe-public-access for research Plan. so, sponsored acce do public federally provide to will of Energy others of allow Department The or purposes. manuscript, reproduc this or publish of to license no world-wide a irrevocable, retains Government up, arti States the United accepting the that by acknowledges publisher, the U and The Energy. retains of Government Department U.S. the with DE-AC05-00OR22725 nbigmr uisa oe ro ae n a expect can one — rates — error improve lower and at scale qubits to more continue enabling architectures hardware As savhcefrrpdqatmcmie rttpn enablin interoperabili prototyping approaches. and compiler compilation work optimizations, quantum classical this rapid integration, see operation for we language entangling vehicle Ultimately, in noise. a reduction program as 10x of a source in common result p than programming that standard faster terns provi via 1000x optimizations compiler resource Our are quantum compilers. that language compar quantum times than faster standalone 5-10x compile and approaches, in Pythonic standard results availabl work efforts compiler our compari and language In of standalone cases. other test demon- execution to sub and and benchmarks and standard rewriting passes, compilation, via approach programmability, pattern and optimization the LLVM quantum MLIR strate standard both the and provide via system We optimizations expressions IR. language classical machine-level progressi quantum LLVM unique map Represen the its to Intermediate leverages capabilities Multi-level and lowering framework the (MLIR) upon tation syntax. builds improved gate- work and entire for patterns Our extensions programming the custom quantum provide support and common language We 3 ou OpenQASM OpenQASM the specification. based leverage of 3 language version we for compiler programmingquantum architecture, a enable our to quantum IR proposed demonstrate available To for languages. compiler reta optimizing, an ahead-of-time enables that (IR) representation diate unu iuain rgamn languages programming simulation, quantum hsmnsrp a enatoe yU-atle L under LLC UT-Battelle, by authored been has manuscript This unu ceeaino xsigsiniccmuigwork computing scientific existing of acceleration Quantum Terms Index Abstract nbigRtreal piiigCmiesfor Compilers Optimizing Retargetable Enabling W rsn ut-ee unu-lsia interme- quantum-classical multi-level a present —We unu ceeaosvaaMulti-Level a via Accelerators Quantum qatmcmuig unu programming, quantum computing, —quantum a ig ainlLaboratory National Ridge Oak a ig,T,Uie States United TN, Ridge, Oak .I I. unu cec Center Science Quantum NTRODUCTION [email protected] he Nguyen Thien nemdaeRepresentation Intermediate -plan). h ulse form published the e 10 l o publication, for cle -xlsv,paid- n-exclusive, st hs results these to ss ,[ ], Government s bi Access ublic otatNo. Contract 12 ie States nited rgetable, ,[ ], ywith ty today, e ative 22 ,a s, our son des at- ve to ]. y g r - - - 1 uha #[ bridge Q# langua to stand-alone as begun with such community, have research the approaches in language gap this such of Re languages. number and a libraries, runtimes, integration classical tight existing enable that approaches language tow formant etc.) (Python, i languages embedded classical frameworks application-level and structures vi data construction vendor-provided circuit manual from community programming nerto fCUadQUrsucs[ tighter resources QPU toward and move CPU to of integration models machine quantum-classical pcfial,teMI rmwr [ framework multi-l MLIR classical the Specifically, state-of-the-art opport these leverage an to represents nity languages as expressions specific program domain quantum stand-alone Treating LLVM. the machine- like to IRs language-lev lowering automated and retain transformations, optimizations, that languages for specific development main compiler robust representation executa enables intermediate This low-level of abstraction. hierarchy to a via representations code itself object high-level workflows language lowering take source progressive associated the that with to tandem abstract close in of those — variety including a — at levels representations compiler enabling no of classi- of introduction improve result the to One is done techniques. work and frameworks of compilation wealth cal a been has there ment, unu sebylnugswt otu oto o struc- flow control true no with languages assembly quantum an IR language-level capabilities. extensible IR-lowering its progressive quant via of prototyping compilers rapid language the for resource unique a acceler provide classically for [ today workflows use heterogeneous in IR multi-level popular [ [ optimization lang niques compiler classical classical and com- high-level quantum and enable integration, language that robust approaches from pilation benefit to stand architectures is[ gies 15 nprle oqatmpormigrsac n develop- and research programming quantum to parallel In ercnl eosrtdteuiiyo LRfrsimple for MLIR of utility the demonstrated recently We ,adcascllnug xesoslk CR[ QCOR like extensions language classical and ], 29 ,[ ], 25 a ig ainlLaboratory National Ridge Oak a ig,T,Uie States United TN, Ridge, Oak ,adcmie-uoae ici ytei strate- synthesis circuit compiler-automated and ], 32 unu cec Center Science Quantum 31 lxne McCaskey Alexander .Teei rtclne omv h quantum the move to need critical a is There ]. [email protected] ,OeQS . [ 3.0 OpenQASM ], ut-ee nemdaerepresentations intermediate multi-level 11 ,[ ], 16 ,adi elpstoe to well-positioned is and ], 3 9 ,[ ], [ ], 17 13 6 sa xml fa of example an is ] ,Sl [ Silq ], ,[ ], 8 ,[ ], 7 18 ,Scaffold ], 20 vlIRs. evel .These ]. r per- ard ,[ ], cently, tech- level uage with (IR) ated 23 do- ges um ion ble u- te el ]. n d a OpenQASM 2 OpenQASM 3 ... 1 %Array = type opaque 2 %Qubit = type opaque

OpenQASM 2 Parser OpenQASM 3 Parser ... 3 4 declare void @__quantum__qis__cnot(%Qubit* %0, →֒ MLIR Target %Qubit* %1) 5 declare void @__quantum__qis__h(%Qubit* %0) Quantum Affine SCF Std 6 7 declare void (QIR Specification LLVM Dialect - LLVM IR ֒→ @__quantum__rt__qubit_release_array(%Array* %0 8 declare i8* )Binary Executable ֒→ @__quantum__rt__array_get_element_ptr_1d (QCOR Runtime Linkage: Classical Compute Codegen: ֒→ %Array %0, i64 %1 NISQ / FTQC x86, ARM, PowerPC, etc. * 9 declare %Array Accelerator Host * (quantum__rt__qubit_allocate_array(i64 %0__@ →֒ 10 Fig. 1: QCOR’s MLIR-based compilation stack for CPU- 11 define i32 @ghz() { QPU heterogeneous computing. Each quantum programming 12 %4 = call %Array* (language is processed by a dedicated parser that produces ֒→ @__quantum__rt__qubit_allocate_array(i64 3 13 %5 = call i8* )the AST of the input source code. AST of different source ֒→ @__quantum__rt__array_get_element_ptr_1d (languages is all mapped to an MLIR representation expressed ֒→ %Array* %4, i64 0 in the QCOR quantum dialect and built-in Standard, Affine, 14 %6 = bitcast i8* %5 to %Qubit** 15 %7 = load %Qubit , %Qubit %6, align 8 and SCF dialects. This MLIR representation is progressively * ** 16 call void @__quantum__qis__h(%Qubit* %7) (multi-stage) transformed (optimization and dialect conver- 17 %8 = call i8* )sion/lowering) until only operations in the LLVM dialect ֒→ @__quantum__rt__array_get_element_ptr_1d (Array %4, i64 1% →֒ remain, i.e., quantum operations are converted to LLVM * 18 %9 = bitcast i8* %8 to %Qubit** function calls adhering to the QIR specification. This guar- 19 %10 = load %Qubit*, %Qubit** %9, align 8 antees that the final binary executable is compatible with 20 call void @__quantum__qis__cnot(%Qubit* %7, (Qubit %10% →֒ any QIR-conformed runtime implementations provided at link * 21 %11 = call i8* )time, such as qcor runtime supporting both remotely hosted ֒→ @__quantum__rt__array_get_element_ptr_1d (NISQ) and tightly coupled (FTQC) execution models. ֒→ %Array* %4, i64 2) 22 %12 = bitcast i8* %11 to %Qubit** 23 %13 = load %Qubit*, %Qubit** %12, align 8 24 call void @__quantum__qis__cnot(%Qubit* %10, (tures [19] — a low level MLIR dialect for quantum computing. ֒→ %Qubit* %13 In this work, we leverage and extend that simple quantum 25 call void *quantum__rt__qubit_release_array(%Array__@ →֒ (MLIR extension for a more complex quantum language — ֒→ %4 OpenQASM version 3.0 (henceforth referred to as Open- 26 call void @__quantum__rt__finalize() QASM, unless stated otherwise), which provides robust con- 27 ret i32 0 28 } trol flow structures, variable declaration and assignment, and novel syntax for quantum circuit generation and synthesis [1]. Fig. 2: A quantum program generating the Greenberger-Horne- Our approach enables an optimizing compiler for OpenQASM Zeilinger (GHZ) state on 3 qubits represented in the LLVM that compiles to the LLVM machine-level IR adherent to IR adherent to the QIR specification. the recently introduced Quantum Intermediate Representation (QIR) specification [2],[5]. Moreover, this approach need not be limited to OpenQASM — the implementation pattern execution. This work extends the qcor executable to stand- shown in this work can serve as a robust mechanism for further alone OpenQASM source files, and enables their compilation quantum language compiler prototyping and deployment. The and execution in a retargetable (quantum hardware-agnostic) work presented here puts forward the requisite infrastructure fashion. for quantum language expression and LLVM IR generation, leaving future language compiler implementations as a matter II. BACKGROUND of providing the mapping of a language abstract syntax tree (AST) to our quantum MLIR extension (via ANTLR [28], Figure 1 demonstrates the hierarchy of layers underlying our for instance). Language lowering to executable code is then compiler architecture. Ultimately, we take quantum-classical readily available. languages down to an MLIR representation before emitting We integrate our approach with the qcor compiler platform standard object code via lowering to the LLVM IR (we [4],[20] (note we use QCOR to denote the language extension leverage the LLVM ecosystem of executables to map an specification [27] and qcor for the compiler implementation). IR bitcode representation to assembly and executable code). qcor enables single-source quantum-classical programming Here we seek to provide necessary background on the QIR in both C++ and Python, promoting an ahead-of-time C++ specification, the MLIR framework, and the qcor compiler compiler executable and just-in-time compilation infrastruc- frontend and quantum runtime library to set up the presentation ture for performant quantum-classical code generation and of the rest of the compiler architecture.

2 A. Quantum Intermediate Representation translations from operations in language-level dialects to op- Recent work has resulted in the development of a for- erations in a machine-level dialects like the LLVM IR. This mal specification for a Quantum Intermediate Representation infrastructure and corresponding workflow provide a flexible (QIR) embedded in the LLVM IR [5]. This specification does architecture for the development of compilation pathways not extend the LLVM IR with new instructions pertinent to taking language-level syntax trees down to a machine-level quantum computing, instead, it expresses quantum specific op- IR, like the LLVM. erations as declared function calls on opaque data types. Func- C. qcor tion declarations and corresponding signatures are defined for quantum memory allocation and deallocation, individual qubit qcor provides C++ [20] and Python [26] language exten- addressing, quantum instruction invocation on allocated qubits, sions for heterogeneous quantum-classical computing in an and utility functions for array management, tuple creation, and effort to promote native quantum kernel programming in a callable invocation. By describing qubits, measurement results, single-source context. Critically, qcor puts forward a com- and arrays as opaque types, and promoting function decla- piler runtime library that enables quantum program execution rations over concrete implementations, the QIR specification in a multi-modal, retargetable fashion. The execution model promotes a flexible approach to quantum-classical compiler enables two mode types of quantum instruction invocation architecture and integration. The approach represents a novel — nisq and ftqc modes (see QCOR Runtime Linkage target for language compilers — all of the LLVM toolchain in Figure 1). nisq mode supports runtime-level quantum becomes available, including runtime linking, compile time instruction queueing and flushing upon exit of a quantum classical optimizations, and external language interoperability. kernel function, implying a full quantum circuit submission Figure 2 demonstrates a LLVM IR code snippet adherent to on a remotely hosted quantum computer. ftqc mode models the QIR specification for the generation of a Greenberger- a tightly-coupled CPU-QPU integration model, and quantum Horne-Zeilinger (GHZ) maximally entangle state. Note the instructions are instead streamed as they are invoked, enabling declaration of opaque Array and Qubit types (lines 1,2) features like fast-feeback on qubit measurement results. The and the externally declared QIR runtime functions (lines 4-10). qcor runtime ultimately delegates to the XACC framework These are left for implementation by appropriate QIR runtime [21], and supports remote execution on IBM, Rigetti, IonQ, libraries, affecting actual execution of quantum instructions, and Honeywell quantum processors, and has support for array handling (including arrays of Qubit instances), and various simulators for the ftqc mode of execution. It is this data type actualization. The body of the code consists of runtime that plays a critical role in this work, as it providesa standard LLVM instructions (bitcast, call, etc.) and calls target for our QIR runtime library implementation. Moreover, to the declared QIR runtime functions. These functions are we have provided an entrypoint to OpenQASM compilation concretely provided at link time via the runtime library. natively as part of the qcor command line executable. B. MLIR III. EXTENDING OPENQASM 3 Moving up the IR abstraction hierarchy in Figure 1, recent OpenQASM version 3 has recently been put forward as a developments in classical compilation research and develop- formal specification [13], and a extended Bachus-Naur form ment has resulted in the MLIR [17]. The MLIR represents a description of the language has been made public as an modular and extensible approach to defining custom compiler ANTLR grammar file. This language departs from the previous IRs that can express a spectrum of language abstraction version (version 2.0) in the introduction of classical control (language-level IR down to the machine-level LLVM IR). At flow and variable declarations, making version 3 much more its core, the framework puts forward the concept of a language friendly to hybrid quantum-classical programming. The lan- dialect which is composed of language-specific operations. guage provides standard quantum instruction calls, but enables These operations are the core abstractional unit in the MLIR, more complex quantum circuit synthesis via ctrl, adj, and and they model a unique mapping of operands to return values, pow quantum gate modifiers. Standard while and range- and can optionally carry a dictionary of compile-time metadata based for loops are also allowed, as well as the conditional (attributes). Operands and return values are modeled as an if-else block. mlir::Value type, which describes a computable, typed While our work seeks to ensure that our compiler implemen- value and its corresponding set of users (enabling one to tation is fully compatible with the base grammar specification construct standard use-define chains). The creation of dialects for OpenQASM, we also are in a unique position to enhance it provides a mechanism for mapping language ASTs to a corre- with features pertinent to the qcor compiler platform and its sponding MLIR operation tree composed of language-dialect- user base. We envision the language and compiler presented in specific operations alongside other utility dialect operations this work as a novel language extension for the qcor quantum (standard function calls, memory references and allocation, for kernel programming model, i.e. enabling users of qcor to loops, conditional statements, etc.). Moreover, the framework program quantum kernels using our extended OpenQASM puts forward a general progressive lowering capability — language. We seek extensions that (while remaining backwards incremental translation of higher dialect operations to lower compatible) enable a more C-like syntax, C-like primitive type level dialect operations. This enables one to define custom declarations, and common quantum programming patterns

3 (a) Compute-Action ANTLR Grammar Addition. architecture lowers the MLIR down to the LLVM IR. At this

1 compute_action_stmt level, the generation of executable code and linking with a 2 : 'compute' compute_block=programBlock valid QIR runtime implementation is readily accomplished 3 'action' action_block=programBlock ; with standard assemblers and linkers. (b) OpenQASM code leveraging the compute-action statement. A. Symbol Table 1 qubit q[4]; 2 let bottom_three = q[1:3]; The symbol table is the data structure used by the com- 3 compute { piler to cache information about each observed symbol (e.g., 4 rx(1.57) q[0]; variable name, its type, its constness, etc.). Since OpenQASM 5 h bottom_three; 6 for i in [0:3] { allows for local variables, the symbol table becomes critical for 7 cnot q[i], q[i + 1]; tracking metadata about the variable symbol and its subsequent 8 } 9 } action { use. In other words, when processing each statement, the 10 rz(2.2) q[4]; compiler, via the symbol table, is aware of the context of all 11 } visible symbols, i.e., those from this scope and those above, to perform proper name lookup. Therefore the symbol table Fig. 3: OpenQASM language extension for compute- provides a mechanism to validate various semantic errors, such action-uncompute pattern. as illegal operations for a specific variable type or referring to out-of-scope variables, which could not be detected by syntactic considerations. already present in the qcor language. To start, we have We have implemented a symbol table that is composed extended the grammar to provide familiar typedefs for 32 of an array of scope-indexed hash maps. Each map is a and 64 bit integers and floats, Specifically, we parse int as lookup table from variable name to the corresponding MLIR int[32], int64_t as int[64], float as float[32], mlir::Value instance representing the variable. Name and double as float[64]. We have also updated the lookup is performed from the current scope upward (to parent grammar and implemented the parser to handle both range- scopes) to find the first match, i.e., the one in the nearest parent based C-like for statements as well as the usual for state- scope. Another utility of the symbol table is the compile- ment with with initializer, conditional expression, and iteration time evaluation of constant expressions. OpenQASM supports expression. constant integer and floating-point variable declarations (via Finally, we see an opportunity to enable syntax and seman- the const keyword). The symbol table tracks these constant tics for specific compile-time optimizations. We have updated values and provides a utility to evaluate simple math expres- the OpenQASM grammar with support for the ubiquitous sions1 involving these constants at compile time, if possible. compute-action-uncompute pattern, first demonstrated Critically, the compiler relies on the symbol table to track in [30]. Given the common pattern W = UVU †, the new qubit use-define chains for quantum instruction operations syntax enables one to express U and V as the code in the (typical quantum gate invocations). We have designed our compute and action scopes, respectively, and the compiler quantum instruction operation in the quantum MLIR dialect auto-generates the U † code after application of the compute, extension to adhere to the value semantics representation first action segments. This is demonstrated in Figure 3a (addition described in [14], whereby quantum operations consume one to OpenQASM grammar) and 3b (example usage), and is or many qubit mlir::Value instances and produce one not only useful for readability, but it also gives the compiler or many new mlir::Value instances as operation return implementation the opportunity to generate optimal quantum types. Using the underlying pointer to the MLIR variable code in the situation where the programmer wants to program (mlir::Value) as the lookup key, the symbol table re- W controlled on the state of another qubit. In this case, the places input qubit operands with the newly-created output compiler synthesizes U ctrl-VU † as opposed to the naive ctrl- mlir::Value. Therefore, the OpenQASM source code is U ctrl-V ctrl-U †, thereby leading to less multi-qubit operations compiled into the MLIR representation with explicit use-define and shorter depth quantum programs. chains for qubits amendable to compile-time optimization techniques similar to the DAG representation of quantum IV. COMPILER ARCHITECTURE circuits. The architecture we put forward starts with the definition of a frontend parser for the OpenQASM language. This parser B. ANTLR Parser produces an AST and we provide custom tree walkers that Our compiler implementation leverages the ANTLR [28] traverse the tree and construct a corresponding MLIR repre- (ANother Tool for Language Recognition) toolchain to gen- sentation. To handle variable data and scoping, we introduce erate the compiler frontend as depicted in Figure 4. With our a custom symbol table, enabling the tracking of variable extended OpenQASM grammar as input, ANTLR generates use-define chains as well as scope visibility. Our MLIR the corresponding lexer and parser utilities capable of scanning representation is amenable to general transformation, and we leverage this for quantum-level optimizations. Ultimately, our 1using the C++ Mathematical Expression Toolkit (exprtk) Library

4 Source.qasm numbers, and arrays) are mapped to QIR types (Qubit and Array) or memory-referenced (memref) MLIR Standard di- ANTLR grammar i1 i8 qasm3Lexer alect types (e.g., for boolean bits, for 8-bit integers, etc.) OpenQASM.g4 Classical math operations are converted to the corresponding instructions from the MLIR Standard dialect, such as addi Auto-generated Token Stream or cmpi for integer addition or comparison, respectively. Importantly, OpenQASM for loops are transformed into an qasm3Parser Error Listener AffineForOp [24] (MLIR affine dialect) amenable to future classical optimization passes, such as loop unrolling. Intrinsic quantum gates are converted to value-semantics ANTLR AST Syntax Errors! quantum operations (static single assignment form) of the quantum dialect, as shown in Table I. As described in qasm3Visitor Sec. IV-A, we use the symbol table to track the qubit operands (as opaque mlir::Value pointers) and replace them with MLIR Op Tree the new values created by each value-semantics quantum gate operation. In other words, each qubit SSA variable (shown Fig. 4: ANTLR-based compiler front-end: lexer and parser as %k in Table I) will only be assigned and used once, thus are auto-generated from the extended OpenQASM grammar allowing us to trace gate operations on each qubit line. This (ANTLR grammar file). An error listener is attached to the is to explicitly define the use-define chains, which we can parser to capture and report any syntax violations encountered leverage in downstream quantum optimizations. during parsing. The qasm3Visitor visits each node of Another key feature of OpenQASM is the ability to express the ANTLR AST constructing the MLIR syntax tree in the quantum gate modifiers (e.g., controlled or adjoint) for both standard, affine, and quantum dialects. intrinsic gates and subroutines. Our compiler implementation takes a pragmatic approach by rewriting modifiers into scoped regions with dedicated MLIR marker operations as shown in and parsing source strings according to the provided grammar Table I. These operations are effectively no-ops, but indicate rules. The compiler frontend produces an AST representing to the runtime that the following region of quantum operations the input source code against the set of syntactic rules in the is to be handled differently. For example, for the ctrl marker grammar. For instance, a valid OpenQASM loop (matching a (q.ctrl_region), the operations within that region should syntax rule named loopStatement) will be parsed into a be processed to synthesize the controlled version of that LoopStatementContext AST node along with all nested composite operation. We preserve the high-level semantics sub-nodes, e.g., the loop termination conditions and the loop of these modifiers at both the MLIR and latter LLVM IR body. ANTLR also generates a base AST visitor interface for levels (see IV-D2) rather than trying to perform compile- each grammar file, which includes all possible AST node types time gate synthesis. Ultimately, the evaluation and synthesis that the parser may produce. The AST visitor is the mechanism of these modifier-enclosed blocks will be performed by QIR- we leverage to transform the raw OpenQASM syntax tree compatible runtime implementation. into the MLIR representation as we will discuss in the next To handle the nested, recursive nature of the Open- subsection. QASM syntax tree, the compiler uses a multi-layer visit- While processing the input source, the parser may throw ex- ing strategy whereby a standalone visitor-like utility, named ceptions indicating syntactic or semantic errors. The compiler qasm3_expression_generator, is provided to traverse implements the ANTLR BaseErrorListener interface and process sub-expression nodes in-place, if necessary. To (see Figure 4) to catch these potential issues and report them give an example, when visiting a for loop with a math to users with detailed information, such as the location of expression as its upper bound, the main AST visitor would use offending characters. this qasm3_expression_generator to handle this sub- C. Visitor Handlers expression (converting the math expression to a MLIR equivalent) and then take the resulting value (as a mlir::Value) Once the valid OpenQASM source has been transformed to construct the current MLIR for loop. into the ANTLR AST, the compiler traverses each node in the AST in a depth-first manner producing the equivalent D. Progressive Lowering MLIR tree using the standard (operations for classical control flow and memory references), affine (operations for looping), After visiting all the ANTLR AST nodes representing the and quantum dialects (our contribution modeling quantum input OpenQASM program, the compiler has constructed a operations). Table I summarized OpenQASM-MLIR rewrite MLIR code in the quantum, affine, standard, and built-in patterns for important OpenQASM constructs. dialects. As depicted in Figure 5, this is the first stage of In particular, quantum types (qubit and qreg) and clas- a progressive, multi-stage IR transformation and lowering sical types (boolean, variable-width integer or floating-point pipeline that produces an optimized executable.

5 TABLE I: OpenQASM to MLIR Constructs OpenQASM MLIR Quantum Types Qubit Register qubit qubit_array[20]; %0 = q.qalloc(20) { name = qubit_array } : !quantum.Array Classical Types Bits bit[20] bit_array; %0 = alloca() : memref<20xi1> Integers int[16] short_int; %0 = alloca() : memref Floating point numbers float[32] sp_float; %0 = alloca() : memref Global Constants const shots = 1024; global_memref "private" constant @shots : memref = dense<1024> Quantum Instructions Gates ry(theta) q; %4 = qvs.ry(%2, %3) : !quantum.Qubit Note: %2: !quantum.Qubit; %3: f64 Measurements b = measure q; %7 = q.mz(%6) : !quantum.Result %9 = q.resultCast(%7) : i1 store %9, %4[] : memref Note: %6: !quantum.Qubit; %4: memref Classical Operations a += 4; %c4_i64 = constant 4 : i64 %1 = load %0[] : memref %2 = addi %1, %c4_i64 : i64 store %2, %0[] : memref Note: %0: memref (represents a variable) Branching if (i == 5) {...} %c5_i64 = constant 5 : i64 %1 = load %0[] : memref %2 = cmpi "eq", %1, %c5_i64 : i64 cond_br %2, ˆbb1, ˆbb2 ˆbb1: // pred: ˆbb0 br ˆbb2 ˆbb2: // 2 preds: ˆbb0, ˆbb1 Note: %0: memref (represents i variable) Looping for i in [0:10] {...} affine.for %arg0 = affine_map<(d0) -> (d0)>(%1) to affine_map<(d0) {...} (d0)>(%0) <- →֒ Note: %0: index and %1: index represent the constant values of 10 and 0, respectively. Subroutines def foo(float[64]:theta) func @foo(%arg0: f64, %arg1: !quantum.Array) { qubit[2]:q {...} ... return }

foo(theta) qq; call @foo(%2, %0) : (f64, !quantum.Array) -> () Note: %2: f64 and %0: !quantum.Array represent the theta variable and the q qubit array, respectively. Extern functions extern foo(float[64]) -> func private @foo(f64) -> f64 float[64]; Modiﬁers Adjoint inv @ phase(pi) q; q.adj_region { %2 = qvs.phase(%1, %cst) : !quantum.Qubit }

Controlled ctrl @ oracle q[0], q[1]; q.ctrl_region { %3 = call @oracle(%2) : (!quantum.Qubit) -> !quantum.Qubit } (ctrl_bit = %1)

Power pow(8) @ foo q; q.pow_u_region { %c8_i64 = constant 8 : i64 %2 = call @foo(%1) : (!quantum.Qubit) -> !quantum.Qubit } (pow = %c8_i64)

Aliasing Slicing let slice = reg[0:2:12]; %1 = q.qarray_slice(%0, %c0_i64, %c2_i64, %c12_i64) : quantum.Array! →֒ Concatenation let concat = reg1 || reg2; %2 = q.qarray_concat (%0, %1) : !quantum.Array

Next, we perform a set of optimization passes at the related operations have been lowered to QIR functions. The MLIR level whereby control flow constructs (e.g., those from final lowering to LLVM IR and any built-in LLVM optimiza- the affine dialect) and the static single assignment (SSA) tions (e.g., -O3 optimization) are provided by the MLIR- form of quantum instructions in the quantum dialect are LLVM infrastructure, producing binary executables targeting suitable for static optimization procedures. The optimized the QIR runtime along with the classical compute ISA (e.g., MLIR code is lowered to the LLVM dialect via the MLIR x86/Arm/OpenPC depending on the target platform). ConversionPattern utility. At this point, all quantum-

6 Optimization TABLE II: List of MLIR Passes Pass Name Descriptions Identity Pair Simplify or remove redundant quantum instructions. Lower MLIR -O3 LLVM Binary OpenQASM MLIR Removal For example, this pass removes any gates immediately LLVM IR Exe followed by their adjoints, such as pairs of X-X, T - Quantum Dialect LLVM Dialect QIR + LLVM T †, or CNOT -CNOT gates on the same qubits. Affine Dialect Repeated qubit reset instructions are also simplified. ANTLR Standard Dialect Rotation Merg- Combine consecutive mergable quantum instructions, AST ing e.g., Z and Rz(θ), Rx(θ1) and Rx(θ2), etc. Gate Sequence Find a sequence of consecutive compile-time constant Fig. 5: Compilation pipeline: The ANTLR-based frontend Simplification gates and simplify if possible, i.e., resynthesize to parses the OpenQASM source string into an Abstract Syntax fewer gates. For example, H-T -H gate sequence can Tree (AST) data structure. By processing (visiting) the AST, be simplified to Rx(π/4). Qubit Extract Merge duplicate qubit extracts from registers with we generate a MLIR representation using a couple of dialects, Lifting compile-time constant indices. This pass also unifies most importantly, our quantum dialect. A set of optimization the SSA use-define chain after loop unrolling (loop passes can be applied at this stage to simplify the MLIR tree induction variable as qubit array index) and function inlining. before it is lowered to the LLVM dialect. At this stage, the Gate Permuta- Permute gates that are commutative, e.g., Rz on the IR tree only contains valid LLVM instructions including QIR- tion control qubit of a CNOT gate. Despite no immediate adherent function calls and types. Standard LLVM optimiza- benefit (no gate count reduction), this pass might produce optimization opportunities for others, such as tion can be applied when the LLVM dialect is lowered to rotation merging. bitcode, e.g., the -O3 LLVM optimization flag. Lastly, the Constant Prop- Propagate global constants, e.g., constant integer val- LLVM IR bitcode is compiled to binary executable by linking agation ues as loop counts or constant floating points as rotation angles. in a compatible QIR runtime implementation. Dead Code Eliminate unused operations (dead code). For example, Elimination qreg allocation whose result is never used can be (DCE) eliminated. These dead values may emerge as a result 1) Optimization Passes: Figure 6 illustrates the MLIR- of other passes. level optimization pipeline that we have implemented in the OpenQASM compiler. Specifically, we combine optimization techniques from both classical and quantum programming, circuit optimization, are applied multiple times in a loop to such as function inlining, loop unrolling, and various quantum make sure that we can pick up new optimizing patterns that circuit optimization procedures. Table II lists the quantum emerged thanks to the code rewrite of previous passes. optimization passes that we have implemented for the MLIR In Figure 7, we demonstrate the MLIR transformation along operations in our quantum dialect. Inlining and loop unrolling the optimization pipeline for a simple OpenQASM source code are built-in MLIR passes for the standard and affine dialects, (Figure 7a). This code contains a subroutine definition and respectively. later invocation, which is compiled to the MLIR CallOp (line The pseudocode in Algorithm 1 illustrates a typical MLIR 2 in Figure 7b). This call is then inlined (Figure 7c, line 4-6), optimization pass based on dataflow analysis. Specifically, resulting in a CNOT-CNOT identity pair which is removed by we show the procedure to perform quantum gate merging the identity pair removal pass (Figure 7d). What is left after on the MLIR AST tree. By adhering to value-semantics for this step is a sequence of unused operations, such as extracting the quantum operations, we are able to follow the use-define qubit addresses and the qreg allocation itself. These are all chain of each quantum instruction (shown as the User map dead code, hence removed by the final DCE pass as shown in in Algorithm 1). With the SSA dataflow information, we Figure 7e. can query the next quantum operation on the qubit line and 2) Dialect Conversion and Lowering: After simplifying the check whether a gate-merge opportunity exists. For illustration MLIR tree with optimization passes such as those listed in purposes, we depict the gate merging procedure as two black- Table II, the compiler will lower the MLIR representation box functions, CanMerge and Merge, implementing check- to LLVM progressively, as depicted in Figure 5. This low- ing and gate generation procedures. The new injected gate ering procedure is similar to the one described in [19] for operation will have its input and output SSA values bridging OpenQASM 2 compilation. We implemented a collection of those two original instructions (line 10 and 11 in Algorithm 1). mlir::ConversionPatterns to perform the conversion Each optimization pass operating on the MLIR operation tree from quantum dialect to the LLVM dialect targeting the QIR maintains the SSA value chain as they transform the IR. specification. We also want to note that this optimization procedure, as As compared to the work in [19], the lowering pipeline well as others listed in Table II, is most effective when the of the qcor compiler has been enhanced with (1) dialect AST is a flat linear region whereby the use-define chain is conversion from affine to LLVM branch-based CFG (control uninterrupted (e.g., due to subroutine calls or loops). There- flow graph) representation and (2) conversion pattern im- fore, it is crucial to have loop unrolling and function inlining plementations for new quantum dialect operations. The first passes applied beforehand as shown in Figure 6. In this pass one stems from the fact that we utilize operations from the pipeline, some passes, especially those performing quantum affine dialect to handle control flows (e.g., for loops) in

7 Constant Identity Pair Gate Inliner Loop Unroll Gate Merging Others DCE Propagation Removal Permutation

Fig. 6: Optimization pass pipeline. Optimization passes simplifying quantum gate operations are repeated a set number of times.

ALGORITHM 1 Gate Merging Optimization __quantum__qis__INSTNAME(Qubit*,...)) during Vars: the lowering stage, they are compatible with the existing • ops : [VSOp] (Sequence of value semantics ops) QIR intrinsic quantum gates in the runtime. Key extensions • Users: VSOp 7→ [VSOp] (use-define trace mechanism) to the QIR runtime to support OpenQASM are the region • CanMerge: (VSOp, VSOp) 7→ B (Mergeable check) marker functions to implement gate modifier concepts, such • Merge: (VSOp, VSOp) 7→ VSOp (Create merge op) as controlled (ctrl) or adjoint (inv). Specifically, when Gate Merging: a quantum gate or subroutine is subjected to a modifier directive, the compiler injects the corresponding runtime 1: dead_ops: [VSOp] ← [] functions before and after the modified operation. For ex- 2: for op ∈ ops do ample, __quantum__rt__start_adj_u_region and 3: if op ∈ dead_ops then __quantum__rt__end_adj_u_region are functions to 4: Skip to next op denote a region of code whereby the inverse (adjoint) of 5: end if the collected quantum sequence generated within should be 6: if Length(Users(op)) == 1 then applied. Once again, we take a pragmatic approach in imple- 7: next_op ← Users(op)[0] menting the gate modifier feature of the OpenQASM language 8: if CanMerge(op, next_op) then by delegating the modified circuit realization to the runtime. 9: merged_op ← Merge(op, next_op) By invoking the wrapped code region at runtime in a special 10: merged_op.input ← op.input instruction collection mode, we can retrieve the flattened se- 11: merged_op.output ← next_op.output quence of gates and thus construct the corresponding modified 12: Add merged_op to IR tree circuit (e.g., adjoint or controlled) for backend execution. 13: Append op and next_op to dead_ops Finally, the QIR runtime environment can be further special- 14: end if ized via compilation flags or executable invocation arguments. 15: end if For instance, specific hardware or simulator backend (qpu) 16: end for can be selected, and the quantum runtime can be configured 17: for op ∈ dead_ops do to run in a tightly-coupled execution mode (simulator-only) 18: Erase op from IR tree whereby the dynamical measurement-controlled branching is 19: end for fully supported.

V. DEMONSTRATION OpenQASM. The latter involves lowering procedures for new quantum dialect operations for gate modifiers (see Table I) as In this section, we demonstrate the utility and performance well as the new quantum value semantics instruction. MLIR of an MLIR-based compiler. We present a typical OpenQASM modifier-marked regions (ctrl, inv, or pow) are converted programming, compilation, and execution workflow using the to the calls to corresponding quantum runtime functions at the qcor infrastructure. The compilation speed is benchmarked beginning and the end of the scoped block. To lower the value- against a variety of comparable quantum compilers. Lastly, we semantics quantum operations (MLIR quantum dialect) to provide an example showing extensions to the OpenQASM memory-semantics LLVM QIR calls, the qubit SSA variables language provided by qcor. are mapped back to their root variable (in the use-define chain) A. Compilation and execution workflow by propagating the mlir::Value of input qubit operands to the corresponding outputs. Figure 8 shows an OpenQASM source code to compute the expectation value of Pauli operators (e.g., σX σX in this case) E. QIR Implementation and Linking in an arbitrary state prepared by a variational ansatz. The state- The last stage of the compilation workflow, as shown in preparation is expressed as a quantum subroutine (ansatz, Figure 5, involves the compilation of LLVM IR bytecode into lines 6-10). This is a prototypical procedure commonly used a binary object containing QIR function calls, that needs to be in the variational quantum eigensolver (VQE) algorithm. linked to a valid QIR runtime implementation to form an ex- A feature that we want to highlight is the fact that Pauli ecutable. As described in [19], qcor provides a QIR runtime expectation accumulation is explicitly expressed as a for loop library implementation backed by the XACC framework [21]. (lines 16-28). Note the h quantum instruction broadcasts Since quantum value-semantics operations are con- across all qubits in the register. In each iteration, we count the verted to memory-semantics function calls (e.g., void parity of measured bits to compute the average result across all

8 (a) OpenQASM source 1 2 OPENQASM 3; 1 def foo qubit[2]:qq { 3 include "stdgates.inc"; 2 cx qq[0], qq[1]; 4 3 } 5 const shots = 1024; 4 6 // State-preparation: 5 qubit q[2]; 7 def ansatz(float[64]:theta) qubit[2]:q { 6 foo q; 8 x q[0]; 7 cx q[0], q[1]; 9 ry(theta) q[1]; 10 cx q[1], q[0]; (b) Unoptimized MLIR 11 } 1 %0 = q.qalloc(2) { name = q } : !quantum.Array 12 2 call @foo(%0) : (!quantum.Array) -> () 13 def compute(float[64]:theta) qubit[2]:q -> } [c0_i64 = constant 0 : i64 ֒→ float[64% 3 4 %1 = q.extract(%0, %c0_i64) : !quantum.Qubit 14 bit first, second; 5 %c1_i64 = constant 1 : i64 15 float[64] num_parity_ones = 0.0; 6 %2 = q.extract(%0, %c1_i64) : !quantum.Qubit 16 float[64] result; 7 %3:2 = qvs.cx(%1, %2) : !quantum.Qubit, 17 for i in [0:shots] { ;quantum.Qubit 18 ansatz(theta) q! →֒ 8 q.dealloc(%0) 19 // Change measurement basis 9 return 20 h q; 21 // Measure (c) MLIR after inlining pass 22 first = measure q[0]; 23 second = measure q[1]; 1 %c0_i64 = constant 0 : i64 24 if (first != second) { 2 %c1_i64 = constant 1 : i64 25 num_parity_ones += 1.0; 3 %0 = q.qalloc(2) { name = q } : !quantum.Array 26 } 4 %1 = q.extract(%0, %c0_i64) : !quantum.Qubit 27 // Reset 5 %2 = q.extract(%0, %c1_i64) : !quantum.Qubit 28 reset q; 6 %3:2 = qvs.cx(%1, %2) : !quantum.Qubit, 29 } quantum.Qubit 30! →֒ 7 %6:2 = qvs.cx(%3#0, %3#1) : !quantum.Qubit, 31 // Compute expectation value - quantum.Qubit 32 result = (shots - num_parity_ones) / shots! →֒ ;q.dealloc(%0) ֒→ num_parity_ones / shots 8 9 return 33 return result; 34 } (d) MLIR after identity pair removal pass 35 36 float[64] theta, exp_val; 1 %c0_i64 = constant 0 : i64 37 qubit qq[2]; 2 %c1_i64 = constant 1 : i64 38 // Try a theta value: 3 %0 = q.qalloc(2) { name = q } : !quantum.Array 39 theta = 0.123; 4 %1 = q.extract(%0, %c0_i64) : !quantum.Qubit 40 exp_val = compute(theta) qq; 5 %2 = q.extract(%0, %c1_i64) : !quantum.Qubit 41 print("Avg = ", exp_val); 6 q.dealloc(%0) 7 return Fig. 8: OpenQASM example: Compute Pauli expectation (e) MLIR after DCE (σX σX ) after a state-preparation circuit (ansatz).

1 return

Fig. 7: MLIR optimization example. The input OpenQASM source code (a) is first compiled into the MLIR representation (b), which is then processed by a sequence of optimization passes. After the call to foo (b, line 2) is inlined (c), the back- translated into the MLIR representation consisting of opera- to-back CNOT pattern emerges; thus, both gates are removed tions from our quantum dialect (e.g., quantum gates) and the by the identity pair removal pass (d). Finally, the DCE pass standard dialect (e.g., branching and arithmetic instructions). eliminates the redundant qubit array allocation, constant value We can now transform this high-level IR to executable code declarations, and qubit extract calls (e) since they have no adhering to the QIR specification (see Figure 5). At the time of further use. writing, the execution of this type of tightly coupled quantum- classical program, whereby measurement feedforward is required, is only applicable to simulator backends of qcor’s runs (shots) in line 31. An abbreviated MLIR representation of ftqc runtime [20]. We anticipate that quantum hardware the compute subroutine in Figure 8 is depicted in Figure 9 providers will support this dynamical runtime model in the after all MLIR-level optimization passes have been applied. near future as OpenQASM becomes more mature and widely- For the sake of presentation, we only keep the high-level adopted. Importantly, in our workflow, the runtime implemen- structure of the MLIR printout (omitted regions are presented tation is linked in at the final phase of the compilation pipeline, as ellipses). thus could be provided interchangeably and dynamically to The semantics of the OpenQASM source code is faithfully target different accelerator targets.

9 1 func @compute(%arg0: f64, %arg1: !quantum.Array) -> Qiskit t|keti Q#-compile f64 { OpenQASM-compile OpenQASM-total →֒ 2 ... 3 br ˆbb1 3 4 ˆbb1: // 2 preds: ˆbb0, ˆbb5 10 5 %5 = load %4[] : memref 6 %6 = cmpi "slt", %5, %c1024_i64 : i64 7 cond_br %6, ˆbb2, ˆbb3 8 ˆbb2: // pred: ˆbb1 102 9 %7 = q.extract(%arg1, %c0_i64) : !quantum.Qubit 10 %8 = qvs.x(%7) : !quantum.Qubit 11 %9 = q.extract(%arg1, %c1_i64) : !quantum.Qubit 12 %10 = qvs.ry(%9, %arg0) : !quantum.Qubit Runtime [secs] 1 13 %11 = q.extract(%arg1, %c1_i64) : 10 quantum.Qubit! →֒ 14 %12:2 = qvs.cx(%11, %8) : !quantum.Qubit, quantum.Qubit! →֒ 15 .... 0 16 %23 = cmpi "ne", %21, %22 : i1 10 17 cond_br %23, ˆbb4, ˆbb5 18 ˆbb3: // pred: ˆbb1 10 20 30 40 50 19 ... 20 %28 = subf %27, %26 : f64 N qubits 21 ... 22 %37 = divf %34, %36 : f64 Fig. 10: Processing time of Trotter circuits for Heisenberg 23 ... Hamiltonian models (eq. 1) with a variable number of qubits. 24 return %39 : f64 Transpiler optimization level 3 and full peephole optimization 25 ˆbb4: // pred: ˆbb2 26 %40 = load %2[] : memref are applied for Qiskit and t|keti, respectively. For OpenQASM 27 %41 = addf %40, %cst_0 : f64 compilation, all MLIR-level optimization passes (see IV-D1) 28 store %41, %2[] : memref and LLVM -O3 optimization are applied. For Q#, QIR 29 br ˆbb5 30 ˆbb5: // 2 preds: ˆbb2, ˆbb4 generation mode is used to generate LLVM IR, which is 31 %42 = qvs.reset(%14) : !quantum.Qubit then optimized (-O3) by LLVM’s clang. Link time to 32 %43 = qvs.reset(%16) : !quantum.Qubit qcor’s QIR runtime is included for both OpenQASM and 33 %44 = load %4[] : memref 34 %45 = addi %44, %c1_i64 : i64 Q# compilation time. Each data point is the average of 10 35 store %45, %4[] : memref runs with standard deviation also plotted. OpenQASM total 36 br ˆbb1 time includes both compilation time and resources estimation Fig. 9: Truncated MLIR representation of the OpenQASM runtime (counting all executed gates). program in Figure 8. Here, we only show a simplified MLIR printout of the core deuteron function whereby most of the required to generate and execute binary executables in the code has been omitted for clarity. High-level classical control resource estimation mode, i.e., counting flattened quantum flow constructs such as the for loop and the if statement in gates, because it provides a mechanism to compare statically- Figure 8 are converted into LLVM-style CFG constructs and compiled executables against Python-based interpreted scripts operations such as blocks and branches. Thanks to MLIR- constructing the equivalent circuits. level optimization (sec. IV-D1), the ansatz subroutine in In Figure 10, we plot the compile data for Trotter circuits Figure 8 has been inlined into the deuteron body as shown simulating the generic Heisenberg Hamiltonian model of the as qvs.x, qvs.ry, and qvs.cx operations. Arithmetic op- form erations are translated to MLIR operations from the Standard H = −h Xi − Jz ZiZi+1. (1) X X dialect, such as addf, subf, divf (floating-point numbers) i i or addi, cmpi (integers). The circuit is constructed by ‘for’ loops over a fixed number of Trotter steps (steps = 100, step size (dt) = 0.01) with a variable number of qubits from 5 to 50. In other words, we B. Compiler performance construct the circuit representing the unitary 2 In this section, we benchmark the compilation and resource 100 estimation runtime of the OpenQASM compiler against a set U = exp(−hdtXi) exp(−JzdtZiZi+1) , (2) YY Y of different quantum programming languages and frameworks, k=1 i i 3 4 5 such Q# , Qiskit , andt|keti . We benchmark the total runtime where each Pauli exponential term is converted to an equivalent gate-based sub-circuit. These sub-circuits are repeated 2Set-up: Intel Xeon CPU E5-2698 v4 @ 2.20GHz running Linux Debian at each time step to simulate the Trotterized evolution of the 10 distribution. 3Microsoft Quantum Development Kit 0.18.2107153439 Heisenberg Hamiltonian. 4qiskit-terra 0.18.1 Quantum programming languages, such as Q# and Open- 5pytket 0.13.0 QASM, preserve the loop construct in their IR representation,

10 resulting in almost constant compilation time. We note that the 1 const nb_qubits = 5; number of qubits is a compile-time constant for both the Q# 2 def heisenberg_U() qubit[nb_qubits]:r { 3 and OpenQASM cases. The compiler may choose to unroll 4 // Extension-provided C-like data types these loops. 5 int nb_steps = 100; The compilation time of OpenQASM includes: (1) frontend 6 double step_size = .01; 7 double Jz = 1.0; parsing (ANTLR), (2) MLIR generation and optimization, 8 double h = 1.0; (3) lowering to LLVM IR, (4) LLVM optimization, and (5) 9 object code generation and linking. For Q#, we leverage 10 // -h*sigma_x layers 11 for step in [0:nb_steps] { the Microsoft Quantum Development Kit (QDK) to perform 12 // -h*sigma_x layers QIR LLVM generation, i.e., equivalent to steps (1)-(3) in our 13 rx(-h * step_size) r; OpenQASM workflow. The QDK-generated LLVM IR is then 14 15 // -Jz sigma_z sigma_z layers qcor * * optimized and linked with the runtime similar to steps 16 for i in [0:nb_qubits-1] { 4 and 5 using the standard LLVM toolchain. 17 compute { The results in Figure 10 highlight the need for statically- 18 cx r[i], r[i+1]; 19 } action { compiled quantum programming languages in order to de- 20 rz(-Jz * step_size) r[i + 1]; scribe large-scale programs. Imperative gate-by-gate construc- 21 } tion of quantum circuits using scripting languages, despite its 22 // Could be written manually like this 23 // No optimizations picked up flexibility and ease of use, does come with a significant perfor- 24 // cx r[i], r[i+1]; mance overhead. Importantly, our MLIR-based compiler for 25 // rz(-Jz * step_size) r[i + 1]; OpenQASM demonstrates improved performance compared 26 // cx r[i], r[i+1]; 27 } to other compilers, such as Q#. It is worth noting that the 28 } Q# language is much more feature-rich than OpenQASM, 29 } therefore requiring a more elaborated frontend and build 30 31 // Allocate the qubits system. In particular, initial Q# to QIR generation accounts 32 qubit r[nb_qubits], c; for the majority (80-90%) of the total Q# compilation time 33 in Figure 10. As of this writing, there are no other publicly 34 // Perform ctrl-U 35 ctrl @ heisenberg_U c, r; available OpenQASM compilers that we can compare our implementation with. Fig. 11: OpenQASM code defining a subroutine describing the C. Extensions for Optimal Code Generation unitary operation in Eq. 2. The use of compute-action from our grammar extension enables the compiler imple- Here we demonstrate the utility of our proposed extensions mentation to optimally synthesis controlled versions of this to the OpenQASM grammar specification with regards to subroutine. This code snippet also demonstrates the C-like optimal quantum code generation. Specifically, we show how extensions for primitive data types and for loops. the compute-action syntax enables the compiler implementation to generate optimal quantum instruction sequences in the presence of a ctrl gate modifier. As stated in Sec. III, and the adjoint or reverse of the compute instructions to † controlled operations on the W = UVU pattern only require the MLIR tree. Moreover, for instructions that are not in the controls applied to the operations in V . Via programmer intent action block, the compiler marks the added instructions — i.e. leveraging the custom compute {...} action with a flag to indicate that they are part of the compute {...} syntax — the compiler can optimally synthesize quan- or uncompute block. At runtime, this information is used to tum instructions adherent to this pattern. optimally synthesize controlled versions of this block of code. Take the Heisenberg Hamiltonian and corresponding time We benchmark the usage of compute-action vs manual evolution operator U in Eqs. 1 and 2. In the context of the programming of W and present the results in Fig 12. The re- quantum phase estimation algorithm, if we seek a correspond- sults show the number of controlled operations (CRZ, CNOT) ing eigenvalue of U with respect to some eigenstate |ψi, we present in the compiled quantum program for the manual will require the application of a series of controlled versions (commented lines 22-26) and compute-action (lines 17-21) of U. cases. One can clearly see that via programmer intent, the Figure 11 shows an OpenQASM subroutine describing compiler can optimally synthesize instruction sequences and the trotter evolution in Eq. 2. The second nested for loop improve on the resource utility of the compiled program. With (lines 18-29) could be written manually as the sequence this simple programming extension, programmers can pick up W = CX ⊗ RZ(Θ) ⊗ CX†, but by replacing it with a an order of magnitude in gate count reductions. compute-action block, we give the compiler an opportunity for optimal instruction synthesis under application of a VI. CONCLUSION ctrl modifier. Specifically, we have implemented an ANTLR We have presented an optimizing ahead-of-time Open- visitor handler that processes the compute-action source QASM compiler built on the MLIR framework. Our approach and adds the compute instructions, the action instructions, lowers OpenQASM codes to the LLVM IR in a manner that

11 Manual Compute / Action ACKNOWLEDGMENT This material is based upon work supported by the U.S. 105 Department of Energy, Ofﬁce of Science, National Quantum Information Science Research Centers. ORNL is managed by UT-Battelle, LLC, for the US Department of Energy under contract no. DE-AC05-00OR22725.

REFERENCES [1] Circuit synthesis via quantum gate modiﬁers. 104 https://qiskit.github.io/openqasm/language/gates.html#quantum-gate-modiﬁers. Accessed: 2021-08-30. [2] Introducing quantum intermediate representation (qir). https://devblogs.microsoft.com/qsharp/introducing-quantum-intermediate-representation-qir/. Accessed: 2021-01-15. [3] Mlir documentation. https://mlir.llvm.org. Accessed: 2021-01-15. [4] Qcor github. https://github.com/ornl-qci/qcor. Accessed: 2021-01-15.

N Controlled Operations (CX, CRZ) [5] Quantum intermediate representation (qir) specification. https://github.com/microsoft/qsharp-language/tree/main/Specifications/QIR. Accessed: 2021-01-15. 10 20 30 40 50 [6] Koen Bertels, J.W. Hogaboam, and Carmen G. Almudever. Quantum N Qubits computer architecture: a full-stack overview. Microprocessors and Microsystems, 66:21–66, 2019. Fig. 12: The number of controlled operations (CX and CRZ) [7] Benjamin Bichsel, Maximilian Baader, Timon Gehr, and Martin Vechev. Silq: A high-level quantum language with safe uncomputation and intu- present in the compiled representation of Figure 11 for a given itive semantics. In Proceedings of the 41st ACM SIGPLAN Conference number of qubits. The Manual plot represents a program that on Programming Language Design and Implementation, PLDI 2020, sequentially lists the instruction for the pattern W = UVU †, page 286–300, New York, NY, USA, 2020. Association for Computing Machinery. while the other plot represents the case whereby a programmer [8] Keith A Britt and Travis S Humble. High-performance computing with expresses this pattern via the compute-action syntax from quantum processing units. ACM Journal on Emerging Technologies in our grammar extension. Computing Systems (JETC), 13(3):1–13, 2017. [9] Andrew W. Cross, Ali Javadi-Abhari, Thomas Alexander, Niel de Beau- drap, Lev S. Bishop, Steven Heidel, Colm A. Ryan, John Smolin, Jay M. Gambetta, and Blake R. Johnson. Openqasm 3: A broader and deeper quantum assembly language, 2021. is adherent to the QIR specification. We provide quantum [10] E. F. Dumitrescu, A. J. McCaskey, G. Hagen, G. R. Jansen, T. D. Morris, T. Papenbrock, R. C. Pooser, D. J. Dean, and P. Lougovski. Cloud circuit optimization passes at the MLIR level and leverage quantum computing of an atomic nucleus. Phys. Rev. Lett., 120:210501, existing classical optimization passes present in both MLIR May 2018. and LLVM. Our work extends the OpenQASM grammar [11] Tobias Gysi, Christoph Müller, Oleksandr Zinenko, Stephan Herhut, Eddie Davis, Tobias Wicky, Oliver Fuhrer, Torsten Hoefler, and Tobias with support for C-like constructs and the compute-action- Grosser. Domain-specific multi-level ir rewriting for gpu, 2020. uncompute pattern for efficient programming and compile- [12] Kathleen E. Hamilton, Eugene F. Dumitrescu, and Raphael C. Pooser. time optimizations. Targeting the QIR enables one to swap Generative model benchmarks for superconducting qubits. Phys. Rev. A, 99:062323, Jun 2019. runtime library implementations enabling a write once run [13] IBM. Openqasm 3.x live specification, 2021. anywhere characteristic. We have provided a runtime library https://qiskit.github.io/openqasm/. implementation of the QIR specification that is backed by the [14] David Ittah, Thomas Häner, Vadym Kliuchnikov, and Torsten Hoefler. Enabling dataflow optimization for quantum programs, 2021. XACC quantum programming framework, thereby enabling [15] Ali JavadiAbhari, Shruti Patil, Daniel Kudrow, Jeff Heckey, Alexey OpenQASM compilation that targets quantum computers from Lvov, Frederic T. Chong, and Margaret Martonosi. Scaffcc: Scalable IBM, Rigetti, Honeywell, IonQ, as well as simulators that scale compilation and analysis of quantum programs. Parallel Computing, 45:2 – 17, 2015. Computing Frontiers 2014: Best Papers. from laptops to large-scale heterogeneous high performance [16] Tian Jin, Gheorghe-Teodor Bercea, Tung D. Le, Tong Chen, Gong computers, like Summit. Moving forward, our approach opens Su, Haruki Imai, Yasushi Negishi, Anh Leu, Kevin O’Brien, Kiyokuni up the possibility of true language integration at the LLVM IR Kawachiya, and Alexandre E. Eichenberger. Compiling onnx neural network models using mlir, 2020. level. We envision a number of language approaches that map [17] Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy to the LLVM IR adherent to the QIR specification, and via Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasi- simple runtime linking, enabling the integration of quantum lache, and Oleksandr Zinenko. Mlir: A compiler infrastructure for the end of moore’s law, 2020. language A with code from quantum language B. As of this [18] Alexander McCaskey, Eugene Dumitrescu, Dmitry Liakh, and Travis writing, the integration of Q#, qcor C++, and OpenQASM Humble. Hybrid programming for near-term quantum computing sys- is now possible. We also envision this work as an alternative tems. In 2018 IEEE International Conference on Rebooting Computing (ICRC), pages 1–12. IEEE, 2018. mechanism for embedded C++ quantum kernels in qcor, [19] Alexander McCaskey and Thien Nguyen. A mlir dialect for quantum departing from the existing Clang Syntax Handler source assembly languages, 2021. preprocessing. Future work will investigate true quantum [20] Alexander Mccaskey, Thien Nguyen, Anthony Santana, Daniel Claudino, Tyler Kharazi, and Hal Finkel. Extending c++ for heterogeneous language integration and compile-time embedding of MLIR- quantum-classical computing. ACM Transactions on Quantum Com- to-LLVM processing in the qcor C++ language extension. puting, 2(2), July 2021.

12 [21] Alexander J McCaskey, Dmitry I Lyakh, Eugene F Dumitrescu, Sarah S 19 Powers, and Travis S Humble. XACC: a system-level software in- 20 def Z_op(idxs, n_qubits): frastructure for heterogeneous quantum–classical computing. Quantum 21 op = None Science and Technology, 5(2):024002, feb 2020. 22 if 0 in idxs: [22] Alexander J. McCaskey, Zachary P. Parks, Jacek Jakowski, Shirley V. 23 op = Z Moore, Titus D. Morris, Travis S. Humble, and Raphael C. Pooser. 24 else: Quantum chemistry as a benchmark for near-term quantum computers. 25 op = I npj Quantum Information, 5(1):99, 2019. 26 for i in range(1, n_qubits): [23] Tiffany M Mintz, Alexander J Mccaskey, Eugene F Dumitrescu, 27 if (i in idxs): Shirley V Moore, Sarah Powers, and Pavel Lougovski. Qcor: A language 28 op ˆ= Z extension specification for the heterogeneous quantum-classical model 29 else: of computation. arXiv preprint arXiv:1909.02457, 2019. 30 op ˆ= I [24] MLIR. ’affine’ dialect. https://mlir.llvm.org/docs/Dialects/Affine/. 31 return op [25] Yunseong Nam, Neil J. Ross, Yuan Su, Andrew M. Childs, and Dmitri 32 Maslov. Automated optimization of large quantum circuits with con- 33 def heisenberg_ham(n_qubits): tinuous parameters. npj Quantum Information, 4(1):1–12, May 2018. 34 Jz = 1.0 Bandiera abtest: a Cc license type: cc by Cg type: Nature Research 35 h = 1.0 Journals Number: 1 Primary atype: Research Publisher: Nature Pub- 36 H = -h * X_op([0], n_qubits) lishing Group Subject term: Computer science;Quantum information 37 for i in range(1, n_qubits): Subject term id: computer-science;quantum-information. 38 H=H-h * X_op([i], n_qubits) [26] Thien Nguyen and Alexander J McCaskey. Extending python for 39 for i in range(n_qubits - 1): quantum-classical computing via quantum just-in-time compilation. 40 H = H- Jz * (Z_op([i, i + 1], n_qubits)) arXiv preprint arXiv:2105.04671, 2021. 41 return H [27] Thien Nguyen, Anthony Santana, Tyler Kharazi, Daniel Claudino, Hal 42 Finkel, and Alexander McCaskey. Extending c++ for heterogeneous 43 n_qubits = [10, 20, 50, 100] quantum-classical computing, 2020. 44 nbSteps = 100 [28] T. J. Parr and R. W. Quong. Antlr: A predicated-ll(k) parser generator. 45 n_runs = 10 Software: Practice and Experience, 25(7):789–810, 1995. 46 [29] Ethan Smith, Marc G. Davis, Jeffrey M. Larson, Ed Younis, Costin 47 def trotter_circ(q, exp_args, n_steps): Iancu, and Wim Lavrijsen. Leap: Scaling numerical optimization based 48 qc = QuantumCircuit(q) synthesis using an incremental approach, 2021. 49 for i in range(n_steps): [30] Damian S. Steiger, Thomas Häner, and Matthias Troyer. Projectq: an 50 for sub_op in exp_args: open source software framework for quantum computing. Quantum, 51 qc += PauliTrotterEvolution() 2:49, Jan 2018. 52 .convert(EvolvedOp(sub_op)) [31] Krysta Svore, Alan Geller, Matthias Troyer, John Azariah, Christopher 53 .to_circuit() Granade, Bettina Heim, Vadym Kliuchnikov, Mariia Mykhailova, Andres 54 return qc Paz, and Martin Roetteler. Q#: Enabling scalable quantum computing 55 and development with a high-level dsl. In Proceedings of the Real World 56 for nbQubits in n_qubits: Domain Specific Languages Workshop 2018, RWDSL2018, New York, 57 data = [] NY, USA, 2018. Association for Computing Machinery. 58 for run_id in range(n_runs): [32] Ed Younis, Koushik Sen, Katherine Yelick, and Costin Iancu. Qfast: 59 ham_op = heisenberg_ham(nbQubits) Quantum synthesis using a hierarchical continuous circuit space, 2020. 60 q = QuantumRegister(nbQubits, 'q') 61 start = time.time() APPENDIX A 62 comp = trotter_circ(q, ham_op.oplist, nbSteps) 63 comp = transpile(comp, optimization_level=3) BENCHMARK EXPERIMENTAL SETUP 64 end = time.time() Here, we list all the source codes used for the Trotter circuit 65 data.append(end - start) 66 print('n_qubits =', nbQubits, '; Elapsed time =', ('[benchmarking (Figure 10). The number of qubits in the Q# and ֒→ mean(data), '+/-', stdev(data), '[secs OpenQASM source codes represent a particular data point. We modify the number of qubits and recompile the source code B. t|keti script for the benchmark. 1 import time A. Qiskit script 2 from pytket.circuit import Circuit, PauliExpBox 3 from pytket.pauli import Pauli 1 from qiskit.aqua.operators import (X, Z, I, 4 from pytket.extensions.qiskit import EvolvedOp, PauliTrotterEvolution) ֒→ AerStateBackend →֒ 2 from qiskit import QuantumRegister, QuantumCircuit 5 from pytket.passes import FullPeepholeOptimise 3 from qiskit.compiler import transpile 6 from statistics import mean, stdev 4 from statistics import mean, stdev 7 nb_steps = 100 5 import time 8 step_size = 0.01 6 9 n_qubits = [5, 10, 20, 30, 40, 50] 7 def X_op(idxs, n_qubits): 10 n_runs = 10 8 op = None 11 for nb_qubits in n_qubits: 9 if 0 in idxs: 12 data = [] 10 op = X 13 for run_id in range(n_runs): 11 else: 14 # Start timer 12 op = I 15 start = time.time() 13 for i in range(1, n_qubits): 16 circ = Circuit(nb_qubits) 14 if (i in idxs): 17 h = 1.0 15 op ˆ= X 18 Jz = 1.0 16 else: 19 for i in range(nb_steps): 17 op ˆ= I 20 # Using Heisenberg Hamiltonian: 18 return op 21 for q in range(nb_qubits):

13 22 circ.add_pauliexpbox(PauliExpBox( 37 // Entry point: we allow the Q# program to have Pauli.X], -h * step_size), [q]) ֒→ full information (compile-time) about the] →֒ .for q in range(nb_qubits - 1): ֒→ number of qubits, steps, etc 23 24 circ.add_pauliexpbox(PauliExpBox( 38 @EntryPoint() } Pauli.Z, Pauli.Z], -Jz * 39 operation CircuitGen() : Unit] →֒ ;(step_size), [q, q + 1]) 40 HeisenbergTrotterEvolve(50, 1.0, 0.01 →֒ 25 41 } 26 # Compile to gates 42 } 27 backend = AerStateBackend() 28 circ = backend.get_compiled_circuit(circ) 29 D. OpenQASM source code

30 # Apply optimization 1 OPENQASM 3; 31 FullPeepholeOptimise().apply(circ) 2 32 3 const nb_steps = 100; 33 end = time.time() 4 const nb_qubits = 50; 34 data.append(end - start) 5 const step_size = 0.01; 35 6 const Jz = 1.0; 36 print('n_qubits =', nb_qubits, '; Elapsed time 7 const h = 1.0; →֒ =', mean(data), '+/-', stdev(data), 8 →֒ '[secs]') 9 qubit r[nb_qubits]; 10 for step in [0:nb_steps] { 11 // -h*sigma_x layers C. Q# source code 12 for i in [0:nb_qubits] { 13 rx(-h * step_size) r[i]; 1 namespace Benchmark.Heisenberg { 14 } 2 open Microsoft.Quantum.Intrinsic; 15 3 open Microsoft.Quantum.Canon; 16 // -Jz*sigma_z*sigma_z layers 4 open Microsoft.Quantum.Math; 17 for i in [0:nb_qubits - 1] { 5 open Microsoft.Quantum.Convert; 18 cx r[i], r[i+1]; 6 open Microsoft.Quantum.Arrays; 19 rz(-Jz * step_size) r[i + 1]; 7 open Microsoft.Quantum.Measurement; 20 cx r[i], r[i+1]; 8 // In this example, we will show how to 21 } { simulate the time evolution of 22 →֒ 9 // an Heisenberg model: 10 operation HeisenbergTrotterEvolve(nSites : Int, : simulationTime : Double, trotterStepSize →֒ } Double) : Unit →֒ 11 // We pick arbitrary values for the X and J couplings →֒ 12 let hXCoupling = 1.0; 13 let jCoupling = 1.0; 14 15 // This determines the number of Trotter steps →֒ 16 let steps = Ceiling(simulationTime / ;(trotterStepSize →֒ 17 18 // This resizes the Trotter step so that time evolution over the →֒ 19 // duration is accomplished. 20 let trotterStepSizeResized = simulationTime ;(IntAsDouble(steps / →֒ 21 22 // Let us initialize nSites clean qubits.

14 This figure "CompilationFlow.png" is available in "png" format from:

http://arxiv.org/ps/2109.00506v1 This figure "pass_uml.png" is available in "png" format from:

http://arxiv.org/ps/2109.00506v1