Evolution of standards over the few A brief overview of this course decades Advanced Fortran Programming concentrates on aspects The 1st version became a standard by John Backus related to the most recent versions of the Fortran (IBM/1957) standard (F2003/2008/2015) Mid-60’s the most significant version was labeled as an – some Fortran 95 less-known features are covered, too ANSI standard, called FORTRAN66 Our major F2003/F2008 subjects in this course are In 1978 the former “de-facto” FORTRAN77 standard was – Language interoperability approved and quickly became popular, contained e.g. – Object-oriented Fortran (OOF) – IMPLICIT statement – Parallel programming with Fortran coarrays – IF – THEN – ELSE IF – ELSE – ENDIF – CHARACTER data type – PARAMETER statement for specifying constants

1 2

Evolution of Fortran standards… Evolution of Fortran standards…

An important extension to FORTRAN77 was release of A major step in keeping Fortran programming alive was “military standard” Fortran enhancements (also in 1978) introduction of the Fortran 90 standard in ’91 (ISO), ’92 by US DoD, and adopted by most FORTRAN77 compilers (ANSI) – IMPLICIT NONE – Free format source input (up to 132 characters per line) – DO – END DO – Dynamic memory handling – INCLUDE statement Marked a number of features obsolescent (but did not – Bit manipulation functions delete any features), e.g. All these were eventually incorporated to the next major – Arithmetic IF and computed GOTO statements official standard release – Fortran 90 – Alternate RETURN, and use of H in FORMAT statements

3 4 Evolution of Fortran standards… Evolution of Fortran standards…

A minor revision in 1997 (ISO) some features were taken Fortran 2003 was a significant revision of the Fortran 95 from High Performance Fortran (HPF) specification, e.g. standard (introduced in 2004) – FORALL and nested WHERE clauses to improve vectorization – Object-oriented programming (OOP), e.g. – User-defined PURE and ELEMENTAL procedures . Type extension, (single) inheritance, polymorphism, … – Automatic DEALLOCATE when ALLOCATABLE out of scope – Parameterized derived types – Procedure POINTERs – POINTER and TYPE components default initializations – Interoperability with C (and other) languages Deleted some features previously marked as obsolescent – OS interfacing: command line arguments & env. variables – Use of non-INTEGERs as DO loop parameters – ALLOCATABLE improvements – H edit descriptor in FORMATs – POINTER improvements – ASSIGN and assigned GOTO – I0 edit descriptor for INTEGERs in FORMAT statements 5 6

Evolution of Fortran standards… Fortran 2008: A minor revision of Fortran 2003 (approved 2010) New features include – Coarrays – Submodules – CONTIGUOUS attribute – BLOCK construct – newunit= in OPEN statement – DO CONCURRENT construct – G0 edit descriptor for REALs in FORMAT statements

7 Outline

Interaction with the operating system – Accessing command line arguments and environment variables – Running operating system commands from Fortran Less known features of allocatable variables contiguous attribute Useful features since Fortran 90 Further scope control mechanisms, associate and block constructs, submodules

8 9

Command line arguments

Parameters to a program are very often given to programs as command line arguments – Input file(s), modify program behavior, etc. INTERACTION WITH THE OPERATING SYSTEM Besides command line arguments, environment variables are a common way to influence program behavior Fortran 2003 introduced a standardized method for – reading command line arguments – accessing values of environment variables

10 11 Command line arguments Command line arguments …

Access separate command line arguments Access the whole command line get_command_argument(number[,value][,length][,stat call get_command(command[,length][,status]) us]) – command is of type character string and contains the value – number denotes which argument to get of the command line on return. – value is a character string that contains the value of the – length is of type integer and contains the length of the requested argument on return (optional) command line on return (optional) – length contains the length of the requested argument on return (optional) Get the number of command line arguments integer :: command_argument_count()

12 13

Command line input Environment variables

Access a value of an environment variable Example: reading in subroutine read_command_line(height, width) integer, intent(out) :: height, width call get_environment_variable(name,value[,length] two integer values character(len=10) :: args(2) [,status][,trim_name]) from the command integer :: n_args, i n_args = command_argument_count() – name is a character string that contains the name of the line, e.g. if ( n_args /= 2 ) then write(*,*) ' Usage : ./exe height width' requested variable % ./a.out 100 100 stop end if – value is a character string that contains the value of the do i = 1, 2 requested variable call get_command_argument(i,args(i)) args(i) = trim(adjustl(args(i))) – length is the length of the requested variable on return end do (optional) read(args(1),*) height read(args(2),*) width – trim_name is of type logical and sets if trailing blanks are end subroutine read_command_line allowed in variable names or not (optional)

14 15 Environment variables: example Executing commands program environment Invoking external programs from within a program is implicit none character(len=256) :: enval sometimes useful integer:: len,stat – No source nor library API available for a useful program ! extract hostname – /python/etc parsing scripts call get_environment_variable('hostname',enval,len,stat) if (stat == 0) write (*,'(a,a)') 'host=', enval(1:len) Fortran 2008 has a standardized method for invoking an external command ! extract user call get_environment_variable('user',enval,len,stat) if (stat == 0) write (*,'(a,a)') 'user=', enval(1:len) end program environment

16 17

Executing commands Executing commands: example

Execute a command line program execcommand call execute_command_line(command[,wait] implicit none [,exitstat][,cmdstat][,cmdmsg]) integer :: estat, cstat call execute_command_line('ls -al', .true., estat, cstat) – command is a character string containing the command if (estat==0) write (*,'(a)') 'command completed successfully’ – wait is logical value indicating if command termination is to be end program execcommand waited (optional) – exitstat is an integer value containing the return value of the command if wait=.true. (optional) – cmdstat is zero if command executed successfully (optional) – cmdmsg is a character string containing explanatory message for positive values of cmdstat (optional)

18 19 Automatic deallocate

subroutine sub1 An allocatable variable, integer, allocatable :: a(:) allocate(a(1000)) which is declared as a local call use_array( a ) variable in procedure and LESS KNOWN FEATURES OF ALLOCATE ! deallocate(a) ! not needed gets allocated, will be end subroutine sub1 automatically deallocated subroutine sub2 upon returning back from integer, allocatable, save :: a(:) routine allocate(a(1000)) call use_array( a ) This is “out of scope” rule deallocate(a) ! is needed holds as long as variables do end subroutine sub2 not have save attribute as well

20 21

Allocatable derived data type component Allocatable as a dummy argument

program test module my_mod An allocatable variable can The Fortran 90 standard integer :: n type my_type introduced user-defined real, allocatable :: a(:) appear in the procedure real, allocatable :: array(:) call read_input(a, n) argument list end type my_type datatypes, but dynamic call use_array (a, n) end module my_mod allocation was restricted to deallocate(a) Enables dynamic memory end program test allocation in the called subroutine sub(n) pointers only routine based on sizing use my_mod The restriction was removed subroutine read_input(a, n) implicit none real, allocatable, intent(out) :: a(:) information integer, intent(in) :: n in Fortran 2003 integer, intent(out) :: n – Allocated (and perhaps type(my_type) :: t Allocation remains intact read *, n initialized) data is returned : even after going out of allocate(a(n)) allocate(t % array (n) ) read *, a to the caller : scope, if the type was ! no automatic deallocate here The caller must remember end subroutine sub defined in a module end subroutine sub to deallocate 22 23 Function return value as allocatable Assignment to an allocatable variable program pack A function return value can Assigning values to an real :: unpacked(1000) program test real, allocatable :: a(:) have the allocatable real, allocatable :: a(:), b(:) allocatable array via data or a = pack_data(unpacked) attribute real :: input(3) from another array triggers call use_packed_data(a) read *,input ! read 3 values from stdin deallocate(a) Works similarly as when a = input ! automatic allocate automatic allocation contains allocatable is a dummy ! a(:) = input(:) would be illegal ! – If array was already allocated, function pack_data(x) result (pk) print *,size(a) ! gives 3 but its size was not sufficient, real, allocatable :: pk(:) argument b = (/ 1,2,4,8,16 /) ! automatic alloc. real, intent(in) :: x(:) a = [ a, b ] ! automatic extent ok the extent size will be allocated integer :: npk print *,size(a) ! size now = 8 automatically npk = num_distinct_values(x) ! deallocate legal, but not necessary allocate(pk(npk)) May require additional deallocate(a, b) call get_distinct_values(x,...,pk,npk) end program test compiler flags, like –assume end function pack_data end program pack realloc_lhs (Intel) and –ew (Cray) 24 25

Allocatable scalars Transferring allocation : move_alloc

program test These are especially useful program test Provides allocatable character(:), allocatable :: ch, s with Fortran character strings, integer, allocatable :: a(:), b(:) equivalent of pointer integer :: ch_len whose length is decided at run allocate (a(1:5)) ; a(:) = 0 assignment: moves allocation ! explicit allocation time a(3) = 3 to another memory location ch_len = len('hello world!') print *, 'a=',a ! output: 0 0 3 0 0 allocate(character(ch_len) :: ch) Allocation can either explicit call move_alloc(a,b) As result previous allocation ch(:) = 'hello world!' ! a is now deallocated & b allocated ! implicit allocation or automatic (see previous becomes unallocated and new if (allocated(a)) print *, '>a=',a ! (s(:) = ... fails) slide) allocation holds previous data if (allocated(b)) print *, '>b=',b s = 'hello world!' print *,len(ch),len(s) ! prints 12 12 May require additional ! no need for explicit deallocates Arrays to (=A) and from (=B) ! deallocate legal, but not required compiler flags due to implicit deallocate(b) must have the same type and deallocate(ch, s) assignments of allocatables ! would be illegal: deallocate(a) rank end program test end program test

26 27 Improvements to the pointer assignments Improvements to the pointer assignments

It is possible to set the desired lower bounds of a pointer The target of a multi-dimensional array pointer can be a to any value, e.g. : one-dimensional array real, target :: state_budget(1917:2016) real, pointer :: memory(:), matrix(:,:), diagonal(:) real, pointer, dimension(:) :: a, b, c allocate(memory(n*n)) ! space for n-by-n matrix a => state_budget matrix (1:n, 1:n) => memory ! new stuff b => state_budget (1939:1945) diagonal => memory(::n+1) c(1939:) => state_budget (1939:1945) ! new After this the pointer array matrix operates across all the memory region that has been allocated, and the pointer diagonal refers to the diagonal elements

28 29

CONTIGUOUS attribute

Unit stride data access (contiguous) is significantly faster than a constant non-unit stride (non-contiguous) Typical optimization: use structure of arrays instead of CONTIGUOUS ATTRIBUTE arrays of structures In Fortran, having non-contiguous data is possible due to non-unit-stride array indexing, for example: vector(::3) ! Every third element

30 31 CONTIGUOUS attribute CONTIGUOUS attribute

In Fortran 2008, an assumed shape or a pointer variable Potential performance benefits can be declared contiguous – Array traversal and element address computation is simplified real, pointer, contiguous :: x(:) – Improved vectorization properties real, contiguous :: y(:,:) – No need to generate auto-dispatch code for contiguous data Testing contiguity separately logical :: is_contiguous(arr) – arr is an array of any type Simply contiguous: no need to declare the array as contiguous – the function returns .true. if arr is contiguous and .false. otherwise – A whole array of that is not of assumed shape or pointer – Continuous section of a contiguous array

32 33

ASSOCIATE construct

Deeply nested types have associate(x => some % nested % type %x, y => some % nested % type %y) a drawback of making the y = x + 1 code more difficult to SOME ADDITIONAL SCOPING FEATURES end associate understand ! Compare with ! some % nested % type % y = & – Not just a code ! some % nested % type % x + 1 readability problem: the compiler may not be able ! It is possible also to associate to perform certain ! with expressions, like associate (z => exp(-(x**2+y**2)) optimizations when ... nested types are used end associate Associate constructs can be nested (and named) 34 35 BLOCK construct Submodules subroutine swap(i, j) With the block construct, it is possible Submodules allow a module procedure to have its integer :: i, j to define variables within executable interface defined in a module while having the body of if (i < j) then code, variables being fully local to the the procedure defined in a separate unit block scope integer :: temp A submodule can itself be further extended by other temp = i – Variables defined within the construct “descendant” submodules i = j will not affect anything outside of it – j = temp Sibling submodules of a module cannot access the entities end block Entities without a save attribute will defined local to each other end if cease to exist at the end of the May allow a programmer to reduce the size end subroutine swap construct of the module proper block constructs can be nested Use of submodules also allows information hiding within the scope of a module 36 37

Submodules Summary module points type :: point real :: x, y It is now possible to obtain command line arguments, end type point interface access environment variables and run operating system module function point_dist(a, b) result(distance) type(point), intent(in) :: a, b commands in a standardized way real :: distance Dynamic memory allocation has also been improved by end function point_dist end interface means of enhancements to allocatable variables and end module points pointers submodule (points) points_a Contiguous attribute for arrays enables more contains module procedure point_dist optimization distance = sqrt((a%x - b%x)**2 + (a%y - b%y)**2) end procedure point_dist Associate and block constructs as well as submodules end submodule points_a provide more scoping control

38 39 Outline

I0 and G0 descriptors for dynamic output formatting Newunit for open Asynchronous I/O Fortran namelist Character encodings

Advanced topics in Fortran I/O

40 41

I0 and G0 edit descriptors

Dynamic sizing of real and integer valued output – Correspond to C language %d and %g formats Output fields become left justified with all the DYNAMIC FORMAT DESCRIPTORS unnecessary leading blanks (and precision for reals) removed integer :: i = 12345 real(kind=4) :: sp = 1.23e0 real(kind=8) :: dp = 1.234567890d0 write(*,fmt='("")') i,sp,dp ! output is

42 43 Newunit specifier

With the NEWUNIT specifier a file on an unused unit number (automatically chosen) is opened – It also returns the unit number that was chosen NEWUNIT IN OPEN integer :: unit ... open(newunit=unit,file='test') write(unit) some_data ...

44 45

Asynchronous I/O

Both input and output I/O input ts + 1 can be asynchronous i.e. – Other statements may be compute f(ts) ASYNCHRONOUS I/O executed whilst I/O is in progress in the background output ts

Caveat: It is an wait input implementation dependent factor whether I/O actually prepare f(ts+1) is performed asynchronously wait output

46 47 Asynchronous I/O Asynchronous I/O

program async_io program async_io Use the asynchronous integer :: id, ia(1000000) If you open your file for integer :: id attribute in open, read call initialize(ia,size(ia)) asynchronous I/O, but real :: a(1000000), b(1000) and/or write open(10, ..., asynchronous='yes') perform actual I/O call async_write(10,id,ia,size(ia)) open(10, ..., asynchronous='yes') operations in other read(10, id=id, asynchronous='yes') a – When all I/O statements call do_something_else_here( ) call do_something_with(b) are performed in the same wait(10, id=id) procedure, then also the wait(10, id=id) ! blocks until a read in routine, then the variables end program async_io arrays involved in the close(10) need not to be declared subroutine async_write(iu, id, ia, n) I/O must also be integer, intent(in) :: iu, n call do_something_with(a,size(a)) asynchronous declared with the end program async_io integer, intent(in), asynchronous :: ia(n) Variables cannot be integer, intent(out) :: id asynchronous attribute accessed whilst being write(iu, id=id, asynchronous='yes') ia processed by async I/O end subroutine async_write

48 49

Fortran namelist

The namelist construct allows for convenient parsing of input files integer :: iparam = 1 NAMELIST I/O real :: rparam = 1.0 logical :: lparam = .false. namelist /aparams/ iparam, rparam, lparam namelist /bparams/ ...... open (10, file=’params.in’, status=’old’, recl=80) params.in: read (10, nml=aparams) & params read (10, nml=bparams) iparam = 2, ... rparam = 2.0, bparam = T / & bparams ... / 50 51 Character encodings

use iso_fortran_env Fortran supports integer, parameter :: ascii = selected_char_kind for the ASCII and ('ascii') integer, parameter :: ucs4 = selected_char_kind UCS-4 (aka CHARACTER ENCODINGS ('ISO_10646') character(kind=ascii, len=26) :: alphabet Unicode, UTF-8) character(kind=ucs4, len=30) :: hello_world character

alphabet = "abcdefghijklmnopqrstuvwxyz" encoding hello_world = 'Hello World and Ni Hao -- ' & – Some compilers // char (int (z'4F60'), ucs4) & // char (int (z'597D'), ucs4) may have other sets supported open (output_unit, encoding='UTF-8') as well

52 53

Summary

Edit descriptors enable dynamic sizing of output The newunit specifier for open avoids the probing of an available I/O unit Asynchronous I/O enables overlap with computation Perhaps a less-known old feature of namelists enable neat parsing of input files Unicode characters are enabled

54 Choosing a compiler

Many different options – GNU, PGI, Intel, Cray, etc. Compatibility – Different proprietary intrinsics – Different rounding rules Performance Know your compiler – There is no universally fastest compiler – Depends on the application or even input

55 56

Documentation Compiler message system

Cray: man crayftn With Cray compiler, the explain command displays an Intel: ifort --help explanation of any message issued by the compiler b = fun() % a GNU: man gcc, man gfortran ^ ftn-1725 crayftn: ERROR FUNCTION_DT, File = funtest.f90, Line = 10, Column = 13 Unexpected syntax while parsing the assignment statement : "operator or EOS" was expected but found "%". http://docs.cray.com/PDF/Cray_Fortran_Reference_Man ... ual_85.pdf pmannin@sisu-login4:~/misc> explain ftn-1725 https://software.intel.com/en-us/intel-fortran-compiler- Error : Unexpected syntax while parsing the %s statement : "%s" was expected 17.0-user-and-reference-guide but found "%s". The syntax of this statement is incorrect. During parsing, the compiler was https://gcc.gnu.org/onlinedocs/gfortran/ looking for one thing, but found another. This is a general message used throughout the parser. The statement type that the compiler believes it is ... 57 58 Optimal porting Compiler optimization techniques

”Improving application performance Architecture-specific tuning without touching the source code” Theoretical peak – Tunes all applicable parameters to the defined Potential to get significant microarchitecture performance improvements with Vectorization little effort Performance Effort – Exploiting the vector units of the CPU (AVX etc.) Should be revisited routinely • Compilers – Improves performance in most cases – Hardware, OS, compiler and library • Compiler flags Loop transformations upgrades • Numerical libraries • Intranode placement – Fusing, splitting, interchanging, unrolling etc. – Can be automated • Internode placement – Effectiveness varies • Filesystem parameters

59 60

Compiler flag examples Doesn't the compiler do everything?

Feature Cray Intel GNU You can make a big difference to code performance with Compiler output -hlist=a -qopt-report=3 -fopt-info-vec how you express things Balanced (default) -O2 -O3 – Helping the compiler spot optimisation opportunities Optimization Aggressive -O3,fp3 -Ofast -Ofast –funroll- – Using the insight of your application Optimization loops – Architecture -h cpu= -x -march= Removing obscure (and obsolescent) “optimizations” in specific tuning older code Fast math -h fp3 -fp-model fast=2 -ffast-math . Simple code is the best, until otherwise proven

Diagnostics -Rbp -check bounds -fcheck=all This is a dark art, mostly: optimize on case-by-case basis (runtime bounds -check pointers etc checking) – First, check what the compiler is already doing

61 62 SIMD vectorization SIMD vectorization

SIMD instructions operate on multiple elements at one The compiler will only vectorize loops cycle Constant (unit) strides are best AVX/AVX2: 256 bits = 4 DP values, 8 SP values Indirect addressing will not vectorize (efficiently) – AVX2 brings FMA Can vectorize across inlined functions but not if a AVX512: 512 bits = 8 DP values, 16 SP values procedure call is not inlined Needs to know loop tripcount (but only at runtime) Scalar + = – i.e. DO WHILE style loops will not vectorize AVX + = AVX512 + = No recursion allowed

63 64

SIMD vectorization

See the compiler output on whether a hot-spot loop was vectorized or not Does a non-vectorized loop have true dependencies (i.e. do two iterations depend on each other)? – No: add the directive ivdep on top of the loop, alternatively use the OpenMP simd directive – Yes: Accept it, or try to rewrite the loop . Move if statements out of the loop If you cannot vectorize the entire loop, consider splitting it so that as much of the loop is vectorized as possible

65 Outline

Language interoperability issues Module iso_c_binding – Mapping of C intrinsic data types in Fortran – Mapping of C derived data types in Fortran – Utility functions Calling C routines from Fortran Language interoperability Mapping and accessing data from Fortran I/O interoperability

66 67

Language interoperability issues Before Fortran 2003

The problem of language interoperability has been Compilers often, but not always present for a long time – Passed all procedure arguments by-reference (i.e. by-address) – Before Fortran 2003 the standard did not specify the – Referred to functions and subroutines in lowercase letters and calling convention in detail by adding an additional underscore after, e.g. Func becomes Interoperability is becoming even more important than func_ to be found by the linker before – Passed function return values via stack – Steering of computation with scripting languages – Passed character strings by-reference, with an additional – Utilizing libraries that are written using other programming hidden length argument, passed by value. languages . Note: Fortran character strings are not null-character terminated

68 69 Portability Interoperability with Fortran 2003: example

program F_calls_C The traditional way to have interoperability with C use, intrinsic :: iso_c_binding implicit none requires a priori knowledge of lowercase and underscore integer(kind=C_INT), PARAMETER :: n = 8 policy used by the compiler real(kind=C_DOUBLE), dimension(n) :: plm integer :: err interface Complex cases, such as passing character strings or function gsl_plm_array(lmax, m, x, res) & & bind(c,name=‘gsl_sf_legendre_Plm_array') result(rval) passing arguments by value, are generally very error use, intrinsic :: ISO_C_BINDING prone and may lead to catastrophic errors at runtime integer(kind=C_INT) :: rval integer(kind=C_INT), value :: lmax, m ! Pass by value real(kind=C_DOUBLE), value, intent(in) :: x ! intent(in)=const float Often a separate layer of C functions was needed for real(kind=C_DOUBLE) :: res(*) interoperability end function gsl_plm_array end interface

err = gsl_plm_array(n+1, 2, 0.234_C_DOUBLE, plm) print *, plm end program F_calls_C

70 71

The iso_c_binding module Calling C routines

Fortran 2003 intrinsic module iso_c_binding is used A Fortran subroutine maps to a C function with void with result use, intrinsic :: iso_c_binding A Fortran function maps to a C function returning a Module contains value – Access to named constants that represent kind type Binding label in bind(c, name=

72 73 Mapping of C intrinsic data types VALUE attribute

Interoperable mappings for the most commonly used C For scalar dummy arguments, the value attribute can be intrinsic data types used to pass the value to the procedure

Traditional “old” Fortran Fortran declaration C data type – Actual argument is copied integer*2 integer(c_short) short int – Dummy argument can be altered but result is not visible to integer*4 integer(c_int) int caller integer*8 integer(c_long) long int – Argument must not be a pointer, allocatable, have real*4 real(c_float) float intent(in) or intent(inout), be a procedure or have real*8 real(c_double) double the volatile attribute character*1 character(1,c_char) char value attribute is not limited to procedures with the Note that Fortran does not support unsigned integers bind attribute

74 75

Exercise Answer program FrobProg Find 6 mistakes program FrobProg use, intrinsic :: iso_c_binding implicit none implicit none C prototype, assumed real :: a, b real(kind=c_double) :: a, b integer*2 :: c integer(kind=c_int) :: c correct: interface interface function c_Frobnicate(a,b) & function c_Frobnicate(a,b) & int c_frob(double a, result(c) & result(c) & double* b); bind(c, name='c_Frob') bind(c, name='c_frob') real(kind=c_double) :: a import real(kind=c_double) :: b real(kind=c_double), value :: a integer(kind=c_int) :: c real(kind=c_double) :: b end function integer(kind=c_int) :: c end interface end function a = 2.0**10 end interface a = 2.0**10 b = 3.0**5 b = 3.0**5 c = C_Frobnicate(a,b) c = C_Frobnicate(a,b) end program FrobProg end program FrobProg 76 77 Mapping of derived data types Mapping of derived data types

In many cases it is possible to describe Fortran derived For a derived type to be interoperable between C and data types in terms of C data structures (and vice versa) Fortran, variable ordering, data types and array sizes To be interoperable, Fortran derived type must have the must be identical bind(c) attribute – Variable names do not need to be identical – sequence or extends keywords are forbidden Typical usage comes through function calls, for instance a Individual Fortran components in the data type must be Fortran routine extracting information from a C routine of an interoperable type – zero-sized array components are forbidden – allocatable and pointer components are forbidden C types cannot be unions nor structures with bit-fields

78 79

Mapping of derived data types: example Mapping of array data

/* C data structure */ program typedeftest ! Fortran array declarations Array indexing typedef struct { use, intrinsic :: ISO_C_BINDING ! compatible with C int count; implicit none REAL (c_double) :: z1(5), z2(3:5,17) – type, bind(c) :: C_type_in_Fortran C: starts from zero (0) double d[100]; INTEGER (c_int) :: ivec(-4:7) } C_type; integer(kind=c_int) :: count – Fortran: by default, starts real(kind=c_double) :: d(100) end type C_type_in_Fortran from one (1) /* C-function example */ interface void C_func(C_type *p) { subroutine testf(P) & Multidimensional ordering p->count = 1; bind(c, name='C_func') p->d[0] = 1.23; import – C: “row-major”, grow /* Corresponding C-declarations */ } type(C_type_in_Fortran) :: P fastest along the last end subroutine testf end interface double z1[5], z2[17][3]; dimension type(C_type_in_Fortran) :: X int ivec[12]; call testf(X) – Fortran: “column-major”, write (*,*) X % count, X % d(1) grow fastest along the end program typedeftest first dimension

80 81 Mapping of character data Mapping of character data: example

program F_atoi C procedures expect character strings to be null use, intrinsic :: iso_c_binding terminated implicit none interface – A null character has to be appended to a string before ! C-prototype: int atoi(const char *num); calling the C procedure function atoi(num) bind(c, name='atoi') result(x) use, intrinsic :: iso_c_binding Module iso_c_binding contains many character character(kind=c_char), intent(in) :: num(*) integer(kind=c_int) :: x constants: c_alert, c_backspace, c_form_feed, end function atoi end interface c_carriage_return, c_horizontal_tab, character*(*), parameter :: year = '2013' c_null_char, c_new_line, c_vertical_tab character(:,kind=c_char), allocatable :: number allocate(character(len=len(YEAR)+1)::number) ! Space for c_null_char number = trim(year) // c_null_char write (*,'(A,I0)') 'atoi('//year//')=',atoi(number) end program F_atoi

82 83

Accessing global data Accessing global data: example

Global data: /* Global C data */ ! Fortran global data must be ! defined in a module – Fortran: is declared in modules or in common blocks int number; module something float Array[8]; use, intrinsic :: ISO_C_BINDING – C: is declared outside any local scope and referenced via double slice[3][7]; implicit none extern elsewhere struct coord { integer(c_int), bind(c) :: number float x, y, z; real(c_float) :: my_array(8) C global data is accessed in Fortran with }; bind(c, name=‘Array’) :: my_array bind(c, name=

84 85 Interoperability with C pointers Interoperability with C pointers

Pointer concept is completely different in Fortran and C iso_c_binding contains inquiry and manipulation – Fortran: pointer is a storage descriptor which holds an functions for type(c_ptr) and type(c_funptr) address of the target object and, for arrays, the related When calling C routines with pointer arguments, C bounds and strides storage locations are sometimes needed. These can be – C: pointer is an address to a memory location acquired with c_loc and c_funloc: Module iso_c_binding contains Fortran type type(c_ptr) function c_loc(x) definitions for C pointers and C function pointers type(c_funptr) function c_funloc(x) – type(c_ptr) abstracts C pointers where x in interoperable. – type(c_funptr) abstracts C function pointers – For an interoperable Fortran entity x, return type(c_ptr) (or type(c_funptr)) containing the C address of x 86 87

Interoperability with C pointers Interoperability with C pointers

Checking association status of C pointer can be done with When access to the actual storage elements is needed, c_associated function: the contents of C pointer must be first transferred to logical function c_associated(ptr1[, ptr2]) Fortran form. This can be done with c_f_pointer where ptr1 and ptr2 are TYPE(C_PTR) or routine: type(c_funptr) subroutine c_f_pointer(c_p, f_p [, shape]) – Return .false. if ptr1 is a NULL C pointer or if ptr2 is where c_p is scalar type(c_ptr), f_p is a Fortran present and has a different value. Otherwise return pointer. If present shape is a rank-1 integer array with size .true. equal to rank of f_p – Set f_p to point to point to c_p. If f_p is an array, shape must be present. When f_p is an array, its lower bounds are set to 1 and upper bounds according to shape

88 89 Interoperability with C pointers: example Interoperability with C pointers: example

/* C data structure */ module my_typedef program mapcpointer use my_typedef typedef struct { use, intrinsic :: iso_c_binding type(simpletype) :: x int count; implicit none integer(c_int), parameter :: n = 10 double *d; /* dynamic */ type, bind(c) :: simpletype real(c_double), pointer :: xd(:) } SimpleType; integer(kind=c_int) :: count type(c_ptr) :: d call my_func(x, n) end type simpletype call c_f_pointer(x % d, xd, (/ n /)) ! map data to fortran pointer /* C-function example */ interface print *,x % count, xd(1) #include // for malloc subroutine my_func(p,n) & end program mapcpointer void my_func(SimpleType *p, int n) { p->count = n; bind(c,name='my_func') p->d = malloc(n * sizeof(*p->d)); import if (n > 0) p->d[0] = 1.23; type(simpletype) :: p } integer(c_int), value :: n end subroutine my_func end interface end module my_typedef

90 91

Interoperating with C binary I/O Interoperating with C binary I/O: example

Interoperability problems extend to binary (unformatted) integer(kind = 4) :: array(100) Data written from Fortran open(11, file='file.bin', & I/O form='unformatted', & as unformatted sequential access='sequential', & file – Fortran: unformatted I/O files contain data and the status='unknown') associated record delimiters (4 or 8 bytes long) Contains record ! write 400 bytes + 8 – C: binary I/O files contain data write (11) array(1:100) delimiters, usually 4 or 8 ! write 200 bytes + 8 Writing and reading files with STREAM I/O in Fortran bytes at the beginning and write (11) array(1:50) end of each record solves data interoperability problems close (11) Corresponding C code has – Stream I/O files do not have the facility to backspace or to ”decipher” these skip over records File ‘file.bin’ in a binary format: delimiters 400 | ... ARRAY(1:100) ...... | 400 200 | ... ARRAY(1:50) ... | 200

92 93 Interoperating with C binary I/O: example Summary use, intrinsic :: iso_c_binding When using With iso_c_binding, Fortran has standardized integer(kind = c_int) :: array(100) open (11, file='file.bin', & access='stream', no mechanisms to access source code and libraries written form='unformatted', & additional record in C, as well as define Fortran accessible from C access='stream', & status='unknown') delimiters are output Interoperability should be used with well-defined ! write 400 bytes interfaces write (11) array(1:100) /* The corresponding C-reader is now ! write 200 bytes easy to implement */ – Complicated structures or calling sequences are not write (11) array(1:50) int array[100]; recommended FILE *fp = fopen (”file.bin”, ””); close (11) Use of stream I/O is recommended if data is to be fread (array, sizeof(*array), 100, fp); File ‘file.bin’ in a binary format: fread (array, sizeof(*array), 50, fp); interoperated from C ... ARRAY(1:100) ...... ARRAY(1:50) ... fclose (fp)

94 95 Fortran coarrays?

Adds parallel processing as a part of Fortran language – Only small changes required to convert existing Fortran code to support a robust and potentially efficient parallelism A Partitioned Global Address Space (PGAS) approach – Unlike OpenMP, which operates over shared memory, the coarrays implement parallelism over “distributed shared Parallel programming with Fortran coarrays memory” – This approach is potentially massively parallel Have been integrated into Fortran 2008 standard

96 97

Some history Partitioned global address-space model

“Coarray Fortran” (formerly known as F--) was created by Partitioned global address-space (PGAS) programming Bob Numrich and John Reid in the 1990s model: each PE can access its local data and (a portion – Reached the current form in 1998 as a simple extension to of) remote i.e. other PE’s data with single-sided put/get Fortran 95 for parallel processing (=write/read) operations Coarrays have been in production for many years but mainly on Cray hardware Global address space

A set of core features are now part of the Fortran 2008 Local memory standard, development continues in the upcoming Fortran 2015 standard CPU CPU CPU – Not another language or extension, hence “Fortran coarrays” rather than “Coarray Fortran” 98 99 Why bother? Word of warning about the compiler support

Real-world experiences Still today, compiler support for the coarray syntax is a bit of improved limited performance when Cray compiler implements coarrays properly replacing MPI – automatic recognition; no flags needed; production-level communication kernels performance with PGAS Intel compiler supports coarrays but on top of MPI communication (compile with -coarray) For example: partial – Full support coming in v18 compiler suite inclusion of coarrays to ECMWF’s IFS G. Mozdzynski et al.

100 101

Word of warning about the compiler support

GNU compiler (5.1 and newer) supports building a coarray code upon an external library (OpenCoarrays on top of either MPI or GASNet library) – compile with -fcoarray=lib and link with the needed FORTRAN COARRAYS: BASIC CONCEPTS libraries G95 compiler features also partial support Not supported by the PGI compiler

102 103 Execution model Execution model

Upon startup a coarrays program gets replicated into a Image’s data are visible number of copies called images (i.e. processes) within the image only – integer :: a – The number of images is usually decided at the execution except for data declared as integer :: b[*], c[*] time special arrays: coarrays b b b Each replica (image) runs asynchronously in a non- Coarrays One-sided memory c c c (shared) coupled way until program-controlled synchronization accesses enable movement a a a of coarray data across Private different images of a data Image 1 Image 2 Image 3 program

104 105

Execution model Time for “Hello World” t program hello_world num_images() returns the program foo a=1 a=1 integer :: a[*] implicit none number of images in use for this ... Image #1 program foo a=2 a=1 write(*,*) ‘Hello world from ‘, & integer :: a[*] Image #2 run (usually set outside the programcall mpi_initfoo (rc) a=3 a=3 this_image() , ‘of’, & ...integer :: a[*] Image #3 program, by the environment) call mpi_init(rc) a=4 a=4 num_images() program foo Image #4 ...integer :: a[*] this_image() ... end program hello_world returns the image number in concern –

me = this_image() if (me == 2) a = a[1] numbered from 1 to a = me num_images() This program is a trivially parallel i.e. each image does not explicitly share any data and runs seemingly independently 106 107 Declaring coarrays Declaring coarrays

Images become meaningful in parallel programming The square brackets [*] denote allocation of coarrays context when their data are remotely accessible through over images coarrays The round brackets “( )” mean local array accesses, and integer :: scalar[*] the “[ ]” are meant for remote data array accesses only – Declares a scalar with a local instance on every image integer, dimension(64) :: vector[*] integer :: global(3)[*], local(3) – Declares a vector with a local instance consisting of 64 global(:) = this_image() * (/ 1, 2, 3 /) ! local initialization elements on every image local(:) = global(:)[1] ! copy from image number 1 to every ! image

108 109

Synchronization Synchronization: corrected remote copy

We need to be careful when updating coarrays We need to add barrier synchronization of all images – Is the remote data we are copying valid i.e. up to date? before the copying takes place to be absolutely sure we – Could another image overwrite our data without notice? are getting the most up to date copy of global(:)[1] – How do we know if the remote update (fetch) is complete? global(:) = this_image() * (/ 1, 2, 3 /) Fortran provides synchronization statements, e.g. adds a sync all local(:) = global(:)[1] barrier for synchronization of all images sync all To be absolutely sure we are getting correct result, we need to modify our previous copying example a little …

110 111 Synchronization Case study: parallel sum of array elements

In this particular case – since only the image #1 is in a Array originally on image #1 (I1) critical position, we could use an alternative, pairwise form of synchronization (recommended) I1 I2 Parallel algorithm – Scatter global(:) = this_image() * (/ 1, 2, 3 /) . Half of the array is read to image #2 (I2) sync images(1) local(:) = global(:)[1] – Compute . I1 & I2 sum independently their segments sync images can also take a vector of image ranks with – Reduction which to syncronize . Partial sum on I2 written to I1 . I1 sums the partial sums sync images( (/1, 2/) )

112 113

Case study: parallel sum Case study: parallel sum

Step 1: scattering - I2 fetches a lower part of Step 2: Both images compute local the array sums I1 I2 I1 I2 integer, dimension(n) :: array[*] psum = sum(array(1:n/2)) integer :: psum[*] ... if (this_image() == 2) then array(1:n/2) = array(n/2+1:n)[1] end if ∑=

∑=

114 115 Case study: parallel sum Interim summary: basic concepts

Concept of images and some related functions Step 3: Image 1 reads the partial sum from Image 2 and computes the total sum How to declare “codimensional” arrays (coarrays) and I1 I2 access their data Image synchronization sync all if (this_image() == 1) then t_sum = psum + psum[2] ∑= end if

∑=

116 117

Allocatable coarrays

It is possible to define dynamic coarrays integer, allocatable :: a(:)[:] ... DYNAMIC MEMORY ALLOCATION FOR COARRAYS allocate(a(1000)[*]) – allocate and deallocate imply implicit synchronization – all images must participate, i.e., an implicit sync all occurs – The local size must be the same on each image or otherwise a runtime error will occur – The last codimension must have an asterisk “*”

118 119 About pointers with coarrays Variable length coarrays via allocatable

A coarray declaration cannot have the pointer attribute Create a new Fortran data type with allocatable Thus the following definition is illegal: component in it – and place it in a module type mytype real, pointer :: ptr[:] ! this is invalid Fortran real, allocatable :: data(:) However, we can define a new type, where type end type mytype component(s) have pointer (or allocatable) attributes Then declare a coarray of that type, e.g. – And then define a coarray being of that type type(mytype) :: co[*] Used in dynamic dimensioning of coarrays Allocate on each image, but different size allocate(co % data(10 * this_image())) – In this way, the local sizes on every image can be different Refer to the data of another image as, e.g., element = co[1] % data(1)

120 121

Coarrays in procedures

When declared in a subroutine or function, a coarray must be one the following – Declared as a dummy argument to the procedure COARRAY VARIABLES AND PROCEDURES – Have allocatable and/or save attribute Re-mapping of corank is also allowed A coarray in procedure cannot be an automatic array

122 123 Coarrays in procedures subroutine some_routine(n, array_1, co_array_1, array_2, co_array_2, co_array_3) ! Procedure arguments integer :: n integer :: array_1(n), array_2(:) ! explicit and assumed shape integer :: co_array_1(n)[*], co_array_2(:)[*] ! explicit and assumed shape integer :: co_array_3[:] ! illegal: assumed co-shape COARRAYS AND (FILE) I/O ! Procedure variable declarations integer :: local_array_2(1000) ! local array integer, save :: local_co_array_3(1000)[*] ! coarray with save-attribute integer :: local_co_array_2(1000)[*] ! illegal: coarray without save integer, allocatable :: local_co_array_4(:)[:] ! coarray with allocatable integer :: local_co_array_1(n)[*] ! illegal: automatic co-arrays integer, pointer :: local_co_array_5(:)[:] ! illegal: coarray with pointer end subroutine some_routine

124 125

Coarrays and I/O I/O conventions

Each image has its own, independent set of Fortran input Do stdin with the image integer :: param[*] = 0 and output units ... #1 only and broadcast the The default input (“stdin”, i.e. read(*,…) etc) is pre- if ( this_image() == 1 ) then data read *, param connected to the master image (image #1) only do i = 2, num_images() – Do stdin with the image #1 only and broadcast the data param[i] = param end do – The same applies to command-line input end if sync all The default output (“stdout”) is connected to all images – Output is merged (in no particular order) into one output stream – The standard error (“stderr”) is redirected to the “stdout”

126 127 Parallel I/O with coarrays Parallel I/O with coarrays

Spokesman strategy Every man for himself – One image takes care of all I/O using – Each image writes its local normal read/write routines results to a separate file . Data gathered explicitly to the spokesman – Good bandwidth image – Difficult to handle a huge number of – Does not scale, single writer is a bottleneck files in later analysis . Single I/O client not able to fully utilize a – Can overwhelm filesystem (for parallel filesystem example Lustre metadata) – Can be good option when the amount of data is small (e.g. input files, log files)

128 129

Sketchy examples Parallel I/O with coarrays

Spokesman strategy if (this_image() == 0) then Subset of writers/readers open(unit=10, file=”data.dat”, form=’unformatted’, access=”stream”) – Good compromise but more complex do i = 1, num_images() write(10, position=”append”) data[i] – Number of readers/writers could be end do close(10) e.g. sqrt(num_images()) end if – If readers/writer images are dedicated, Every man for himself some computational resources will be wasted write(filename,’(A5,I0,A4)’) ’file.’, this_image(), ’.dat’ funit = 10+this_image() open(funit, file=filename, form=’unformatted’, access=’stream’) write(funit) data close(funit)

130 131 Summary of the second part

How to define coarrays with varying-sized local parts Using coarrays in procedures I/O and Fortran coarrays FURTHER TOPICS: MULTIPLE CODIMENSIONS

132 133

Multiple codimensions Multiple codimensions and this_image

Multi-dimensional coarrays are possible also in terms of In addition to returning the local image number, codimension this_image can also return the cosubscripts of a coarray – The last codimension must always be declared with An alternative form of the function is asterisk “*” this_image(coarray[, dim]) – The bounds of codimensions start from 1 by default but where coarray is a coarray of any type. can be adjusted – If dim is absent, the result is an array of rank one with the size equal to the corank of the coarray holding the integer, codimension[2,*] :: scalar cosubscripts of the image holding the coarray real, dimension(64,64), codimension[4,*] :: matrix real, dimension(128,128,128), codimension[0:1,0:1,0:*] :: grid – If dim is present, integer cosubscript of the coarray corresponding to dim is returned.

134 135 this_image: example Multiple codimensions and image_index real :: a(7,7)[0:3,3:*] The set of images fills the Occasionally it is useful to perform an opposite ! Prints 2 4 on image 7 out of 10 print *, this_image(a) coarray contiguously conversion of this_image, i.e., to extract image index ! Prints 2 on image 7 out of 10 starting from the first based on some given cosubscripts print *, this_image(a,1) ! Prints 4 on image 7 out of 10 codimension To get an image index based on the cosubscripts of a print *, this_image(a,2) See the table for an coarray, image_index intrinsic function can be used example case where 10 image_index(coarray, sub) images in total are used – If sub holds valid cosubscripts for coarray, the return value

Image 1 2 3 4 5 6 7 8 9 10 is an integer describing the corresponding image index this_image(a) (0,3) (1,3) (2,3) (3,3) (0,4) (1,4) (2,4) (3,4) (0,5) (1,5) – Otherwise the result is zero

Cosubscripts of array a when running with 10 images in total

136 137

image_index: example Obtaining lower and upper bounds for coarrays real :: a(7,7)[0:3,3:*] If the given cosubscripts There are query functions for obtaining upper/lower print *, image_index([2,4]) ! 7 print *, image_index([2,5]) ! 10 are valid, index of the cobounds image holding the data is lcobound(coarray [, dim]) returned. Zero return ucobound(coarray [, dim]) value indicates invalid They return an integer array, or a scalar in case of a single [x,y] 3 4 5 cosubscripts dimensional query (with optional “dim”) 0 1 5 9 See the table for an 1 2 6 10 example case where 10 For the array a from the previous page 2 3 7 0 images in total are used lcobound(a) is 0 3 3 4 8 0 ucobound(a, dim=1) is 3 Image indexes returned by image_index([x,y]) for array a when running with 10 images in total

138 139 Case study: parallel matrix-matrix multiply Case study: parallel matrix-matrix multiply

Consider a blockwise matrix-matrix multiply Data distribution: image 푖, 푗 holds data for blocks 퐴푖푗,

푁 } 푛 퐵푖푗 and 퐶푖푗 푋11 ⋯ 푋1푁 퐶 = 퐴퐵 ⟺ 퐶푖푗 = 퐴푖푘퐵푘푗 , 푋 = ⋮ ⋱ ⋮ ⋮ To compute 퐶푖푗, each image needs to remotely read 퐴푖푘 푋 ⋯ 푋 } 푘 푘=1 } } 푁1 푁푁 and 퐵푘푗, for 푘 = 1, ⋯ , 푁, 푘 ≠ 푗 from other images 푛 … 푘 Remotely 퐵21 read by image (3,2) 퐵22 = = 퐶32 퐴32 퐵23 퐶32 퐴31 퐴32 퐴33 퐴34 퐵23

퐵24 Held by image (3,2)

140 141

Case study: parallel matrix-matrix multiply Case study: parallel matrix-matrix multiply

subroutine parmatmul(n, a, b, c) type :: blockmatrix Size of a block can differ The block matrix-matrix implicit none real(kind=dp), allocatable :: mtr(:,:) from one image to integer, intent(in) :: n products 퐴 퐵 , for 푘 = end type blockmatrix 푖푘 푘푗 type(blockmatrix) :: a[n,*],b[n,*] type(blockmatrix), allocatable :: & another, i.e., variable 1, ⋯ , 푁 readily computed a[:,:], b[:,:], c[:,:] type(blockmatrix) :: c[n,*] integer :: i, j, in, jn, n, ntot, bsize length coarrays are used integer :: i, j, k with matmul intrinsic i = this_image(a,1) function ! allocate coarrays (global) For simplicity, assume that j = this_image(a,2) allocate(a[n,*],b[n,*],c[n,*]) the number of images is a c % mtr(:,:) = 0.0_dp do k = 1, n ! allocate block (local) square number c % mtr(:,:) = c % mtr(:,:)+ & i = this_image(a,1) matmul(a[i,k] % mtr(:,:), & j = this_image(a,2) jn = min(j*bsize, ntot)-(j-1)*bsize b[k,j] % mtr(:,:)) in = min(i*bsize, ntot)-(i-1)*bsize end do allocate(a % mtr(in,jn), & end subroutine parmatmul b % mtr(in,jn), c % mtr(in,jn))

142 143 Critical sections

integer :: mysum[*] = 0 At times it is necessary to integer :: ans ! the reference result integer :: me, npes, p execute a block of code by p = 1 ! fixed image# [p] one image at a time to avoid ADVANCED SYNCHRONIZATION me = this_image() race conditions whilst npes = num_images() ans = (npes * (npes + 1))/2 updating coarrays at a fixed critical image location mysum[p] = mysum[p] + me end critical critical – end critical sync all is the construct to use Remember to synchronize especially after the section

144 145

Lock variables sync memory type (lock_type) :: lockvar[*] Use of lock variables ... Enforces any pending if (this_image() == p) then resembles entering to a integer :: a[*] coarray writes to finish lock(lockvar[q]) critical section, except that before proceeding p[q] = p[q] + 1 it is more flexible, since you if (this_image() == 1) then unlock(lockvar[q]) a[2] = 123 ! put 123 to image#2 Waits until any preceding sync images (q) can have many distinct locks ! image#2 may not yet see the update writes are visible to the end if active simultaneously sync memory images in concern – unlike just one with a ! image#2 now sees the update Flushes also any cached critical section end if reads on image’s coarray(s) Not actual synchronization at all, however sync all / sync images also imply sync memory

146 147 Technical specification

Fortran, coarrays included, continue to evolve Technical specification (TS) outlines the future developments (now part of the Fortran 2018 draft) UPCOMING COARRAY FEATURES In the pipeline are the following enhancements – Image teams – Failed images – Events – Collective intrinsic functions for coarrays – Improvements in atomic updates

148 149

TS: Image teams TS: Failed images

Learning lessons from MPI and its communicators, it is An image may have died during the run – consequently a useful to have subsets (teams) of images sync images would fail (or stall) if it contains the failed image Define a new team at runtime sync images (image_list, stat = st) – For example divide images into two teams if (st == stat_failed_image) then ... use iso_fortran_env failed_images() function returns the count of failed type(team_type) odd_even images at the given moment ... – Another possibility is to use num_images(failed = id = mod(this_image(),2) + 2 .true.) form team (id, odd_even) Together with the new team capability, fault-tolerant Synchronize within a team only programming becomes possible (still non-trivial) sync team (odd_even) 150 151 TS: Collectives Summary of the last part

Collective operations are essential part of parallel Use of multiple codimensions sometimes handy programs – Using specific intrinsic functions lcobound, ucobound, TS contain definitions for this_image and image_index usually clarify the – co_broadcast situation – co_sum, co_min, co_max More advanced synchronization across images – co_reduce Fortran 2015 features: image teams, fault-tolerant Faster than corresponding explicit communication utilities, collectives patterns (with Cray compiler at least) – Many of these already supported by GNU and Cray compilers

152 153 DERIVED TYPES AND BEYOND

Types and procedure pointers

154 155

Derived types Derived types: examples of declaration type person integer :: age For large applications, type base_type character(len=20) :: name using only intrinsic types integer :: field = 1 end type person is often insufficient end type base_type type(person_type) :: p1 type :: array_type integer :: p1_age Use derived types to character(len=20) :: p1_name integer, pointer :: field1(:) group data ! Equivalently integer, dimension(:), pointer :: field1 p1_age = 20 – Code becomes easier integer :: field2(10) p1_name = 'John' end type array_type p1 = person(p1_age, p1_name) to read and maintain call print_person_bad(p1_age,p1_name) – Cleaner interfaces type recursive_type call print_person_good(p1) type(recursive_type), pointer :: next – Encapsulation of data end type recursive_type

156 157 Type initialization Derived types: initialization example module mymod Fortran 2003 allows the ! Type definitions ! Type allocations type person type pointer_type type(body) :: data1, data2 integer :: age = 0 use of keywords and also type(body) :: data type(pointer_type) :: r1 character(len=20) :: name optional arguments in type(pointer_type), pointer :: next type(pointer_type), target :: r2 end type person end type pointer_type type(alloc_type) :: a1, a2 end module mymod structure constructors type alloc_type ! Pointer components program test character(len=3) :: type r2 = pointer_type(data2, null()) use mymod real, allocatable :: data(:) r1 = pointer_type(data1, r2) type(person) :: p1, p2, p3 end type alloc_type p1 = person(50, ‘Peter’) ! Allocatable components p2 = person(age=43, name=‘Cindy’) a1 = alloc_type('nul',null()) ! age field has default value, a2 = alloc_type('vec',[1.2,2.3,4.5]) ! can omit keyword p3 = person(name=‘John’) end program test

158 159

Derived types: assignment semantics Derived types: assignment semantics

Assignment semantics differ by component type type(pointer_type) :: r1, r2 type(alloc_type) :: a1, a2 – Pointer components: shallow copy, i.e., pointer assignment is performed r1 = r2 ! Equivalent to ! r1 % data = r2 % data – Allocatable components: deep copy, i.e., the required ! r1 % next => r2 % next storage is automatically allocated and data is copied a1 = a2 ! Equivalent to Assignment can be overloaded ! a1 % type = a2 % type ! if (allocated(a1 % data)) deallocate(a1 % data) ! allocate(a1 % data(size(a2 % data))) ! a1 % data = a2 % data

160 161 Generic procedures

Procedures which perform similar actions but for different data types can be defined as generic procedures GENERIC PROCEDURES Procedures are called using the generic name and compiler uses the correct procedure based on the argument number, type and dimensions – Compare with ”overloading” in C++ Generic name is defined in INTERFACE section

162 163

Generic procedures example Operator overloading

interface operator(+) MODULE swapmod PROGRAM switch Instead of procedure module procedure evil_sum IMPLICIT NONE USE swapmod end interface operator INTERFACE swap IMPLICIT NONE name, use operator or MODULE PROCEDURE swap_real, swap_char CHARACTER :: n, s assignment keyword interface assignment(=) END INTERFACE REAL :: x, y module procedure evil_assign CONTAINS n = 'J' – Operator: 1 or 2 input end interface assignment SUBROUTINE swap_real(a, b) s = 'S' REAL, INTENT(INOUT) :: a, b x = 10 arguments with function evil_sum(x,y) result(z) REAL :: temp y = 20 intent(in) type(base), intent(in) :: x,y temp = a; a = b; b = temp PRINT *, x, y type(base) :: z END SUBROUTINE PRINT *, n, s – Assignment: 2 input z % val = x % val + y % val + 1 SUBROUTINE swap_char(a, b) CALL swap(n, s) end function CHARACTER, INTENT(INOUT) :: a, b CALL swap(x, y) arguments with CHARACTER :: temp PRINT *, x, y intent(inout) or subroutine evil_assign(x,y) temp = a; a = b; b = temp PRINT *, n, s type(base), intent(inout) :: x END SUBROUTINE END PROGRAM intent(out) and type(base), intent(in) :: y END MODULE swapmod x % val = x % val + y % val + 1 intent(in) end subroutine

164 165 Procedure variables

module test Pass subroutine as an contains function apply(f, x) result(y) actual argument to procedure(add_one) :: f another subroutine integer, intent(in) :: x PROCEDURE VARIABLES AND ABSTRACT integer :: y Usually called ”callback” y = f(x) INTERFACES end function function function add_one(x) result(y) Here add_one is used to integer, intent(in) :: x integer :: y type the dummy y = x + 1 argument f strongly end function end module test

166 167

Interfaces Interfaces interface Programs may have to call module test subroutine not_dangerous(a, b, c) interface Instead of explicit integer :: a, b, c routines outside the function int_fun(x) result(y) module function one end subroutine not_dangerous program itself (external integer, intent(in) :: x end interface integer :: y can provide an libraries) or provide end function interface to type f in integer :: x, y, z services to others APIs end interface x = 1; y = 1; z = 1 apply function ! Call external subroutine without Wrong calling arguments contains ! an interface call dangerous(x,y,z) may lead to errors during function apply(f, x) result(y) ! Call external subroutine with the executable linking procedure(int_fun) :: f ! an interface integer, intent(in) :: x call not_dangerous(x,y,z) phase or even when the integer :: y executable is being run y = f(x) end function end module test

168 169 PROCEDURE statement Procedure pointers

Procedure pointer variable can be defined as follows: Procedure statement procedure(int_fun), pointer :: fptr

procedure([iface]), attrs :: decl-list allows definition of fptr => add_one where pointers to procedures print *, fptr(1) fptr => substract_one – attrs is one of: public, private, bind, intent, (explicit or implicit print *, fptr(1) optional, pointer, save interface) function substract_one(x) result(y) integer, intent(in) :: x – and list of declarations decl: name [=> null-init] A derived type can have integer :: y procedure pointers as y = x-1 null-init references intrinsic function null() with end function no arguments and may only appear with pointer components function add_one(x) result(y) integer, intent(in) :: x Implicit interfaces similarly to external statement integer :: y y = x+1 end function

170 171

Abstract interfaces Abstract interface example

Use abstract interface to stress that there is no external ! Abstract interface definition ! Procedure definition abstract interface procedure(sub_with_no_args) :: sub1 procedure of the same name procedure(trig) :: mysin, mycos subroutine sub_with_no_args() Declaration of a generic interface: end subroutine sub_with_no_args ! Implicit interfaces abstract interface ! No known return value nor function trig(x) result(y) ! arguments abstract-interface-body real, intent(in) :: x procedure() :: x real :: y ! Real return value, arguments end interface end function trig ! unknown procedure(real) :: y Routines declared in abstract-interface-body end interface must not have an actual implementation in the current ! Compiler error: scope print *, trig(1.0)

172 173 Summary

Group data/variables using derived types Interfaces – with procedure statement provide type-safe access to external routines, – provide generic procedure semantics: multiple subroutines, one name – provide operator overloading semantics Procedure pointers provide functionality similar to C function pointers and can be bound to derived types

174 EXTENDING TYPES

Type-bound procedures and extensions

175 176

Type extension Type extension type :: base Type extensions are To be eligible for type extension, parent type must be integer :: field1 end type base used to extend the extensible, i.e., be a derived type with neither sequence functionality of an or bind(C) attributes type, extends(base) :: extended integer :: field2 existing type Extensions can have zero additional components end type extended Type extensions are All components as well as type parameters of the parent backwards compatible, type are inherited by extended type because every instance Extended type has an additional component of the of extended is also an parent type with the name parent type instance of base

177 178 Type extension example

! Type describing a person type :: person Person character(len=10) :: name name: character integer :: age age: integer end type person … POLYMORPHIC OBJECTS ! Employee has fields: ! name, age, salary, person Employee type, extends(person) :: employee integer :: salary salary: integer end type employee …

! Staff has fields: ! name, age, salary, employee Staff type, extends(employee) :: staff … end type staff …

179 180

Polymorphism Polymorphism: CLASS type :: base A polymorphic variable: ”a The type of a polymorphic variable may vary at run time integer :: field1 end type base variable whose data type Declaration of a polymorphic variable: may vary at run time.” type, extends(base) :: extended integer :: field2 Here p1 may point to all class(type_name)[, class_attr] :: name end type extended extensions of base type(extended), target :: t1 Object polymorphism: a class(base), pointer :: p1 where class_attr is either pointer or allocatable variable can have – Unless variable is used as a dummy argument t1 = extended(1,2) instances from different ! p1 is of class base Polymorphic variables are type-compatible with p1 => t1 classes as long as they are related by a common base type_name and all extensions of type_name class

181 182 Polymorphism example Base type argument subroutine write_name(this) type :: base class(person) :: this integer :: field1 Procedures with base write (*,*) this % name end type base class dummy arguments end subroutine write_name type, extends(base) :: extended accept instances of type(person) :: p1 integer :: field2 extended class as actual type(staff), target :: p2 end type extended class(employee), pointer :: p3 arguments. function basefun(x) result(y) p1 = person(name='Joe',age=20) class(base) :: x,y Polymorphic variable p2 = staff(name='Mike',age=42,salary=2000) end p3 => p2 definition with class type(extended) :: z keyword call write_name(p1) ! Prints 'Joe' type(base) :: y call write_name(p3) ! Prints 'Mike' y = basefun(z)

183 184

Procedures as type components

In Fortran 2003, procedures can be tied to – objects (object-bound) – types (type-bound) PROCEDURES AS TYPE COMPONENTS The use of type-bound procedures is encouraged, as they enforce the object encapsulation and can be overridden Overriding type-bound procedures in extended types is allowed

185 186 Object-bound procedures Object-bound procedures example

Object-bound procedures can be added as procedure type :: person subroutine print_per(this) character(len=10) :: name class(person) :: this pointer type components integer :: age write (*,*) this % name, & procedure(print_per),pointer ::& this % age procedure([iface]), pointer :: decl-list print_person => null() end subroutine end type person where iface defines the procedure interface to be used subroutine print_emp(this) type, extends(person) :: employee class(employee) :: this The object itself is passed to the procedure as the first integer :: salary write (*,*) this % name, & procedure(print_emp),pointer ::& this % age, & actual argument, unless the nopass attribute is given print_employee => null() this % salary procedure([iface]), pointer, nopass :: decl-list end type employee end subroutine

187 188

Object-bound procedures example cont. Object-bound procedures: pass attribute

type base type(person) :: per procedure(sub), pointer, pass(x) :: p Use the pass attribute type(employee) :: emp end type base to decide which ! Initialize per and emp abstract interface subroutine sub(a, b, x) per = person(name='Joe',age=10) Person argument should refer to ! bring ”base” type to current scope: emp = employee(name='Mike',age=42,& name : character print_per import base the base class in the salary=2000) age : integer real :: a, b … class(base) :: x argument list per % print_person => print_per end subroutine sub emp % print_person => print_per end interface First argument is default emp % print_employee => print_emp Employee type(base) :: t1 if pass is not given salary : integer print_emp t1 % p => mysub call per % print_person() ! Equivalent to call mysub(1.0,2.0,t1) Also attribute: call emp % print_employee() … call t1 % p(1.0,2.0) nopass do not pass the container object

189 190 IMPORT statement Type-bound procedures

Interface does not access its environment via host type :: base Type-bound procedures are integer :: field1 association, i.e., named constants and derived types from contains dynamically determined for enclosing module are not available procedure :: & polymorphic variables write_output => write_base – The use statement requires breaking of encapsulation end type base during runtime and have a smaller performance Fortran 2003 addresses this with the import statement type, extends(base) :: extended integer :: field2 penalty than a full type import [ [::] import-name-list] contains determination where import-name is an entity to be accessed from procedure :: & write_output => write_extended Supports the pass the host end type extended attribute in the same way import without arguments makes all host entities ! Implementations of write_base and as object-bound procedure accessible ! write_extended omitted

191 192

Procedure overriding Type-bound procedures example

A way to extend also the functionality of an extended type :: person subroutine print_per(this) character(len=10) :: name class(person) :: this type is to override the procedures defined by the base integer :: age write (*,*) this % name, & contains this % age type procedure, non_overridable :: & end subroutine print_person => print_per When extending a type, a procedure can be overridden procedure :: & subroutine print_emp(this) print_info => print_per class(employee) :: this by simply defining a new procedure with the same name end type person call this % person % print_info() write (*,*) ‘And salary is:’, & Overriding procedure must have exactly the same type, extends(person) :: employee this % salary integer :: salary end subroutine interface as the overridden procedure contains procedure, non_overridable :: & Notice the parent object call in – Apart from the type of the passed-object dummy print_employee => print_emp procedure :: & print_emp argument which must be of the extended type print_info => print_emp end type employee Quiz: Why does ”this” have to be The non_overridable attribute prevents overriding class?

193 194 Type-bound procedures example cont. Type-bound generic procedures type(person) :: per module test Similar to generic type(employee) :: emp Person type :: base ! Initialize per and emp name : character integer :: field1 interfaces per = person(name='Joe',age=10) age : integer contains emp = employee(name='Mike',age=42,& – Supports operator salary=2000) print_person procedure :: add_i, add_r print_info generic :: add => add_i, add_r overloading ! Calls print_per end type base call per % print_info() contains Call procedure determined ! Calls print_emp subroutine add_i(this, x) call emp % print_info() by actual argument types Employee class(base) :: this integer :: x salary : integer end subroutine type(base) :: x subroutine add_r(this, x) x = base(1) print_employee class(base) :: this call x % add(1) ! calls add_i print_info real :: x call x % add(1.0) ! calls add_r end subroutine end module

195 196

Summary Type extensions (extends) allow for better code reuse Polymorphic objects (class()) allows base class pointers to point to an extended class target Type-bound procedures (after contains inside type) – Type-bound generics are similar to generic interfaces, support operator overloading – A type-bound procedure can be overwritten with extended types Procedures bound to objects can be changed run-time

197 Outline

Closure functions Memoization Linked lists and binary trees with and without OOP features

Complex data structures with Fortran

198 199

Closures in Fortran A closure is a function that inherits some data from the scope where it is being called In Fortran one can write closures using contains CLOSURES semantics together with procedure pointers and procedure dummy arguments

200 201 Closure by example program closuretest integer :: z The add_z function looks do z=1,10 up the value of z from the print *, apply(add_z, 1) end do parenting scope MEMOIZATION contains The add_z function is function apply(fun,x) result(y) passed on as a function procedure(intfun) :: fun integer :: x, y parametrized with z to y = fun(x) end function apply function add_z(x) result(y) integer :: x, y The apply function could y = x + z be in an external module end function end program as well

202 203

Memoization Memoization with object bound procedures

A technique where output value of function is stored in type data_t Access input using integer, private :: x, y memory after it is called the first time integer, private :: val initdata interface procedure(calcdata), pointer, public :: & - initdata resets Usually used if the function is evaluated irregularly, calculate => calcdata often, and the evaluation is costly compared to fetching end type calculate pointer Access output result from memory interface data_t ! “override” type initializer module procedure initdata values with – No point in memoizing simple floating point functions such end interface as exp, sin, cos, ... calculate type(data_t) :: data procedure Can be achieved with conditional structures or using integer :: value - Sets calculate data = data_t(1,1) object-bound procedure pointers value = data % calculate() pointer to function print *, ‘value is’, value that only returns val

204 205 Overwriting initializer module datamodule type(data_t) :: foo type data_t integer :: x, y foo = data_t(1,2) end type This will cause object interface data_t ! “override” type initializer LINKED LISTS module procedure initdata foo hold end interface foo % x = 0 foo % y = 1 contains function initdata(x,y) result(obj) Useful when user is not type(data_t) :: obj allowed to modify the integer :: x,y type contents directly obj % x = x-1 obj % y = y-1 Benefit: multiple end function initializers, choice by end module argument types

206 207

Linked list Simplified Fortran implementation

type :: node_t type(data_t) :: val Replace arrows with type(key_t) :: key pointers a b c d type(node_t), pointer :: right => NULL() end type Encapsulate key and data type :: list_t in node_t type Root Last type(node_t), pointer :: root => NULL() type(node_t), pointer :: last => NULL() Root and last pointers end type allow O(1) append and get subroutine append_node(list, key, data) type(list_t) :: list A data structure that can implement a first-in-first-out type(key_t) :: key (FIFO) structure type(data_t) :: data ! ... a b c d – O(n) search specific key end subroutine function get_first(list) result(node) Root Last – O(1) append, get ! ... 208 209 Modernized Fortran features Derived type finalization type :: node_t ! Type describing a person The subroutine type(data_t) :: val Use type-bound type :: person type(key_t) :: key procedures for cleaner integer, pointer :: bank_accounts(:) finalize_person is type(node_t), pointer :: right => NULL() integer :: id called automatically when end type code contains final :: finalize_person instance of person type :: list_t The final statement? type(node_t), pointer :: root => NULL() end type person 1. Falls out of scope type(node_t), pointer :: last => NULL() – list_t_finalize is contains called automatically contains 2. Is deallocated (for non- procedure :: append => list_t_append pointer variables) procedure :: get => list_t_get when a list_t type subroutine finalize_person(this) final :: list_t_finalize type(person) :: this 3. Passed to intent(out) end type object goes out of scope deallocate(this % bank_accounts) argument – Deallocate nodes here end subroutine 4. Used as left hand side of intrinsic assignment

210 211

Final components cont.

An object is finalizable if it has a final subroutine which matches the object A non-pointer object will be finalized when it is deallocated, goes out of scope, is passed to as an intent BINARY TREES out argument or used as a left-hand side of an intrinsic assignment statement Derived type containing finalizable components will be finalized before the individual components Termination of a program does not invoke any final subroutines

212 213 Binary search tree (BST) Simplified Fortran implementation

Root type :: BTree type(BNode), pointer :: root => NULL() Also store pointer to end type parent type :: BNode Other operations might type(data_t) :: val type(key_t) :: key include: type(BNode), pointer :: left=>NULL(), & right=>NULL(), parent=>NULL() – Get first node end type – Get one not smaller node function node_less_than(a, b) result(lt) Useful data structure for ordered keys, if balanced end function Root then insert and search are O(log n) operations subroutine insert(tree, key, value) end subroutine

BST-condition: Key on left is always smaller than on right subroutine get_node(tree, key) Usually terminated with the null pointer (black box) end subroutine 214 215

Modernized BST Modernized BST type :: BTree type :: BTree Access control type(BNode), pointer, private :: root => Insertion and search type(BNode), pointer, private :: root => NULL() algorithms depend on NULL() – protected: read only contains contains procedure :: insert node_less_than procedure :: insert outside module procedure :: get_node procedure :: get_node – procedure :: get_first ! and so on method procedure :: get_first ! and so on private: not accessible end type end type outside module type :: BNode – type bound procedures type :: BNode – public: accessible outside type(data_t) :: val – type bound generic for type(data_t) :: val type(key_t), protected :: key type(key_t), protected :: key module type(BNode), pointer :: left=>NULL(), & comparison yields type(BNode), pointer, private :: right=>NULL(), parent=>NULL() left=>NULL(), & Access control also usable contains cleaner code right=>NULL(), parent=>NULL()

procedure, private :: node_less_than Root contains for modules generic, public :: & procedure, private :: node_less_than operator(<) => node_less_than generic, public :: & – Import only limited procedure :: next_node operator(<) => node_less_than symbols with the use end type procedure :: next_node end type statement 216 217 Module access control by example Summary module access_module private ! Module default access is private Fortran 2003/2008 features allow cleaner interfaces to integer, protected :: v1 ! v1 is protected complex structures integer :: v2 ! v2 is private integer, public :: v3 ! v3 is public – Finalization, type-bound generics and procedures public :: sub1 contains – if-less memoization with object-bound procedures subroutine sub1(…) ! sub1 is public end subroutine sub1 – Using public and private statements usually results in subroutine sub2(…) ! sub2 is private a cleaner interface to a module end subroutine sub2 subroutine sub3(…) ! sub3 is private Not everything has to be an object! end subroutine sub3 end module access_module – Start with non-object-oriented design of a module, and if it looks like an object, then refactor it into one

218 219