APPLICATION NOTE XIP Linux for RZ/A1
Total Page:16
File Type:pdf, Size:1020Kb
APPLICATION NOTE RZ/A1 EU_00181 Rev.1.10 XIP Linux for RZ/A1 Jun 13, 2016 Introduction Target Device Contents 1. Frame of Reference .......................................................................................................................... 2 2. What is an XIP Linux Kernel ............................................................................................................. 2 3. RZ/A1 XIP SPI Flash Hardware Support .......................................................................................... 2 4. Updating the kernel image ................................................................................................................ 3 5. Kernel RAM usage ............................................................................................................................ 3 6. Simple Benchmarks .......................................................................................................................... 3 7. Kernel vs Userland ............................................................................................................................ 6 8. Files Systems and Storage ............................................................................................................... 6 9. u-boot Modifications .......................................................................................................................... 7 EU_00181 Rev.1.10 Page 1 of 8 Jun 13, 2016 RZ/A1 XIP Linux for RZ/A1 1. Frame of Reference Since the Linux kernel and open source community is constantly changing, please keep in mind that this documented was written in August of 2014, and the kernel references were to the Linux-3.14 code base. 2. What is an XIP Linux Kernel When any executable program is compiled and linked, the different portions of the program are combined together in the resulting binary image. For the Linux kernel, the order is basically: text (ie, code) , read only data, initialized data variables, uninitialized BSS variables. You can see this by examining the System.map file. For a traditional standard Linux kernel, this entire image memory map image is placed in RAM. The reason for this is that systems that generally utilize Linux are either PCs or high end embedded MPU designs where code is intended to be run from high speed RAM (DDR memory). A couple years ago, source code and the linker scripts within the kernel were modified such that ROM and RAM sections could be explicitly defined as opposed to just letting the RAM follow the ROM. The main target platform was the Power PC with a parallel NOR Flash. The main purpose was to allow for a faster boot time since the kernel would not have to be decompressed and copied into RAM before execution would begin. Instead, the kernel code could begin execution immediately. The tradeoff however was that NOR execution was slower than DDR execution, and NOR Flash cost more than DDR. Later, some patches were submitted to the mainline kernel for a TI OMAP device (ARM based). Again, the assumption was execution from parallel NOR flash. It should be noted that while traditional kernel utilities like mkimage were modified to produce some level of support for creating XIP kernel images that could be launched using the ‘bootm’ command in u-boot, it was specific to the original Power PC experiment (and a bit of a hack). TI did release an app note related to getting around the Power PC specific nuances, but again, it was somewhat of a work around to for the Power PC specific booting nuances. It is also worth mentioning that there is a section defined in the kernel called ‘init’ who scope is only during boot time. This means that any functions or data structures that are only needed for the boot process and can assumed to only be used once, you can assign them to this unique section. The benefit is that the final operations the Linux kernel will do during its boot process before handing control off to the application space is ‘free’ any init sections, thus reclaiming valuable RAM memory since otherwise it was be wasted taking up space for code that would never be run again. Therefore, one modification of the XIP kernel build was to outline that while the init section RAM data could be freed, the inti section code could not (since it will be located in ROM). For a RAM based kernel for the ARM architecture, the kernel is generally located at a virtual base address of 0xC0000000. Virtual memory mapping is used to remap the RAM’s physical location, say external SDRAM at 0x08000000 (CS2) or internal RAM at 0x20000000 for the RZ/A1. For an XIP kernel for ARM, the kernel’s ROM sections (code and constants) are mapped 16MB below the beginning of RAM, ie, 0xBF000000. This is the same area that is used for loaded modules. See the definition of the macro XIP_VIRT_ADDR in arch/arm/include/asm/memory.h. If you examine the System.map file of an XIP kernel build, you will easily be able to identify what portions are access directly from ROM Flash (0xBFxxxxxxx) and those portions that will be in RAM (0xC0xxxxxx) One more thing to mention is that when driver modules are loaded at run-time, they are loaded in RAM and accesses using address 0xBFxxxxxxx. This may be a little confusing because we just mentioned that 0xBFxxxxxxx was the location of the XIP kernel ROM, but that is the beauty of virtual address mapping. Also, if you have driver code that you need to run as fast as possible, by making it a driver module you can ensure that all the code will execute out of RAM which will give you better performance than a static driver which would be part of an XIP kernel executing out of Flash ROM. 3. RZ/A1 XIP SPI Flash Hardware Support The RZ/A1 has the ability to memory map the contents of SPI Flash into linear accessible/executable memory using peripheral block called “SPI Multi I/O Bus Controller”. Basically this means when the CPU attempts to read data or fetch code from a specific address range, hardware will automatically use the SPI channel to read the corresponding data from within the SPI Flash. Additionally, the hardware has 16 cache lines (8 bytes each) that can be used to prefetch data from flash in order to reduce latency. There is also the option to automatically fill more than 1 cache line with contiguous flash data on a cache miss in order to anticipate future reads. From experiments with the XIP Linux kernel, filling 2 cache lines automatically (ie, always read 16 bytes of SPI Flash), yields the best performance results. EU_00181 Rev.1.10 Page 2 of 8 Jun 13, 2016 RZ/A1 XIP Linux for RZ/A1 Other features of the XIP interface include supporting both 2-bit and 4-bit address/data interfaces to the SPI flash. This greatly increases the speed at which you can retrieve data. Additionally, there are Double Data Rate (DDR) capabilities so that data is read on every clock edge, meaning you effectively send the address and read the data at a rate twice the clock operating speed. Lastly, the 2 channels of this specialized SPI Flash interface can be used in conjunction with each other meaning you can retrieve data twice as fast since both SPI Flash will be responding to the same flash address. Channel 0 holds all the odd addressed bytes and channel 1 holds all the even address bytes. Therefore in theory it is possible to use 2 SPI Flash devices with the 4-bit wide interface and DDR option to retrieve 16-bits of flash data for each 50MHz SPI clock cycle (the maximum clock frequency of the RZ/A1). 4. Updating the kernel image The use of an XIP kernel does not prohibit the file system or media device you would like to use. The only exception is that you cannot reprogram the flash devices that you are currently running out of. For example, if the RZ/A1 is running in XIP mode using the Quad SPI interface, you cannot modify (erase/write) to that SPI Flash device since that would require taking the SPI peripheral out of XIP mode and putting it back into SPI mode, which would in turn crash your system. Instead, to update your kernel you would first have to save it someplace else and then reboot into u-boot or some other customer bootloader that executes out of RAM. It might be possible, however, to create a loadable kernel module where you first load the data you want to program into a memory buffer (in kernel space) and then disable all interrupts. Since we know the entire module will be loaded into system RAM at runtime, we can then change the SPI peripheral from XIP mode to SPI mode and erase/program in our data. Of course at time, we need to make sure no functions are used that are outside of that module, including any kernel utility functions since we would have completely disable any kernel code access. 5. Kernel RAM usage Since the main motivation to moving to an XIP kernel is saving RAM, here are some numbers related to what a XIP kernel uses in terms of RAM. For a traditional kernel, all code and data is kept in RAM. Therefore the majority of the kernel image that gets loaded into RAM is static code which obviously does not require a read/write medium to reside in. There here is a comparison of compiling the same kernel as XIP vs standard. Of course the kernel has many options and drivers that can be turned on and off. For this build, only a small number of drivers were included, the largest being the Ethernet driver and TCP/IP stack. The System.map file was used to determine amount of RAM used by basically looking at the address of the very last symbol ‘_end’. SDRAM kernel: 3,803 KB XIP kernel: 332 KB 6. Simple Benchmarks The following simple benchmarks were performed to understand what the performance difference were between and XIP kernel running out of quad SPI flash vs SDRAM. 6.1 Boot Time To measure boot time, the time was measured from the point in u-boot that the kernel boot process is started to the point at which the log message "Freeing unused kernel memory" is displayed because at that point, the file system is mounted and the rest of the time depends on what apps you want to start.