FEM simulations: effects of improvements in information technologies on the computational time with large full vehicle FEM models

A. Ghelardini, G. Mancini, C. Goracci, A. Cera, A. Corbizi, D. Russo Trenitalia S.p.A., Florence, Italy

Abstract New well designed computing platforms for FEM simulators and new hardware and software computing technologies have allowed to address FEM related scalability constraints. The latest 64 bit computer technologies are tremendously improving Technical and Research Department capability to deal with more complex and realistic simulation models, simultaneously reducing the department FEM related response times.

Introduction

Trenitalia is the main Train Operator Company in Italy, it is responsible for managing the development, construction and maintenance of the rail transportation system (Image 1) in the country. In this capacity, the Trenitalia Technical and Research Department uses ANSYS Mechanical for the following activities: • Design optimization for implementation of new equipment on existing locomotives, coaches and wagons of Trenitalia fleet. • Stress strength structural checks to comply with safety transportation rules for new vehicles. • Maintenance engineering planning for bogies and/or body frames deteriorated by fatigue and corrosion phenomena.

Looking for FEM optimized computing platforms

The need for larger analysis models and shorter computer response times led Trenitalia to evaluate new calculation solutions. In looking for more computational power to improve mechanical stress simulation capability, Trenitalia Technical and Research Department, in cooperation with Information Technology Department, investigated how finite element programs interact with modern operating systems. This research activity demonstrated the following main constraints in dealing with production-size models: • Model size is limited by the amount of real memory used by 32-bit FEM programs (especially in their internal databases) in a 32-bit computing solution. • Solution times are reduced using multiprocessors platforms. • Hardware architecture bottlenecks in memory and storage sub-systems increase elapsed times.

To address the first issue, Trenitalia started investigated 64-bit technology from the older Alpha-based solutions to the current Itanium-based ones, and even to the latest AMD64/EM64T platforms. Both shared memory and massive parallel architecture technologies have been evaluated, with efficiency requirements and economical constraints suggesting a scalable SMP architecture. Economical, technical and maintenance needs led Trenitalia to carefully evaluate the new -64 technology of the latest PC-based AMD64/EM64T systems for us to understand if the new systems were ready for reliable high performance computing solutions.

Image 1: The ETR 500 italian high speed train

The architectural core improvements in x86-64 capable processors allowed us to get very good integer and floating point mathematical results, and the full-duplex star topology of PCI-Express modern workstations gave sufficient bandwidth to move Gigabytes of data to/from memory from/to storage sub-system. These systems were less expensive than traditional solutions and were going to be mainstream in the market, but would they be able to accomplish their tasks?

Benchmarking New Systems

In December 2004, two 64-bit operating systems were tested on the same hardware (an ordinary monoprocessor 2GB AMD64 personal computer) with the same 0.35 million degree of freedom test model: • The most recent version of Linux (kernel 2.6.9) with a native 64-bit ANSYS 9.0 got an elapsed of 270 seconds. • The Release Candidate of Microsoft Windows x64 Edition with the 32-bit version of ANSYS 9.0 got an elapsed of 180 seconds. The same phenomena was observed on a dual Xeon 32-bit SMP test platform.

Better SMP scalability, greater efficiency in thread and memory management and maintenance constraints led us to select the new MS Windows XP X64 Ed. . Several field tests were planned in 2005 to evaluate reliability, even while the x64 Operating System was still in a beta phase, with the current versions available of Win 32 ANSYS on a 4GB two way AMD64 platform.

Trenitalia found that the x64 operating system was indeed the best operating system for “/3GB compliant” (i.e. large addresses) Win 32 professional programs (i.e.: ANSYS 9.0A1 Win 32). ANSYS 9.0A1 (this is the code version identity in output logs of ANSYS 9.0 SP1) was finally able to manage up to 3.7 GB of memory. Nevertheless, what Trenitalia still needed was a true, native 64-bit version of ANSYS for Win x64 able to address more than 4 GB of memory. At last, Trenitalia could define their ideal reference platform: a two- way Win x64 SMP workstation able to support at least 16 GB of memory with a low latency memory and storage sub-system. Due to platform and drivers constraints with more than 4 GB of memory installed, it took a long time (until Summer 2005) and considerable tuning efforts to acquire the first full working prototype of the reference platform.

Real production models (Image 2 and Image 3) were used to benchmark (thanks to side- by-side trials) with the latest and AMD based platforms. Trenitalia selected the AMD64 (table 1) solution due to its actual greater efficiency in executing 64-bit programs, improved scalability on SMP systems and better performances while solving very complex models. The latest dual-core CPUs were also tested in a real production environment with the same production test models.

Image 2: Static strength results from stress analysis of a bogie frame for a high-speed car

The widespread availability of the new high performance 64-bit operating system, the quick improvements in quality in 64 bit developing tools, the investments in ANSYS parallel compliant high performance sparse solver (implicit SMP solver with large memory capability), the huge amount of memory addressable by 64-bit technologies and an ANSYS management investment in emerging 64-bit technologies, allowed ANSYS to quietly publish a 64-bit native ANSYS 10 Win x64 beta product at the end of Summer 2005. Trenitalia jumped on the new program files and began to test the beta product.

Image 3: First natural frequency from modal analysis of railway car body frame

ANSYS 10 Win x64 beta was able to use considerably more than 4 GB of memory, and Trenitalia immediately started to develop some million degree of freedom production models. While solving them with the beta build (the ANSYS Win 32 version was not even able to open them), Trenitalia demonstrated some important issues in the same beta product, especially those dealing with the efficiency of the sparse solver (in 10.0 x64 beta, while solving large models, the sparse solver was slower than the older 9.0 sp1 32 bit one). After having notified this limitation to our local ANSYS office, we were allowed to directly report feedbacks to the appropriate manager at ANSYS and, even if ANSYS had only a few weeks to improve their code before the scheduled official release, they quickly identified the problems and fixed them in time for the official ANSYS 10.0 sp1 Win x64 product. This effort enabled the sparse solver to gain a 30% in efficiency from the beta version.

Deploying 64 bit workstations

In Febrary 2006, as soon as Ansys 11.0 sp1 for Windows XP Professional x64 Edition was officially released, Trenitalia deployed several 64 bit workstations and the dealing with some MDOF FEM models feature became an usual capability of the Technical and Research Department. In December 2006, while testing the new Ansys 11.0 version with a brand new production model (modal analysis of a full body frame vehicle of the latest Alstom ETR600) (Image 4), Trenitalia was able to identify a critical Sparse Solver related bug. Trenitalia demonstrated the problem with Ansys 10.0 sp1 and Ansys 11.0 while solving on multiprocessor systems on 32 and 64 bit Windows SMP platforms; monoprocessor systems worked fine while solving the same model. In January 2007 Trenitalia contacted again Ansys Support and was quickly forwarded to report directly to Ansys Sparse Solver Developer Department. Trenitalia sent Ansys a model (only for internal debug efforts) and told Ansys how to reproduce the bug. Ansys confirmed the bug on SMP platforms even with the Itanium-64, Linux-32 and Linux- EM64T binaries; the Linux-AMD64 binaries worked fine. This was the proof of a cross- platform compiler family inducted error and not a source code bug, so Trenitalia suggested to investigate very carefully about the high performance mathematical libraries used to implement Ansys Sparse Solver. While waiting for a fix, Trenitalia production platforms were constrained on monoprocessor solvers only.

Image 3: The new ETR 600 high speed tilting train

In Febrary 2007 Ansys reported that they had identified the problem, had developed a pair of workarounds and actually they started working with the compiler manufacturer to totally fix it. It took more than six month of hard work and product quality testing to upgrade the developing tools (the Fortran compiler and its high performance mathematical libraries) to their latest version. In October 2007 Ansys finally published the long waited Ansys 11.0 sp1 executables with their updated mathematical libraries. Trenitalia successfully tested the binaries on her SMP platforms and deployed them on production workstations, while removing the monoprocessor constraint for the Sparse Solver. Last but not least, the compiler and libraries upgrade enabled Ansys 11.0 sp1 binaries full code optimization on SMP platform equipped with the latest Intel "65 nm. Conroe based" platforms. Finally the latest generation Intel multicore Xeon processors were able to efficiently solve large FEM models. Trenitalia early trials were so impressive that it was decided to plan a new side by side 64 bit Ansys trial with the latest AMD "Barcellona-based" platforms and the latest Intel "45 nm. Penryn-based" platforms for future purchase.

Compressing Analysis Time

Presently, Trenitalia use solid elements for meshing (with 64 bit Altair 8.0 sr1 Hypermesh) directly imported 3-D CAD models from the CAD systems (64 bit Autodesk Inventor 2009), thereby significantly reducing the engineering time needed to prepare and to debug models for the solver. Trenitalia are also able to solve large models typically using the highly efficient and memory-hungry in-core feature of the sparse solver on SMP multiprocessor platforms. Also we are finally able to browse through large post- processed result files in real time and to efficiently study larger models made with expensive cluster solutions.

64 bit ANSYS for Win x64 allows Trenitalia work with finite element models five times larger than it was previously possible, with eight times faster response times. The new Workbench meshing functions are also significantly improving Trenitalia pre-processing work.

New computing platforms and information technologies are tremendously improving our department’s capability to deal with more complex and realistic simulation models, simultaneously reducing the department analysis-related response times.

Changing the Way Engineers Work

The new 64 bit ANSYS for Win x64 has changed the way Trenitalia engineers are accustomed to work. To quickly check the structural strength of a particular class of vehicles, we formerly used manually optimized and simplified vehicle models that took about an hour to solve with the old 32-bit platforms. Thanks to the new 64-bit computing systems, the same models take only about two minutes to solve, so a full day’s activity shrinks to about two hours of work for modeling, solving and post-processing. Trenitalia FEM engineers, while validating simulations results, are now used to browse in real time the post-processed data generated by several MDOF models, effectively reducing the amount of time needed for results validation and improving the global quality of the simulations. The improvements in terms of global efficiency, performance, quality and reliability of simulation results so reached are able to be obtained only with a long-term and strong co-operation between end users, application software developers, operating system manufacturers and computational platforms producers.

Trenitalia has already planned the move to Win x64 platforms for its heavy-duty technical users now using Win 32 and Linux 32 platforms. WB-ANSYS for win x64 enables users here to deal with multi-million DOF models, with reliable, inexpensive (less than $10,000) and simple computing solutions. ______

Table 1: Trenitalia Field Test Results (Spring 2006) ANSYS 9.0A1 win 32 and ANSYS 10.0A1 win x64 with Windows XP x64 Edition on dual processor platforms

Notes: Operating System: Windows XP Pro x64 Edition fully patched by Windows Update 16 GB of memory PC3200 or PC2-3200, 16 GB swap file ANSYS working directory is on a 4 sata disks Raid 0 volume Mechanical stress model: telcond—meshfinale (0,35 MDOF) and telaio4 (1,5 MDOF) Mechanical stress model not linear: TelCond-668 (1 MDOF) Modal analysis:cassa_modif_01 (0,8 MDOF) and caso_12a (1,7 MDOF) NUM_PROC = 2, SIZE_BIO = 65536 (only for 9.0A1) MEMORY REQ. (MB)-m = 3072 for 9.0A1 and –m = 8192 for 10.0A1 DATABASE SIZE REQ. (MB) –db = 2040 for 9.0A1 and -db = 2048 for 10.0A1 Times are in seconds with the following format: [CP Time] Elapsed Time

Stress Analysis

ANSYS batch run telcond— Telaio4 TelCond-668 meshfinale 1,5 MDOF 1 MDOF 0,35 MDOF Supermicro H8DCE 9.0A1: [143] 95 9.0A1: [521] 396 9.0A1: [8366] 5251 AMD Opteron 250-E4 10.0A1: [118] 71 10.0A1: [390] 287 10.0A1: [6566] 3995 2x2,4GHz Supermicro H8DCE 9.0A1: [174] 83 9.0A1: [693] 336 9.0A1: [10520] 4124 AMD Opteron 280-E6 10.0A1: [152] 54 10.0A1: [516] 239 10.0A1: [8445] 3089 2x2,4GHz Num_Proc=4 Supermicro X6DA8-G2 9.0A1: [150] 156 9.0A1: [608] 651 N/A Intel Xeon64 2 MB L2 2x3,6 GHz no/HT

Modal Analysis ANSYS batch run cassa_modif_01 caso_12a 1,7 MDOF 0,8 MDOF -m = 2984 for 9.0A1 Supermicro H8DCE 9.0A1: [330] 272 9.0A1: [2383] 3532 AMD Opteron 250-E4 2x2,4GHz 10.0A1: [249] 262 10.0A1: [1897] 2091 Supermicro H8DCE 9.0A1: [410] 245 9.0A1: [2307] 2572 AMD Opteron 280-E6 2x2,4GHz 10.0A1: [309] 241 10.0A1: [2274] 2064 Num_Proc=4 Supermicro X6DA8-G2 N/A 9.0A1: [2303] 5620 Intel Xeon64 2 MB L2 2x3,6 GHz no/HT

Workbench and ANSYS for Windows XP Professional x64 By Raymond Browell Product Manager ANSYS Inc.

A powerful new wave of computer, operating system and software performance is about to be available at affordable prices. Although beneficial to the general public, this technology is especially helpful for ANSYS customers to perform amazing analyses on the desktop.

The term x64 is used to describe the 64-bit architecture developed by Advanced Micro Devices (AMD) and Intel to provide processors that are highly compatible with the x86 processors that have been the mainstay of personal computing for decades. The biggest difference between x64 and other 64-bit processors is that x64 processors are compatible at the hardware level with 32-bit, x86 processors. So 32 bit x86 standard operating systems (Windows and Linux) are able to be installed on x64 capable platforms and to execute every 32 bit programs. X64 systems (with more than 1 GB of physical memory) really shine when a 64 bit operating system is installed on them: native 64 bit programs may address more than 4 GB of (virtual) memory and the “old” legacy 32 bit programs may be executed as well. So the x64 architecture, when combined with Windows XP Professional x64 Edition, can run the thousands of 32-bit programs available today.

Windows XP Professional x64 Edition is a relatively new operating system, and although computer accessory vendors, such as graphic board and printer manufacturers, are actively adding drivers for the Windows XP Professional x64 Edition, you should check with your IT professionals on specific makes and models.

There are currently two basic x64 processor families: • AMD’s AMD64 • Intel’s EM64T they both share the same x86-64 architecture and instruction set.

Windows XP Professional x64 Edition is a near feature-complete version of Windows XP Professional that runs on x64 processors. Windows XP Professional x64 Edition supports up to 128 GB of RAM and 8 terabytes of address space for a 64-bit process, as compared to 4 GB of both physical RAM and virtual memory address space for 32-bit Windows XP Professional. Additionally, Windows XP Professional x64 Edition supports 1 Terabyte of system cache.

How much memory will be addressable by Workbench or ANSYS under Windows XP Professional x64? In Windows XP Professional x64, the physical RAM limit is 128 Gbytes, while the virtual memory limit is 8 Terabytes. Workbench and ANSYS are built as a native 64-bit application and can access the full virtual memory address range. In reality, the physical RAM you have installed will matter the most. On Windows XP Professional x64, the ver 10.0 sparse solver in ANSYS is limited to a contiguous block of 16 Gbytes (or 32 Gbytes for complex solvers). The iterative solver (pcg) can grow memory as large as the virtual space will allow although it is not practical to grow larger than the physical memory available on the system.

For those old enough to remember the UNIX migration from 32 bit to 64 bit, you might ask if there a speed penalty for running 32-bit applications on Windows XP Professional x64. Unlike the transition from 32-bit UNIX to 64-bit UNIX, where the solver was typically about 10% to 20% slower due to the increased IO (double the amount), the internal architecture of x64 capable microprocessors has doubled its internal registers number and improved the DMA engine for its I/O to alleviate this problem. In fact a large number of scenarios run from slightly faster to really faster due to more memory being allocated to the Operating System file cache and the internal registry related improved processor efficiency.

When will the Workbench and ANSYS be available for Windows XP Professional x64? Currently both are available with 10.0 SP1. Distributed ANSYS for Windows XP Professional x64 will be available at Release 11.0.

###

Antonio Ghelardini Born in 1964, he achieved a Degree on Electronic Engineering in 1993 at “Università degli Studi di Firenze”, Italy. After a post-degree course on railway engineering, in 1993 he started working in Italian Rolling Stock Engineering Department in the field of Information Technology for railway vehicles. He is author of several papers published in FEM conferences.

Giampaolo Mancini Born in 1967, he achieved a Degree “cum laude” on Aeronautical Engineering at “Università di Pisa”, Italy, in 1992. After a post-degree course on railway engineering, in 1993 he started working at the FS Rolling Stock Engineering Department in the field of mechanical and aerodynamical development and testing of railway vehicles. Among his activities from 1993 to 2002 he was in charge of: − the extensive test campaign on ETR460 Pendolinos for the approval of the tilting system and the homologation from the point of view of dynamic behaviour; − the large aerodynamic full-scale test campaign of ETR500 first series in open air and in tunnels within the EU-funded TRANSAERO project; − the tests for the approval from the point of view of dynamic behaviour of the new ETR500 high speed train on the Italian high speed and conventional lines. In 2002 he changed his work from testing to rolling stock design within FS Trenitalia Engineering Department and he became responsible for mechanics. In 2004 he joined the AEIF group in charge of drafting the revision of High Speed Rolling Stock TSI. In 2005 he was appointed as CER speaker representing CER within the ERA group in charge of drafting Conventional Rolling Stock TSI. At present he is responsible for mechanics, electrics e telecommunications systems within FS Trenitalia Engineering Department. He is author of several papers published by Italian and international journals and conferences. Among his publications the paper “Cross-wind aerodynamic forces on rail vehicles: wind tunnel experimental tests and numerical dynamic analysis” was awarded at WCRR 2003 as best paper addressing railway vehicles.

Claudio Goracci Born in 1955, he achieved a Degree in 1983 on Mechanical Engineering at “Università degli Studi di Firenze”, Italy. In 1988 he started working in Italian Rolling Stock Engineering Department in the field of vehicles and locomotives frames calculation and design. He is author of several papers dealing with the “art” of modeling and simulate the mechanical stress strength on railway vehicles and locomotives frames.

Alessandro Cera Born in 1975 in Italy, and achieved the Laurea on Mechanical Engineering in 2000 at “Università di Perugia”, Italy. He started working for FS Trenitalia in 2001 in force at the FS Rolling Stock Engineering Department in the field of mechanical components. Since 2003 he become technical expert for bogies; among his activities from 2003 to 2007 has been responsible for: − the design and calculation activities within bogies components for the modifications of the existing Rolling Stock Material; − the technical specification editing for bogie system of new Rolling Stock Material; − the participation, as technical expert for bogie, to the European Project HTE for the redaction of a common Technical Specification for high speed train within the most important European operators; − the participation to the main existing European railway Project ModTrain for the standardization of technical, functional and interfaces requirements of components for high speed trains and loco. At present he is system engineer within the new orders Vivalto double deck coaches and loco E403. In 2005 he joined the CEN WG35 for the drafting of the new standard EN 15437 "Railway applications - Axlebox condition monitoring - Performance requirements - Part 1: Track side equipment" He is author of several papers published by conferences.

Alessandro Corbizi Fattori Graduated in Engineering in 1996 with an experimental Degree Thesis on Italian High Speed Train ETR500, he is now working at Engineering Head Office of Trenitalia, the Technical Department of the company, as expert for wheelsets and sub-components. At the moment he is participating in several international working groups, both for standardization and for research projects. Recently he published papers on the items above, mainly regarding axles and wheels design, taking into account the normative aspects of the process (see also in WCRR 2003 and 2006 proceedings).

David Russo Born in 1971, he achieved a Degree of Electronic Engineering (Telecommunications) at the University of Florence, Italy, in 1998. In 1999 he started working at Marconi Communications as Project Quality Manager of different product divisions like TETRA vehicular and handheld, or analog airport transmitters. He was in charge of quality and process control and Reliability Analysis (MIL-HDBK-217 and Bellcore). In the while he worked also as an external consultant on FMECA and RCM analysis. From 2000 to 2001 he worked at Motorola as electronic designer on mobile phones and he was in charge of Proto-certification, Electric testing and Accelerated Life Testing. In 2001 he started working at the FS Rolling Stock Engineering Department on the development of the new railway signaling systems SCMT(Italy) and ERTMS(Europe), juridical recording systems and radio systems. At present he is responsible of Telecommunication and Control systems installed on rolling stock within FS Trenitalia Engineering Department.