RechenZentrum Garching of the Max Planck Society
High Performace AFS
Hartmut Reuter [email protected]
• Supercomputing environment at RZG
• Why AFS is slow compared to NFS and SAN-filesystems • Direct I/O from the client to the fileserver partition • Implementation in MR-AFS and OpenAFS • Performance Measurements and results
• „fs import“
• MR-AFS and Castor
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
RZG
RZG is the supercomputing center of the Max Planck Society in Germany
It also acts as the local computing center for a number of Max Planck institutes located at Garching, specially for IPP (Institut für Plasmaphysik)
The local AFS-cell therefore historically has the name ipp-garching.mpg.de
Using MR-AFS this AFS cell provides also archival space for the MPG
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
MPI for Polymer Research, Mainz
Multiscale model of bisphenole-A-polycarbonate (BPA-PC) on nickel
(a) The coarse grained representation of a BPA-PC segment
(b) Coarse grained model of a N=20 BPA-PC molecule
(c) Phenole adsorbed on the bridge site of a (111) nickel surface
Code: CPMD
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
MPI for Astrophysics Garching
Core-collapse supernova simulation: Snapshots of the hydro- dynamic evolution of a rotating massive star, 0.25 s after the start of the explosion
Code: Rady/2D
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
MPI for Metals Research, Stuttgart
Large-scale atomistic study of the inertia properties of mode I cracks.
A crack propagating at several kilometers per second is suddenly brought to rest.
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
MPI of Plasmaphysics Garching and Greifswald
Simulation of the time development of the turbulent radial heat flux
Code: TORB
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
4 Decades of Supercomputing Tradition at RZG
1962: IBM 7090 1969: IBM 360/91 1979: Cray-1 1998: Cray T3E/816 2002/2003: IBM p690 0.1 Mflop/s, 128 kB RAM 15 Mflop/s, 2 MB RAM 80 Mflop/s, 8 MB RAM 0.47 TFlop/s, 104 GB RAM 4 TFlop/s, 2 TB RAM
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
IBM p690 4 TFlop/s, 2 TB RAM 64 GB 64 GB 64 GB 64 GB 64 GB 64 GB 64 GB Federation 256 GB Switch 256 GB
22 TB FC disks I/O I/O
5 TB SSA disks 24 compute nodes, 2 I/O nodes. Each node has 32 power4 processors. Federation switch: measured throughput: 4.4 GB/s bidirectional between 2 nodes measured latency is 12 µs.
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
AFS is too slow on the Regatta cluster
For large files AFS is much slower than GPFS on the Regatta cluster + GPFS stripes data over multiple nodes. - AFS exchanges data with a single fileserver - with AFS all data go through the AFS cache. - AFS is also slower than NFS for protocol reasons
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Why AFS is slow compared to NFS
• Disk caches on local disks are slower than the network.
• write() sleeps while data are transfered to the server.
• Unnecessary read rpcs before a chunk is written. • Memory mapping of cache files breaks large I/O down to hundreds of requests • Rx-protocol is considered sub-optimal
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
How to make AFS faster for large files
• Use fastest filesystem for /vicep-partition on server
– On Regatta cluster use GPFS
• Bypass AFS caching on the client by direct I/O to the fileserver's /vicep-partitions.
– helps on all fileserver machines for files in volumes stored there
– helps in clusters, if /vicep-partitions are cluster wide mounted.
– requires modifications in the client and server code.
– Should be done only on trusted hosts
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Writing a new file to AFS
1) create_file RPC 1 2) write chunks into cache
4 This process is interrupted and followed 5 by store_data RPCs each one doing: 3 2 3) read from cache /vicepa 4) transfer over network cache 5) write to /vicepa
fileserver client
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Writing a file directly to the AFS server partition
1) create file 1 2) check meta-data, permissions, and quota and 2 4 return the file's path in the /vicepa. 3) write the file into /vicepa. 3 4) update meta-data on the server. /vicepa
fileserver client
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Design of direct I/O to /vicep-partitions
• Fileservers owning /vicep-partitions are identified by a sysid file in the partition. • afsd with option “-vicepaccess“ informs AFS kernel extension (new subcall). • Volumes with instances on fileservers with visible partitions are flagged • Open of files in these volumes tries first new RPC to get path-information from fileserver
– If that or the open of the vnode/dentry failes, open resumes in the old way. • I/O is done directly using the opened vnode/dentry. • Close for write informs the fileserver about new file-length
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Implementation of direct I/O in MR-AFS and OpenAFS
• Why MR-AFS
– because RZG runs only MR-AFS fileservers
– because existing ResidencyCmd RPC could be used without changing afsint.xg
– because MR-AFS has large file support • Which version of OpenAFS
– CVS-version from July 2003 where my last patches regarding the AIX 5.2 port had been comitted
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
MR-AFS server modifications
• partition.c copies /usr/afs/local/sysid into all active /vicep-partitions • In src/viced/afsfileprocs.c:
– For direct read and write new RPC subcalls of SAFS_ResidencyCmd were implemented which return the path in the /vicep-partition as a string.
– They need the same checks as all flavours of SAFS_StoreData and SAFS_FetchData.
– Therefor the common code was put into generic routines StoreData() and FetchData().
– In the long run new RPCs SAFS_DirectStore and SAFS_DirectFetch should be implemented, also in the OpenAFS fileserver.
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Open() on the client
After all other is done
has GetServerPath volume's server vis. src/afs/VNOPS/afs_vnop_open.c yes RPC to fileserver part.? Open file in no success? yes vicep-partition no success? Save dentry yes pointer in vcache no
done
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
OpenAFS client modification for open()
• /vicep-partitions are scanned for sysid files. Uuids found in there are handed over to the kernel (afsd.c, afs_call.c). • Some additional flag bits in some structs allow to identify the AFS files which might be read or written directly in a visible /vicep-partition.
• If these flag bits are set the open vnode operation tries After all other is done to get the path-information from the fileserver using the
has GetServerPath volume's server vis. new RPC (afs_vnop_open.c). yes RPC part.? Open file in vicep- no success? – partition If the RPC succeeds the file's vnode/dentry is no success? Save dentry yes looked up and the pointer stored in the vcache pointer in vcache no struct.
done – If the RPC or the lookup of the files path failes the old way of open is resumed.
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
write() on the client
Before anything else is done
has • vcache pointer to exchange dentry ptr. in srtuct file src/afs/LINUX/osi_vnodeops.c • dentry? yes call generic_file_write() Whenever a pointer to a restore dentry ptr. In struct file vnode/dentry in struct vcache no is available it is used to do the success? no I/O directly bypassing the AFS yes cache and the RPCs to the do it the old way fileserver.
return • If this failes the old way is used return instead.
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
read() on the client
Before anything else is done
has • vcache pointer to exchange dentry ptr. in srtuct file src/afs/LINUX/osi_vnodeops.c • dentry? yes call generic_file_read() Whenever a pointer to a restore dentry ptr. In struct file vnode/dentry in struct vcache no is available it is used to do the success? no I/O directly bypassing the AFS yes cache and the RPCs to the do it the old way fileserver.
return • If this failes the old way is used return instead.
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
close() on the client
After everything else is done src/afs/VNOPS/afs_vnop_write.c has • Close for write triggers a vcache pointer to dput(dentry pointer) dummy StoreData64 RPC to yes dentry? update the meta-data (files no size, modification time). • Any close() releases the storemini() does StoreData RPC Was file vnode/dentry and clears the which updates file length open for write? field in vcache. in the afs vnode of the file src/afs/afs_segments.c • Does a dummy RPC SAFS_StoreData after close write
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
How we started
• 1st successful implementation on my laptop for Linux 2.4. – /vicep-partition was reiserfs. • 2nd successful implementation on the Regatta system for AIX 5.1: – /vicepm is a GPFS with TSM-HSM support visible on all Regattas on the switch – Only possible with MR-AFS as shared residency because special precaution necessary for delayed open of migrated files. • 3rd try on GPFS in a Linux cluster was not successful (incomplete VFS implementation). • 4th try on StorNext filesystem at CASPUR in Rome was not successful (incomplete VFS implementation). • 5th try on NFS mounted filesystem was successful.
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society Problems
To open, read or write files I copied the technique used for the cache files. But this leads to some problems: At least StoreDirect, but probably also GPFS on Linux, have not properly filled operation pointers. To verify this I added some debugging code in afs_vnop_open.c: afs_Trace2(afs_iclSetp, CM_TRACE_POINTER, ICL_TYPE_STRING, "tvc->nameivp->d_inode->i_mapping->a_ops", ICL_TYPE_POINTER, tvc->nameivp->d_inode->i_mapping->a_ops); if (tvc->nameivp->d_inode->i_mapping->a_ops) { afs_Trace2(afs_iclSetp, CM_TRACE_POINTER, ICL_TYPE_STRING, "tvc->nameivp->d_inode->i_mapping->a_ops->prepare_write", ICL_TYPE_POINTER, tvc->nameivp->d_inode->i_mapping->a_ops->prepare_write); if (tvc->nameivp->d_inode->i_mapping->a_ops->prepare_write) { afs_Trace2(afs_iclSetp, CM_TRACE_POINTER, ICL_TYPE_STRING, "tvc->nameivp->d_inode->i_mapping->a_ops->commit_write", ICL_TYPE_POINTER, tvc->nameivp->d_inode->i_mapping->a_ops->commit_write); afs_Trace2(afs_iclSetp, CM_TRACE_POINTER, ICL_TYPE_STRING, "tvc->nameivp->d_inode->i_mapping->a_ops->prepare_write", ICL_TYPE_POINTER, tvc->nameivp->d_inode->i_mapping->a_ops->prepare_write); if (tvc->nameivp->d_inode->i_mapping->a_ops->prepare_write && tvc->nameivp->d_inode->i_mapping->a_ops->prepare_write) found = 1; } }
Output from “fstreace dump”: time 151.581058, pid 8590: Pointer tvc->nameivp->d_inode->i_mapping == 0xf7983634 time 151.581058, pid 8590: Pointer tvc->nameivp->d_inode->i_mapping->a_ops == 0xf8a9db80 time 151.581058, pid 8590: Pointer tvc->nameivp->d_inode->i_mapping->a_ops->prepare_write == 0x0 time 151.581058, pid 8590: Pointer tvc->nameivp->d_inode->i_mapping->a_ops->commit_write == 0x0
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Performance measurement: write_test, read_test
• write_test writes length bytes into file filename at offset offset – Usage: write_test filename offset length
– Buffer size is 1 MB. Buffer is filled with offset information at each 4 KB.
– After each 100 MB the time needed and the current data rate are printed
– At the end the total time and data rate are printed. • read_test reads a file produced by write_test and checks for correct contents – Usage: read_test filename offset • The offset parameter was used to test the large file support in AFS without having to wait for the writing of the 1st 2 GB!
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Example for write_test output ~/test/r: ~hwr/afs/@sys/write_test 1GB 0 1000000000 1 writing of 104857600 bytes took 0.703 sec. (145603 Kbytes/sec) 2 writing of 104857600 bytes took 0.616 sec. (166260 Kbytes/sec) 3 writing of 104857600 bytes took 0.924 sec. (110830 Kbytes/sec) 4 writing of 104857600 bytes took 1.054 sec. (97117 Kbytes/sec) 5 writing of 104857600 bytes took 0.958 sec. (106873 Kbytes/sec) 6 writing of 104857600 bytes took 0.989 sec. (103571 Kbytes/sec) 7 writing of 104857600 bytes took 0.985 sec. (104005 Kbytes/sec) 8 writing of 104857600 bytes took 0.961 sec. (106508 Kbytes/sec) 9 writing of 104857600 bytes took 0.891 sec. (114942 Kbytes/sec) write of 1000000000 bytes took 8.676 sec. close took 0.000 sec. Total data rate = 112557 Kbytes/sec. for write ~/test/r: pwd /afs/ipp-garching.mpg.de/home/h/hwr/test/r ~/test/r: df -k /vicepm Filesystem 1024-blocks Free %Used Iused %Iused Mounted on /dev/hsmgpfs 3292001280 3274890496 1% 36 1% /vicepm ~/test/r:
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Performance measurement: raid_test
• raid_test is a combination of write_test and read_test to to get aggregate throughput numbers • Usage: raid_test filename streams [length]
– It forks
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society raid_test output running on machine Linux videamus2 2.4.21-144-smp4G-kgdb #2 SMP Mon Dec 8 14:01:55 CET 2003 i686 i686 i386 GNU/Linux 5th test read 4 files in parallel at Mi Jan 28 09:35:12 CET 2004 real 0m26.271s in directory /vicepz/tmp user 0m1.910s Using files with option and size 1073741824 sys 0m20.680s
1st test: write raidfile.0 6th test: write raidfile.4 to raidfile.7 and read the others real 0m11.487s real 1m41.067s user 0m0.260s user 0m3.760s sys 0m4.940s sys 1m35.880s
2nd test: read raidfile.0 7th test read 8 files in parallel real 0m6.072s real 1m25.868s user 0m0.280s user 0m3.360s sys 0m2.720s sys 0m45.190s
3rd test: write raidfile.1 and read raidfile.0 Average values: real 0m17.169s write 1 stream 94420 KB/s user 0m0.580s read 1 stream 182830 KB/s sys 0m10.380s write 2 streams 63760 KB/s read 2 streams 65629 KB/s 4th test: write raidfile.2 and raidfile.3 and read the others write 4 streams 30896 KB/s real 0m34.344s read 4 streams 44548 KB/s user 0m1.950s write 8 streams 10598 KB/s sys 0m38.540s read 8 streams 13282 KB/s Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
The test environment on Linux Triplestor MASSCOPE 3.0 TB xSeries 335 IDE-FC RAID system 2 processors Intel Xeon 2.8 GHz 256 MB cache 1.5 GB main memory 12 Hitachi ATA 100 disks 250 GB each Linux SuSE 9.0 kernel 2.4.21-144-smp4G-kgdb RAID 5 over 11 disks + 1 hot spare FC-interface 2Gb/s /vicepz: reiserfs 3.62 partition 200 GB
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
AFS client on the fileserver without -vicepaccess running on machine Linux videamus2 2.4.21-144-smp4G-kgdb #2 SMP /vicep-Partition is on an IDE-RAID on the same machine. Mon Dec 8 14:01:55 CET 2003 i686 i686 i386 GNU/Linux at Mi Jan 28 10:43:11 CET 2004 Tests: Aggregate data rates in directory /afs/ipp/tests/fileserver/videamus2.rzg.mpg.de/perftests st Using files with option and size 1073741824 1) write 1 file 29110 KB/s 2) read 1st file 49155 KB/s
****snip***** st nd Average values: 3) read 1 file and write 2 file 35956 KB/s 4) read 1st and 2nd file, write 3rd and 4th file 39390 KB/s write 1 stream 29110 KB/s read 1 stream 49155 KB/s 5) read 4 files 43148 KB/s write 2 streams 17261 KB/s st th th th read 2 streams 18695 KB/s 6) read 1 to 4 file, write 5 to 8 file 38016 KB/s write 4 streams 8908 KB/s 7) read all files 41912 KB/s read 4 streams 10787 KB/s write 8 streams 4265 KB/s Calculation example: aggregate data rate for test 6 is read 8 streams 5239 KB/s (4265 + 5239) * 4 = 38016 The test with > 4 streams are slowed down by the limitation of 4 rx-calls per connection.
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
AFS client on the fileserver, afsd -vicepaccess running on machine Linux videamus2 2.4.21-144-smp4G-kgdb #2 SMP /vicep-Partition is on an IDE-RAID on the same machine. Mon Dec 8 14:01:55 CET 2003 i686 i686 i386 GNU/Linux at Mi Jan 28 09:56:19 CET 2004 Tests: Aggregate data rates in directory /afs/ipp/tests/fileserver/videamus2.rzg.mpg.de/perftests st Using files with option -g and size 8 1) write 1 file 97541 KB/s 2) read 1st file 183940 KB/s
****snip***** st nd Average values: 3) read 1 file and write 2 file 127590 KB/s 4) read 1st and 2nd file, write 3rd and 4th file 138868 KB/s write 1 stream 97541 KB/s read 1 stream 183940 KB/s 5) read 4 files 159596 KB/s write 2 streams 65965 KB/s 6) read 1st to 4th file, write 5th to 8th file 89288 KB/s read 2 streams 61625 KB/s write 4 streams 29735 KB/s 7) read all files 91728 KB/s read 4 streams 39899 KB/s Calculation example: aggregate data rate for test 6 is write 8 streams 10856 KB/s read 8 streams 11466 KB/s (10856 + 11466) * 4 = 89288 This run with filesize 8 GB.
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
I/O to AFS, AFS with -vicepaccess, and directly to the partition
I/O Performance
write 1st file
Read 1st file
Write 2nd file read 1st file AFS normal Write 3rd and 4th file read 1st and 2nd file AFS direct /vicepm/tmp Read 4 files
Write 5th to 8th file read 1st to 4th file
Read 8 files
0 25000 50000 75000 100000 125000 150000 175000 200000 KB/s
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
How to exploit this new feature
• On each AFS fileserver the AFS client benefits from this technique. • On Regatta systems:
– All nodes which can mount a vicep-partition in GPFS benefit
– This is expected to be possible in future also remotely and also for Linux • Implementation should be tested also for other SAN filesystems such as
– StorageTank, CXFS, QFS, StoreDirect, and others.
– Still some work to be done • Even where NFS is faster than AFS this technique can be used
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Open Questions
• Presently the fileserver doesn't keep information about files opened on the client
– What happens when the volume is moved to another server?
– What happens when the file is going to be wiped (MR-AFS)? • We need something similar to the callback mechanism
– How to synchronize after server restart?
– How to synchronize after client reboot? • Still some work to be done before we can use this in a production environment!
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Limitations and Chances
• Export of vicep-partitions is limited to trusted hosts because
– the root user can access all data in the vicep-partition bypassing AFS.
• AFS can be used as access control mechanism to data in globally shared filesystems because
– the local uid of a user as defined in /etc/passwd doesn't matter
– Data access is strongly protected by Kerberos authentication
– Data are accessible from any AFS client world wide.
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Import of existing files into AFS
Expensive, long running batch jobs should better be independent from AFS. On the Regattas we allow users to write files into a special subdirectory of the /vicep-partition: /r is a symbolic link tp /vicepm/r Files written there can later be imported into MR-AFS by
fs import
This creates a vnode in AFS and renames the file from /vicepm/r/... to /vicepm/AFSIDat/... where the namei-algorithm it expects. Works also with files migrated by TSM-HSM
This could be implemented for the OpenAFS fileserver as well because it doesn't depend on special MR-AFS features.
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Data Migration / HSM-Systems at RZG
• Use of IPP developed HSM systems with Data Migration to tapes since early times.
– AMOS 1971
– HADES 1981
– AMOS2 1984 • Unix-based HSM-systems since the nineties
– DMF on Cray 1992
– DMF on SGI 1993
– TSM-HSM 2002 • HSM functionality in AFS (MR-AFS) since mid nineties
– Support of CASTOR under work
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Supercomputers, HSM-Servers, and HSM-software at RZG 8 9 0 1 2 2 3 70 81 82 93 971 972 983 994 995 1962 1963 1964 1965 1966 1967 196 196 19 1 1 1973 1974 1975 1976 1977 1978 1979 198 19 19 1 1984 1985 1986 1987 1988 1989 1990 199 199 19 1 1 1996 1997 1998 1999 2000 2001 200 200 Supercomputers IBM 7090 IBM 36091 Cray 1 Cray XMP 24 Cray YMP Cray T3D/128 Cray T3E IBM Regatta HSM-servers IBM 370145 Amdahl 470 V6 Siemens 7870 IBM 4381 (B) IBM 4381 (C) IBM 309015E Cray EL Cray Jedi SGI Origin 2000 IBMRegatta HSM-software AMOS HADES AMOS2 Cray DMF YMP CrayDMF EL Cray DMF Jedi SGIDMF TSMHSM
MR-AFS
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Multiple-Resident-AFS (MR-AFS)
• Developed at Pittsburgh Supercomputer Center (psc.edu) by Jonathan Goldick, Chris Kirby, Bill Zumach e.a.
• Fileserver extensions to Transarc's AFS
• Since 1995 development and maintenance at RZG.
• Since 2001 based on OpenAFS code and libraries.
• Client extensions integrated in OpenAFS (large file support, commands, etc.)
• Used in production only at RZG.
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Main Features of MR-AFS
• Files may be stored outside the volume’s partition.
• Fileserver can do I/O remotely (remioserver).
• Fileservers can share HSM resources and disks.
• Files from any fileserver partition can be migrated into the HSM system (AFS internal data migration).
• Volumes can be moved between fileservers without moving the files stored in the HSM system or other shared disks.
• Intelligent queing for HSM recall requests.
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
AFS Cell “ipp-garching.mpg.de”
• ~20 fileservers all with MR-AFS binaries
– 3 of them using data migration
– all others behave like OpenAFS fileservers.
• ~36 TB of files, 6 TB on disk all others on tape.
• File-based backup done by TSM allows users to restore old file versions.
• RO-volumes in the RW-partition and on a separate server:
– Each night all RW-volumes which don't have actual ROs are released.
– If a partition is lost the RO-clones on the separate server can be converted to RW-volumes within few minutes. (Happened already!)
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Future: MR-AFS and Castor
clients n clusters Castor tape movers
migration and staging
Large files shared filesystem
Castor shared residency meta data server Tape fileserver drives shared archival residency local_disk meta data, directories, small files
Geneva, February 5, 2004 Hartmut Reuter RechenZentrum Garching of the Max Planck Society
Conclusions
• AFS can make use of the speed of other shared file systems such as GPFS
– If the fileserver's partitions can be exported to trusted clients
– Results show native filesystem speed also through the AFS client.
– This technique can also be used to add secure access control to globally shared file systems. • High Performance AFS is presently available only in combination with MR-AFS
– But it schould be easy to port it to OpenAFS fileservers as well. • MR-AFS has some other interesting features which could be worth to use it.
Geneva, February 5, 2004 Hartmut Reuter