IJCSI International Journal of Computer Science Issues, Vol. 11, Issue 1, No 2, January 2014 ISSN (Print): 1694-0814 | ISSN (Online): 1694-0784 www.IJCSI.org 144

Directory Compaction Techniques for Space Optimizations in ExFAT and FAT Systems for Embedded Storage Devices

Keshava Munegowda1, Dr. G T Raju2 and Veera Manikandan Raju3

1 Advanced Storage Division, EMC Corporation Bangalore, Karnataka, India

2 Department of Computer Science and Engineering, R N S Institute of Technology Bangalore, Karnataka, India

3 Visual Technology Centre, Texas Instruments Bangalore, Karnataka, India

In FAT , the file or directory is the linked list of Abstract the clusters. A Cluster is a group of blocks or sectors of The Extend (ExFAT) is the future file storage device. The File Allocation Table contains the system for embedded storage devices. The File Allocation Table linked list of clusters of files/directories. The FAT (FAT) file system de-facto file system for embedded storage devices such as Multimedia Cards (MMC) / Secure Digital (SD) / specification limits the maximum supported storage size to Micro SD cards, NOR, NAND flash memories. The MMC and 32 GB. But, the maximum size supported by FAT32 SD card associations classify the ExFAT as the standard file implementation is 128 GB. But, today the flash storage system for storage flash cards of than 32 Giga Bytes (GB) cards of more than 32 GB are available in market. of size. This paper implements the directory compaction The ExFAT [2] [3] file system is developed, by Microsoft techniques for both FAT and ExFAT file system to improve the Corporation, as successor of FAT32 file system. This file availability of the space in the file systems. system is optimally designed to support large size flash Keywords: Cluster, Contiguous, Compaction, Directory, storage cards with higher and performance. The ExFAT, FAT, File system, Flash memories, MMC, Micro SD, NOR, NAND, Storage. maximum storage size supported by ExFAT file system in 128 peta bytes (PB).The ExFAT file system is not compatible with FAT File system. But, it uses the clusters 1. Introduction concept to store file/directory data and File Allocation Table to store the cluster chain of file/directory. The The FAT [1] is widely used file system in tablet personal ExFAT file system Contiguous clusters read algorithm [4] computers, mobile phones, digital cameras and other for file read performance improvements. The ExFAT file embedded devices for and multi-media system optimizes the free cluster search by using “cluster applications such as video imaging, audio/video playback heap”. The cluster heap is group of bits to indicate the and recording. The initial version of FAT file system was status of all clusters of file system. Every bit in the cluster FAT12 by Microsoft Corporation, later it was extended as heap specifies the status of the data cluster, thus every byte FAT16 and further as FAT32 to support higher storage indicates the status of 8 clusters. The binary value “0” capacity. The FAT file system was initially developed to indicates that cluster is free and value 1 indicates that use on floppy disks and hard-drives. Since most of the cluster is allocated. The cluster heap is also referred as Personal Computer (PC) s implements the FAT file system, “Allocation Bitmap”. The cluster heap is similar to block this file system has become a default and world-wide bitmap and bit map structures used in [5] and compatible storage format for embedded devices. Usually Ext3 [6] [7] file systems. Figure 3 shows the logical the device with the implementation of FAT is recognized organization of ExFAT file system and the structure of as removable storage media in a PC. The Flash memories cluster heap. Usually Cluster Heap exists in the 2nd cluster are default choice of any embedded device as they are low- of ExFAT file system. The ExFAT file system is available priced, smaller size and higher storage capacity. Even in Windows 7 and higher operating systems. The FUSE [8] though FAT file system does not define flash management based ExFAT file system implementation [9] is available techniques such as Wear-Levelling and Bad Block for kernel based Ubuntu operating systems. management, the embedded devices implements this file This paper defines and implements the directory system along with the dedicated flash block management compaction technique for both file systems without algorithms. breaking the compatibility of the existing FAT and ExFAT

Copyright (c) 2014 International Journal of Computer Science Issues. All Rights Reserved. IJCSI International Journal of Computer Science Issues, Vol. 11, Issue 1, No 2, January 2014 ISSN (Print): 1694-0814 | ISSN (Online): 1694-0784 www.IJCSI.org 145

File system implementations in windows and Linux heap. The “NoFatChain” [2] [3] bit field of generic operating systems. secondary flags of the stream extension directory entry is set to value “1” to indicate contiguous cluster allocation to a directory. The figure 3 illustrates the directory containing 2. Directory compaction in ExFAT and FAT the deleted entries in the first cluster of directory in ExFAT File systems file system. Note that File allocation table does not indicate that clusters 4, 5 and 6 are allocated whereas the cluster The continious usage of file system by creating, writing, heap shows that these clusters are allocated. The figure 4 deleting file and directories , the FAT file system can shows that modified Directory entry, File allocation table contain the directories having clusters with all deleted and cluster heap in which first cluster is made as free entries of files/directories. This paper implements the cluster. If the clusters allocation to directory is not removal of such clusters from the clusters list of the contigious, then the status of all allcoated clusters are directory and marks the clusters status in File allocation updatd in the File allocation table. Upon deletion of first table as a free clusters.This process is referred as cluster of directory, the cluster chain of directory is “Directory Compaction”. This improves the number of modifed as show in figure 1. data clusters and hence optimizes the user space of the file

system. The directory compaction improves the File Allocation Table Directory entry performance of the file/directory operation by DIR1 … 2 of DIR1 reducing the number of directory entries to be referred Index Content while searching for the file/directory name in compacted 2 3 0xE5 ...... Cluster 2 directory’s clusters list. The directory compaction 3 4 0xE5 ...... technique is first identified and implemented for FAT file 4 5 This cluster has all system [10] and in late 2012, it’s been identified that ...... deleted 5 ...... directory compaction is applicable to ExFAT file system EOC file/directory 6 .. entries also. The Directory compaction can be performed at the 0xE5 ...... following levels of given directory. .. 00 In FAT: Value 0xE5 2.1 First cluster in the clusters chain of the directory .. .. indicates the .. .. FILE1 TXT Cluster 3 .. .. deleted If the first cluster of the given directory has the all the FILE2 TXT file/directories

deleted entries, then Meta-data i.e., 32 byte directory entry ...... should be updated to point to 2nd cluster of the clusters N 00 ...... In ExFAT: Values 0x05, 0x40 chain of the directory and the first cluster should be made FILE8 TXT and 0x41 indicates as free cluster. The figure 1 illustrates the directory deleted containing the deleted entries in the first cluster. The figure file/directories

2 shows that modified Directory entry and File allocation FILE9 TXT Cluster 4 table in which first cluster is made as free cluster. Note DIR1 that, According to FAT file system specification whenever ...... file/directory is deleted the first byte of the 32-byte ...... directory entry i.e., meta-data of the file/directory will be replaced by hexadecimal value 0xE5. Hence any 32 byte 0xE5 ...... directory starting hexadecimal value 0xE5 is referred as a deleted file/directory entry. Note that, In case of ExFAT DIR2 Cluster 5 file system, the directory entry starting with hexadecimal value 0x05 indicates the deleted file/directory entry, the FILE10 TXT directory entry starting with hexadecimal value 0x40 ...... indicates the deleted stream extension directory entry and the directory entry starting with hexadecimal value 0x41 DIR10 indicates the deleted file name extension directory entry. In ExFAT file system, if the contiguous clusters are allocated to directory, then File allocation table contains EOC the status of first cluster of the directory and status of the Fig 1: Directory containing deleted entries in the first cluster rest of the clusters of directory are updated in the cluster in FAT file system

Copyright (c) 2014 International Journal of Computer Science Issues. All Rights Reserved. IJCSI International Journal of Computer Science Issues, Vol. 11, Issue 1, No 2, January 2014 ISSN (Print): 1694-0814 | ISSN (Online): 1694-0784 www.IJCSI.org 146

File Allocation Table 32 byte directory File Allocation Table entry of Index Content DIR1 … 3 Directory entry of DIR1 DIR1 … 3 DIR1 “NoFatChain” flag 2 00 Index Content Cluster 3 is set 2 3 4 EOF Cluster 2 is 0x05 ...... 3 EOF 4 5 made as 0x40 ...... This cluster has all free cluster 4 0 5 EOF Cluster 3 is 0x41 ...... deleted 5 0 ...... file/directory 6 first cluster .. entries 6 0 .. 00 0x41 ...... FILE1 TXT Cluster 3 .. 00 In ExFAT: .. .. FILE2 TXT Cluster 4 Values 0x05, 0x40 ...... and 0x41 indicates ...... FILE1 TXT deleted ...... FILE2 TXT file/directories FILE8 TXT N 00 ...... N 00 ......

FILE8 TXT

Cluster 5 FILE9 TXT Cluster 4 FILE9 TXT Cluster Heap DIR1 DIR1 ...... 00011111 Cluster 2 ...... 0xE5 ...... 0xE5 ......

Cluster 6 00000000

DIR2 Cluster 5 DIR2

FILE10 TXT FILE10 TXT

......

DIR10 DIR10

EOC EOC Fig 2: Compacted directory with new first cluster Fig 3: Directory containing deleted entries in the first cluster in FAT file system in ExFAT file system

2.1 An arbitrary cluster in between the first and last In case of ExFAT file system, if there are contiguous cluster of the clusters chain of the directory cluster allocated as shown in figure 7, and if cluster 4 has If a cluster which is in between the first and last cluster of deleted entries, then removing the cluster 4 from the directory makes the cluster chain as non-contiguous. the cluster chain of the directory has all deleted Hence, After removing the cluster 4, the cluster chain need file/directory entries, then that cluster is will be marked as to updated in the File allocation table and The a free. The index of the previous cluster in the FAT will “NoFatChain” [2] [3] bit field of generic secondary flags contain the cluster number available at the index of deleted of the stream extension directory entry is reset to value “0” cluster. In figure 5 it is shown that cluster 3 has the deleted to indicate non-contiguous cluster allocation to a directory file/directory entries. Figure 6 shows that the directory is as shown in figure 8. If the clusters allocation to directory compacted by freeing the cluster 3 and cluster 2 is is not contigious, then the status of all allcoated clusters directory directly pointing to cluster 4 in the File are updatd in the File allocation table. Upon deletion of Allocation Table.

Copyright (c) 2014 International Journal of Computer Science Issues. All Rights Reserved. IJCSI International Journal of Computer Science Issues, Vol. 11, Issue 1, No 2, January 2014 ISSN (Print): 1694-0814 | ISSN (Online): 1694-0784 www.IJCSI.org 147

cluster containing the deleted entris , the cluster chain of File Allocation Table directory is modifed as show in figure 8. DIR1 .. 2 32 byte Inde Content directory x entry of File Allocation Table 2 3 DIR1 DIR1 … 4 Directory entry FILE1 TXT of DIR1 “NoFatChain” 3 4 Index Content ...... flag is set Cluster 2 2 EOF 4 EOF ...... 3 0 5 -- FILE8 TXT 4 EOF Cluster 3 is 6 .. Cluster 3 5 0 made as free 0xE5 ...... 6 0 cluster .. 00 This cluster has all .. 00 ...... deleted Cluster 4 file/directory ...... 0xE5 ...... entries .. .. FILE1 TXT Cluster 4 is ...... FILE2 TXT first cluster FILE9 TXT ...... Cluster 4 N 00 ...... N 00 ...... FILE8 TXT 0xE5 ......

Cluster 5

FILE9 TXT Cluster Heap EOC DIR1 00011101 Cluster 2 Fig 5: Directory containing deleted file/directory entries ...... in the middle cluster chain in FAT file system ...... 0xE5 ...... File Allocation Table DIR1 .. 2 32 byte Inde Content directory Cluster 6 00000000 x entry of 2 4 DIR2 FILE1 TXT DIR1

FILE10 TXT 3 EOF ...... Cluster 2 ...... 4 EOF ...... 5 -- FILE8 TXT DIR10

6 .. Cluster 3 is made as EOC .. 00 free cluster

Fig 4: Compacted directory with new first cluster in ExFAT .. .. file system .. .. Cluster 2 index .. .. pointes to cluster 4 2.3 Last cluster of the clusters chain of the directory FILE9 TXT When the last cluster of the directory has deleted entries, N 00 Cluster 4 then that cluster is will be marked free cluster and the ...... previous cluster to the last cluster will be marked as last 0xE5 ...... cluster. The figure 9 demonstrates that last cluster 5 has all deleted entries. Figure 10 shows that last cluster 5 is made as free cluster and previous cluster 4 is marked as last EOC cluster in the File Allocation Table. Fig 6: Compacted directory in which cluster 3 is removed from the cluster chain in FAT file system

Copyright (c) 2014 International Journal of Computer Science Issues. All Rights Reserved. IJCSI International Journal of Computer Science Issues, Vol. 11, Issue 1, No 2, January 2014 ISSN (Print): 1694-0814 | ISSN (Online): 1694-0784 www.IJCSI.org 148

File Allocation Table File Allocation Table DIR1 .. 3 Directory DIR1 .. 2 32 byte Inde Content entry of Inde Content directory x DIR1 x entry of 2 2 EOF 3 DIR1 FILE1 TXT FILE1 TXT

3 EOF 3 4 ...... Cluster 2 4 0 ...... 4 EOF ...... Cluster 3 5 0 FILE8 TXT 5 -- FILE8 TXT

6 .. Cluster 4 6 .. Cluster 3 0x05 ...... FILE9 TXT .. 00 This cluster .. 00

...... has all ...... deleted .. .. 0x41 ...... file/directory .. .. FILE19 TXT entries N 00 .. ..

Cluster 4 FILE9 TXT 0xE5 ...... Cluster 5 N 00 Cluster Heap / cluster 2 This cluster ...... has all 00011111 0x41 ...... 0xE5 ...... deleted file/directory .. .. entries

00000000 EOF EOC Fig 7: Directory containing deleted file/directory entries Fig 9: Directory containing deleted file/directory entries in the middle cluster of cluster chain of ExFAT file in the last cluster of cluster chain in FAT file system

system

File Allocation Table File Allocation Table DIR1 .. 3 Directory DIR1 .. 2 32 byte Inde Content Inde Content entry of directory x x entry of DIR1 2 2 EOF 3 DIR1 FILE1 TXT FILE1 TXT 3 5 3 EOF ...... Cluster 2 Cluster 3 ...... 4 0 ...... 4 0

5 EOF FILE8 TXT 5 -- FILE8 TXT

6 .. 6 .. Cluster 3 The cluster 4 FILE9 TXT is made has .. 00 .. 00 free cluster ...... FILE19 TXT N 00 .. ..

FILE9 TXT EOC Cluster 5 N 00 Cluster Heap / cluster 2 ...... The cluster 4 made as free cluster 00011011 0x41 ...... Fig 10: Compacted directory in which last cluster is 00000000 made as free in FAT file system EOF

Fig 8: Compacted directory in which cluster 4 is removed from the cluster chain in ExFAT file system

Copyright (c) 2014 International Journal of Computer Science Issues. All Rights Reserved. IJCSI International Journal of Computer Science Issues, Vol. 11, Issue 1, No 2, January 2014 ISSN (Print): 1694-0814 | ISSN (Online): 1694-0784 www.IJCSI.org 149

File Allocation Table DIR1 .. 3 Directory In case of ExFAT file system, upon removing the last Inde Content entry of x DIR1 cluster of cluster chain of directory, only the cluster heap is 2 EOF updated the status as a free cluster. In figure 11, it shown FILE1 TXT that last cluster 5 has all deleted entries and upon removing 3 EOF ...... this cluster from cluster chain the cluster heap is updated 4 0 ...... the status and File allocation table is not modified. If there Cluster 3 are non-contiguous clusters allocation to directory and last 5 0 FILE8 TXT cluster is removed then the cluster chain in updated in file 6 .. Cluster 4 allocation table. FILE9 TXT .. 00

...... 4. Conclusions .. .. 0x41 ...... The Directory Compaction truncates the directory by N 00 removing the clusters, having deleted entries, from the 0x05… directory clusters chain. The directory compaction Cluster Heap / cluster 2 Cluster 5 generates the free clusters and hence increases the ...... availability of the storage space for user data. The 00011111 This cluster has all directory compaction improves the performance of the 0x41 ...... deleted file/directory open operation by avoiding the comparison file/directory 00000000 entries of deleted directory entries with input file/directory name EOF in compacted directory’s clusters list. Fig 11: Directory containing deleted file/directory entries in the last cluster of cluster chain of ExFAT file Acknowledgments system

It is a pleasure to acknowledge Mr. Robert Shullich, Enterprise Security Architect, Tower Group Companies, File Allocation Table DIR1 .. 3 Directory USA, for resolving technical queries to understand the Inde Content entry of ExFAT file system. Mr. Andrew Nayenko deserves special x DIR1 2 EOF thanks for open souring the FUSE based ExFAT file FILE1 TXT system implementation to Ubuntu and other Linux 3 EOF ...... distributions. 4 0 ...... Cluster 3 References 5 0 FILE8 TXT [1] FAT32 File System Specification, FAT: General Overview of 6 .. Cluster 4 On-Disk Format, Microsoft, 2000. [2] Ravishankar V.Puipeddi, Vishal V.Ghotge, Ravinder S.Thind, FILE9 TXT .. 00 “Quick file name look up using name hash”, USPTO Patent:

...... 8321439, granted on November 27, 2012. .. .. [3] Keshava Munegowda, Venkatraman S, Dr. G T Raju, “The 0x41 ...... Extend FAT file system: Differentiating with FAT32 file N 00 system”, Linux Conference, Prague, Czech Republic, Europe, October 2011. EOF [4] Ravishankar V.Puipeddi, Vishal V.Ghotge, Ravinder S.Thind, “Contiguous File Allocation in Extensible File system”, Cluster Heap / cluster 2 The cluster5 is USPTO Patent: 8,606,830, granted on November 20, 2013. made as free cluster [5] Remy Card, Theodore Ts’o, Stephen Tweedie, “Design and 00001111 implementation of the Second Extended File system”, First .. .. Dutch International Symposium on Linux [6] Stephen C. Tweedie (May 1998). "Journaling the Linux 00000000 ext2fs File System, Proceedings of the 4th Annual Fig 12: Compacted directory in which last cluster is LinuxExpo, Durham, NC. Retrieved 2007-06-23 made as free in ExFAT file system

Copyright (c) 2014 International Journal of Computer Science Issues. All Rights Reserved. IJCSI International Journal of Computer Science Issues, Vol. 11, Issue 1, No 2, January 2014 ISSN (Print): 1694-0814 | ISSN (Online): 1694-0784 www.IJCSI.org 150

[7] Dr. Stephen C. Tweedie , “Ext3 , journaling file system”, Ottawa Linux Symposium, Ottawa Congress Centre, Ottawa, Ontario, Canada on the 20th of July, 2000 [8] FUSE (File System in User Space) : http://fuse.sourceforge.net/ [9] Fuse based ExFAT implementation for Linux : http://code.google.com/p/exfat/ [10] Keshava Munegowda, Veera Manikandan Raju, Rohit Joshi, “Storage optimizations by directory compaction in a fat file system”, USPTO patent: 8429331, granted on April 23, 2013.

Keshava Munegowda is working as Principal software engineer in EMC Corporation, Bangalore, India. He received Bachelor’s Degree in Computer science and Engineering in 2001 from Bangalore University, India, and M-Tech degree in computer Science and Engineering in 2004 from Visvesvaraya Technological University (VTU), Belgaum, Karnataka, and currently pursing PhD in field of computer science in VTU. He is working in the field of storage drivers, file systems, USB and Network Protocols since 10 years. He has obtained 2 USPTO patents in the area of FAT file system. He has published and presented papers in journals, international level Conferences and published a book in field of WLAN secuirty. He is one of the open source contributors for USB host device driver of OMAP SOC in Linux kernel.

Dr. G T Raju is a professor and Head of Computer Science and Engineering Department of R N S Institute of Technolgy , Bangalore, India. He received his Bachelor’s Degree in Computer Science and Engineering from Kalpataru Institute Of technology, Tiptur, Karnataka, in 1992 and M. E in Computer Science and Engineering from B.M.S College of Engineering, Bangalore, India in 1995 and Doctorate of Philosophy Ph.D.in 2008 in Computer Science and Engineering from Visveswaraya Technological University, Belgaum, Karnataka; He has 18 years of Experience in teaching and research in Data Mining, Data Warehousing, Image Processing, Databases, Artificial Intelligence and Computer Graphics. He has published and presented papers in journals, international and national level Conferences and published a text book.

Veera Manikandan Raju is a Senior Member Technical Staff in Texas Instruments, Bangalore, India. He received Bachelor’s Degree in Electronics and Communication Engineering from National Institute of Technology, Trichy, Tamilnadu, India, in 1996. He as 17 years of experience in field of storage drivers, file systems, USB, Network protocols, Video Processing and Gesture Recognition Technologies. He has obtained 2 USPTO patents in the area of FAT file system and he has published several journal and conference papers.

Copyright (c) 2014 International Journal of Computer Science Issues. All Rights Reserved.