NSTL White Paper

NSTL Logo

 

NSTL White Paper

System Performance and File Fragmentation

In Windows NT

October, 1999

Table of Contents

Executive Summary

I. Introduction

File Fragmentation and Data Fragmentation are Different
Fragmentation Can Impede Performance
NTFS is Very Different from FAT
NTFS Does Get Fragmented
Performance Degradations Can Impede Productivity
Keeping a Disk Defragmented Can Prevent These Problems

II. How NTFS Works

NTFS Capabilities in Functional Terms
Master File Table
Directories
Compression
Software RAID
Dynamic Bad-Cluster Remapping
Disk Caching
Volume Sets
Paging Files

III. How NTFS Gets Fragmented

Normal Creation and Deletion of Extents
The Impact of Unusual Events
Checkpoints
Increased Head Movement From Disparity of Extents
Cluster Size Issues, Trade-offs with Capacity and Performance
System Files (Principally, but Not Exclusively, the Paging File)
Fragmentation of Directories
Fragmentation of the MFT Itself
Workstation – Specific Issues
Server – Specific Issues

IV. The Implications of Fragmentation

Fragmentation is Difficult to Test
NT Performance is Impeded by Disk Fragmentation
Enterprise Systems are More Susceptible to These Problems
RAID Systems are Susceptible to Fragmentation
Disk Caching Mitigates, Doesn’t Eliminate These Problems
Some User Scenarios are Performance Limited, and Productivity is Therefore Impeded by Fragmentation
"Optimization" is Not a Solution

V. Conclusions

Regular Defragmentation Can Mitigate Performance Problems
Both Workstations and Servers Can Benefit

Glossary

 

This report was prepared by NSTL under contract for Diskeeper Corporation. NSTL does not guarantee the accuracy, adequacy or completeness of the services provided. NSTL MAKES NO WARRANTIES, EXPRESSED OR IMPLIED, AS TO RESULTS TO BE OBTAINED BY ANY PERSON OR ENTITY FROM USE OF THE CONTENTS OF THIS REPORT. NSTL MAKES NO EXPRESS OR IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE OF ANY PRODUCT MENTIONED IN THIS REPORT.

 

The Dot Executive Summary

Contrary to early conventional wisdom about Windows NT, its file systems do become fragmented. This fragmentation occurs in the normal course of using the operating system. Theoretical analysis and real-world performance testing demonstrate that fragmentation has an adverse impact on system performance. Special characteristics of the NTFS file system, such as the paging file, directories, and the Master File Table, are especially vulnerable to fragmentation, and allowing them to become fragmented is a guarantee of a decrease in overall system performance. Other NTFS features, such as file system compression, inherently create fragmentation.

The best way to avoid these worst-case fragmentation problems, and to keep the system running at optimal performance, is to run a defragmentation system on a regularly, scheduled basis. Both Windows NT Workstations and NT Servers are subject to these problems, and both can improve system performance through regular defragmentation.

The Dot I. Introduction

All computer system design involves trade-offs, and file systems are no exception. One of the major detrimental effects of these trade-offs is fragmentation of files and the file system. Files in a file system become fragmented usually when they begin to run out of large physical stretches of free space. Rather than deny a file the ability to grow beyond the size of the largest free block on disk, file systems allow different parts of the file to exist in different non-contiguous locations, and the file system software presents the file to programs running on the computer as one logical unit. File systems can also become fragmented when files become scattered across the disk, even when the individual files themselves are not fragmented into multiple sections. In the long term, this can happen for the same reasons as those that cause internal file fragmentation, and can occur in the normal course of computer use.

Normal computer use involves the creation and deletion of files, some of them permanent, some of them transient. Many typical computer processes, such as desktop publishing or software development, involve the creation of large numbers of temporary files, the presence of which the user is normally unaware. During the user task, the program reads source files and may create temporary files to store data used in a later portion of the task. In the end, the application may write result files and delete original source files and temporary files.

The end result of this process is that small runs of free space appear amidst the allocated space on the hard disk. This, in and of itself, is a form of fragmentation that decreases performance even if individual files are not internally fragmented. Over time, as the larger runs of free space on the hard disk are lessened in this way, individual files become fragmented because the file system will lack the space to contiguously allocate a file. The term used for space such as this, which is unallocated to any file but unavailable to some degree because it is split into multiple sections, is external fragmentation.

More importantly, as individual files grow, there will not be sufficient free adjacent space for them, and the file system will need to allocate a non-contiguous or non-adjacent block of space for new data.

Windows NT also supports the FAT and HPFS file systems, which have fragmentation issues of their own. But these file systems are provided for compatibility with legacy systems, such as DOS and OS/2, and do not support the full gamut of Windows NT features, such as integrated security. Many of the issues explored in this paper apply to those file systems as well as to fragmentation generically on any operating system, but the focus of this paper will be on the NTFS file system under Windows NT 4.0.

The Dot File Fragmentation and Data Fragmentation are Different

It’s important to note the distinctions between fragmentation at different levels of data storage. Individual applications, such as Microsoft Office programs and database servers like Oracle, have their own issues of fragmentation in their data storage. These issues are generic to all file systems and operating systems. Such disk fragmentation would exist regardless of the file system or operating system.

The file system, NTFS in the case of NT, is not aware of the logical organization of your data. Wherever the file may exist on the disk, and whether or not the file is fragmented, the file system presents it to the application as a single contiguous area of storage. But the application’s view of the data in that file has a logical structure. To a mailing list program, a file may be a group of first names, last names, addresses, and so on. To the file system it is still just a group of clusters of data. A cluster is the smallest unit of storage, which can be allocated by the operating system on a disk. A cluster may consist of one or more sectors of the disk.

The application may, in its own internal organization of the data in the file, create gaps in the data, i.e. it may fragment it. Much like a file system, when you delete data in an application it may not actually remove the data, but only mark it as deleted. The resulting gaps in the logical storage of data are known as internal fragmentation.

Data files may also have allocated but unused space for other reasons. Programs may allocate space in a file in chunks of space analogous to file system clusters, for their own organizational or performance reasons. They may also use external facilities, such as Windows’ OLE structured storage, to manage the structure of the data in their files, and these facilities may have their own wasted space.

Over time, the growth of such areas will cause the total size of the file to grow and may slow the performance of the application as head movement on the disk increases, even if the logical amount of live data remains constant. This problem occurs even if the file itself is not fragmented at the file system level, although data fragmentation increases the likelihood of file fragmentation simply because the file itself grows.

To combat internal data fragmentation, some applications, such as Microsoft Access, provide utilities to defragment (or "compact") the data in the file. Ironically, these utilities themselves run a substantial risk of increasing fragmentation at the file system level because they usually create an entirely new copy of the file, consuming large amounts of disk space in the process. Thus, regular defragmentation of your data files may exacerbate fragmentation of your file system.

Lastly, the individual files associated with an application can, over time, become physically dispersed across a disk. This type of fragmentation, known as usage fragmentation, is an especially difficult problem for a defragmentation program, because normal methods of fragmentation analysis may not identify it. Instead, some knowledge of the application’s behavior may be necessary in order to rectify this problem. In the future, this problem could, in theory, be managed either by applications providing information about their files to the defragmenter or by sophisticated analysis of the file system journal.

The Dot Fragmentation Can Impede Performance

Almost all hard disks have the same basic design: a stack of circular platters with a series of heads that move across the disk to read concentric circular tracks.

In most cases, heads in the disk move in lock step, all the heads will always be physically located over the same track at once, and this group of tracks is called a cylinder.

Hard disks operate at their fastest when they are reading physically sequential data, one track at a time, switching from one head to another within a single cylinder, and moving on to the next physically adjacent cylinder. Under these circumstances the disk can read or write data and pass it back to the interface and to the computer with a minimum amount of head movement. If the next data to read or write were stored elsewhere on the disk, the process would have to wait for the heads to move to the correct cylinder and settle over the appropriate sector within that cylinder. Head movement is expensive in terms of computer performance and, in order to maximize performance, head movement should be minimized.

Modern hard disks usually read one track of information at a time, so keeping files and free space defragmented also takes maximum advantage of the hard disk’s ability to read your data in anticipation of your using it, as well as to cache that data in hardware. The more contiguous your data is on the disk, the more likely it is to be read in a single hard disk read operation. One implication of this is that fragmentation (either internal or external) of a file that lies within a single track on a disk is irrelevant, or at least less relevant, to performance, because head movement will be constant.

All file system designers are faced with a trade-off between several factors, including performance, efficient use of space, and tendency to fragmentation. File systems allocate disk space in units called clusters.

If a file consumes less than an exact multiple of cluster size, the remaining space, often called cluster slack, is technically wasted. But as disks and average file size become larger, it makes sense to use larger clusters, and risk larger amounts of cluster slack. In a well-designed file system, even if cluster size increases, the overall percentage of space wasted as cluster slack remains small, and as the average size of a file increases, the waste in cluster slack also loses its importance. As we will see below, NTFS has special design features that lessen the impact of cluster slack in small files.

Real world experience and research indicate that, while some files have gotten large over time, average files remain small enough that smaller cluster sizes, 4K or less, are optimal.

The Dot NTFS is Very Different from FAT

Windows NT is much smarter than its predecessor operating systems in allocating disk space to files. As a result, it is less prone to fragment files. But as a side effect of preventing file fragmentation, NTFS creates fragmentation in the file system’s free space. Still, NTFS is not immune to the forces that fragment individual files, and over time, files on an NTFS volume will become fragmented.

Starting in version 4.0, Windows NT provides operating system calls designed to facilitate defragmentation, and defragmentation software for Windows NT usually uses these calls. But the design of NTFS and practical implications of how these APIs (application programming interfaces) operate, mean that it is important not only to defragment your disks, but also to do so on a regular basis.

The Dot NTFS Does Get Fragmented

The Windows NTFS File System Driver uses a special file called the Master File Table (MFT) to track all files on that volume. The MFT starts out with some free space to allow new files to be tracked, but on a very busy system it too can run out of space. At this point NTFS extends the MFT itself, creating new stretches of it for new allocations. This situation is precipitated most often by fragmentation in the file system itself, as file system fragments consume entries in the MFT. If these new stretches are not contiguous, the MFT itself becomes fragmented.

There are other files, such as the paging file used by Windows NT’s virtual memory subsystem, which can also become fragmented with unpleasant implications for performance. The solution to these problems, as we will see, it to prevent them from happening by keeping your system defragmented.

Lastly, directories in NTFS are allocated similarly to files, but defragmentation of them can be difficult.

The Dot Performance Degradations Can Impede Productivity

Windows NT does a good job of allowing the system to continue operation even as programs wait for disk I/O, but some inefficiency cannot be hidden forever. Especially on a mission-critical server, on which many users rely, inefficiencies in the file system can lead to performance degradation that impedes user productivity.

These problems are not always apparent, and are frequently cavalierly blamed on other sources; perhaps the computer’s just too slow, needs more memory, or some program being run needs an upgrade. Overall system performance is a complex phenomenon, and even experienced system administrators may not recognize fragmentation in a file system. After all, it can occur with large amounts of free space on the disk. But the main reason users don’t recognize fragmentation is because Windows NT comes with no tools to identify it.

Heavily used systems, which are by definition mission-critical systems for an organization, will become fragmented over time under normal usage in Windows NT. As performance decreases in such systems and users are forced to wait, productivity is thereby impeded.

The Dot Keeping a Disk Defragmented Can Prevent These Problems

Regular defragmentation of the file system improves overall system performance and, as a result, allows the rest of the system to operate at optimal performance speed given normal circumstances.

Heavily fragmented systems can become difficult to defragment, so it is important, in order to maintain optimal performance, to defragment on a regular basis to prevent especially problematic circumstances, such as a fragmented paging file or MFT, from arising. Windows NT’s scheduling service and performance monitoring tools provide an efficient solution to this problem by allowing defragmentation to be scheduled for off hours and/or when other load on the system is light.

The Dot II. How NTFS Works

The Dot NTFS Capabilities in Functional Terms

NTFS is a modern, robust file system designed to support both single user workstations and multi-user servers. Microsoft designed NTFS to overcome the most serious limitations of their predecessor file systems, FAT and HPFS, as well as to support planned features in Windows NT, such as integrated security and support for the POSIX standard.

NTFS has very high limits on storage capacity. It uses 64-bits to number clusters which can occupy up to 64K, meaning that a disk volume in NTFS can be up to 264 (16 billion billion) clusters or 280 bytes, and each file can be up to 264 bytes. Both FAT and HPFS had much smaller limits. While NTFS is internally capable of managing this much storage, the disk partitioning scheme or hardware addressing may limit the partition size to a smaller number.

NTFS is a recoverable file system. This means that operations in NTFS are transactions, as in a database. Either the entire operation completes or the operating system has the capability to roll back the unfinished portion, safeguarding the integrity of the existing data. NTFS also stores redundant copies of critical file system structures in the unlikely event that physical damage makes one copy of them inaccessible.

Security is integrated directly into the NTFS system and derived from the Windows NT object model. Security objects, known as ACLs (Access Control Lists), are stored in the MFT as part of the file. These are the actual security objects used by Windows NT to restrict access to the file object.

Files in NTFS have attributes: a name, a creation date, an archive bit, and so on. In fact, the data in the file is just another attribute. This characteristic of NTFS is how Windows NT implements many of its sophisticated features, such as complex access controls and support for Apple Macintosh clients. Macintosh files, for example, have two sections, a resource fork and a data fork. NTFS manages the association between these sections by storing them in different attributes of the same file.

In some ways, the organizational system of file attributes combats fragmentation, because programmers might otherwise have used additional files to store attribute data. But heavy use of attributes can cause fragmentation within the MFT itself.

Because Windows NT is fully Unicode-enabled, so is NTFS. All data in NTFS file systems are stored in the 16-bit Unicode encoding scheme, where each character in the file name is stored in 16 bits in the file’s name attribute. Filenames can take up to 255 characters including multiple periods and embedded spaces.

The Dot Master File Table

The heart of the NTFS file system is the Master File Table or MFT. The MFT is itself a file, an array of records constituting a database of all files on the system.

Each record in the MFT is usually fixed, by definition, at 1K, and the format of the first 16 records is defined to contain certain volume–specific information, and are known collectively as the NTFS metadata files. Metadata is the name given to these overhead structures in the file system, which are used to track the real data. The first four records are duplicated in a file at or near the physical center of the disk for recoverability purposes.

Normally, each record in the MFT corresponds to one file or directory in the file system. The MFT record contains the file’s attributes. Other standard attribute information in a file record includes the read-only and archive flags; creation and last-accessed dates; the file name, of which there are likely at least two (a "long" file name and a short "8.3" DOS-compatible name); a security descriptor; and the file data, or pointers to where the file data resides on the disk.

Yes, the data in a file is just another attribute of NTFS. For this reason, small files (about 750 bytes, depending on the number of other attributes in the file) can fit entirely within their MFT entry, giving Windows NT and NTFS excellent performance with such files. Such files also exhibit zero fragmentation.

There is at least one entry in the MFT for each file on the NTFS volume, including the MFT itself and other "metadata" files. These are the files, such as the log file, the bad cluster map, and the root directory, which contain the structure of the rest of the volume as seen by NTFS. Users don’t see these files, which all have names beginning with ‘$’ (for example, the MFT is in $MFT). Most of the remaining entries in the MFT are for user files and directories.

In a perfect world, that would be it for the MFT. Of course, many files are not so small that their data fits within their MFT entry, so the MFT stores their data in one or more areas of the disk. NTFS allocates files in units of clusters. The clusters within a file are referenced by NTFS in two ways: first, with Virtual Cluster Numbers (VCNs), from 0 through n-1 where there are n clusters in the file; second, with Logical Cluster Numbers (LCNs), which correspond to the number of the cluster on the NTFS volume.

Because LCNs are simply an index to the clusters on a volume, NTFS uses an LCN to calculate an address on the disk to read or write by simply multiplying the LCN by the number of sectors per cluster and reading or writing sectors starting at that address on the disk.

VCNs are the analog for file offsets requested by applications running under Windows NT. The application knows the format of the data it uses in the file and uses it to calculate a byte offset within the logical format of the file. When the application requests a read or write at that address of the file, NTFS can divide that number by cluster size to determine a VCN to read or write.

By associating VCNs with their LCN, NTFS associates a file’s logical addressing within its files with the physical locations on disk. This mapping of VCN to LCN is what the file’s data attributes do.

All files have at least one data attribute, known as the "unnamed data attribute." There can be other named data attributes, which correspond to the multiple streams of data referred to above. Directories do not have unnamed data attributes, but they can have named ones.

If any attribute, most likely the file data attribute, does not fit in the MFT record, NTFS stores it in a new, separate set of clusters on the disk, called a run or an extent. In fact, other attributes besides the data can become large enough to force new extents. For example, long filenames in Windows NT can be up to 255 characters that, because they are stored in Unicode, consume 2 bytes apiece. When an attribute is stored within the MFT entry, it is called a resident attribute. When one is forced out to an extent, it is called a non-resident attribute.

It may come to pass that the extent will need to grow, for instance, if the user appends data to a file. In this case, NTFS will attempt to allocate physically contiguous clusters to the same extent. If there is no more contiguous space available, NTFS will need to allocate a new extent elsewhere on the disk; in other words, it will separate the file into two fragments. The data attribute header, still stored within the MFT record, stores the information in the form of LCNs, and run lengths that NTFS uses to locate the extents.

In rare cases, usually when the number of attributes is large enough, NTFS may be forced to allocate an additional MFT entry for the file. In this case, NTFS creates an attribute called an attribute list, which acts as an index to all the attributes for the file or directory. This is an unusual situation which should occur only with files that are extremely large and fragmented, and can greatly slow the performance of operations on that file.

The Dot Directories

Directories are very much like files in NTFS. If the directory is small enough, the index to the files to which it points can fit in the MFT record in an attribute called the Index Root attribute. If enough entries are present, NTFS will create a new extent with a non-resident attribute called an index buffer.

In such directories, the index buffers contain what is called a "b+ tree," which is a data structure designed to minimize the number of comparisons needed in order to find a particular file entry. A b+ tree stores information (or indexes to that information) in a sorted order. At points in the directory, NTFS stores sorted groups of entries and pointers to entries that fall below those entries in the sort. This has many advantages over storing entries in whatever order they happen to fall. For example, if you want a sorted list of the entries in the directory, your request is satisfied quickly because that is the order of storage in the index buffer. If you want to look up a particular entry, the lookup is quick because the trees tend to get wide, rather than deep, which minimizes the number of accesses necessary to reach a particular point in the tree.

The Dot Compression

NTFS supports compression of file data as a native function of the file system. One of the side effects of compression is that it can create fragmentation of files and of free space.

You can instruct NTFS to compress data on an entire volume, in a specific directory, or even in a particular file. There are Win32 calls for programs to use to determine the impact of compression, in particular the compressed and uncompressed file sizes. If you get a file’s properties in Windows NT Explorer, you will see both sizes.

It is in this compression scheme that you begin to see the flexibility created by NTFS’s use of both VCNs and LCNs, as well as the potential for problems. In a normal file that has data stored in non-resident attributes or extents, the data attribute will contain mappings of the starting VCN and starting LCN in the extent as well as the length in clusters.

NTFS plays games with these cluster numbers to achieve compression, using two basic approaches. Because some large files have large blocks of nulls (bytes of value 0), NTFS uses a sparse storage for such files, meaning that it only stores the non-zero data.

Imagine a 100 cluster file in which only the first 5 and last 5 clusters contain data, and the middle 90 are all zeroes. NTFS can store two extents for this file, each 5 clusters long. The first will have VCNs 0 through 4 and the second will have VCNs 95 through 99. NTFS can infer that VCNs 5 through 94 are null, and do not need physical storage. If a program requests data in this space, NTFS can simply fill the requesting program’s buffer with nulls. If the program allocates non-zero data to this space, NTFS can create a new extent with the appropriate VCNs. This method is very fast for sparse files.

If a file is not predominately null, NTFS uses a different compression method. Instead of trying to write the file data in one extent, NTFS will divide the data up into runs of 16 clusters apiece. In any particular extent, if compressing the data will save at least 1 cluster, NTFS will store the compressed data, meaning 15 or fewer clusters. If the data cannot be effectively compressed (random data, for example, is generally not compressible), NTFS will simply store the entire extent as it normally would without compression. Back in the MFT record for this file, NTFS can see that there are missing VCNs in the runs for a file and can infer that the file is compressed.

Because the data is stored in a compressed form, it is not possible to look up a specific byte by calculating the cluster in which it is stored. Instead, NTFS calculates in which 16 cluster run the address is located, decompresses the run back to 16 uncompressed clusters, and then calculates the offset into the file using valid virtual cluster numbers. NTFS ensures that all these runs begin with a virtual cluster number divisible by 16 so that this addressing remains possible without having to decompress the entire file.

NTFS tries to write runs of this type into a single contiguous space because the I/O system is already encountering enough added processing and management burden using compressed files without having to fragment individual extents. This is part of the reason NTFS’ designers chose 16 clusters as the size of a compressed run; it cannot be more than 64K, because the file system buffers are 64K each. It is also very likely to be read in a single I/O operation.

NTFS also tries to keep all the separate runs of the file contiguous, but this is a harder job. Compressed files are more likely than non-compressed files to be fragmented.

NTFS only compresses the file’s data attribute, not the metadata. Compression only works on volumes with 4K clusters or smaller.

The Dot Software RAID

NTFS also supports fault tolerance in disk subsystems by dynamically mirroring or striping data across multiple disk volumes. NTFS supports RAID levels 1 and 5. In level 1, known as mirroring, data written to a volume is written in parallel to a second volume; data read from a volume is also read from the second volume and compared to it for correctness. In level 5, known as striping, data streams ("stripes") are divided among three or more disks, using some of the space to store parity information. If one of the disks registers a physical error, NTFS can calculate the missing data using the remaining data and the parity information and the logical exclusive-OR (XOR) operation.

The Dot Dynamic Bad-Cluster Remapping

NTFS is able to dynamically detect the presence of a physically bad cluster and map around it. If, on a disk which has been formatted as an NTFS fault tolerant volume, the NTFS driver attempts to read a cluster and the read operation fails due to a physical read error, the NTFS fault tolerance driver dynamically retrieves a good copy of the data that had been stored in the bad sector using a striped or mirrored volume. NTFS then maps a new cluster to replace the bad one and writes the data to it, and then marks the bad cluster so that it is no longer used. On a non-fault tolerant volume, NTFS can still detect bad clusters and mark them as such, but they cannot necessarily retrieve the data.

Remapping the bad cluster almost certainly fragments the file into at least three fragments. Today’s hardware is usually reliable and it is good that NT has the capability to maintain the integrity of files in this way, but the potential for sudden fragmentation in critical files is another reason to defragment file systems on a regular basis.

The Dot Disk Caching

Windows NT’s I/O Manager integrates a Cache Manager that is involved in all disk I/O. When an application attempts to read data that has not been loaded into the cache, the Cache Manager interacts with the Windows NT Virtual Memory Manager, which calls the NTFS file system driver to load the data into the cache. Similarly, the Cache Manager uses the memory manager to perform all disk writes using background threads.

Unless instructed otherwise, NT’s Cache Manager caches all reads and writes on all secondary media. Cache Manager uses a number of aggressive techniques to improve performance. For example, it will attempt to read ahead in a file in anticipation of a program requesting the following data. It will also delay writes to the disk, so that if reads or writes of the same data occur quickly, they will be satisfied out of the cache rather than a physical disk operation.

Aggressive disk caching can mitigate the effects of disk fragmentation to the extent that data that is read by applications is read from the cache rather than from the disk itself. In fact, adding memory to a heavily fragmented system can improve performance on a fragmented system, although this is an expensive solution to a problem that can be fixed at little cost through software and good practices.

The Dot Volume Sets

The NT fault-tolerance driver also provides some functions unrelated to fault-tolerance, including Volume Sets. A volume set is a single logical volume composed of areas of free space on one or more disks. Using the NT Disk Administrator utility, you can combine two 100MB free areas on different disks into a single logical 200MB volume. These volume sets can be formatted with any NT-supported file system, although there are advantages to using NTFS.

Volume sets are useful for combining smaller disks or free space on larger disks, into a single, more useful area that can be treated as a logical unit. If the volume is formatted with NTFS, the administrator can add new stretches of free space to the volume set while maintaining data on the existing volume. This can be a low-impact way for network administrators to add storage to an existing network drive without impacting users’ view of the network.

The problem with volume sets, from a fragmentation standpoint, is that they have the capacity to exacerbate normal fragmentation into even more performance-limiting fragmentation across physical volumes or physically separate free stretches of a single volume. Windows NT file systems don’t see the fact that they are working with multiple volumes and therefore treat volume sets as they would any single physical device.

The Dot Paging Files

Paging files present a special problem for fragmentation under Windows NT. NT supports up to 16 paging files on a system. These files are used for virtual memory; as Windows NT and its applications use memory in excess of the physical RAM, the Virtual Memory Manager writes the least-recently used areas of memory to the paging files to free RAM. If a program accesses these areas of memory, the Virtual Memory Manager reads them from the paging file back to RAM where the program can use them.

Once the system starts up, these files are always open and cannot be moved or deleted. At startup, the Windows NT System process duplicates the file handles for the paging file so that the files will always be open and the operating system will prevent any other process from deleting or moving them.

For this reason, paging files are a problem for defragmentation software. In order to safely defragment the paging file, defragmenters must defragment them at system boot time before the Virtual Memory Manager gets a chance to lock them down. While this is a desirable feature, regularly rebooting a system to defragment it is not a desirable situation, so the best solution is to keep the rest of the file system defragmented to mitigate any fragmentation problems caused by the existence of paging files.

The Dot III. How NTFS Gets Fragmented

The Dot Normal Creation and Deletion of Extents

In the normal course of computing, on any operating system, files are created and deleted, visibly and invisibly. This process leads to the creation of gaps in the used portions of physical storage. As a disk becomes more full, and use of it becomes heavier, it is likely that the large areas of free space that are present early in the system’s life will break down into smaller free areas throughout the system.

Many programs will explicitly retain the last version, or several versions, of the file the user is working on. Eventually the backup versions are deleted, and their space is freed up. The result is probably a gap in free space on the disk. Or consider the case of downloading the latest version of Netscape Communicator. You might download a 20MB executable program and run it, creating another 20MB or more of files in the Program Files directory. Then you will likely delete the 20MB file you downloaded. The result is that you have a 20MB gap, possibly in one place, possibly split up, and the newly installed program is likely stored after the gap on the disk. The operating system has fewer large free areas to work with.

But programs and the operating system create files on their own without telling the user. Consider the print spooler. Ever since the early versions of DOS, when you print a file, your program and the operating system actually performs at least two steps. First, it creates a file containing the data printed by the application. In the case of Windows applications, these are in an intermediate format called Windows Metafile Format (WMF). The printer driver for your printer then converts this data to a separate file in the native format for the printer, and then the spooler sends that file to the printer. All this data consumes space on the disk temporarily and is then deleted. Printing a large document consumes a correspondingly large amount of disk space.

The Dot The Impact of Unusual Events

Such normal events can cause fragmentation, but it would take a long time and a lot of use. But fragmentation in NTFS is easy to create using unusual, but not unreasonable, techniques. The example above of downloading a large file and installing it is a minor example of this.

To date, there have been 5 service packs for Windows NT 4.0. Each of them has involved a multi-megabyte download, and each makes changes in a large number of NT system programs likely to be stored at the front of the disk. Installing a service pack is therefore likely to push NT system programs further out on the disk, creating gaps. Large service packs may cause fragmentation within NT files themselves, and certainly make fragmentation of other files more likely. Consider also that Microsoft SQL Server, BackOffice, Office 97, and many other common NT programs have their own service packs, and that installing them brings all the same implications of installing an NT service pack. Application upgrades have all the same implications as well.

You’d think that installing a new Windows NT Workstation would start the system out in a clean, unfragmented state, but even this is not necessarily true. Even a clean install will likely end up fragmented, because the installation process creates numerous files and directories that it then deletes. The subsequent application of service packs exacerbates the situation. It is not unusual for a user to install NT on a system with an existing FAT-formatted drive. NT has the capability to convert the drive to NTFS, but doing so requires moving files around in ways that will fragment free space.

The Dot Checkpoints

Aggravating the problem is the fact that NTFS doesn’t immediately make deallocated clusters available for other programs. Instead, they become available after the next time NT "checkpoints" the disk.

Checkpointing is part of NTFS’ facility for recovering from errors. As we stated above, I/O operations in NTFS are transactions. As it performs I/O operations, such as appending data to a file, NTFS logs undo and redo data for that operation. At some point between transactions, when the disk is known to be in a good state, NTFS writes a checkpoint record to its log. If NT detects a disk error while performing an operation it enters a recovery procedure consisting of three passes: the analysis pass, the redo pass and the undo pass.

In the analysis pass, NTFS determines which parts of the operation failed and which clusters it must update in order to undo the transaction. In the redo pass, NTFS performs all other operations that were logged since the last checkpoint. Then in the undo pass, it rolls back any uncommitted operations in the offending transaction.

Because NTFS cannot be certain of the disposition of data in a cluster until a checkpoint, it cannot allow other data to be written to that cluster. Note that no errors need occur for this to happen. It is unlikely to affect a large amount of disk, but it happens every time a cluster is freed, and will tend in the long term to push data further out in the disk, and thus to diminish the average size of a free area of disk.

The Dot Increased Head Movement from Disparity of Extents

As stated above, an I/O subsystem operates at maximum speed when the disk transfers data to or from adjacent sectors on the disk. This is because the heads on the disks have to move at a minimum under such circumstances. Head movement is the enemy of I/O performance.

It is a rare event indeed when the disk gets to read or write contiguously for a long time. It is normal for the heads to move around as Windows NT reads and writes to different files in the normal course of its business.

For example, consider the Checkpointing system described above which allows Windows NT to recover the file system to a correct state even in the event of a power failure or physical disk error. The undo, redo and checkpoint information that makes recoverability possible is stored in a log file that the NT Log File Service (LFS) maintains. Periodically, in the course of writing to some other part of the disk, NTFS writes log entries about the disk operations it is performing to the log file.

Head movement is also inevitable when the operating system pages memory out to disk. The Virtual Memory Manager will begin to page memory out to disk even before there is no unallocated memory. This is a reasonable policy, but it may negatively impact the performance of disk-intensive applications. In a heavily trafficked system, paging to and from disk is not uncommon, and consumes both CPU and disk time.

Even with the normal amount of head movement that occurs in a system, an application can perform at full or near-full speed. But fragmentation in data or program files can significantly increase the amount of time it takes to perform disk operations.

The Dot Cluster Size Issues, Trade-offs with Capacity and Performance

When you format a volume using NTFS you have a choice of cluster size to use. Windows NT has different default cluster sizes for different size volumes. This is a simple association, and knowledge of how the volume is to be used could be used to choose a cluster size more optimal than the default.

Depending on your priorities, you might want to choose a different cluster size than the default, but be careful. Choosing a smaller cluster size will waste less space but is more likely to cause fragmentation. Larger cluster sizes are less likely to cause fragmentation but will waste more space.

512 byte clusters in particular are problematic, especially since the MFT consists of records that are always 1024 bytes. It is possible on a system with 512 byte clusters to have individual MFT entries fragmented. MFT record fragmentation of this type is not possible with larger cluster sizes, which can hold one or more complete MFT Records.

If a file or directory is contiguous, the cluster size doesn’t matter, except to the extent that it wastes a small amount of space. It is therefore wise to choose a cluster size large enough discourage any more fragmentation than you are likely to encounter on NT anyway.

But if you know that you have a very large number of small files, or if you know that you have very few small files, you have information that you can use for a better cluster decision. Also, a very large absolute number of files (on the order of 100,000) will make fragmentation of the MFT more likely. In this case, a larger cluster size will limit the fragmentation in the MFT as it grows to accommodate.

Note that it is possible to create an NTFS volume with a cluster size greater than 4K, however, if you do that you can not use NTFS compression, nor can you get defragmentation using the built in supported Microsoft defragmentation interface.

The Dot System Files (Principally, but Not Exclusively, the Paging File)

DOS and Windows have a small number of files that are known as System files, which make them invisible and unmovable. Windows NT makes far greater use of these files. These files consume a non-trivial portion of the disk space, especially on a boot volume.

Windows NT has two kinds of system files. The first kind are the files which constitute the structure and overhead of the NTFS file system. Call them "NTFS System Files." The MFT is one such file (named $Mft), with special implications, which we deal with below.

First, there is a copy of the first four records of the MFT named $Mftmirr, stored near the physical middle of the disk. There is also the Log File ($Logfile), the Volume file ($Volume), the Attribute Definition Table ($Attrdef), the Root Directory File ($.), the Cluster Bitmap ($Bitmap), the Partition Boot Sector ($Boot), the Bad Cluster File ($Badclus), the Quota Table ($Quota), and the Upcase Table ($Upcase). ($Quota is not used in NT 4.0, but Windows 2000 uses it to implement user storage quotas.)

These files are always present on an NTFS volume. The APIs that Windows NT provides to support defragmentation do not move these files, so they cannot be defragmented while Windows NT is running.

But there are many other such files, and as with DOS, they present problems for defragmenters. Call them Windows NT System Files. For example, NTDETECT.COM, the multi-boot loader, and ntldr, the Windows NT loader program, are Windows NT System Files. Some notebooks, with proper support, will have large hibernation files, the size of physical memory, and most importantly to every day use and performance, the paging file.

Disk I/O to the paging file (\pagefile.sys) is almost always heavily fragmented, because the process being read or written from or to the paging file is not guaranteed to be adjacent to the next process being accessed in the paging file. This is one of the most crucial files for Windows NT’s overall performance, because access to it usually occurs at a point where performance is already being constrained by memory. A large number of fragments in the paging file bring with them a severe performance penalty.

The Dot Fragmentation of Directories

NTFS treats directories almost exactly as it treats files. In fact, directories are just another type of file, although they have special types of attributes in their MFT records. Normally applications manage the contents of the data in their files; in the case of directories, it is NTFS that manages the contents, which are b+ trees that provide an indexed access to files in the directories.

Some directories, such as most application program file directories, aren’t likely to grow or shrink much over their lifetimes. But some directories, such as the TEMP directory or user document directories, are likely to grow and shrink considerably. As the number of files in a directory grows, NTFS can grow the directory storage to accommodate it. In the right circumstances, if the content of the directory shrinks, NTFS can also free up the unused space in the directory, but this doesn’t happen very often.

The directories that are likely to grow and shrink are also the type that is likely to have been created early in the system’s life, such as My Documents and TEMP. Therefore it is likely that, as they grow, their growth will be non-contiguous. These are also likely to be heavily used directories, so this fragmentation is likely to have a real impact on system usage.

Users should also be aware that deeply nested directories may present an organizational convenience, but there is a performance penalty for them. When NTFS searches its b+ trees for data, it does so once for each level in the directory subtree. Therefore performance may be better with flatter trees that have larger numbers of files in them than with deeper trees that have fewer files in each. Very deep subtrees can also create problems for applications that have limits to the number of characters in a complete file path. Many applications limit such a name to 255 characters.

The Dot Fragmentation of the MFT Itself

Normally the MFT uses one entry per file or directory. The area on the disk reserved for the MFT begins life at the time the volume is formatted with about 12.5% of the total volume space reserved for the MFT. This reserved space (the "MFT zone") and the MFT itself are not movable. If everything goes well, the MFT as pre-allocated will be more than up to the task of tracking file and directory metadata.

But when a file becomes very fragmented, it increases the amount of data NTFS need to store in the MFT record in order to track the various fragments or extents. Eventually the MFT record is not large enough to store the data, and NTFS must allocate another record. Because of this, keeping the disk generally defragmented helps to prevent the MFT from becoming fragmented.

Part of the problem with the MFT is that it will grow if necessary, but will never contract. In a system with a large number of files, or one that is heavily fragmented, the MFT may run out of available entries. In this case, NTFS will expand the MFT in 32 record chunks.

Because use of the volume after it is formatted creates files physically following the MFT zone, expansions of the MFT can be made contiguously if no other files are in the MFT zone. These new entries will contain metadata describing recently created files that are likely to be used, and performance in using them will suffer greatly. As noted above, if the MFT begins to fragment, it is better to have a larger cluster size on the volume, as this will limit the number of fragments.

Temporary files are one of the principal ways in which large numbers of files can be created, and the effect is insidious. Users aren’t usually aware of the number of temporary files created during operations like compiling, word processing, and even using the Internet; Microsoft’s Internet Explorer creates a particularly large number of temporary files. Heavy use of such files and failure to clean them up can fragment not just files and free space, but the MFT itself. Users should use utilities, included with recent versions of Windows and available from 3rd parties, to clean up unused temporary files, shortcuts that point to nowhere, and other "Windows droppings" that accumulate over time, and run these utilities on a regular basis.

The Dot Workstation – Specific Issues

Even though the typical server has much more I/O than the typical workstation – that’s what it’s there for after all – workstations are still subject to much fragmentation. They need to be defragmented on a regular basis just as servers do.

Workstations share with servers the issues of service pack installation and its corresponding fragmentation. They have the same system files and paging issues as servers. And even in an environment where workstation users run all their programs off a server, and store all their data files on a server, workstation users still likely have their temp directories stored on local storage. In fact, they may have considerably more temporary files than servers because they are likely to have browser cache files. Since they probably have less memory than servers, they have a lesser ability to cache I/O data, making them more likely to perceive the performance implications of fragmentation.

But in the real world, workstations usually have applications and local data storage too. Because they get less attention from experienced network administrators than servers, they may not be as efficiently managed.

For these reasons it is important not only that workstations be defragmented on a regular basis, but that an automated system be set for doing so. Regular end-users are less likely to monitor the state of their file systems than computing professionals.

The Dot Server – Specific Issues

Fast disk I/O is a priority for almost any server. Whether that system is serving data from a database, running an accounting package, or simply serving files requested by clients, disk I/O is a crucial part of the work performed.

In fact, the trend in the computer industry is to put more and more on the server and to manage it there. This is the basis of all the major trends in software from Microsoft and others in the NT market, where logic is moving away from clients and onto middle and back-end "tiers," a.k.a. servers, where the data can be more efficiently managed.

These servers are expected to interact with numerous clients and other servers on the network, and in the process of doing so they usually interact with the file system. Web/Intranet servers are especially likely to have large numbers of files to manage.

Most web servers, both on the Internet and corporate intranets, serve a combination of static and dynamic web pages. Both types of web page involve, at a minimum, reading from the file system. A static web page server is completely analogous to a conventional file server. A dynamic web server takes a combination of script files, template files and user input, and constructs a response page to send to the client program. The dynamic server is not likely to write that file out to disk before sending it, but the scripting engine, database server, and other mid-level and back-end components involved in the operation almost certainly use temporary file storage. More sophisticated setups, such as transaction processing systems using systems like Microsoft Transaction Server, frequently write temporary storage.

It’s important not to confuse the "compact" or "optimize" utilities that come with many server applications to defragment their data sets with file system defragmentation. Storage at the database level could be completely defragmented in the view of the database server software, but badly fragmented in the file system, and vice-versa. Either type of fragmentation is bad for performance, but file system fragmentation is probably worse, because a fragmented database in a contiguous file will still not likely need much disk movement to find any particular record. But an internally defragmented database stored in multiple file system fragments is likely to be slow. The exact impact of internal fragmentation depends on specifics of the application and data; in the case of databases, the data access method is critical. If usage is dominated by random access or small data items, internal fragmentation may not affect performance much. If access is sequential, internal fragmentation could cripple performance.

Fast disk performance is essential to all these systems. The difference with servers is that their performance is important to the entire group using them, and potentially to the entire enterprise. Some problems on servers can be mitigated by separating the system and data onto separate volumes, which is advisable on any operating system for reasons other than avoiding fragmentation.

The Dot IV. The Implications of Fragmentation

Now that we have established the reasons why Windows NT systems are subject to file system fragmentation, we will examine some performance tests that analyze the effect and discuss solutions to the problem.

The Dot Fragmentation is Difficult to Test

The complexity of modern file systems and the variety of programs and data found in the real world make it difficult to arrive at test numbers that are applicable to all potential users. Even if two systems have the exact same data and programs, they will quickly diverge in file layout because no two users will do exactly the same thing with them. This is as it is with many computer-testing issues.

And yet, in order to test disk fragmentation in a way that is repeatable and reliable, it is first necessary to create multiple test systems that are fragmented in the same way. There are two ways to do this.

The first is to obtain a fragmented disk – which isn’t hard, just take one that has been in service for a long time – and make a tape image copy of it. You can restore the image onto any system on which you wish to test.

While this approach has the advantages of being easy to set up and creating a real world configuration, there are several problems with it. First, it is tied to a particular size of hard disk. Over the years the average size of a disk is likely to increase, and the test will not be transferable. Second, the files on the disk will likely be associated with older versions of programs. Testing, in subsequent years, using old versions of applications is largely a legitimate exercise, but makes the test seem more distant from real world circumstances, and may even miss relevant changes in behavior by newer versions of software. Note that if the disk is a boot disk, it will probably be using an older version of the operating system, which compounds this problem further.

For these reasons, NSTL chose the alternative approach to deterministic fragmentation of a disk volume. NSTL wrote an application, named Fragger, which fragments the files on a hard disk in a controlled and repeatable fashion. Using this application, the same data set can be fragmented repeatedly on any number of differently sized disks. Different data sets, such as different versions of an application, can be fragmented in the same manner to test that effect as well.

The Dot NT Performance is Impeded by Disk Fragmentation

The theoretical analysis above demonstrates the way NTFS operates and the reasons why files and free space on NTFS volumes becomes fragmented. Testing by NSTL for Diskeeper Corporation using Fragger indicates that such fragmentation has a negative effect on system performance.

Detailed results on NSTL testing of fragmented NTFS volumes for Diskeeper Corporation and the benefits of defragmentation using Diskeeper Corporation’s Diskeeper 4.5 is available as a separate paper. We include some highlight results here for illustrative purposes.

In a benchmark using Microsoft Excel, NSTL tested two configurations with three levels of fragmentation each. In the first configuration the disk was completely defragmented, in the second the application data and program files were 13% fragmented, and in the third the application and NT paging file were fragmented a total of 38%. Fragmentation levels were somewhat lower in the second configuration. This test is illustrative of the effects of file system fragmentation on a typical Windows NT workstation.

The interesting comparison in these tests is between the defragmented configuration and the configuration where the application was 13% fragmented. Performance, measured in the amount of time it took for certain Excel macros to complete, degraded substantially; more specifically, it took almost twice as long for the tests to complete.

NSTL also performed similar tests on servers running Microsoft SQL Server and on combinations of clients and servers running Microsoft Outlook and Microsoft Exchange respectively. These tests demonstrate the effects of fragmentation on clients and servers in a busy Windows NT corporate network.

In the Outlook/Exchange tests, defragmentation improved performance at the workstation from 5.9% to 55.6% faster, depending on the test. Defragmentation on the server improved performance as much as 80.8% faster. In the SQL Server tests, some tests were over 100% faster when the server was defragmented.

In all these tests, the hardware being constant, the tests being identical, the only difference in configuration being the amount of fragmentation in the file system, we must conclude that fragmentation in the file system has a harmful effect on system performance, and potentially a severe one.

The Dot Enterprise Systems are More Susceptible to These Problems

The NSTL results for the Exchange and SQL Server are especially relevant to corporate administrators and users, because enterprise systems are especially vulnerable to fragmentation.

NT Servers, depending on the specific configuration, can manage a very large number of files. Furthermore, by their nature, servers in an enterprise environment are critical to large numbers of users. What slows one down slows all the users, and therefore the enterprise as a whole.

A typical network file server may have numerous home document directories for a large number of users. In the course of a busy day, many different users will write data to the hard disk. Keeping files contiguous in a heavy-use environment is a tall order for a file system.

Without careful management by an administrator, the files and free space on a server will eventually fragment. As we have seen, the operating systems and server applications that run on these systems have upgrades, official security patches and service packs, and application of these can cause fragmentation in the system, so even a properly maintained and managed enterprise server can become badly fragmented.

The Dot RAID Systems are Susceptible to Fragmentation

RAID systems, using both hardware RAID and Windows NT Server’s support for software RAID, are also susceptible to file fragmentation and need defragmentation.

Designers use RAID both for reasons of performance and robustness. By performing logically contiguous disk operations on multiple disks in parallel, instead of one longer operation on a single disk, raw I/O throughput is improved. Robustness comes in certain RAID configurations from redundancy of storage or use of parity information to enable recovery of data in the event of a physical error.

There is a small chance that, in a RAID configuration, a fragmented file would not incur an I/O cost, if the file fragments happened to be in the same data stripe. But in the usual case, the effect of fragmentation on a RAID system is the same as on a non-RAID system: additional head movement and I/O will be necessary in order to perform file operations.

A disk defragmenter, such as Diskeeper, views file order in the same way that NT does, whatever the physical organization of files on one or more disks, and will therefore optimize them properly.

The Dot Disk Caching Mitigates, Doesn’t Eliminate These Problems

Because disk caching lowers the amount of necessary physical disk I/O, it improves performance in general, even on defragmented systems. On a heavily fragmented system, it especially helps because the cost of I/O on an average file basis is so much higher.

Disk cache memory competes for space with general application memory. As memory requirements of applications and operating systems has increased over the years (Windows 2000 Professional will require 128MB RAM and Windows 2000 Server will require 256MB), memory has become cheaper, but not all servers have kept up.

And disk caching can only delay the inevitable writing of data to disk; NT must periodically flush its cache for safety reasons and to checkpoint the disk. With luck and good strategy, this writing of data can be done asynchronously and at a time when it will not delay any other running tasks, but sometimes it can’t. Consider that the workstation in the NSTL Excel tests had 128MB of RAM, a healthy amount for a workstation, which should allow NT ample room for caching, and yet fragmentation still slowed the system.

 

Run #

Defragmented