
NSTL White Paper
System Performance and File Fragmentation
In Windows NT
October, 1999
Table of Contents
Executive Summary
I. Introduction
File Fragmentation and Data Fragmentation are Different
Fragmentation Can Impede Performance
NTFS is Very Different from FAT
NTFS Does Get Fragmented
Performance Degradations Can Impede Productivity
Keeping a Disk Defragmented Can Prevent These Problems
II. How NTFS Works
NTFS Capabilities in Functional Terms
Master File Table
Directories
Compression
Software RAID
Dynamic Bad-Cluster Remapping
Disk Caching
Volume Sets
Paging Files
III. How NTFS Gets Fragmented
Normal Creation and Deletion of Extents
The Impact of Unusual Events
Checkpoints
Increased Head Movement From Disparity of Extents
Cluster Size Issues, Trade-offs with Capacity and Performance
System Files (Principally, but Not Exclusively, the Paging
File)
Fragmentation of Directories
Fragmentation of the MFT Itself
Workstation Specific Issues
Server Specific Issues
IV. The Implications of Fragmentation
Fragmentation is Difficult to Test
NT Performance is Impeded by Disk Fragmentation
Enterprise Systems are More Susceptible to These Problems
RAID Systems are Susceptible to Fragmentation
Disk Caching Mitigates, Doesnt Eliminate These Problems
Some User Scenarios are Performance Limited, and Productivity
is Therefore Impeded by Fragmentation
"Optimization" is Not a Solution
V. Conclusions
Regular Defragmentation Can Mitigate Performance Problems
Both Workstations and Servers Can Benefit
Glossary
|
This report was prepared by NSTL under contract for Diskeeper Corporation.
NSTL does not guarantee the accuracy, adequacy or completeness of the services provided.
NSTL MAKES NO WARRANTIES, EXPRESSED OR IMPLIED, AS TO RESULTS TO BE OBTAINED BY ANY PERSON
OR ENTITY FROM USE OF THE CONTENTS OF THIS REPORT.
NSTL MAKES NO EXPRESS OR IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE OF ANY PRODUCT MENTIONED IN THIS REPORT.
|
Executive
Summary
Contrary to early conventional wisdom about Windows NT, its file systems do become
fragmented. This fragmentation occurs in the normal course of using the operating
system. Theoretical analysis and real-world performance testing demonstrate that
fragmentation has an adverse impact on system performance. Special characteristics
of the NTFS file system, such as the paging file, directories, and the Master File
Table, are especially vulnerable to fragmentation, and allowing them to become fragmented
is a guarantee of a decrease in overall system performance. Other NTFS features,
such as file system compression, inherently create fragmentation.
The best way to avoid these worst-case fragmentation problems, and to keep the system
running at optimal performance, is to run a defragmentation system on a regularly,
scheduled basis. Both Windows NT Workstations and NT Servers are subject to these
problems, and both can improve system performance through regular defragmentation.
I.
Introduction
All computer system design involves trade-offs, and file systems are no exception.
One of the major detrimental effects of these trade-offs is fragmentation of files
and the file system. Files in a file system become fragmented usually when they
begin to run out of large physical stretches of free space. Rather than deny a file
the ability to grow beyond the size of the largest free block on disk, file systems
allow different parts of the file to exist in different non-contiguous locations,
and the file system software presents the file to programs running on the computer
as one logical unit. File systems can also become fragmented when files become scattered
across the disk, even when the individual files themselves are not fragmented into
multiple sections. In the long term, this can happen for the same reasons as those
that cause internal file fragmentation, and can occur in the normal course of computer
use.
Normal computer use involves the creation and deletion of files, some of them permanent,
some of them transient. Many typical computer processes, such as desktop publishing
or software development, involve the creation of large numbers of temporary files,
the presence of which the user is normally unaware. During the user task, the program
reads source files and may create temporary files to store data used in a later
portion of the task. In the end, the application may write result files and delete
original source files and temporary files.
The end result of this process is that small runs of free space appear amidst the
allocated space on the hard disk. This, in and of itself, is a form of fragmentation
that decreases performance even if individual files are not internally fragmented.
Over time, as the larger runs of free space on the hard disk are lessened in this
way, individual files become fragmented because the file system will lack the space
to contiguously allocate a file. The term used for space such as this, which is
unallocated to any file but unavailable to some degree because it is split into
multiple sections, is external fragmentation.
More importantly, as individual files grow, there will not be sufficient free adjacent
space for them, and the file system will need to allocate a non-contiguous or non-adjacent
block of space for new data.
Windows NT also supports the FAT and HPFS file systems, which have fragmentation
issues of their own. But these file systems are provided for compatibility with
legacy systems, such as DOS and OS/2, and do not support the full gamut of Windows
NT features, such as integrated security. Many of the issues explored in this paper
apply to those file systems as well as to fragmentation generically on any operating
system, but the focus of this paper will be on the NTFS file system under Windows
NT 4.0.
File
Fragmentation and Data Fragmentation are Different
Its important to note the distinctions between fragmentation at different
levels of data storage. Individual applications, such as Microsoft Office programs
and database servers like Oracle, have their own issues of fragmentation in their
data storage. These issues are generic to all file systems and operating systems.
Such disk fragmentation would exist regardless of the file system or operating system.
The file system, NTFS in the case of NT, is not aware of the logical organization
of your data. Wherever the file may exist on the disk, and whether or not the file
is fragmented, the file system presents it to the application as a single contiguous
area of storage. But the applications view of the data in that file has a
logical structure. To a mailing list program, a file may be a group of first names,
last names, addresses, and so on. To the file system it is still just a group of
clusters of data. A cluster is the smallest unit of storage, which can be allocated
by the operating system on a disk. A cluster may consist of one or more sectors
of the disk.
The application may, in its own internal organization of the data in the file, create
gaps in the data, i.e. it may fragment it. Much like a file system, when you delete
data in an application it may not actually remove the data, but only mark it as
deleted. The resulting gaps in the logical storage of data are known as internal
fragmentation.
Data files may also have allocated but unused space for other reasons. Programs
may allocate space in a file in chunks of space analogous to file system clusters,
for their own organizational or performance reasons. They may also use external
facilities, such as Windows OLE structured storage, to manage the structure
of the data in their files, and these facilities may have their own wasted space.
Over time, the growth of such areas will cause the total size of the file to grow
and may slow the performance of the application as head movement on the disk increases,
even if the logical amount of live data remains constant. This problem occurs even
if the file itself is not fragmented at the file system level, although data fragmentation
increases the likelihood of file fragmentation simply because the file itself grows.
To combat internal data fragmentation, some applications, such as Microsoft Access,
provide utilities to defragment (or "compact") the data in the file. Ironically,
these utilities themselves run a substantial risk of increasing fragmentation at
the file system level because they usually create an entirely new copy of the file,
consuming large amounts of disk space in the process. Thus, regular defragmentation
of your data files may exacerbate fragmentation of your file system.
Lastly, the individual files associated with an application can, over time, become
physically dispersed across a disk. This type of fragmentation, known as usage fragmentation,
is an especially difficult problem for a defragmentation program, because normal
methods of fragmentation analysis may not identify it. Instead, some knowledge of
the applications behavior may be necessary in order to rectify this problem.
In the future, this problem could, in theory, be managed either by applications
providing information about their files to the defragmenter or by sophisticated
analysis of the file system journal.
Fragmentation
Can Impede Performance
Almost all hard disks have the same basic design: a stack of circular platters with a series of heads that move across the disk to read concentric circular tracks.
In most cases, heads in the disk move in lock step, all the heads will always be physically located over the same track at once, and this group of tracks is called a cylinder.
Hard disks operate at their fastest when they are reading physically sequential data, one track at a time, switching from one head to another within a single cylinder, and moving on to the next physically adjacent cylinder. Under these circumstances the disk can read or write data and pass it back to the interface and to the computer with a minimum amount of head movement. If the next data to read or write were stored elsewhere on the disk, the process would have to wait for the heads to move to the correct cylinder and settle over the appropriate sector within that cylinder. Head movement is expensive in terms of computer performance and, in order to maximize performance, head movement should be minimized.
Modern hard disks usually read one track of information at a time, so keeping files and free space defragmented also takes maximum advantage of the hard disks ability to read your data in anticipation of your using it, as well as to cache that data in hardware. The more contiguous your data is on the disk, the more likely it is to be read in a single hard disk read operation. One implication of this is that fragmentation (either internal or external) of a file that lies within a single track on a disk is irrelevant, or at least less relevant, to performance, because head movement will be constant.
All file system designers are faced with a trade-off between several factors, including performance, efficient use of space, and tendency to fragmentation. File systems allocate disk space in units called clusters.
If a file consumes less than an exact multiple of cluster size, the remaining space, often called cluster slack, is technically wasted. But as disks and average file size become larger, it makes sense to use larger clusters, and risk larger amounts of cluster slack. In a well-designed file system, even if cluster size increases, the overall percentage of space wasted as cluster slack remains small, and as the average size of a file increases, the waste in cluster slack also loses its importance. As we will see below, NTFS has special design features that lessen the impact of cluster slack in small files.
Real world experience and research indicate that, while some files have gotten large over time, average files remain small enough that smaller cluster sizes, 4K or less, are optimal.
NTFS
is Very Different from FAT
Windows NT is much smarter than its predecessor operating systems in allocating
disk space to files. As a result, it is less prone to fragment files. But as a side
effect of preventing file fragmentation, NTFS creates fragmentation in the file
systems free space. Still, NTFS is not immune to the forces that fragment
individual files, and over time, files on an NTFS volume will become fragmented.
Starting in version 4.0, Windows NT provides operating system calls designed to
facilitate defragmentation, and defragmentation software for Windows NT usually
uses these calls. But the design of NTFS and practical implications of how these
APIs (application programming interfaces) operate, mean that it is important not
only to defragment your disks, but also to do so on a regular basis.
NTFS
Does Get Fragmented
The Windows NTFS File System Driver uses a special file called the Master File Table
(MFT) to track all files on that volume. The MFT starts out with some free space
to allow new files to be tracked, but on a very busy system it too can run out of
space. At this point NTFS extends the MFT itself, creating new stretches of it for
new allocations. This situation is precipitated most often by fragmentation in the
file system itself, as file system fragments consume entries in the MFT. If these
new stretches are not contiguous, the MFT itself becomes fragmented.
There are other files, such as the paging file used by Windows NTs virtual
memory subsystem, which can also become fragmented with unpleasant implications
for performance. The solution to these problems, as we will see, it to prevent them
from happening by keeping your system defragmented.
Lastly, directories in NTFS are allocated similarly to files, but defragmentation
of them can be difficult.
Performance
Degradations Can Impede Productivity
Windows NT does a good job of allowing the system to continue operation even as
programs wait for disk I/O, but some inefficiency cannot be hidden forever. Especially
on a mission-critical server, on which many users rely, inefficiencies in the file
system can lead to performance degradation that impedes user productivity.
These problems are not always apparent, and are frequently cavalierly blamed on
other sources; perhaps the computers just too slow, needs more memory, or
some program being run needs an upgrade. Overall system performance is a complex
phenomenon, and even experienced system administrators may not recognize fragmentation
in a file system. After all, it can occur with large amounts of free space on the
disk. But the main reason users dont recognize fragmentation is because Windows
NT comes with no tools to identify it.
Heavily used systems, which are by definition mission-critical systems for an organization,
will become fragmented over time under normal usage in Windows NT. As performance
decreases in such systems and users are forced to wait, productivity is thereby
impeded.
Keeping
a Disk Defragmented Can Prevent These Problems
Regular defragmentation of the file system improves overall system performance and,
as a result, allows the rest of the system to operate at optimal performance speed
given normal circumstances.
Heavily fragmented systems can become difficult to defragment, so it is important,
in order to maintain optimal performance, to defragment on a regular basis to prevent
especially problematic circumstances, such as a fragmented paging file or MFT, from
arising. Windows NTs scheduling service and performance monitoring tools provide
an efficient solution to this problem by allowing defragmentation to be scheduled
for off hours and/or when other load on the system is light.
II.
How NTFS Works
NTFS
Capabilities in Functional Terms
NTFS is a modern, robust file system designed to support both single user workstations
and multi-user servers. Microsoft designed NTFS to overcome the most serious limitations
of their predecessor file systems, FAT and HPFS, as well as to support planned features
in Windows NT, such as integrated security and support for the POSIX standard.
NTFS has very high limits on storage capacity. It uses 64-bits to number clusters
which can occupy up to 64K, meaning that a disk volume in NTFS can be up to 264
(16 billion billion) clusters or 280 bytes, and each file can be up to
264 bytes. Both FAT and HPFS had much smaller limits. While NTFS is internally
capable of managing this much storage, the disk partitioning scheme or hardware
addressing may limit the partition size to a smaller number.
NTFS is a recoverable file system. This means that operations in NTFS are transactions,
as in a database. Either the entire operation completes or the operating system
has the capability to roll back the unfinished portion, safeguarding the integrity
of the existing data. NTFS also stores redundant copies of critical file system
structures in the unlikely event that physical damage makes one copy of them inaccessible.
Security is integrated directly into the NTFS system and derived from the Windows
NT object model. Security objects, known as ACLs (Access Control Lists), are stored
in the MFT as part of the file. These are the actual security objects used by Windows
NT to restrict access to the file object.
Files in NTFS have attributes: a name, a creation date, an archive bit, and so on.
In fact, the data in the file is just another attribute. This characteristic of
NTFS is how Windows NT implements many of its sophisticated features, such as complex
access controls and support for Apple Macintosh clients. Macintosh files, for example,
have two sections, a resource fork and a data fork. NTFS manages the association
between these sections by storing them in different attributes of the same file.
In some ways, the organizational system of file attributes combats fragmentation,
because programmers might otherwise have used additional files to store attribute
data. But heavy use of attributes can cause fragmentation within the MFT itself.
Because Windows NT is fully Unicode-enabled, so is NTFS. All data in NTFS file systems
are stored in the 16-bit Unicode encoding scheme, where each character in the file
name is stored in 16 bits in the files name attribute
.
Filenames can take up to 255 characters including multiple periods
and embedded spaces.
Master
File Table
The heart of the NTFS file system is the Master File Table or MFT. The MFT is itself
a file, an array of records constituting a database of all files on the system.
Each record in the MFT is usually fixed, by definition, at 1K, and the format of
the first 16 records is defined to contain certain volumespecific information,
and are known collectively as the NTFS metadata files. Metadata is the name given
to these overhead structures in the file system, which are used to track the real
data. The first four records are duplicated in a file at or near the physical center
of the disk for recoverability purposes.
Normally, each record in the MFT corresponds to one file or directory in the file
system. The MFT record contains the files attributes. Other standard attribute
information in a file record includes the read-only and archive flags; creation
and last-accessed dates; the file name, of which there are likely at least two (a
"long" file name and a short "8.3" DOS-compatible name); a security
descriptor; and the file data, or pointers to where the file data resides on the
disk.
Yes, the data in a file is just another attribute of NTFS. For this reason, small
files (about 750 bytes, depending on the number of other attributes in the file)
can fit entirely within their MFT entry, giving Windows NT and NTFS excellent performance
with such files. Such files also exhibit zero fragmentation.
There is at least one entry in the MFT for each file on the NTFS volume, including
the MFT itself and other "metadata" files. These are the files, such as
the log file, the bad cluster map, and the root directory, which contain the structure
of the rest of the volume as seen by NTFS. Users dont see these files, which
all have names beginning with $ (for example, the MFT is in $MFT). Most
of the remaining entries in the MFT are for user files and directories.
In a perfect world, that would be it for the MFT. Of course, many files are not
so small that their data fits within their MFT entry, so the MFT stores their data
in one or more areas of the disk. NTFS allocates files in units of clusters. The
clusters within a file are referenced by NTFS in two ways: first, with Virtual Cluster
Numbers (VCNs), from 0 through n-1 where there are n clusters in the
file; second, with Logical Cluster Numbers (LCNs), which correspond to the number
of the cluster on the NTFS volume.
Because LCNs are simply an index to the clusters on a volume, NTFS uses an LCN to
calculate an address on the disk to read or write by simply multiplying the LCN
by the number of sectors per cluster and reading or writing sectors starting at
that address on the disk.
VCNs are the analog for file offsets requested by applications running under Windows
NT. The application knows the format of the data it uses in the file and uses it
to calculate a byte offset within the logical format of the file. When the application
requests a read or write at that address of the file, NTFS can divide that number
by cluster size to determine a VCN to read or write.
By associating VCNs with their LCN, NTFS associates a files logical addressing
within its files with the physical locations on disk. This mapping of VCN to LCN
is what the files data attributes do.
All files have at least one data attribute, known as the "unnamed data attribute."
There can be other named data attributes, which correspond to the multiple streams
of data referred to above. Directories do not have unnamed data attributes, but
they can have named ones.
If any attribute, most likely the file data attribute, does not fit in the MFT record,
NTFS stores it in a new, separate set of clusters on the disk, called a run or an
extent. In fact, other attributes besides the data can become large enough to force
new extents. For example, long filenames in Windows NT can be up to 255 characters
that, because they are stored in Unicode, consume 2 bytes apiece. When an attribute
is stored within the MFT entry, it is called a resident attribute. When one is forced
out to an extent, it is called a non-resident attribute.
It may come to pass that the extent will need to grow, for instance, if the user
appends data to a file. In this case, NTFS will attempt to allocate physically contiguous
clusters to the same extent. If there is no more contiguous space available, NTFS
will need to allocate a new extent elsewhere on the disk; in other words, it will
separate the file into two fragments. The data attribute header, still stored within
the MFT record, stores the information in the form of LCNs, and run lengths that
NTFS uses to locate the extents.
In rare cases, usually when the number of attributes is large enough, NTFS may be
forced to allocate an additional MFT entry for the file. In this case, NTFS creates
an attribute called an attribute list, which acts as an index to all the attributes
for the file or directory. This is an unusual situation which should occur only
with files that are extremely large and fragmented, and can greatly slow the performance
of operations on that file.
Directories
Directories are very much like files in NTFS. If the directory is small enough,
the index to the files to which it points can fit in the MFT record in an attribute
called the Index Root attribute. If enough entries are present, NTFS will create
a new extent with a non-resident attribute called an index buffer.
In such directories, the index buffers contain what is called a "b+ tree,"
which is a data structure designed to minimize the number of comparisons needed
in order to find a particular file entry. A b+ tree stores information (or indexes
to that information) in a sorted order. At points in the directory, NTFS stores
sorted groups of entries and pointers to entries that fall below those entries in
the sort. This has many advantages over storing entries in whatever order they happen
to fall. For example, if you want a sorted list of the entries in the directory,
your request is satisfied quickly because that is the order of storage in the index
buffer. If you want to look up a particular entry, the lookup is quick because the
trees tend to get wide, rather than deep, which minimizes the number of accesses
necessary to reach a particular point in the tree.
Compression
NTFS supports compression of file data as a native function of the file system.
One of the side effects of compression is that it can create fragmentation of files
and of free space.
You can instruct NTFS to compress data on an entire volume, in a specific directory,
or even in a particular file. There are Win32 calls for programs to use to determine
the impact of compression, in particular the compressed and uncompressed file sizes.
If you get a files properties in Windows NT Explorer, you will see both sizes.
It is in this compression scheme that you begin to see the flexibility created by
NTFSs use of both VCNs and LCNs, as well as the potential for problems. In
a normal file that has data stored in non-resident attributes or extents, the data
attribute will contain mappings of the starting VCN and starting LCN in the extent
as well as the length in clusters.
NTFS plays games with these cluster numbers to achieve compression, using two basic
approaches. Because some large files have large blocks of nulls (bytes of value
0), NTFS uses a sparse storage for such files, meaning that it only stores the non-zero
data.
Imagine a 100 cluster file in which only the first 5 and last 5 clusters contain
data, and the middle 90 are all zeroes. NTFS can store two extents for this file,
each 5 clusters long. The first will have VCNs 0 through 4 and the second will have
VCNs 95 through 99. NTFS can infer that VCNs 5 through 94 are null, and do not need
physical storage. If a program requests data in this space, NTFS can simply fill
the requesting programs buffer with nulls. If the program allocates non-zero
data to this space, NTFS can create a new extent with the appropriate VCNs. This
method is very fast for sparse files.
If a file is not predominately null, NTFS uses a different compression method. Instead
of trying to write the file data in one extent, NTFS will divide the data up into
runs of 16 clusters apiece. In any particular extent, if compressing the data will
save at least 1 cluster, NTFS will store the compressed data, meaning 15 or fewer
clusters. If the data cannot be effectively compressed (random data, for example,
is generally not compressible), NTFS will simply store the entire extent as it normally
would without compression. Back in the MFT record for this file, NTFS can see that
there are missing VCNs in the runs for a file and can infer that the file is compressed.
Because the data is stored in a compressed form, it is not possible to look up a
specific byte by calculating the cluster in which it is stored. Instead, NTFS calculates
in which 16 cluster run the address is located, decompresses the run back to 16
uncompressed clusters, and then calculates the offset into the file using valid
virtual cluster numbers. NTFS ensures that all these runs begin with a virtual cluster
number divisible by 16 so that this addressing remains possible without having to
decompress the entire file.
NTFS tries to write runs of this type into a single contiguous space because the
I/O system is already encountering enough added processing and management burden
using compressed files without having to fragment individual extents. This is part
of the reason NTFS designers chose 16 clusters as the size of a compressed
run; it cannot be more than 64K, because the file system buffers are 64K each. It
is also very likely to be read in a single I/O operation.
NTFS also tries to keep all the separate runs of the file contiguous, but this is
a harder job. Compressed files are more likely than non-compressed files to be fragmented.
NTFS only compresses the files data attribute, not the metadata. Compression
only works on volumes with 4K clusters or smaller.
Software
RAID
NTFS also supports fault tolerance in disk subsystems by dynamically mirroring or
striping data across multiple disk volumes. NTFS supports RAID levels 1 and 5. In
level 1, known as mirroring, data written to a volume is written in parallel to
a second volume; data read from a volume is also read from the second volume and
compared to it for correctness. In level 5, known as striping, data streams ("stripes")
are divided among three or more disks, using some of the space to store parity information.
If one of the disks registers a physical error, NTFS can calculate the missing data
using the remaining data and the parity information and the logical exclusive-OR
(XOR) operation.
Dynamic
Bad-Cluster Remapping
NTFS is able to dynamically detect the presence of a physically bad cluster and
map around it. If, on a disk which has been formatted as an NTFS fault tolerant
volume, the NTFS driver attempts to read a cluster and the read operation fails
due to a physical read error, the NTFS fault tolerance driver dynamically retrieves
a good copy of the data that had been stored in the bad sector using a striped or
mirrored volume. NTFS then maps a new cluster to replace the bad one and writes
the data to it, and then marks the bad cluster so that it is no longer used. On
a non-fault tolerant volume, NTFS can still detect bad clusters and mark them as
such, but they cannot necessarily retrieve the data.
Remapping the bad cluster almost certainly fragments the file into at least three
fragments. Todays hardware is usually reliable and it is good that NT has
the capability to maintain the integrity of files in this way, but the potential
for sudden fragmentation in critical files is another reason to defragment file
systems on a regular basis.
Disk
Caching
Windows NTs I/O Manager integrates a Cache Manager that is involved in all
disk I/O. When an application attempts to read data that has not been loaded into
the cache, the Cache Manager interacts with the Windows NT Virtual Memory Manager,
which calls the NTFS file system driver to load the data into the cache. Similarly,
the Cache Manager uses the memory manager to perform all disk writes using background
threads.
Unless instructed otherwise, NTs Cache Manager caches all reads and writes
on all secondary media. Cache Manager uses a number of aggressive techniques to
improve performance. For example, it will attempt to read ahead in a file in anticipation
of a program requesting the following data. It will also delay writes to the disk,
so that if reads or writes of the same data occur quickly, they will be satisfied
out of the cache rather than a physical disk operation.
Aggressive disk caching can mitigate the effects of disk fragmentation to the extent
that data that is read by applications is read from the cache rather than from the
disk itself. In fact, adding memory to a heavily fragmented system can improve performance
on a fragmented system, although this is an expensive solution to a problem that
can be fixed at little cost through software and good practices.
Volume
Sets
The NT fault-tolerance driver also provides some functions unrelated to fault-tolerance,
including Volume Sets. A volume set is a single logical volume composed of areas
of free space on one or more disks. Using the NT Disk Administrator utility, you
can combine two 100MB free areas on different disks into a single logical 200MB
volume. These volume sets can be formatted with any NT-supported file system, although
there are advantages to using NTFS.
Volume sets are useful for combining smaller disks or free space on larger disks,
into a single, more useful area that can be treated as a logical unit. If the volume
is formatted with NTFS, the administrator can add new stretches of free space to
the volume set while maintaining data on the existing volume. This can be a low-impact
way for network administrators to add storage to an existing network drive without
impacting users view of the network.
The problem with volume sets, from a fragmentation standpoint, is that they have
the capacity to exacerbate normal fragmentation into even more performance-limiting
fragmentation across physical volumes or physically separate free stretches of a
single volume. Windows NT file systems dont see the fact that they are working
with multiple volumes and therefore treat volume sets as they would any single physical
device.
Paging
Files
Paging files present a special problem for fragmentation under Windows NT. NT supports
up to 16 paging files on a system. These files are used for virtual memory; as Windows
NT and its applications use memory in excess of the physical RAM, the Virtual Memory
Manager writes the least-recently used areas of memory to the paging files to free
RAM. If a program accesses these areas of memory, the Virtual Memory Manager reads
them from the paging file back to RAM where the program can use them.
Once the system starts up, these files are always open and cannot be moved or deleted.
At startup, the Windows NT System process duplicates the file handles for the paging
file so that the files will always be open and the operating system will prevent
any other process from deleting or moving them.
For this reason, paging files are a problem for defragmentation software. In order to safely defragment the paging file, defragmenters must defragment them at system boot time before the Virtual Memory Manager gets a chance to lock them down. While this is a desirable feature, regularly rebooting a system to defragment it is not a desirable situation, so the best solution is to keep the rest of the file system defragmented to mitigate any fragmentation problems caused by the existence of paging files.
III.
How NTFS Gets Fragmented
Normal
Creation and Deletion of Extents
In the normal course of computing, on any operating system, files are created and
deleted, visibly and invisibly. This process leads to the creation of gaps in the
used portions of physical storage. As a disk becomes more full, and use of it becomes
heavier, it is likely that the large areas of free space that are present early
in the systems life will break down into smaller free areas throughout the
system.
Many programs will explicitly retain the last version, or several versions, of the
file the user is working on. Eventually the backup versions are deleted, and their
space is freed up. The result is probably a gap in free space on the disk. Or consider
the case of downloading the latest version of Netscape Communicator. You might download
a 20MB executable program and run it, creating another 20MB or more of files in
the Program Files directory. Then you will likely delete the 20MB file you downloaded.
The result is that you have a 20MB gap, possibly in one place, possibly split up,
and the newly installed program is likely stored after the gap on the disk. The
operating system has fewer large free areas to work with.
But programs and the operating system create files on their own without telling
the user. Consider the print spooler. Ever since the early versions of DOS, when
you print a file, your program and the operating system actually performs at least
two steps. First, it creates a file containing the data printed by the application.
In the case of Windows applications, these are in an intermediate format called
Windows Metafile Format (WMF). The printer driver for your printer then converts
this data to a separate file in the native format for the printer, and then the
spooler sends that file to the printer. All this data consumes space on the disk
temporarily and is then deleted. Printing a large document consumes a correspondingly
large amount of disk space.
The
Impact of Unusual Events
Such normal events can cause fragmentation, but it would take a long time and a
lot of use. But fragmentation in NTFS is easy to create using unusual, but not unreasonable,
techniques. The example above of downloading a large file and installing it is a
minor example of this.
To date, there have been 5 service packs for Windows NT 4.0. Each of them has involved
a multi-megabyte download, and each makes changes in a large number of NT system
programs likely to be stored at the front of the disk. Installing a service pack
is therefore likely to push NT system programs further out on the disk, creating
gaps. Large service packs may cause fragmentation within NT files themselves, and
certainly make fragmentation of other files more likely. Consider also that Microsoft
SQL Server, BackOffice, Office 97, and many other common NT programs have their
own service packs, and that installing them brings all the same implications of
installing an NT service pack. Application upgrades have all the same implications
as well.
Youd think that installing a new Windows NT Workstation would start the system
out in a clean, unfragmented state, but even this is not necessarily true. Even
a clean install will likely end up fragmented, because the installation process
creates numerous files and directories that it then deletes. The subsequent application
of service packs exacerbates the situation. It is not unusual for a user to install
NT on a system with an existing FAT-formatted drive. NT has the capability to convert
the drive to NTFS, but doing so requires moving files around in ways that will fragment
free space.
Checkpoints
Aggravating the problem is the fact that NTFS doesnt immediately make deallocated
clusters available for other programs. Instead, they become available after the
next time NT "checkpoints" the disk.
Checkpointing is part of NTFS facility for recovering from errors. As we stated
above, I/O operations in NTFS are transactions. As it performs I/O operations, such
as appending data to a file, NTFS logs undo and redo data for that operation. At
some point between transactions, when the disk is known to be in a good state, NTFS
writes a checkpoint record to its log. If NT detects a disk error while performing
an operation it enters a recovery procedure consisting of three passes: the analysis
pass, the redo pass and the undo pass.
In the analysis pass, NTFS determines which parts of the operation failed and which
clusters it must update in order to undo the transaction. In the redo pass, NTFS
performs all other operations that were logged since the last checkpoint. Then in
the undo pass, it rolls back any uncommitted operations in the offending transaction.
Because NTFS cannot be certain of the disposition of data in a cluster until a checkpoint,
it cannot allow other data to be written to that cluster. Note that no errors need
occur for this to happen. It is unlikely to affect a large amount of disk, but it
happens every time a cluster is freed, and will tend in the long term to push data
further out in the disk, and thus to diminish the average size of a free area of
disk.
Increased
Head Movement from Disparity of Extents
As stated above, an I/O subsystem operates at maximum speed when the disk transfers
data to or from adjacent sectors on the disk. This is because the heads on the disks
have to move at a minimum under such circumstances. Head movement is the enemy of
I/O performance.
It is a rare event indeed when the disk gets to read or write contiguously for a
long time. It is normal for the heads to move around as Windows NT reads and writes
to different files in the normal course of its business.
For example, consider the Checkpointing system described above which allows Windows
NT to recover the file system to a correct state even in the event of a power failure
or physical disk error. The undo, redo and checkpoint information that makes recoverability
possible is stored in a log file that the NT Log File Service (LFS) maintains. Periodically,
in the course of writing to some other part of the disk, NTFS writes log entries
about the disk operations it is performing to the log file.
Head movement is also inevitable when the operating system pages memory out to disk.
The Virtual Memory Manager will begin to page memory out to disk even before there
is no unallocated memory. This is a reasonable policy, but it may negatively impact
the performance of disk-intensive applications. In a heavily trafficked system,
paging to and from disk is not uncommon, and consumes both CPU and disk time.
Even with the normal amount of head movement that occurs in a system, an application
can perform at full or near-full speed. But fragmentation in data or program files
can significantly increase the amount of time it takes to perform disk operations.
Cluster
Size Issues, Trade-offs with Capacity and Performance
When you format a volume using NTFS you have a choice of cluster size to use. Windows
NT has different default cluster sizes for different size volumes. This is a simple
association, and knowledge of how the volume is to be used could be used to choose
a cluster size more optimal than the default.
Depending on your priorities, you might want to choose a different cluster size
than the default, but be careful. Choosing a smaller cluster size will waste less
space but is more likely to cause fragmentation. Larger cluster sizes are less likely
to cause fragmentation but will waste more space.
512 byte clusters in particular are problematic, especially since the MFT consists
of records that are always 1024 bytes. It is possible on a system with 512 byte
clusters to have individual MFT entries fragmented. MFT record fragmentation of
this type is not possible with larger cluster sizes, which can hold one or more
complete MFT Records.
If a file or directory is contiguous, the cluster size doesnt matter, except
to the extent that it wastes a small amount of space. It is therefore wise to choose
a cluster size large enough discourage any more fragmentation than you are likely
to encounter on NT anyway.
But if you know that you have a very large number of small files, or if you know
that you have very few small files, you have information that you can use for a
better cluster decision. Also, a very large absolute number of files (on the order
of 100,000) will make fragmentation of the MFT more likely. In this case, a larger
cluster size will limit the fragmentation in the MFT as it grows to accommodate.
Note that it is possible to create an NTFS volume with a cluster size greater than
4K, however, if you do that you can not use NTFS compression, nor can you get defragmentation
using the built in supported Microsoft defragmentation interface.
System
Files (Principally, but Not Exclusively, the Paging File)
DOS and Windows have a small number of files that are known as System files, which
make them invisible and unmovable. Windows NT makes far greater use of these files.
These files consume a non-trivial portion of the disk space, especially on a boot
volume.
Windows NT has two kinds of system files. The first kind are the files which constitute
the structure and overhead of the NTFS file system. Call them "NTFS System
Files." The MFT is one such file (named $Mft), with special implications, which
we deal with below.
First, there is a copy of the first four records of the MFT named $Mftmirr, stored
near the physical middle of the disk. There is also the Log File ($Logfile), the
Volume file ($Volume), the Attribute Definition Table ($Attrdef), the Root Directory
File ($.), the Cluster Bitmap ($Bitmap), the Partition Boot Sector ($Boot), the
Bad Cluster File ($Badclus), the Quota Table ($Quota), and the Upcase Table ($Upcase).
($Quota is not used in NT 4.0, but Windows 2000 uses it to implement user storage
quotas.)
These files are always present on an NTFS volume. The APIs that Windows NT provides
to support defragmentation do not move these files, so they cannot be defragmented
while Windows NT is running.
But there are many other such files, and as with DOS, they present problems for
defragmenters. Call them Windows NT System Files. For example, NTDETECT.COM, the
multi-boot loader, and ntldr, the Windows NT loader program, are Windows NT System
Files. Some notebooks, with proper support, will have large hibernation files, the
size of physical memory, and most importantly to every day use and performance,
the paging file.
Disk I/O to the paging file (\pagefile.sys) is almost always heavily fragmented,
because the process being read or written from or to the paging file is not guaranteed
to be adjacent to the next process being accessed in the paging file. This is one
of the most crucial files for Windows NTs overall performance, because access
to it usually occurs at a point where performance is already being constrained by
memory. A large number of fragments in the paging file bring with them a severe
performance penalty.
Fragmentation of Directories
NTFS treats directories almost exactly as it treats files. In fact, directories
are just another type of file, although they have special types of attributes in
their MFT records. Normally applications manage the contents of the data in their
files; in the case of directories, it is NTFS that manages the contents, which are
b+ trees that provide an indexed access to files in the directories.
Some directories, such as most application program file directories, arent
likely to grow or shrink much over their lifetimes. But some directories, such as
the TEMP directory or user document directories, are likely to grow and shrink considerably.
As the number of files in a directory grows, NTFS can grow the directory storage
to accommodate it. In the right circumstances, if the content of the directory shrinks,
NTFS can also free up the unused space in the directory, but this doesnt happen
very often.
The directories that are likely to grow and shrink are also the type that is likely
to have been created early in the systems life, such as My Documents and TEMP.
Therefore it is likely that, as they grow, their growth will be non-contiguous.
These are also likely to be heavily used directories, so this fragmentation is likely
to have a real impact on system usage.
Users should also be aware that deeply nested directories may present an organizational
convenience, but there is a performance penalty for them. When NTFS searches its
b+ trees for data, it does so once for each level in the directory subtree. Therefore
performance may be better with flatter trees that have larger numbers of files in
them than with deeper trees that have fewer files in each. Very deep subtrees can
also create problems for applications that have limits to the number of characters
in a complete file path. Many applications limit such a name to 255 characters.
Fragmentation
of the MFT Itself
Normally the MFT uses one entry per file or directory. The area on the disk reserved
for the MFT begins life at the time the volume is formatted with about 12.5% of
the total volume space reserved for the MFT. This reserved space (the "MFT
zone") and the MFT itself are not movable. If everything goes well, the MFT
as pre-allocated will be more than up to the task of tracking file and directory
metadata.
But when a file becomes very fragmented, it increases the amount of data NTFS need
to store in the MFT record in order to track the various fragments or extents. Eventually
the MFT record is not large enough to store the data, and NTFS must allocate another
record. Because of this, keeping the disk generally defragmented helps to prevent
the MFT from becoming fragmented.
Part of the problem with the MFT is that it will grow if necessary, but will never
contract. In a system with a large number of files, or one that is heavily fragmented,
the MFT may run out of available entries. In this case, NTFS will expand the MFT
in 32 record chunks.
Because use of the volume after it is formatted creates files physically following
the MFT zone, expansions of the MFT can be made contiguously if no other files are
in the MFT zone. These new entries will contain metadata describing recently created
files that are likely to be used, and performance in using them will suffer greatly.
As noted above, if the MFT begins to fragment, it is better to have a larger cluster
size on the volume, as this will limit the number of fragments.
Temporary files are one of the principal ways in which large numbers of files can be created, and the effect is insidious. Users arent usually aware of the number of temporary files created during operations like compiling, word processing, and even using the Internet; Microsofts Internet Explorer creates a particularly large number of temporary files. Heavy use of such files and failure to clean them up can fragment not just files and free space, but the MFT itself. Users should use utilities, included with recent versions of Windows and available from 3rd parties, to clean up unused temporary files, shortcuts that point to nowhere, and other "Windows droppings" that accumulate over time, and run these utilities on a regular basis.
Workstation
Specific Issues
Even though the typical server has much more I/O than the typical workstation
thats what its there for after all workstations are still subject
to much fragmentation. They need to be defragmented on a regular basis just as servers
do.
Workstations share with servers the issues of service pack installation and its
corresponding fragmentation. They have the same system files and paging issues as
servers. And even in an environment where workstation users run all their programs
off a server, and store all their data files on a server, workstation users still
likely have their temp directories stored on local storage. In fact, they may have
considerably more temporary files than servers because they are likely to have browser
cache files. Since they probably have less memory than servers, they have a lesser
ability to cache I/O data, making them more likely to perceive the performance implications
of fragmentation.
But in the real world, workstations usually have applications and local data storage
too. Because they get less attention from experienced network administrators than
servers, they may not be as efficiently managed.
For these reasons it is important not only that workstations be defragmented on
a regular basis, but that an automated system be set for doing so. Regular end-users
are less likely to monitor the state of their file systems than computing professionals.
Server
Specific Issues
Fast disk I/O is a priority for almost any server. Whether that system is serving
data from a database, running an accounting package, or simply serving files requested
by clients, disk I/O is a crucial part of the work performed.
In fact, the trend in the computer industry is to put more and more on the server
and to manage it there. This is the basis of all the major trends in software from
Microsoft and others in the NT market, where logic is moving away from clients and
onto middle and back-end "tiers," a.k.a. servers, where the data can be
more efficiently managed.
These servers are expected to interact with numerous clients and other servers on
the network, and in the process of doing so they usually interact with the file
system. Web/Intranet servers are especially likely to have large numbers of files
to manage.
Most web servers, both on the Internet and corporate intranets, serve a combination
of static and dynamic web pages. Both types of web page involve, at a minimum, reading
from the file system. A static web page server is completely analogous to a conventional
file server. A dynamic web server takes a combination of script files, template
files and user input, and constructs a response page to send to the client program.
The dynamic server is not likely to write that file out to disk before sending it,
but the scripting engine, database server, and other mid-level and back-end components
involved in the operation almost certainly use temporary file storage. More sophisticated
setups, such as transaction processing systems using systems like Microsoft Transaction
Server, frequently write temporary storage.
Its important not to confuse the "compact" or "optimize"
utilities that come with many server applications to defragment their data sets
with file system defragmentation. Storage at the database level could be completely
defragmented in the view of the database server software, but badly fragmented in
the file system, and vice-versa. Either type of fragmentation is bad for performance,
but file system fragmentation is probably worse, because a fragmented database in
a contiguous file will still not likely need much disk movement to find any particular
record. But an internally defragmented database stored in multiple file system fragments
is likely to be slow. The exact impact of internal fragmentation depends on specifics
of the application and data; in the case of databases, the data access method is
critical. If usage is dominated by random access or small data items, internal fragmentation
may not affect performance much. If access is sequential, internal fragmentation
could cripple performance.
Fast disk performance is essential to all these systems. The difference with servers
is that their performance is important to the entire group using them, and potentially
to the entire enterprise. Some problems on servers can be mitigated by separating
the system and data onto separate volumes, which is advisable on any operating system
for reasons other than avoiding fragmentation.
IV.
The Implications of Fragmentation
Now that we have established the reasons why Windows NT systems are subject to file
system fragmentation, we will examine some performance tests that analyze the effect
and discuss solutions to the problem.
Fragmentation
is Difficult to Test
The complexity of modern file systems and the variety of programs and data found
in the real world make it difficult to arrive at test numbers that are applicable
to all potential users. Even if two systems have the exact same data and programs,
they will quickly diverge in file layout because no two users will do exactly the
same thing with them. This is as it is with many computer-testing issues.
And yet, in order to test disk fragmentation in a way that is repeatable and reliable,
it is first necessary to create multiple test systems that are fragmented in the
same way. There are two ways to do this.
The first is to obtain a fragmented disk which isnt hard, just take
one that has been in service for a long time and make a tape image copy of
it. You can restore the image onto any system on which you wish to test.
While this approach has the advantages of being easy to set up and creating a real
world configuration, there are several problems with it. First, it is tied to a
particular size of hard disk. Over the years the average size of a disk is likely
to increase, and the test will not be transferable. Second, the files on the disk
will likely be associated with older versions of programs. Testing, in subsequent
years, using old versions of applications is largely a legitimate exercise, but
makes the test seem more distant from real world circumstances, and may even miss
relevant changes in behavior by newer versions of software. Note that if the disk
is a boot disk, it will probably be using an older version of the operating system,
which compounds this problem further.
For these reasons, NSTL chose the alternative approach to deterministic fragmentation
of a disk volume. NSTL wrote an application, named Fragger, which fragments
the files on a hard disk in a controlled and repeatable fashion. Using this application,
the same data set can be fragmented repeatedly on any number of differently sized
disks. Different data sets, such as different versions of an application, can be
fragmented in the same manner to test that effect as well.
NT
Performance is Impeded by Disk Fragmentation
The theoretical analysis above demonstrates the way NTFS operates and the reasons
why files and free space on NTFS volumes becomes fragmented. Testing by NSTL for
Diskeeper Corporation using Fragger indicates that such fragmentation has
a negative effect on system performance.
Detailed results on NSTL testing of fragmented NTFS volumes for Diskeeper Corporation and the benefits of defragmentation using Diskeeper Corporations Diskeeper 4.5 is available as a separate paper. We include some highlight results here for illustrative purposes.
In a benchmark using Microsoft Excel, NSTL tested two configurations with three levels of fragmentation each. In the first configuration the disk was completely defragmented, in the second the application data and program files were 13% fragmented, and in the third the application and NT paging file were fragmented a total of 38%. Fragmentation levels were somewhat lower in the second configuration. This test is illustrative of the effects of file system fragmentation on a typical Windows NT workstation.
The interesting comparison in these tests is between the defragmented configuration
and the configuration where the application was 13% fragmented. Performance, measured
in the amount of time it took for certain Excel macros to complete, degraded substantially;
more specifically, it took almost twice as long for the tests to complete.
NSTL also performed similar tests on servers running Microsoft SQL Server and on
combinations of clients and servers running Microsoft Outlook and Microsoft Exchange
respectively. These tests demonstrate the effects of fragmentation on clients and
servers in a busy Windows NT corporate network.
In the Outlook/Exchange tests, defragmentation improved performance at the workstation
from 5.9% to 55.6% faster, depending on the test. Defragmentation on the server
improved performance as much as 80.8% faster. In the SQL Server tests, some tests
were over 100% faster when the server was defragmented.
In all these tests, the hardware being constant, the tests being identical, the
only difference in configuration being the amount of fragmentation in the file system,
we must conclude that fragmentation in the file system has a harmful effect on system
performance, and potentially a severe one.
Enterprise
Systems are More Susceptible to These Problems
The NSTL results for the Exchange and SQL Server are especially relevant to corporate
administrators and users, because enterprise systems are especially vulnerable to
fragmentation.
NT Servers, depending on the specific configuration, can manage a very large number
of files. Furthermore, by their nature, servers in an enterprise environment are
critical to large numbers of users. What slows one down slows all the users, and
therefore the enterprise as a whole.
A typical network file server may have numerous home document directories for a
large number of users. In the course of a busy day, many different users will write
data to the hard disk. Keeping files contiguous in a heavy-use environment is a
tall order for a file system.
Without careful management by an administrator, the files and free space on a server
will eventually fragment. As we have seen, the operating systems and server applications
that run on these systems have upgrades, official security patches and service packs,
and application of these can cause fragmentation in the system, so even a properly
maintained and managed enterprise server can become badly fragmented.
RAID
Systems are Susceptible to Fragmentation
RAID systems, using both hardware RAID and Windows NT Servers support for
software RAID, are also susceptible to file fragmentation and need defragmentation.
Designers use RAID both for reasons of performance and robustness. By performing
logically contiguous disk operations on multiple disks in parallel, instead of one
longer operation on a single disk, raw I/O throughput is improved. Robustness comes
in certain RAID configurations from redundancy of storage or use of parity information
to enable recovery of data in the event of a physical error.
There is a small chance that, in a RAID configuration, a fragmented file would not
incur an I/O cost, if the file fragments happened to be in the same data stripe.
But in the usual case, the effect of fragmentation on a RAID system is the same
as on a non-RAID system: additional head movement and I/O will be necessary in order
to perform file operations.
A disk defragmenter, such as Diskeeper, views file order in the same way that NT
does, whatever the physical organization of files on one or more disks, and will therefore optimize them properly.
Disk
Caching Mitigates, Doesnt Eliminate These Problems
Because disk caching lowers the amount of necessary physical disk I/O, it improves
performance in general, even on defragmented systems. On a heavily fragmented system,
it especially helps because the cost of I/O on an average file basis is so much
higher.
Disk cache memory competes for space with general application memory. As memory
requirements of applications and operating systems has increased over the years
(Windows 2000 Professional will require 128MB RAM and Windows 2000 Server will require
256MB), memory has become cheaper, but not all servers have kept up.
And disk caching can only delay the inevitable writing of data to disk; NT must
periodically flush its cache for safety reasons and to checkpoint the disk. With
luck and good strategy, this writing of data can be done asynchronously and at a
time when it will not delay any other running tasks, but sometimes it cant.
Consider that the workstation in the NSTL Excel tests had 128MB of RAM, a healthy
amount for a workstation, which should allow NT ample room for caching, and yet
fragmentation still slowed the system.