File Fragmentation: Why Scattered Files Are Hard to Recover

File Fragmentation

A file’s data clusters can sit anywhere on the disk. The file system tracks the order, but the storage itself doesn’t enforce it. When clusters are scattered (instead of stored in a single contiguous block), the file is fragmented. Modern file systems handle this transparently for normal access, but fragmentation creates major complications when recovery has to work without the file system metadata. Most file carving tools assume contiguous storage and fail on fragmented files.

Reference content reviewed by recovery engineers. Editorial standards. About the authors.
📚
8 sources
Klennet · Forensics Wiki
SalvationDATA · Wikipedia
đŸ’»
58% / 17% / 16%
Garfinkel rates: Outlook
JPEG · Word fragmented
📅
Last updated
SmartCarving era
📖
8 min
Reading time

File fragmentation is the condition where a file’s data clusters are stored in non-contiguous locations on a storage medium rather than as a single sequential block. Fragmentation arises naturally during normal file system use as files are created, deleted, modified, and grow over time; the file system allocates clusters wherever space is available. While modern file systems handle fragmentation transparently for normal file access (using metadata to track which clusters belong to which file), fragmentation creates significant complications for data recovery: signature-based recovery techniques like file carving traditionally assume files occupy contiguous clusters and fail when files are fragmented.

What File Fragmentation Actually Is

The file system divides each file into one or more storage clusters. For a contiguous file, the clusters are sequential (cluster 100, then 101, then 102, then 103, etc.). For a fragmented file, the clusters are scattered: cluster 100, then 247, then 891, then 1456, then 2103. Both arrangements are valid; both store the same file data; both are accessed transparently by normal file operations.1

A concrete example

Consider a 20 KB JPEG image saved to an NTFS partition with 4 KB clusters. The image needs five clusters to store its data. Two scenarios:

  • Contiguous: the image lives in clusters 1000, 1001, 1002, 1003, 1004. The file system records “JPEG starts at cluster 1000, length 5 clusters.”
  • Fragmented: the image lives in clusters 1000, 1547, 1548, 4203, 4204. The file system records the cluster list explicitly: “JPEG occupies clusters 1000, 1547, 1548, 4203, 4204.”

Reading the image from the contiguous arrangement requires one read operation that spans five sequential clusters. Reading the fragmented version requires three read operations (one for cluster 1000, one for clusters 1547-1548, one for clusters 4203-4204) and seeks between them. The user sees an identical JPEG; the difference is invisible at the application level.

External vs internal fragmentation

The fragmentation discussed here is “external fragmentation”: a file’s clusters are scattered across the disk. There’s a separate concept of “internal fragmentation,” which is the wasted space within an allocated cluster when the file’s size doesn’t fill it. Internal fragmentation is what data recovery literature usually calls slack space. The two concepts are unrelated despite the shared “fragmentation” word; this entry covers external fragmentation, which is the recovery-significant phenomenon.

How file systems track fragmented files

Each file system has its own mechanism for recording where a file’s clusters live:

  • NTFS uses data runs in the MFT record. A data run encodes a sequence of contiguous clusters as (start, length) pairs; a fragmented file has multiple data runs in its MFT record.
  • FAT32 uses cluster chains in the File Allocation Table. Each cluster’s FAT entry contains the number of the next cluster; fragmented files have non-sequential cluster numbers in their chains.
  • ext4 uses extents: contiguous ranges of clusters represented compactly. A fragmented file has multiple extents.
  • APFS uses extents similar to ext4, with copy-on-write semantics that further complicate fragmentation behavior.

For normal file access, the file system reads its tracking metadata, follows the cluster references, and presents the file to the application as a contiguous stream of bytes. The fragmentation is invisible above the file system layer.

Why Fragmentation Happens

Fragmentation isn’t a defect; it’s a natural consequence of how file systems allocate space over time. Several patterns produce fragmentation in normal use.2

File growth after creation

When a file is first written, the file system allocates clusters for the file’s initial size. If the user later appends to the file (saving more data to a document, adding records to a database), the file system needs to allocate additional clusters. If contiguous clusters aren’t available adjacent to the file’s existing clusters, the file system allocates from elsewhere on the disk, fragmenting the file. Database files, virtual machine disk images, log files that grow over time, and email mailboxes are particularly prone to this pattern.

Fill-and-delete cycles

Normal file system use creates and deletes files continuously. Each delete leaves a gap of available space; each new file gets allocated from the available pool. Over time, the available space becomes a patchwork of small gaps rather than large contiguous regions. New files have to fragment to fit into the available patchwork; the longer the drive has been in use, the worse the fragmentation pattern.

Allocation algorithm choices

File systems make different allocation decisions to balance fragmentation against performance:

  • First-fit: allocates from the first available region large enough. Fast but produces heavy fragmentation over time.
  • Best-fit: allocates from the smallest available region that fits. Slower but reduces overall fragmentation.
  • Delayed allocation (used in ext4): defers cluster assignment until the file is actually written, allowing the file system to allocate larger contiguous regions when possible. Reduces fragmentation significantly.
  • Extent-based allocation (used in ext4 and APFS): allocates large contiguous ranges as a single unit, naturally producing less fragmentation than per-cluster allocation.

File types prone to heavy fragmentation

Certain file types fragment more heavily than others due to how applications use them:

  • Outlook PST files: Garfinkel’s research found up to 58% of Outlook PST files in real-world drive samples were fragmented. The pattern reflects how Outlook continuously appends new mail to the PST file.
  • Database files: SQL Server MDF files, MySQL InnoDB files, SQLite databases all grow incrementally as records are added.
  • Virtual machine disks: VMDK, VHD, VHDX files grow as the VM uses storage; thin-provisioned disks are particularly fragmentation-prone.
  • Video files: particularly when recorded directly to disk (security camera DVR systems, screen recordings); the streaming write pattern can produce highly fragmented files.
  • Log files: system logs, application logs, web server access logs all append continuously.
  • Email databases: Mbox files, Maildir files, modern email client databases all grow over time.

Garfinkel’s fragmentation statistics

Simson Garfinkel’s foundational 2007 research on file fragmentation produced widely-cited rates from real-world drive samples:

  • Up to 58% of Outlook PST files were fragmented
  • 17% of JPEG image files were fragmented
  • 16% of Microsoft Word documents were fragmented

These rates established that fragmentation is not an edge case but a substantial fraction of files on real drives. The numbers also motivated subsequent research into fragmented-file carving algorithms (SmartCarving, bifragment gap carving) that specifically target the fragmentation problem.

Why Fragmented Files Are Hard to Recover

Recovery from fragmented files is dramatically harder than recovery from contiguous files. The reason is that recovery techniques fall into two categories with very different fragmentation responses.3

File-system-based recovery handles fragmentation gracefully

When file system metadata is intact (recently deleted files where the MFT record or directory entry still exists), recovery software simply reads the metadata, follows the cluster references, and reads the file’s clusters in their recorded order. The fragmentation pattern is encoded in the metadata; recovery is no harder than normal file access. This is the primary recovery path for most consumer recovery scenarios and works equally well for contiguous and fragmented files.

Signature-based recovery breaks on fragmented files

When file system metadata is gone (formatted volumes, corrupted MFT records, severely damaged file systems), recovery falls back to file carving: scanning the disk for known file signatures and reconstructing files from raw cluster content. The Forensics Wiki documentation captures the central problem: “The majority of file carving programs will only recover files that are contiguous on the media (in other words files that are not fragmented).”

The standard carving approach reads forward from a recognized signature until it hits a footer, a known file size, or invalid content. For contiguous files, this works well. For fragmented files, the cluster after the first fragment belongs to a different file (or is unallocated), and the carving tool either:

  • Stops too early (the next cluster looks invalid; the tool truncates the file to the first fragment, producing a partial file).
  • Reads through (the next cluster’s content gets included in the output, producing a corrupted file with foreign data inserted).
  • Stops at the wrong boundary (the tool finds something that looks like a footer somewhere in unrelated data).

Why most carvers don’t handle fragmentation

Adding fragmentation handling to a carver is computationally expensive. To find the rest of a fragmented file, the carver has to search the entire unallocated space for clusters that might continue the file, validate each candidate, and order the candidates correctly. For files with multiple fragments, the search space grows combinatorially; brute-force approaches are infeasible on real-world drives. Specialized algorithms (covered in the next section) reduce the search space using heuristics, but most basic carving tools (Foremost, Scalpel) don’t implement them and produce poor results on fragmented files.

The DFRWS Carving Challenge

The Digital Forensic Research Workshop introduced the DFRWS Carving Challenge in 2006, which established fragmentation as the central unsolved problem in file carving. The challenge dataset included deliberately fragmented files; competitors had to recover them as completely as possible. The challenge motivated subsequent algorithmic work on fragmented-file recovery and remains a reference benchmark for evaluating carving tools.

Fragmentation Recovery Techniques

Several specialized techniques attempt to recover fragmented files where standard carving fails. Each has its own trade-offs and applicability range.4

Header / Footer carving (the baseline)

The simplest approach: find a header signature, read forward until finding a footer signature, treat everything between as the file. The SalvationDATA reference confirms this as the foundational method. Works only for contiguous files with both header and footer signatures (JPEG, PDF, ZIP). Produces broken or wrong files for fragmented inputs. Despite the limitation, it’s the fastest method and the default mode for tools like Scalpel and Foremost.

Header-Size carving

Some file types embed their total size in the header (for example, PNG includes total chunk lengths; ZIP includes uncompressed-size and compressed-size fields). The carver reads the header, extracts the size, then reads exactly that many bytes after the header. Works only for contiguous files; the read-forward assumption still applies.

Bifragment gap carving

The Klennet Carver documentation describes bifragment gap carving as a specialized technique for files split into exactly two fragments with a gap of unrelated data between them. The technique works because for a file with known total size and known boundaries (header at one end, footer at the other), the size of the gap can be computed: gap size equals the total distance between header and footer minus the known file size.

The carving tool then tests possible gap positions, validating each candidate by checking whether the resulting file is structurally valid. The major advantage is that bifragment gap carving works with a validation function alone, without requiring a proximity function. For each possible gap position, the tool runs validation; the position that produces a valid file is the correct gap location. The limitation is that the technique only handles files split into exactly two fragments; files with three or more fragments require different approaches.

SmartCarving (Pal, Memon, Sencar, Shanmugasundaram)

The Wikipedia file carving documentation describes SmartCarving as a more general algorithm. Unlike bifragment gap carving, SmartCarving handles files split into many fragments. The algorithm has three phases:

  1. Preprocessing: blocks are decompressed or decrypted if necessary, normalizing them for analysis.
  2. Collation: blocks are sorted according to their file type using content-based heuristics about the fragmentation behavior of known file systems.
  3. Reassembly: blocks are placed in sequence to reproduce the deleted file using statistical methods to determine the most likely fragment ordering.

SmartCarving is the basis for the Adroit Photo Forensics and Adroit Photo Recovery applications from Digital Assembly. The approach is more computationally expensive than traditional carving but recovers fragmented files that traditional methods would miss entirely.

Sequential hypothesis testing

Wikipedia notes that “state-of-the-art file carving algorithms use statistical techniques like sequential hypothesis testing for determining fragmentation points.” The approach analyzes the data stream for statistical anomalies that indicate where one file ends and another begins, allowing the carver to identify fragment boundaries even when no clear footer signature exists.

Fragment Recovery Carving (Garfinkel’s “Split Carving”)

The Forensics Wiki describes Fragment Recovery Carving as a carving method in which two or more fragments are reassembled to form the original file or object. Garfinkel previously called this approach “Split Carving.” This is the general category that bifragment gap carving and SmartCarving both fall under; the term covers any technique that explicitly handles fragmentation rather than assuming contiguity.

Tool ecosystem

Tools that implement fragmentation-aware carving:

  • Klennet Carver: implements bifragment gap carving as a primary recovery technique; published real-world data on bifragment recovery applicability.
  • Adroit Photo Forensics / Adroit Photo Recovery: implements SmartCarving for image recovery.
  • R-Studio: includes fragmentation-aware carving for many file types as part of its scanning pipeline.
  • X-Ways Forensics: supports manual fragment reassembly through its hex editor and structured analysis features.
  • PhotoRec: the open-source standard for signature-based recovery; primarily contiguous-file oriented but handles some fragmented file types with heuristics.
  • Foremost / Scalpel: open-source carvers; primarily contiguous-only.

HDD vs SSD Fragmentation Differences

The performance impact and the recovery impact of fragmentation differ significantly between hard drives and solid-state drives. Understanding the difference is important both for general system maintenance and for setting recovery expectations.

HDD physical impact: seek penalty

On a hard drive, reading non-contiguous clusters requires the read head to physically move between tracks. Each seek operation takes several milliseconds; a heavily fragmented file can require many seeks to read sequentially, dramatically slowing access. Defragmentation tools rearrange files into contiguous clusters, eliminating the seeks and improving sequential read performance. Windows includes a built-in Disk Defragmenter; third-party tools (Auslogics Disk Defrag, Defraggler) provide additional control and visibility.

SSD physical reality: no seek penalty

On a solid-state drive, there’s no read head and no physical movement. SSDs are random-access at the cell level, and reading any cell takes the same time regardless of which cells came before or after. The wear-leveling translation layer means that even logically contiguous files are physically scattered across NAND cells; the SSD’s controller distributes writes for wear-leveling reasons rather than for performance reasons. Defragmenting an SSD provides zero performance benefit while adding significant write wear to the cells. Modern Windows recognizes SSDs and skips defragmentation accordingly; manual defragmentation of SSDs is actively counterproductive.

Recovery impact is identical

From a recovery perspective, the carving complications are the same on both storage types. Logical fragmentation (the file system’s view of which clusters belong to which file) creates the carving problem; the underlying physical layout doesn’t change the carving problem. A fragmented file on an HDD is just as hard to carve as the same fragmented file would be on an SSD. The TRIM-on-SSD issue (covered in the unallocated space entry) is a separate problem that’s specific to SSDs but unrelated to fragmentation per se.

Per-file-system fragmentation rates

Different file systems produce different fragmentation rates under similar workloads:

  • NTFS: heavy fragmentation under typical Windows usage; defragmentation tools commonly used. NTFS uses an MFT record per file with data runs encoding fragmentation.
  • FAT32 / exFAT: heavy fragmentation; the FAT structure inherently encodes per-cluster pointers that fragment naturally.
  • ext4: reduced fragmentation due to delayed allocation and extent-based storage; defragmentation rarely needed.
  • APFS: low fragmentation due to extent-based copy-on-write architecture; defragmentation isn’t a normal maintenance operation.
  • Btrfs / ZFS: copy-on-write file systems can produce high fragmentation under certain workloads (large files updated in place); both have specific tools to handle the issue.

File fragmentation is the recovery scenario that distinguishes professional-grade tools from consumer-grade ones. For files where the file system metadata is intact, fragmentation is invisible: any decent recovery tool reads the metadata and reconstructs the file regardless of how scattered its clusters are. For files where metadata is gone, fragmentation becomes the dominant factor in whether recovery succeeds. Standard file carving works on contiguous files but produces broken output on fragmented ones; fragmentation-aware techniques (SmartCarving, bifragment gap carving) produce dramatically better results but require specialized tools that consumer recovery products don’t typically include.5

For consumers attempting recovery, the practical implication is that recovery success is much more variable for fragmentation-prone file types. JPEG photos, PDFs, and small documents recover reliably because they tend to be contiguous. Outlook PST files, virtual machine disks, video files, and database files have lower expected recovery rates because they’re often heavily fragmented. The recovery prospect for a deleted file depends not just on whether its content has been overwritten (the standard recovery boundary) but also on whether the file is contiguous (recoverable through standard carving) or fragmented (requires fragmentation-aware tools or surviving file system metadata). Recovery software varies in fragmentation handling capability; tools that explicitly market fragmented-file recovery (R-Studio, Klennet Carver, Adroit Photo) handle these cases significantly better than basic file recovery utilities.

For the question of whether to defragment a drive proactively, the modern answer depends on the storage type. For HDDs, periodic defragmentation improves performance and incidentally produces contiguous files that would recover better if they were ever deleted (although intentionally relying on this for backup purposes isn’t a sound strategy). For SSDs, defragmentation is harmful: it adds write wear without performance benefit, and it doesn’t help recovery prospects because the carving problem is logical-layer rather than physical-layer. Modern operating systems handle this correctly automatically; users shouldn’t manually defragment SSDs even when third-party tools offer the option.

File Fragmentation FAQ

What is file fragmentation?+

File fragmentation is the condition where a file’s data clusters are stored in non-contiguous locations on a storage medium rather than as a single sequential block. Instead of clusters 100, 101, 102, 103, 104 (contiguous), a fragmented file’s clusters might be at 100, 247, 891, 1456, 2103 (scattered). The file system tracks which clusters belong to which file via internal metadata (the MFT on NTFS, FAT chains on FAT32, extent records on ext4 and APFS), so normal file access works transparently. From a data recovery perspective, fragmentation matters because it complicates recovery techniques that work without file system metadata: signature-based recovery and file carving traditionally assume files occupy contiguous clusters, and fragmented files break this assumption.

Why does file fragmentation make recovery harder?+

Most file carving tools assume that a file’s data is stored in contiguous clusters. They find a file signature (the JPEG header, the PDF marker, the ZIP signature), then read forward from that point until they hit either a footer, a known file size, or a cluster boundary that doesn’t continue the file pattern. When the file is fragmented, the contiguous-read approach breaks: after the first fragment, the next cluster may belong to a different file or be unallocated, and the carving tool either stops too early (producing a truncated file) or reads through unrelated data (producing a corrupted file). Without file system metadata to indicate where the next fragment lives, the carving tool has no reliable way to find it. This is why standard file carving works well for contiguous files but produces poor results for fragmented ones.

How common is file fragmentation in practice?+

Garfinkel’s research from the early 2000s reported significant fragmentation rates in real-world drive samples: up to 58% of Outlook PST files, 17% of JPEG images, and 16% of Microsoft Word documents were fragmented. The exact numbers vary with file type and file system; large files written over time (mailboxes, databases, virtual machine disks, video files) tend to fragment heavily, while smaller files written once and never modified tend to remain contiguous. Modern file systems with delayed allocation strategies (ext4) or extent-based storage (APFS) reduce fragmentation rates compared to FAT32 or older NTFS volumes, but fragmentation remains a meaningful percentage of files on typical drives. The recovery implication is that any drive containing many large or frequently-modified files will have a substantial subset of files where fragmentation affects recovery prospects.

Does fragmentation affect SSDs the same way as HDDs?+

From a logical perspective (which clusters the file system assigns to a file), fragmentation works the same way on SSDs and HDDs. From a physical perspective (where the actual data lives on the storage medium), it’s completely different. On HDDs, fragmentation causes seek penalties because the read head has to physically move between non-contiguous tracks; defragmentation tools rearrange files into contiguous clusters to improve performance. On SSDs, the controller’s wear leveling means that even logically contiguous files are physically scattered across NAND cells, and there’s no seek penalty because access is random-access at the cell level. Defragmenting an SSD provides no performance benefit and adds wear to the cells. From a recovery perspective, the carving complications from logical fragmentation apply equally to SSDs and HDDs; the underlying physical layout doesn’t change the carving problem.

What is bifragment gap carving?+

Bifragment gap carving is a specialized technique for recovering files that have been split into exactly two fragments with a gap of unrelated data between them. The technique works because for a file with known total size and known boundaries (header at one end, footer at the other), the size of the gap can be computed: gap size = total distance between header and footer, minus known file size. The carving tool then tests possible gap positions, validating each candidate by checking whether the resulting file is structurally valid (a parseable JPEG, a valid PDF, etc.). The approach has limited applicability (only files split into exactly two fragments) but the major advantage that it works with a validation function alone, without requiring a proximity function. Klennet Carver implements bifragment gap carving as one of its recovery techniques.

What is SmartCarving?+

SmartCarving is a fragmented-file recovery algorithm developed by Pal, Memon, Sencar, and Shanmugasundaram. Unlike bifragment gap carving, which only handles files split into two fragments, SmartCarving can reassemble files split into many fragments. The algorithm has three phases: preprocessing (blocks are decompressed or decrypted if necessary), collation (blocks are sorted by file type using content-based heuristics), and reassembly (blocks are placed in sequence to reproduce the original file using statistical methods to determine the most likely fragment ordering). SmartCarving is the basis for the Adroit Photo Forensics and Adroit Photo Recovery applications from Digital Assembly. The approach is more computationally expensive than traditional carving but recovers fragmented files that traditional methods would miss.

Related glossary entries

  • File Carving: the recovery technique most affected by fragmentation; standard carving fails on fragmented files.
  • Signature-Based Recovery: the underlying technique that breaks when fragmentation is present.
  • Unallocated Space: where deleted file fragments live; fragmentation complicates recovery from this territory.
  • Slack Space: the “internal fragmentation” concept; unrelated to external fragmentation despite the shared word.
  • Deleted File: deleted fragmented files are harder to recover than deleted contiguous files.
  • NTFS: uses data runs to track fragmentation; favorable for metadata-based recovery.
  • Data Recovery: the broader discipline; fragmentation handling distinguishes professional from consumer tools.

Sources

  1. Forensics Wiki: File carving (accessed May 2026)
  2. Aesonlabs: Recovering Data After Fragmentation. File Carving
  3. ScienceDirect (Garfinkel 2007): Carving contiguous and fragmented files with fast object validation
  4. Klennet Carver: File carving methods in data recovery
  5. Wikipedia: File carving

About the Authors

đŸ‘„ Researched & Reviewed By
Rachel Dawson
Rachel Dawson
Technical Approver · Data Recovery Engineer

Rachel brings over twelve years of data recovery engineering experience including substantial work on fragmented-file recovery cases. The pattern in fragmented-file work is that file system metadata recovery (when possible) is dramatically more productive than carving-based recovery; the practical priority during forensic acquisition is preserving and extracting whatever metadata survives, since fragmentation-aware carving is a fallback rather than a primary technique. The “metadata first, carving second” workflow reflects how successful fragmented-file recoveries typically work in practice.

12+ years data recovery engineeringFragmented-file recoveryForensic methodology
✅
Editorial Independence & Affiliate Disclosure

Data Recovery Fix earns revenue through affiliate links on some product recommendations. This does not influence our reference content. Glossary entries are written and reviewed independently based on documented research, vendor documentation, independent testing, and recovery-engineer review. If anything on this page looks inaccurate, outdated, or worth revisiting, please reach out at contact@datarecoveryfix.com and we’ll review it promptly.

We will be happy to hear your thoughts

Leave a reply

Data Recovery Fix: Reviews, Comparisons and Tutorials
Logo