Signature-Based Recovery: How File Carving Tools Find Files

Signature-Based Recovery

Signature-based recovery finds files by what’s inside them rather than where the file system says they live. When the file system is corrupted or gone, every JPEG still starts with the same byte pattern, every PDF still starts with the same bytes, every PNG still starts with the same bytes. Recovery tools scan the raw drive looking for those patterns and extract the file content directly. The tradeoff: file names, paths, and timestamps are gone.

Reference content reviewed by recovery engineers. Editorial standards. About the authors.
📚
9 sources
Wikipedia · Starus
Handy Recovery · Apriorit
💻
Header → Footer
The carving model
Magic numbers + length
📅
Last updated
Modern carving practice
📖
8 min
Reading time

Signature-based recovery is a data recovery technique that locates files by scanning storage for distinctive byte patterns called file signatures or magic numbers, rather than relying on file system metadata. Every file format begins with a specific sequence of bytes that identifies its type. Signature-based recovery tools scan a drive sector by sector looking for these signatures, then extract the data between the signature and either an explicit footer marker or a calculated file length. Because the technique ignores the file system entirely, it works on drives where the file system is corrupted, missing, or has been overwritten.

What Signature-Based Recovery Actually Is

Most data recovery techniques work by reading the file system. The Master File Table on NTFS, the FAT tables on FAT32, the inode tables on ext4, and the catalog on APFS all serve the same purpose: they list every file on the drive along with where its data is stored. Recovery tools that work with the file system can preserve file names, paths, timestamps, and folder structure because all that information is in the metadata. Signature-based recovery does something fundamentally different: it ignores the file system entirely and finds files by scanning the raw storage for the distinctive byte patterns inside the files themselves.1

The technique in one sentence

Every file format begins with a specific sequence of bytes (the file signature, also called a magic number) that identifies what type of file it is. Signature-based recovery scans for those byte patterns across the entire storage device, and when it finds one, extracts the data that follows as a file of that type.

When it’s the right approach

Signature-based recovery is the technique to use when:

  • The file system is missing or corrupted: if the partition has been formatted, deleted, or had its metadata damaged, file-system-based recovery can’t work but signature-based recovery still can.
  • The drive shows as RAW: a RAW partition has no working file system. Signature-based recovery doesn’t need one.
  • You need to recover from unallocated space: deleted files in unallocated space have no file system entries pointing to them, but their content is still there until overwritten.
  • You’re working with a partial or damaged image: a disk image that’s missing the file system area but contains the data area can still yield files via signature recovery.
  • File-system recovery has failed or recovered too few files: a second pass with signature-based recovery often finds files the file-system approach missed.

Relationship to file carving

The terms “signature-based recovery” and “file carving” overlap heavily. File carving is the broader umbrella for any technique that recovers files from raw data without file system metadata. Signature-based recovery is the most common form of file carving, where the carver uses byte-pattern signatures to locate files. In practice, most modern file carving is signature-based; the terms are used interchangeably in much of the recovery industry. This entry focuses on the signature-recognition aspect specifically.

Historical origin

Magic numbers as a file-identification mechanism originated in 1979 with Seventh Edition Unix (V7) and have been part of computing standard practice ever since.2 The first widely-used file carving tool was Foremost, originally developed by the U.S. Air Force Office of Special Investigations in 2001. Scalpel, an improved version of Foremost, was introduced by Richard and Roussev in 2005. PhotoRec, by Christophe Grenier (the same developer behind TestDisk), became the dominant cross-platform tool over time and is still actively maintained.

How Magic Numbers and File Signatures Work

The core idea is simple: file formats need to be self-identifying. When you double-click a file, the operating system needs to know what application to open it with. Looking at the file extension is one method, but extensions can be wrong or missing. Looking at the actual bytes inside the file is more reliable, which is why file formats include identifying byte patterns at known locations.3

Common file signatures

Each file format has its own signature. A small sample of widely-used formats:

FormatHeader (hex)Footer (if any)
JPEGFF D8 FFFF D9
PNG89 50 4E 47 0D 0A 1A 0A49 45 4E 44 AE 42 60 82
GIF47 49 46 38 37 61 or 47 49 46 38 39 6100 3B
PDF25 50 44 4625 25 45 4F 46
ZIP50 4B 03 04Variable; uses central directory
MP3FF FB or 49 44 33 (ID3)None reliable
MP400 00 00 ?? 66 74 79 70None reliable
DOCX (and other Office)50 4B 03 04 (ZIP-based)Same as ZIP
EXE / DLL4D 5A (MZ)None reliable
RAR52 61 72 21 1A 07 00None reliable

Why headers are reliable but footers aren’t always

The header is at a known location: the very start of the file. Every JPEG begins with FF D8 FF; this is part of the format specification. Footers are less reliable because not every format has a defined end marker. JPEG and PNG do (FF D9 and the IEND chunk respectively), and recovery tools can use those to determine where the file ends. PDF has %%EOF. But many formats (executables, audio, video) have no footer; the file simply ends when its declared length runs out. For formats without footers, signature-based recovery has to either read a calculated length from the header or guess based on what comes next.

Files without recognizable signatures

Some content can’t be located by signature scanning:

  • Plain text files have no magic number; ASCII or Unicode text doesn’t have an identifying signature.
  • Encrypted files appear as random bytes after encryption; if the magic number is encrypted along with the rest, it’s no longer recognizable.
  • Custom application formats may not have signatures registered with carving tools’ databases.
  • Compressed files where the compression destroys the signature pattern can be missed unless the carver knows about the wrapper format.

The Carving Process: Header to Footer

Signature-based recovery follows a consistent algorithm regardless of which tool implements it. Understanding the steps clarifies what the technique can and can’t do.4

Step 1: Sequential sector scan

The tool reads the storage device (or disk image) sequentially, sector by sector, comparing the contents of each sector against its database of known file signatures. The scan operates at the raw byte level, not at the file system level. It doesn’t care whether sectors are marked as allocated or free; it scans everything. This is why the technique works against unallocated space, deleted partitions, and damaged file systems.

Step 2: Signature matching

When a sector contains a byte sequence matching a known signature, the tool flags that location as a potential file start. For some formats (JPEG, PNG), the signature is at offset 0 within the file; for others (PDF), the signature might be a few bytes in. Modern carving tools handle these offset variations from format definitions in their configuration databases.

Step 3: Validation

A signature match alone isn’t enough to confirm a real file; the byte sequence might appear coincidentally in other content. Sophisticated carvers perform validation checks: parsing additional bytes after the signature to confirm the structure looks like a real file of that type.5 A JPEG signature followed by valid JPEG markers is much more likely to be a real JPEG than a random match.

Step 4: Determining file length

Three approaches:

  • Footer search: for formats with reliable footers (JPEG, PNG, PDF), scan forward from the header until the matching footer is found. Everything between is the file.
  • Length from header: some formats (BMP, MP4) include the file length in the header. The tool reads the length and extracts that many bytes.
  • Maximum size limit: for formats without footers or length fields, extract up to a configured maximum size and assume the file ends there or at the next signature.

Step 5: Output

Each recovered file is written to a destination (an output directory or another drive) with a sequential generated name (file001.jpg, file002.jpg) and grouped by file type into subfolders. The tool produces a log of what was recovered, where on the source it came from, and any validation warnings.

Common signature-based recovery tools

ToolPlatformStrengths
PhotoRecWindows, macOS, LinuxMost widely used; supports hundreds of formats; user-friendly menu interface
ForemostLinux primarilyThe original (2001); fast; configurable signature definitions
ScalpelLinux primarilyImproved Foremost (2005); better performance and flexibility
Bulk_extractorCross-platformForensic-focused; finds emails, credit cards, URLs alongside files
R-Studio / EaseUS / DiskGeniusCross-platformCommercial tools with signature recovery as one mode
Recoverit / Disk DrillCross-platformUser-friendly commercial tools combining file system and signature recovery

Why It Loses File Names and Metadata

The single most-asked question about signature-based recovery: where did all the original file names go? The answer reveals an important fact about how file systems actually store files.6

File names live in the file system, not in the file

When you create a file called “vacation_2024.jpg”, the file’s content is the JPEG image data starting with FF D8 FF. The name “vacation_2024.jpg” isn’t part of that content at all. It’s stored separately in the file system: in an MFT record on NTFS, in a directory entry on FAT32, in a directory inode on ext4. The file system links the name to the location on disk where the JPEG content actually lives.

Signature-based recovery scans the data regions where JPEG content lives. It never reads the MFT or directory entries because those are separate. The recovered JPEG content is intact, but the link from name to content has been lost.

What metadata gets lost

Signature recovery loses everything that lives in the file system rather than the file content:

  • Original filename (“vacation_2024.jpg” → recovered as “file00001.jpg”).
  • Folder path (“Pictures/Vacations/2024/Hawaii/” → recovered to a flat output directory by file type).
  • Creation timestamp, modification timestamp, access timestamp.
  • File system attributes (read-only, hidden, system).
  • Owner and permission information on systems that track those.
  • Alternate data streams on NTFS or extended attributes on macOS.

What does survive

Information embedded inside the file content survives because it’s part of what gets recovered:

  • EXIF metadata in photos: camera model, GPS coordinates, date the photo was taken (this is inside the JPEG, not in the file system).
  • ID3 tags in MP3 files: artist, album, track name, embedded album art.
  • PDF document metadata: title, author, creation date as recorded by the PDF creator (inside the PDF, not the file system).
  • Office document metadata embedded in the file format itself.
  • Image content, audio content, video content intact in their entirety.

Why this matters for users

Users approaching signature-based recovery should understand the result format in advance. Recovery of 50,000 photos with names like img00001.jpg through img50000.jpg, grouped only by file type, can be overwhelming. Plan for the post-recovery work of identifying which files actually matter; tools like image deduplication software, EXIF-based date sorting, and visual review can help organize the recovered output. For files where the original name was the only way to identify content (named documents, project files), signature recovery may be only partially useful.

The Fragmentation Problem

The biggest technical limitation of signature-based recovery is its difficulty with fragmented files. Understanding why reveals an important limitation that no tool fully solves.7

The contiguity assumption

Signature-based recovery’s basic algorithm assumes file data is stored contiguously: find the header, read forward until the footer or calculated length, that’s the file. This works well when files are stored in unbroken sequences of sectors. It breaks when files are fragmented across multiple non-contiguous regions.

Imagine a 10 MB JPEG that was stored in two fragments: 6 MB at sector 1000 and 4 MB at sector 50000, with sectors 1000-1999 belonging to the JPEG, sectors 2000-2999 belonging to a different file (a Word document, say), and sectors 3000+ belonging to yet other files. Signature recovery finds the JPEG header at sector 1000 and reads forward, but at sector 2000 it starts reading the Word document content as if it were JPEG data. The result: a recovered file that’s 6 MB of valid JPEG followed by 4+ MB of corrupted content, or a partial JPEG that ends prematurely.

Why fragmentation happens

File systems fragment files when there isn’t enough contiguous free space for the entire file. As drives fill up with files of various sizes that get created, deleted, and modified, contiguous free regions become scarce. A new large file ends up split across whatever free fragments are available.

  • Heavily-used drives fragment more than fresh drives.
  • SSDs handle fragmentation differently from HDDs at the physical level (no seek penalty), but the file system still records fragmented allocations.
  • Some file systems (APFS, ext4) actively fight fragmentation; FAT32 notoriously doesn’t.
  • Large files are more likely to be fragmented than small files.

What types of files survive fragmentation best

Files that recover well via signature-based methods share characteristics:

  • Photos: usually small enough (under 10 MB) to fit in contiguous free space.
  • Audio files: typically stored contiguously by media applications.
  • Short videos: if they fit in contiguous space.
  • Individual documents: small documents tend to be contiguous.

Files that recover poorly:

  • Large videos that span multiple fragments.
  • Large databases with extensive fragmentation.
  • Virtual machine disk images that are typically multi-GB and fragmented.
  • Frequently-edited large files that have grown over time.

State-of-the-art fragmentation handling

Research and modern tools have improved fragmentation handling over basic header-to-footer carving. Sequential hypothesis testing (Pal, Sencar, Memon) detects fragmentation points statistically. Some tools attempt fragment reassembly: identifying multiple fragments that might belong together and reassembling them in the correct order. None of these techniques fully solves the fragmentation problem; for heavily fragmented files, file-system-based recovery remains the only path that reliably preserves file integrity.

Signature-based recovery is the recovery technique of last resort: when nothing else works, this often does. That makes it both invaluable and frequently misunderstood. Users approach signature recovery expecting it to be like file-system recovery with extra capability, then are surprised when the recovered files have generic names, are grouped by type rather than original folder structure, and include many partially-corrupted fragments alongside successful recoveries. Setting expectations correctly before running the scan saves significant frustration.8

The right mental model for signature-based recovery is: it’s archaeological. The technique reconstructs files from the byte-level evidence remaining on the drive, without reference to the original organization that’s been destroyed. A recovered photo collection from a heavily-used drive will include the user’s photos alongside cached thumbnails from web browsers, profile pictures from messaging apps, image attachments from emails, and screenshots from years past. Sorting through the result is part of the technique. The alternative (getting nothing at all because the file system is too damaged for traditional recovery) is much worse.

For practical use, signature-based recovery works best as the second-pass technique. Recovery software typically tries file-system-based recovery first and falls back to signature-based recovery for files the first pass missed. The two-pass approach gets the best of both: file system metadata where it exists, raw signature scanning where it doesn’t. Pair this with sector-by-sector cloning for failing drives and the recovery workflow becomes resilient: image the failing source once, run file system recovery against the image, then run signature-based recovery against the same image to capture anything the first pass missed. The image stays unchanged through both passes, so neither attempt risks the other’s results.

Signature-Based Recovery FAQ

What is signature-based recovery?+

Signature-based recovery is a data recovery technique that finds files by scanning storage for distinctive byte patterns called file signatures or magic numbers, rather than using file system metadata. Every file format starts with a specific sequence of bytes that identifies what type of file it is: JPEG images start with FF D8 FF, PNG images start with 89 50 4E 47 0D 0A 1A 0A, PDF documents start with 25 50 44 46. Signature-based recovery tools scan storage looking for these patterns, then extract the file data that follows. The technique works even when the file system is completely destroyed, because it doesn’t depend on the file system at all.

What’s the difference between signature-based recovery and file system recovery?+

File system recovery uses metadata structures (the Master File Table on NTFS, the FAT tables on FAT32, inodes on ext4, the catalog on APFS) to find files. This recovers files with their original names, paths, timestamps, and folder structure intact. Signature-based recovery ignores file system metadata entirely and finds files by scanning for their internal byte signatures. This works when the file system is damaged or missing, but recovers files without their original names, paths, or timestamps. The two techniques are complementary: file system recovery is preferred when possible because it preserves more information, while signature-based recovery is the fallback when file system recovery doesn’t work.

What tools use signature-based recovery?+

Several tools specialize in signature-based recovery. PhotoRec is the most user-friendly and widely used, supporting hundreds of file formats and running on Windows, macOS, and Linux. Foremost is the original signature-based carver from 2001, command-line, with configurable signature definitions. Scalpel is an improved version of Foremost with better performance, also command-line. Bulk_extractor scans for specific data types like email addresses, credit card numbers, and URLs in addition to files. Most full-featured commercial recovery tools (R-Studio, EaseUS, DiskGenius) include signature-based recovery as one mode alongside file system recovery.

Why do signature-recovered files lose their names?+

File names aren’t part of file content; they live in the file system metadata. The Master File Table on NTFS stores each file’s name in a record alongside the pointer to where the file’s data lives on disk. Signature-based recovery finds files by scanning the data regions directly without looking at the file system, so it never sees the names. Recovered files are typically named with sequential numbers (file001.jpg, file002.jpg) or grouped into folders by file type. The original filename, the original folder path, the creation and modification timestamps, and any file system attributes are all lost. If you need names preserved, file system recovery is the right approach; signature-based recovery is the fallback when names can’t be preserved at all.

Does signature-based recovery work on fragmented files?+

Mostly no, and this is the technique’s biggest limitation. Signature-based recovery assumes the file’s data is contiguous in storage; it finds the header, then reads forward expecting the rest of the file to follow. When a file is fragmented (stored in multiple non-contiguous chunks), reading forward from the header eventually hits data from a different file, producing a recovered file that’s corrupted or partial. Modern signature recovery tools include some fragmentation handling, including statistical techniques to detect fragmentation points, but reliable recovery of heavily fragmented files generally requires file system metadata that signature-based recovery doesn’t use. Files that work best with signature-based recovery are those that tend to be stored contiguously: photos and short videos, individual documents, audio files.

What can signature-based recovery not recover?+

Several scenarios are outside the technique’s reach. Encrypted files appear as random bytes with no recognizable signatures, so they can’t be located. Files where the first sector containing the header is in a bad sector are typically lost because the signature can’t be matched. Files smaller than the cluster size may be missed if the tool scans at cluster boundaries rather than sector boundaries. Heavily fragmented files often come back broken or partial. Compressed file content where the magic number doesn’t survive the compression won’t be found. And any file format whose signature isn’t in the tool’s database won’t be recognized at all; signature-based recovery can only find file types it knows how to identify.

Related glossary entries

  • File Carving: the broader umbrella concept; signature-based recovery is its most common form.
  • Sector-by-Sector Clone: the technique that produces the image signature recovery runs against.
  • RAW Partition: the canonical scenario where signature recovery is needed.
  • Unallocated Space: the region signature recovery often scans to find deleted files.
  • File Fragmentation: the limitation that defeats signature recovery on large files.
  • Data Recovery: the umbrella concept; signature recovery is one strategy.
  • Disk Image: the artifact signature recovery typically runs against.

About the Authors

👥 Researched & Reviewed By
Rachel Dawson
Rachel Dawson
Technical Approver · Data Recovery Engineer

Rachel brings over twelve years of cleanroom data recovery experience and routinely deploys signature-based recovery in cases where the file system is unrecoverable. The hardest conversations in her intake work are explaining to customers that their 10,000 recovered photos won’t have names, dates, or folder structure preserved; users frequently arrive expecting metadata recovery that simply isn’t possible from the byte-level data alone. The post-recovery organization phase often takes longer than the technical recovery itself.

12+ years data recovery engineeringPhotoRec specialistForensic carving
Editorial Independence & Affiliate Disclosure

Data Recovery Fix earns revenue through affiliate links on some product recommendations. This does not influence our reference content. Glossary entries are written and reviewed independently based on documented research, vendor documentation, independent testing, and recovery-engineer review. If anything on this page looks inaccurate, outdated, or worth revisiting, please reach out at contact@datarecoveryfix.com and we’ll review it promptly.

We will be happy to hear your thoughts

Leave a reply

Data Recovery Fix: Reviews, Comparisons and Tutorials
Logo