Signature-Based Recovery
Signature-based recovery finds files by what’s inside them rather than where the file system says they live. When the file system is corrupted or gone, every JPEG still starts with the same byte pattern, every PDF still starts with the same bytes, every PNG still starts with the same bytes. Recovery tools scan the raw drive looking for those patterns and extract the file content directly. The tradeoff: file names, paths, and timestamps are gone.
Handy Recovery · Apriorit
Magic numbers + length
Modern carving practice
Signature-based recovery is a data recovery technique that locates files by scanning storage for distinctive byte patterns called file signatures or magic numbers, rather than relying on file system metadata. Every file format begins with a specific sequence of bytes that identifies its type. Signature-based recovery tools scan a drive sector by sector looking for these signatures, then extract the data between the signature and either an explicit footer marker or a calculated file length. Because the technique ignores the file system entirely, it works on drives where the file system is corrupted, missing, or has been overwritten.
What Signature-Based Recovery Actually Is
Most data recovery techniques work by reading the file system. The Master File Table on NTFS, the FAT tables on FAT32, the inode tables on ext4, and the catalog on APFS all serve the same purpose: they list every file on the drive along with where its data is stored. Recovery tools that work with the file system can preserve file names, paths, timestamps, and folder structure because all that information is in the metadata. Signature-based recovery does something fundamentally different: it ignores the file system entirely and finds files by scanning the raw storage for the distinctive byte patterns inside the files themselves.1
The technique in one sentence
Every file format begins with a specific sequence of bytes (the file signature, also called a magic number) that identifies what type of file it is. Signature-based recovery scans for those byte patterns across the entire storage device, and when it finds one, extracts the data that follows as a file of that type.
When it’s the right approach
Signature-based recovery is the technique to use when:
- The file system is missing or corrupted: if the partition has been formatted, deleted, or had its metadata damaged, file-system-based recovery can’t work but signature-based recovery still can.
- The drive shows as RAW: a RAW partition has no working file system. Signature-based recovery doesn’t need one.
- You need to recover from unallocated space: deleted files in unallocated space have no file system entries pointing to them, but their content is still there until overwritten.
- You’re working with a partial or damaged image: a disk image that’s missing the file system area but contains the data area can still yield files via signature recovery.
- File-system recovery has failed or recovered too few files: a second pass with signature-based recovery often finds files the file-system approach missed.
Relationship to file carving
The terms “signature-based recovery” and “file carving” overlap heavily. File carving is the broader umbrella for any technique that recovers files from raw data without file system metadata. Signature-based recovery is the most common form of file carving, where the carver uses byte-pattern signatures to locate files. In practice, most modern file carving is signature-based; the terms are used interchangeably in much of the recovery industry. This entry focuses on the signature-recognition aspect specifically.
Historical origin
Magic numbers as a file-identification mechanism originated in 1979 with Seventh Edition Unix (V7) and have been part of computing standard practice ever since.2 The first widely-used file carving tool was Foremost, originally developed by the U.S. Air Force Office of Special Investigations in 2001. Scalpel, an improved version of Foremost, was introduced by Richard and Roussev in 2005. PhotoRec, by Christophe Grenier (the same developer behind TestDisk), became the dominant cross-platform tool over time and is still actively maintained.
How Magic Numbers and File Signatures Work
The core idea is simple: file formats need to be self-identifying. When you double-click a file, the operating system needs to know what application to open it with. Looking at the file extension is one method, but extensions can be wrong or missing. Looking at the actual bytes inside the file is more reliable, which is why file formats include identifying byte patterns at known locations.3
Common file signatures
Each file format has its own signature. A small sample of widely-used formats:
| Format | Header (hex) | Footer (if any) |
|---|---|---|
| JPEG | FF D8 FF | FF D9 |
| PNG | 89 50 4E 47 0D 0A 1A 0A | 49 45 4E 44 AE 42 60 82 |
| GIF | 47 49 46 38 37 61 or 47 49 46 38 39 61 | 00 3B |
25 50 44 46 | 25 25 45 4F 46 | |
| ZIP | 50 4B 03 04 | Variable; uses central directory |
| MP3 | FF FB or 49 44 33 (ID3) | None reliable |
| MP4 | 00 00 00 ?? 66 74 79 70 | None reliable |
| DOCX (and other Office) | 50 4B 03 04 (ZIP-based) | Same as ZIP |
| EXE / DLL | 4D 5A (MZ) | None reliable |
| RAR | 52 61 72 21 1A 07 00 | None reliable |
Why headers are reliable but footers aren’t always
The header is at a known location: the very start of the file. Every JPEG begins with FF D8 FF; this is part of the format specification. Footers are less reliable because not every format has a defined end marker. JPEG and PNG do (FF D9 and the IEND chunk respectively), and recovery tools can use those to determine where the file ends. PDF has %%EOF. But many formats (executables, audio, video) have no footer; the file simply ends when its declared length runs out. For formats without footers, signature-based recovery has to either read a calculated length from the header or guess based on what comes next.
Files without recognizable signatures
Some content can’t be located by signature scanning:
- Plain text files have no magic number; ASCII or Unicode text doesn’t have an identifying signature.
- Encrypted files appear as random bytes after encryption; if the magic number is encrypted along with the rest, it’s no longer recognizable.
- Custom application formats may not have signatures registered with carving tools’ databases.
- Compressed files where the compression destroys the signature pattern can be missed unless the carver knows about the wrapper format.
The Carving Process: Header to Footer
Signature-based recovery follows a consistent algorithm regardless of which tool implements it. Understanding the steps clarifies what the technique can and can’t do.4
Step 1: Sequential sector scan
The tool reads the storage device (or disk image) sequentially, sector by sector, comparing the contents of each sector against its database of known file signatures. The scan operates at the raw byte level, not at the file system level. It doesn’t care whether sectors are marked as allocated or free; it scans everything. This is why the technique works against unallocated space, deleted partitions, and damaged file systems.
Step 2: Signature matching
When a sector contains a byte sequence matching a known signature, the tool flags that location as a potential file start. For some formats (JPEG, PNG), the signature is at offset 0 within the file; for others (PDF), the signature might be a few bytes in. Modern carving tools handle these offset variations from format definitions in their configuration databases.
Step 3: Validation
A signature match alone isn’t enough to confirm a real file; the byte sequence might appear coincidentally in other content. Sophisticated carvers perform validation checks: parsing additional bytes after the signature to confirm the structure looks like a real file of that type.5 A JPEG signature followed by valid JPEG markers is much more likely to be a real JPEG than a random match.
Step 4: Determining file length
Three approaches:
- Footer search: for formats with reliable footers (JPEG, PNG, PDF), scan forward from the header until the matching footer is found. Everything between is the file.
- Length from header: some formats (BMP, MP4) include the file length in the header. The tool reads the length and extracts that many bytes.
- Maximum size limit: for formats without footers or length fields, extract up to a configured maximum size and assume the file ends there or at the next signature.
Step 5: Output
Each recovered file is written to a destination (an output directory or another drive) with a sequential generated name (file001.jpg, file002.jpg) and grouped by file type into subfolders. The tool produces a log of what was recovered, where on the source it came from, and any validation warnings.
Common signature-based recovery tools
| Tool | Platform | Strengths |
|---|---|---|
| PhotoRec | Windows, macOS, Linux | Most widely used; supports hundreds of formats; user-friendly menu interface |
| Foremost | Linux primarily | The original (2001); fast; configurable signature definitions |
| Scalpel | Linux primarily | Improved Foremost (2005); better performance and flexibility |
| Bulk_extractor | Cross-platform | Forensic-focused; finds emails, credit cards, URLs alongside files |
| R-Studio / EaseUS / DiskGenius | Cross-platform | Commercial tools with signature recovery as one mode |
| Recoverit / Disk Drill | Cross-platform | User-friendly commercial tools combining file system and signature recovery |
Why It Loses File Names and Metadata
The single most-asked question about signature-based recovery: where did all the original file names go? The answer reveals an important fact about how file systems actually store files.6
File names live in the file system, not in the file
When you create a file called “vacation_2024.jpg”, the file’s content is the JPEG image data starting with FF D8 FF. The name “vacation_2024.jpg” isn’t part of that content at all. It’s stored separately in the file system: in an MFT record on NTFS, in a directory entry on FAT32, in a directory inode on ext4. The file system links the name to the location on disk where the JPEG content actually lives.
Signature-based recovery scans the data regions where JPEG content lives. It never reads the MFT or directory entries because those are separate. The recovered JPEG content is intact, but the link from name to content has been lost.
What metadata gets lost
Signature recovery loses everything that lives in the file system rather than the file content:
- Original filename (“vacation_2024.jpg” → recovered as “file00001.jpg”).
- Folder path (“Pictures/Vacations/2024/Hawaii/” → recovered to a flat output directory by file type).
- Creation timestamp, modification timestamp, access timestamp.
- File system attributes (read-only, hidden, system).
- Owner and permission information on systems that track those.
- Alternate data streams on NTFS or extended attributes on macOS.
What does survive
Information embedded inside the file content survives because it’s part of what gets recovered:
- EXIF metadata in photos: camera model, GPS coordinates, date the photo was taken (this is inside the JPEG, not in the file system).
- ID3 tags in MP3 files: artist, album, track name, embedded album art.
- PDF document metadata: title, author, creation date as recorded by the PDF creator (inside the PDF, not the file system).
- Office document metadata embedded in the file format itself.
- Image content, audio content, video content intact in their entirety.
Why this matters for users
Users approaching signature-based recovery should understand the result format in advance. Recovery of 50,000 photos with names like img00001.jpg through img50000.jpg, grouped only by file type, can be overwhelming. Plan for the post-recovery work of identifying which files actually matter; tools like image deduplication software, EXIF-based date sorting, and visual review can help organize the recovered output. For files where the original name was the only way to identify content (named documents, project files), signature recovery may be only partially useful.
The Fragmentation Problem
The biggest technical limitation of signature-based recovery is its difficulty with fragmented files. Understanding why reveals an important limitation that no tool fully solves.7
The contiguity assumption
Signature-based recovery’s basic algorithm assumes file data is stored contiguously: find the header, read forward until the footer or calculated length, that’s the file. This works well when files are stored in unbroken sequences of sectors. It breaks when files are fragmented across multiple non-contiguous regions.
Imagine a 10 MB JPEG that was stored in two fragments: 6 MB at sector 1000 and 4 MB at sector 50000, with sectors 1000-1999 belonging to the JPEG, sectors 2000-2999 belonging to a different file (a Word document, say), and sectors 3000+ belonging to yet other files. Signature recovery finds the JPEG header at sector 1000 and reads forward, but at sector 2000 it starts reading the Word document content as if it were JPEG data. The result: a recovered file that’s 6 MB of valid JPEG followed by 4+ MB of corrupted content, or a partial JPEG that ends prematurely.
Why fragmentation happens
File systems fragment files when there isn’t enough contiguous free space for the entire file. As drives fill up with files of various sizes that get created, deleted, and modified, contiguous free regions become scarce. A new large file ends up split across whatever free fragments are available.
- Heavily-used drives fragment more than fresh drives.
- SSDs handle fragmentation differently from HDDs at the physical level (no seek penalty), but the file system still records fragmented allocations.
- Some file systems (APFS, ext4) actively fight fragmentation; FAT32 notoriously doesn’t.
- Large files are more likely to be fragmented than small files.
What types of files survive fragmentation best
Files that recover well via signature-based methods share characteristics:
- Photos: usually small enough (under 10 MB) to fit in contiguous free space.
- Audio files: typically stored contiguously by media applications.
- Short videos: if they fit in contiguous space.
- Individual documents: small documents tend to be contiguous.
Files that recover poorly:
- Large videos that span multiple fragments.
- Large databases with extensive fragmentation.
- Virtual machine disk images that are typically multi-GB and fragmented.
- Frequently-edited large files that have grown over time.
State-of-the-art fragmentation handling
Research and modern tools have improved fragmentation handling over basic header-to-footer carving. Sequential hypothesis testing (Pal, Sencar, Memon) detects fragmentation points statistically. Some tools attempt fragment reassembly: identifying multiple fragments that might belong together and reassembling them in the correct order. None of these techniques fully solves the fragmentation problem; for heavily fragmented files, file-system-based recovery remains the only path that reliably preserves file integrity.
Signature-based recovery is the recovery technique of last resort: when nothing else works, this often does. That makes it both invaluable and frequently misunderstood. Users approach signature recovery expecting it to be like file-system recovery with extra capability, then are surprised when the recovered files have generic names, are grouped by type rather than original folder structure, and include many partially-corrupted fragments alongside successful recoveries. Setting expectations correctly before running the scan saves significant frustration.8
The right mental model for signature-based recovery is: it’s archaeological. The technique reconstructs files from the byte-level evidence remaining on the drive, without reference to the original organization that’s been destroyed. A recovered photo collection from a heavily-used drive will include the user’s photos alongside cached thumbnails from web browsers, profile pictures from messaging apps, image attachments from emails, and screenshots from years past. Sorting through the result is part of the technique. The alternative (getting nothing at all because the file system is too damaged for traditional recovery) is much worse.
For practical use, signature-based recovery works best as the second-pass technique. Recovery software typically tries file-system-based recovery first and falls back to signature-based recovery for files the first pass missed. The two-pass approach gets the best of both: file system metadata where it exists, raw signature scanning where it doesn’t. Pair this with sector-by-sector cloning for failing drives and the recovery workflow becomes resilient: image the failing source once, run file system recovery against the image, then run signature-based recovery against the same image to capture anything the first pass missed. The image stays unchanged through both passes, so neither attempt risks the other’s results.
Signature-Based Recovery FAQ
Signature-based recovery is a data recovery technique that finds files by scanning storage for distinctive byte patterns called file signatures or magic numbers, rather than using file system metadata. Every file format starts with a specific sequence of bytes that identifies what type of file it is: JPEG images start with FF D8 FF, PNG images start with 89 50 4E 47 0D 0A 1A 0A, PDF documents start with 25 50 44 46. Signature-based recovery tools scan storage looking for these patterns, then extract the file data that follows. The technique works even when the file system is completely destroyed, because it doesn’t depend on the file system at all.
File system recovery uses metadata structures (the Master File Table on NTFS, the FAT tables on FAT32, inodes on ext4, the catalog on APFS) to find files. This recovers files with their original names, paths, timestamps, and folder structure intact. Signature-based recovery ignores file system metadata entirely and finds files by scanning for their internal byte signatures. This works when the file system is damaged or missing, but recovers files without their original names, paths, or timestamps. The two techniques are complementary: file system recovery is preferred when possible because it preserves more information, while signature-based recovery is the fallback when file system recovery doesn’t work.
Several tools specialize in signature-based recovery. PhotoRec is the most user-friendly and widely used, supporting hundreds of file formats and running on Windows, macOS, and Linux. Foremost is the original signature-based carver from 2001, command-line, with configurable signature definitions. Scalpel is an improved version of Foremost with better performance, also command-line. Bulk_extractor scans for specific data types like email addresses, credit card numbers, and URLs in addition to files. Most full-featured commercial recovery tools (R-Studio, EaseUS, DiskGenius) include signature-based recovery as one mode alongside file system recovery.
File names aren’t part of file content; they live in the file system metadata. The Master File Table on NTFS stores each file’s name in a record alongside the pointer to where the file’s data lives on disk. Signature-based recovery finds files by scanning the data regions directly without looking at the file system, so it never sees the names. Recovered files are typically named with sequential numbers (file001.jpg, file002.jpg) or grouped into folders by file type. The original filename, the original folder path, the creation and modification timestamps, and any file system attributes are all lost. If you need names preserved, file system recovery is the right approach; signature-based recovery is the fallback when names can’t be preserved at all.
Mostly no, and this is the technique’s biggest limitation. Signature-based recovery assumes the file’s data is contiguous in storage; it finds the header, then reads forward expecting the rest of the file to follow. When a file is fragmented (stored in multiple non-contiguous chunks), reading forward from the header eventually hits data from a different file, producing a recovered file that’s corrupted or partial. Modern signature recovery tools include some fragmentation handling, including statistical techniques to detect fragmentation points, but reliable recovery of heavily fragmented files generally requires file system metadata that signature-based recovery doesn’t use. Files that work best with signature-based recovery are those that tend to be stored contiguously: photos and short videos, individual documents, audio files.
Several scenarios are outside the technique’s reach. Encrypted files appear as random bytes with no recognizable signatures, so they can’t be located. Files where the first sector containing the header is in a bad sector are typically lost because the signature can’t be matched. Files smaller than the cluster size may be missed if the tool scans at cluster boundaries rather than sector boundaries. Heavily fragmented files often come back broken or partial. Compressed file content where the magic number doesn’t survive the compression won’t be found. And any file format whose signature isn’t in the tool’s database won’t be recognized at all; signature-based recovery can only find file types it knows how to identify.
Related glossary entries
- File Carving: the broader umbrella concept; signature-based recovery is its most common form.
- Sector-by-Sector Clone: the technique that produces the image signature recovery runs against.
- RAW Partition: the canonical scenario where signature recovery is needed.
- Unallocated Space: the region signature recovery often scans to find deleted files.
- File Fragmentation: the limitation that defeats signature recovery on large files.
- Data Recovery: the umbrella concept; signature recovery is one strategy.
- Disk Image: the artifact signature recovery typically runs against.
Sources
- Wikipedia: File carving (accessed May 2026)
- LSoft: What are File Signatures and why should you care when recovering?
- Handy Recovery: Understanding File Signatures and Signature-Based Data Recovery
- Apriorit: How to Recover Lost or Deleted Files with Data Carving
- Starus Recovery: Data Carving: Signature-Based Data Recovery
- Verdicraft: Advanced File Carving Techniques for Digital Forensics
- Andrea Fortuna: Some thoughts about file carving
- Inventive HQ: File Magic Number Checker – File Type Detection
About the Authors
Data Recovery Fix earns revenue through affiliate links on some product recommendations. This does not influence our reference content. Glossary entries are written and reviewed independently based on documented research, vendor documentation, independent testing, and recovery-engineer review. If anything on this page looks inaccurate, outdated, or worth revisiting, please reach out at contact@datarecoveryfix.com and we’ll review it promptly.
