Various Oddities Graphics, Game Dev, Emulators, and other geeky stuff

2Feb/0817

Vista Thumbnail Cache

I had a thought while messing with fast thumbnailing: why generate my own thumbnails when Explorer has already generated them for me? After an hour or so, I was able to code up a thumbs.db reader for XP (and kin, although I never tested it). Vista, on the other hand, has taken a bit longer...

Note: the structures listed in this file are in pseudo-C syntax - I use 010 Editor as my hex editor, and it has some cool templating features - the bits below are just copied out of the templates I made.

If you find this information useful, please link back here!

Thumbs.db

Older versions of Explorer (those in 9x to 2003) would dump 'thumbs.db' files in any directory you navigated to and it generated thumbnails for. These files are serialized OLE databases, similar to (pre-2007) Office documents, containing a special 'Catalog' of filenames and a list of thumbnail entries. It's pretty trivial to find the thumbnail for a given file by looking through the catalog for the filename and reading out the corresponding entry.

Resources

Vinetto thumbnail dumper, Pete's ThumbDBLib (the comments have useful stuff)

thumbcache_*.db

Instead of dirtying up every directory with a thumbs.db, Vista is a bit smarter - it has a set of files under AppData\Local\Microsoft\Windows\Explorer. There's the thumbcache_idx.db index, a set of files for each size (thumbcache_32.db, thumbcache_96.db, thumbcache_256.db, and thumbcache_1024.db), and finally some other file (thumbcache_sr.idx) that I haven't figured out yet - maybe string resources. When trying to find a thumbnail, Explorer first looks in the idx file, finds the entry it wants, and then uses that information to find the data in the container with the size it's looking for.

thumbcache_idx.db (IMMM)

There are two basic structures in this file: the header, and the entries.

typedef struct {
    CHAR magic[4]; // IMMM
    DWORD unk1;
    DWORD unk2;
    DWORD headerSize;
    DWORD entryCount;
    DWORD unk4;
} IMMMH;


Found at the top of the file; immediately following this header there are IMMMH.entryCount IMMM entries:

typedef struct {
    UQUAD secret<format=hex>;
    FILETIME lastModified;
    UINT unk2;
    UINT offset32<format=hex>;
    UINT offset96<format=hex>;
    UINT offset256<format=hex>;
    UINT offset1024<format=hex>;
    UINT offsetsr<format=hex>;
} IMMM;


The 'secret' is a 64-bit identity for the entry - it seems to be based on the file name, data, and maybe modification time. unk2 may be a kind of type - it either seems to be 0 or 1. Alternatively, it could be the color of the node, if the idx file contains a serialized red-black tree (which is what the thumbs.db file is). Finally, the structure has the offsets into the 4 size files and sr file of the entry. These may be set to -1 (0xFFFFFFFF) if the entry does not exist in the files.
Interestingly, a lot of entries in the idx file are zeroed out - this makes me think that the file is some serialized tree with spaces for expansion. In my experiments, the IMMMH.entryCount is the number of entries total, not the number of valid ones. If a secret is 0, I just skip it.

thumbcache_*.db (CMMM)

The other files seem to be content databases, containing a small header followed by a list of entries like the idx file. These files can be scanned to dump all thumbnails, but for lookup it's obvious the idx file is used. One interesting thing about this file format is that they seem to allocate the file in large chunks (probably to prevent fragmentation), and include a placeholder entry at the end of the file. When a new thumbnail needs to be added, it's placed at the end and the placeholder moves down.

typedef struct {
    CHAR magic[4]; // CMMM
    DWORD unk1;
    DWORD unk2;
    DWORD headerSize;
    DWORD offsetLastEntry;
    DWORD entryCount;
} CMMMH;


offsetLastEntry is the offset in the file of the last CMMM entry, and is used for quickly appending entries to the file. Immediately following this header, the entries start:

typedef struct {
    CHAR magic[4]; // CMMM
    DWORD sizeHeaderAndData;
    UQUAD secret<format=hex>;
    CHAR ext[8]; // Unicode - sometimes .txt, .jpg, etc
    DWORD huh1;
    DWORD type; // 0 or 1?
    DWORD dataSize;
    DWORD unk1;
    DWORD unk2;
    DWORD unk3;
    DWORD unk4;
    DWORD unk5;
    CHAR name[32]; // Unicode of some 16 character hex encoding of a string
    if( sizeHeaderAndData - size > 88 )
        CHAR padding[ sizeHeaderAndData - size - 88 ];
    if( size > 0 )
        CHAR data[ size ];
} CMMM;


sizeHeaderAndData is the size of the header (usually 88 or 90b) + dataSize. Note that some entries are empty (dataSize=0). The secret here is the same secret in the IMMM file entry. The name field is weird as it's a Unicode string of the hex of some 8 byte number (not the secret).

The CMMM file can contain a lot of different things; BMPs/JPEGs/etc for thumbnails, pre-rendered folder icons (that include the child thumbnails), and file icons (e.g., the icon for .txt files). The ext field is sometimes populated with the extension, if available. Folder icons and such always seem to be PNG, while small thumbnails (those in the thumbcache_32.db file) seem to be BMPs - probably to save on decode time. The larger sizes seem to be JPGs.

Lookup and the Secret

The actual lookup is fairly easy:

  1. Load the thumbcache_idx.db file
  2. Find the IMMM entry with the secret ID you are looking for
  3. If the offset in the content db you are looking for is not -1, open that content db file
  4. Seek to the position given in the IMMM
  5. Read dataSize bytes

The only complicated part is generating the secret ID. I currently have no way of doing this - it has to be something containing both the filename and some sort of hash/checksum of the file data, but I don't know. A few simple tests show this, one being renaming a file - it'll get a new entry in the db, even though the bytes did not change. Saving two identical files under different names will also result in two different IDs.

Call for Help

If you know anything about the secret ID - what it could be, how to generated, etc, please let me know or post a comment! It's possible to run Explorer through IDA and catch what it does, but my x86 disassembler skills are not advanced enough for that :)

Templates

You can download the 010 Editor templates here: ThumbCache Templates

To use, open 010 Editor, open the idx and content files, then go Templates->Open Template. Browse to the .bt file, open it, click into the corresponding document (e.g., the idx.db if you opened the immm.bt template), and hit F10. You should see a small pane appear below the hex that lets you look at all the structures.