Various Oddities Graphics, Game Dev, Emulators, and other geeky stuff

2Feb/0817

Vista Thumbnail Cache

I had a thought while messing with fast thumbnailing: why generate my own thumbnails when Explorer has already generated them for me? After an hour or so, I was able to code up a thumbs.db reader for XP (and kin, although I never tested it). Vista, on the other hand, has taken a bit longer…

Note: the structures listed in this file are in pseudo-C syntax – I use 010 Editor as my hex editor, and it has some cool templating features – the bits below are just copied out of the templates I made.

If you find this information useful, please link back here!

Thumbs.db

Older versions of Explorer (those in 9x to 2003) would dump ‘thumbs.db’ files in any directory you navigated to and it generated thumbnails for. These files are serialized OLE databases, similar to (pre-2007) Office documents, containing a special ‘Catalog’ of filenames and a list of thumbnail entries. It’s pretty trivial to find the thumbnail for a given file by looking through the catalog for the filename and reading out the corresponding entry.

Resources

Vinetto thumbnail dumper, Pete’s ThumbDBLib (the comments have useful stuff)

thumbcache_*.db

Instead of dirtying up every directory with a thumbs.db, Vista is a bit smarter – it has a set of files under AppData\Local\Microsoft\Windows\Explorer. There’s the thumbcache_idx.db index, a set of files for each size (thumbcache_32.db, thumbcache_96.db, thumbcache_256.db, and thumbcache_1024.db), and finally some other file (thumbcache_sr.idx) that I haven’t figured out yet – maybe string resources. When trying to find a thumbnail, Explorer first looks in the idx file, finds the entry it wants, and then uses that information to find the data in the container with the size it’s looking for.

thumbcache_idx.db (IMMM)

There are two basic structures in this file: the header, and the entries.

typedef struct {
    CHAR magic[4]; // IMMM
    DWORD unk1;
    DWORD unk2;
    DWORD headerSize;
    DWORD entryCount;
    DWORD unk4;
} IMMMH;


Found at the top of the file; immediately following this header there are IMMMH.entryCount IMMM entries:

typedef struct {
    UQUAD secret<format=hex>;
    FILETIME lastModified;
    UINT unk2;
    UINT offset32<format=hex>;
    UINT offset96<format=hex>;
    UINT offset256<format=hex>;
    UINT offset1024<format=hex>;
    UINT offsetsr<format=hex>;
} IMMM;


The ‘secret’ is a 64-bit identity for the entry – it seems to be based on the file name, data, and maybe modification time. unk2 may be a kind of type – it either seems to be 0 or 1. Alternatively, it could be the color of the node, if the idx file contains a serialized red-black tree (which is what the thumbs.db file is). Finally, the structure has the offsets into the 4 size files and sr file of the entry. These may be set to -1 (0xFFFFFFFF) if the entry does not exist in the files.
Interestingly, a lot of entries in the idx file are zeroed out – this makes me think that the file is some serialized tree with spaces for expansion. In my experiments, the IMMMH.entryCount is the number of entries total, not the number of valid ones. If a secret is 0, I just skip it.

thumbcache_*.db (CMMM)

The other files seem to be content databases, containing a small header followed by a list of entries like the idx file. These files can be scanned to dump all thumbnails, but for lookup it’s obvious the idx file is used. One interesting thing about this file format is that they seem to allocate the file in large chunks (probably to prevent fragmentation), and include a placeholder entry at the end of the file. When a new thumbnail needs to be added, it’s placed at the end and the placeholder moves down.

typedef struct {
    CHAR magic[4]; // CMMM
    DWORD unk1;
    DWORD unk2;
    DWORD headerSize;
    DWORD offsetLastEntry;
    DWORD entryCount;
} CMMMH;


offsetLastEntry is the offset in the file of the last CMMM entry, and is used for quickly appending entries to the file. Immediately following this header, the entries start:

typedef struct {
    CHAR magic[4]; // CMMM
    DWORD sizeHeaderAndData;
    UQUAD secret<format=hex>;
    CHAR ext[8]; // Unicode - sometimes .txt, .jpg, etc
    DWORD huh1;
    DWORD type; // 0 or 1?
    DWORD dataSize;
    DWORD unk1;
    DWORD unk2;
    DWORD unk3;
    DWORD unk4;
    DWORD unk5;
    CHAR name[32]; // Unicode of some 16 character hex encoding of a string
    if( sizeHeaderAndData - size > 88 )
        CHAR padding[ sizeHeaderAndData - size - 88 ];
    if( size > 0 )
        CHAR data[ size ];
} CMMM;


sizeHeaderAndData is the size of the header (usually 88 or 90b) + dataSize. Note that some entries are empty (dataSize=0). The secret here is the same secret in the IMMM file entry. The name field is weird as it’s a Unicode string of the hex of some 8 byte number (not the secret).

The CMMM file can contain a lot of different things; BMPs/JPEGs/etc for thumbnails, pre-rendered folder icons (that include the child thumbnails), and file icons (e.g., the icon for .txt files). The ext field is sometimes populated with the extension, if available. Folder icons and such always seem to be PNG, while small thumbnails (those in the thumbcache_32.db file) seem to be BMPs – probably to save on decode time. The larger sizes seem to be JPGs.

Lookup and the Secret

The actual lookup is fairly easy:

  1. Load the thumbcache_idx.db file
  2. Find the IMMM entry with the secret ID you are looking for
  3. If the offset in the content db you are looking for is not -1, open that content db file
  4. Seek to the position given in the IMMM
  5. Read dataSize bytes

The only complicated part is generating the secret ID. I currently have no way of doing this – it has to be something containing both the filename and some sort of hash/checksum of the file data, but I don’t know. A few simple tests show this, one being renaming a file – it’ll get a new entry in the db, even though the bytes did not change. Saving two identical files under different names will also result in two different IDs.

Call for Help

If you know anything about the secret ID – what it could be, how to generated, etc, please let me know or post a comment! It’s possible to run Explorer through IDA and catch what it does, but my x86 disassembler skills are not advanced enough for that :)

Templates

You can download the 010 Editor templates here: ThumbCache Templates

To use, open 010 Editor, open the idx and content files, then go Templates->Open Template. Browse to the .bt file, open it, click into the corresponding document (e.g., the idx.db if you opened the immm.bt template), and hit F10. You should see a small pane appear below the hex that lets you look at all the structures.

Comments (17) Trackbacks (1)
  1. Ben, you have some good info here.
    I might be able to help out a bit. I’m not a programmer, but I’m in the computer forensics field.
    If your interested, shoot me an email.

  2. It appears that in the file thumbcache_idx.db in the entries structure for the field unk2 when the value is 0 then this entry is valid in one of the thumbcache*.db files. If the value is 1 then there is not an entry in any of the thumbcache*.db files. If someone else can also confirm this that would be great.

  3. Also I found in the thumbcache_*.db (CMMM) the DWORD huh1; is actually the length of the unicode field. I found this when I ran accross one of the entries that happened to be 30 instead of 32. So this CHAR name[32]; should be CHAR name[huh1]; Once I did this then it was able to find all the entries.

  4. This doesn’t look like it’s been updated in a while, but this info might be useful to others:

    In thumbcache_*.db (CMMM) files, DWORD unk1 in the header info is the length of the header in bytes, after the “CMMM” magic bytes/file signature. This is pretty much set at 0×14 for every CMMM file I’ve found.

    Also in CMMM files, DWORD unk2 in the header specifies what size of thumbnails the cache stores. The possible values are these:

    0 – 32 x 32
    1 – 96 x 96
    2 – 256 x 256
    3 – 1024 x 768

    I can also confirm what Mark has found out, the length of the Char array differs between 30 and 32 for some reason. I can’t see why this happens, as you would assume whatever hashing algorithm they use to generate the ID string would produce a result of constant length.

  5. Take “name[32]” (my example: 3700610062003700350064006400320063003300380035003400360063003400) and convert its value as Unicode to HEX (my example: 7ab75dd2c38546c4). Then find this value in Windows.edb (see C:\ProgramData\Microsoft\Search\Data\Applications\Windows). Have a nice day! ;)

  6. I have do a few simple test. After I move, touch or rename the image. They all seems to be the same entry. The secret ID may have nothing to do with the path, size or time.
    Is there any unique file identifier in vista? like GUID / inode number. I dun know much about it. perhaps the ID is the hash of it.

  7. Hey, the size of the header should be either 86h or 88h. The length of the name[32] is varying. It is either 30 or 32 bytes. The huh1 in CMMM should in fact indicates this value. The huh1 is either 0×20 or 0x1E is most cases. What do you think? what is huh1?

    The unknown in CMMM should be checksum. When I modify the image data, it is rebuilt.

    btw, is there any idea on the secretID so far?

  8. I am just thinking…
    Will the secretID be the hashed value of the name[32]? It is convenient and easy to implement. The secretID has to be very random, with avalanche effect. It needs not to be two way. It should not be a cipher with key. Perhaps it uses common hash like SHA / MD5 and work on name[32].
    is there anyone suggesting other directions?

    Thx

  9. dm thumbs said that the unk2,3 and unk4,5 in CMMM are the CRC64 checksum of the image data and header data respectively. However, after several test, I found that I cannot hash the correct value.
    Is there any one working on them?
    Thx

  10. FUF – you’re right that the Windows.edb file does contain a field for the thumbnail – it’s called “System_ThumbnailCacheId” and is an eight byte value. Until now I hadn’t tried figuring out a way to reverse engineer the metadata for a thumbnail. The only potential fly in the ointment is if Windows Desktop Search is switched off on the machine, the edb file will be empty, which will prevent gathering the original files metadata from the edb file related to the thumbnail.

    This page is a very nice bit of work, well done!

  11. This is an excellant document on vista Thumbcache files available in the net.
    Thanks for sharing..

  12. Check out this Thumbs.db Viewer (www.janusware.com) disassembled and displaying headers of Vista thumbcache files (IMMMH, CMMMH), headers of entries (IMMM, CMMM),
    and headers of files, and entries for Thumbs.db files even in its free version. Very nice tool.

  13. In Windows 7, the structures have changed a little -
    1) IMMM entry does not contain the 8 byte lastModified field.
    2) CMMM entry does not contain the 8 byte extension field.
    Additionally -
    1) It seems like the unk1 (IMMMH) as mentioned before in Vista as being the header length is possibly a descriptor for the OS type. x14 being Vista and x15 being Win 7.
    2) IMMM unk2 has changed. Used to be 0 or 1 depending on if the entry was present or not (and subsequently just a filename representation of sorts). Now it’s something like x02800008, x02500008, x82054800. There are quite a few more.

    Can somebody verify or disprove and reply? Would be most helpful to get this thread up and running again.

  14. Jopaque, my researches have led me to the same results, and I completely confirm your comment.

  15. Thanks Dec! Any possible idea what the IMMM unk2 fields pertain to? It seemed to follow a pattern geared toward the filetype, but I didn’t need this information so I stopped short of looking at it in depth. Still curious and would appreciate some additional comments from others finding’s.

  16. It’s been a while since the last comment.

    First of all Ben and others great work. I’ve started with combining your effort and personal additions in a format specification document that can be found on https://sourceforge.net/projects/libwtcdb/.

    It currently only contains info about the cache (database) file format, but an updated version with my findings on the index file will follow.

    Some of my findings, some new some that confirm your earlier additions.

    > dm thumbs said that the unk2,3 and unk4,5 in CMMM are the CRC64 checksum
    > of the image data and header data respectively

    I’ve not validated unk2,3 yet, but unk4,5 are a CRC-64 of the first 48 (Vista) or 40 (Win7) bytes in the CMMM. The initial value of the CRC is -1 however I do not what the polynomial is, but you can find the 256 x 8 byte CRC look-up table in thumbcache.dll.

    CMMMH.unk1 seems to be the format version 20 (Vista), 21 (Win7)
    The WIn7 format differs from the Vista.

    CMMMH.entryCount does not hold for all cache files. E.g. I have a cache file were this value is 377 although it only contains 364 CMMM entries.

    CMMM.type looks more like it is size of the padding.
    For now I’ve only spotted values of 0 and 2.

    CMMM.name is not 32 bytes of size. CMMM.huh1 contains its size.


Leave a comment