The "Go" tools

The RSDS pdb format

by Jeremy Gordon -

This file describes the format of the pdb (Program Database) files of the "RSDS" or "DS" type which are emitted by Miscrosoft's link.exe from version 7 and above.

For a description of the earlier "JG" format see Sven B. Schreiber's excellent book:
Undocumented Windows 2000 Secrets.

If anyone wishes to add to or correct the information in this file let me know and I'll include your contribution with an appropriate acknowledgement.

What are PDB files?

The latest Microsoft linkers create a Program Database (pdb) file when linking if the /DEBUG option, or /DEBUG:FULL option is chosen. The pdb file contains information about the creation of the executable, and also contains the symbol information in the latest CodeView format. The executable contains a path and filename for the pdb file on the local machine, together with an identification code, so that the correct pdb file can be located. Neither the format of the pdb file itself nor the latest CodeView format are documented. To my knowledge, the format has changed twice already and it is likely to change again. Microsoft provide APIs to analyse and report the contents of the pdb files in its Debug Information Access (DIA) SDK, but unfortunately this is available only if you subscribe to the Enterprise edition of MSDN or purchase Visual Studio .NET.

PDB file information in the executable

The linker puts the filename of the pdb file made at link-time, and its path on the local machine, in the "CODEVIEW" debug directory in the executable. If this is missing, it's most likely because a dbg file was made instead. This might occur for example if the REBASE program was run after linking. In that case the path and filename of the pdb file will be contained in the dbg file. The filename of the dbg file will then appear in the "MISC" debug directory of the executable.

Where the "CODEVIEW" debug directory does contain the pdb file information it will be in the following format:-

+0h   dword        "RSDS" signature
+4h   GUID         16-byte Globally Unique Identifier
+14h  dword        "age"
+18h  byte string  zero terminated UTF8 path and file name
Here RSDS signature identifies the format. The Globally Unique Identifier is a machine specific unique value. It is written here into the executable and also into the pdb file so that the two can be identified as matched. The "age" is a value which is incremented each time the executable and its associated pdb file is remade by the linker.

Viewing the "CODEVIEW" debug directory

One of the easiest ways to do this is to use a tool which is capable of displaying the contents of the executable visually. One such tool is Wayne J. Radburn's PEView.
With this tool, open the executable and open IMAGE_NT_HEADERS on the left pane. Click on IMAGE_OPTIONAL_HEADER and scroll down until you reach the DEBUG directory entry. This gives the RVA (Relative Virtual Address) of the information you are interested in. Make sure the toolbar is switched to RVA values so that you can proceed (RVA is an address which would apply if the executable was loaded into memory ready to run). You are looking for the DEBUG directory in the executable and it will most likely be buried inside the data of one of the sections. The most likely is the "rdata" section. Click on that in the left hand pane and check that the DEBUG directory now appears. If not, try the other sections looking for the RVA given in the IMAGE_OPTIONAL_HEADER. The DEBUG directory contains pointers to the debug information. If there is a "CODEVIEW" debug directory in the file it will appear in the DEBUG directory, and it will also appear in PEView's left pane. Click on this in the left pane to view its contents.
Here is a typical example of the contents of the "CODEVIEW" debug directory:-


Here the GUID is "9122DBB2-E88F-0245-A20556A28496D442". "Age" is 7. Then the path and filename of the pdb file follows. Note that this is supposedly in the UTF-8 format, which means that filenames in non-Roman characters can be used.

Viewing the PDB file

Since the PDB format keeps changing, you can't expect visual tools to keep up with the changes and the files are best viewed using a hex editor such as Paws, or dumped to a file or printed using a hex filedump program such as Borland's tdump.

Nature of the PDB file

As Sven B. Schreiber worked out, the pdb file format is similar to that used by a disk file system. A disk file system would be divided into blocks of data called "sectors" of a fixed size. The data from a file is contained in those sectors identified as spare when the file is written to disk, but they are not necessarily contiguous on the disk. A file directory keeps track of where the data is on the disk. In pdb files, it might be more appropriate to call the blocks of data "pages", the data from a file a "stream" and the file directory the "stream directory".

Inside the PDB file - header

At the top of the pdb file is the header which appears in this dump:-
Turbo Dump  Version 4.2.16.1 Copyright (c) 1988, 1996 Borland International
                   Display of File TESTGOBUG.PDB

000000: 4D 69 63 72 6F 73 6F 66  74 20 43 2F 43 2B 2B 20 Microsoft C/C++ 
000010: 4D 53 46 20 37 2E 30 30  0D 0A 1A 44 53 00 00 00 MSF 7.00...DS...
000020: 00 04 00 00 02 00 00 00  E3 00 00 00 B4 04 00 00 ................
000030: 00 00 00 00 E1 00 00 00  00 00 00 00 00 00 00 00 ................
000040: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 ................
The character to look for here is 1Ah which (in ascii terms) is an "end-of-file" character. In this case appears (coincidentally) at offset +1Ah in the file and marks the end of the string "Microsoft C/C++ MSF 7.00". It should be noted that the length of this string does differ between different pdb versions. The end-of-file character is immediately followed by the signature which in this case is "DS", then a null-terminator and then sufficient padding to bring the header to the next dword, which in this case is at +20h.

At +20h we find the dword value 400h. This is the size of each block of data which we might call the "page size". In other words the pdb file is divided into blocks of 400h bytes (1,024 bytes in decimal).
At +24h there is the value 2h. I am not yet sure what this represents.
At +28h there is the value 0E3h. This indicates how many pages there are in the whole file. If this is multiplied by the page size of 400h it produces 38C00h or 232,448 which is the size of the pdb file in bytes.
At +2Ch there is the value 4B4h (1,204 decimal). This is the total size of the stream directory in bytes. Since each page is 1,024 bytes we now know that the stream directory cover a complete page plus 180 bytes. This is important because the stream directory is not necessarily contiguous in the file either as we shall see.
At +30h there is the value zero. I have not yet discovered what this represents.
At +34h is the value 0E1h. This is a pointer to the stream directory pointers. Multiplied by the page size of 400h the value 0E1h becomes 38400h. So at 38400h in the file we would expect to find the stream directory pointers.

Inside the PDB file - stream directory pointers

Here is a dump of the file at 38400h holding the stream directory pointers:-
038400: DF 00 00 00 E0 00 00 00  00 00 00 00 00 00 00 00 ................
038410: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 ................
038420: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 ................
The stream directory pointers are in a very simple structure. We know from the pdb header that the stream directory is in two pages so we would expect two pointers. And we can see that the pointers are 0DFh and 0E0h. Pointers are needed because the stream directory is not necessarily contiguous in the file. To get the correct address each pointer needs to be multiplied by the page size of 400h. So we can see that the first page of the stream directory is at 0DFh * 400h = 37C00h, and then it continues at 0E0h * 400h = 38000h.

Inside the PDB file - stream directory

The stream directory is a structure in the following form:-
+0h dword  number of streams
+4h a dword for each stream giving the size in bytes of the stream
     0=no stream
    -1=no stream
+?h array of pointers to the streams
Here is a dump of the file at 37C00h:-
037C00: 15 00 00 00 48 03 00 00  59 00 00 00 98 F2 02 00 ....H...Y.......
037C10: D7 07 00 00 00 00 00 00  D0 0A 00 00 6C 03 00 00 ............l...
037C20: 18 11 00 00 AA 14 00 00  FF FF FF FF 19 00 00 00 ................
037C30: 70 00 00 00 B4 05 00 00  68 01 00 00 1C 00 00 00 p.......h.......
037C40: FF FF FF FF FF FF FF FF  FF FF FF FF FF FF FF FF ................
037C50: FF FF FF FF C8 00 00 00  D9 00 00 00 DE 00 00 00 ................
037C60: DC 00 00 00 18 00 00 00  19 00 00 00 1A 00 00 00 ................
037C70: 1B 00 00 00 1C 00 00 00  1D 00 00 00 1E 00 00 00 ................
037C80: 1F 00 00 00 20 00 00 00  21 00 00 00 22 00 00 00 .... ...!..."...
037C90: 23 00 00 00 24 00 00 00  25 00 00 00 26 00 00 00 #...$...%...&...
037CA0: 27 00 00 00 28 00 00 00  29 00 00 00 2A 00 00 00 '...(...)...*...
037CB0: 2B 00 00 00 2C 00 00 00  2D 00 00 00 2E 00 00 00 +...,...-.......
037CC0: 2F 00 00 00 30 00 00 00  31 00 00 00 32 00 00 00 /...0...1...2...
037CD0: 33 00 00 00 34 00 00 00  35 00 00 00 36 00 00 00 3...4...5...6...
037CE0: 37 00 00 00 38 00 00 00  39 00 00 00 3A 00 00 00 7...8...9...:...
The first dword contains the value 15h. This indicates that there are 21 streams of data in the file. It also means that there are 21 dwords following (giving stream sizes). Page pointers therefore start at +58h which is at 37C58h in the file.
The stream sizes indicate how many page pointers there are for each stream. This is the same system as is used to indicate how many pointers there are to the stream directory itself.

So for example, we can see that stream 1 is 348h bytes long. This can be fitted into one page, so we would expect to find only one pointer to stream 1. This pointer (at 37C58h) is 0D9h, which multiplied by the page size of 400h is 36400h.
Stream 2 is 59h bytes long and its pointer is 0DEh * 400h = 37800h.
Stream 3 is 2F298h bytes (193,176 decimal) long. It therefore covers 189 pages and has 189 pointers starting at 37C5Ch. Its first page is at 0DEh (37800h) its second page is at 0DCh (37000h), its third page is at 18h (6000h) and so on.

Some stream sizes are either zero or -1 and these can be ignored. There will be no page pointer at all for these streams.

The streams

I have not tried very hard to identify the contents of the streams since it is reasonably easy to find the main one of interest (symbols). Like in the "JG" type pdb files, the symbols stream is either the eighth or the ninth stream. Streams 1 to 4 always seem to contain the same type of information. Above stream 4 the contents of the streams tend to vary. Sometimes streams are missing altogether or other streams are added. So far I have not found an index indicating what the streams contain. The ones I have identified so far are:-
  • Stream 1 - (possibly) previous stream directory.
  • Stream 2 - pdb file authenticity.
  • Stream 3 - material from the .debug$S and .debug$T sections in the object file. This can be voluminous, since it will contain a lot of unused material, for example structures and structure members from include files referred to in the source script.
  • Stream 4 - files used in the build process.
  • Stream 8 or 9 - symbols.
  • Above stream 8 you will find section data, other debug symbols, linker own file information and linked import information.

Stream 2 - pdb file authenticity

This field in important because it allows a check to be made to ensure that the pdb file matches the executable concerned. Here is a dump of the file at 37800h holding the pdb file authenticity:-
037800: 94 2E 31 01 25 55 1A 40  07 00 00 00 91 22 DB B2 ..1.%U.@....."..
037810: E8 8F 02 45 A2 05 56 A2  84 96 D4 42 11 00 00 00 ...E..V....B....
037820: 2F 4C 69 6E 6B 49 6E 66  6F 00 2F 6E 61 6D 65 73 /LinkInfo./names
037830: 00 02 00 00 00 04 00 00  00 01 00 00 00 06 00 00 ................
037840: 00 00 00 00 00 0A 00 00  00 0A 00 00 00 00 00 00 ................
037850: 00 04 00 00 00 00 00 00  00 00 00 00 00 00 00 00 ................
If you compare this with the "age" and GUID in the "CODEVIEW" debug directory which we saw in the executable, you can see that there is an exact match. Here the age is at +8h, and the GUID is at +0Ch. There is also a timedate here at +4h, but this will not necessarily match that in the executable.

Symbol stream

In "DS" files each symbol is in the following structure which is similar to that found in the earlier "JG" files, except that the symbol type numbers have changed and the string containing the symbol name is no longer preceded by a size byte (ie. it's no longer a pascal string):-
+0h word - size of structure not including this word but
                  including the padding after the string
+2h word - type of symbol.  So far the following are known:-
           1108h = data type (from h or inc file)
           110Ch = symbol marked as "static" in the object file
           110Eh = global data variables, function names, imported functions
                   local variables
           1125h = function prototype
+4h dword - reserved
+8h dword - offset value
+0Ch word - section number
+0Eh bytes - null terminated string containing symbol name
+?h bytes - padding to next dword


Copyright © Jeremy Gordon 2004
Back to top