Friday, May 23, 2014

Astrophotography - Unraveling the Nikon RAW image format

I took some pictures with a Nikon D90.

I selected the photos to be in the RAW image format. I thought I would log some of the details about this image format as I slowly unravel how to decode it. This is meant to be an educational process as all of this has already been done.
All the understanding can be done using a simple hex editor. After this, use your favourite programming language to read in the data.
This first post will simply be about TIFF files and completely ignore the rest of the image data for the Nikon format.

Preliminary Header


It turns out the image, like most RAW images, follows a TIFF file format. The format is as follows (in hex format, linux command "hexdump -C myimage.NEF | less"):


byte pos
data
ASCII chars
000000004d 4d 00 2a 00 00 00 08 00 19 00 fe 00 04 00 00|MM.*............|

This is the first line of code that I see. Now, let me break it up into what it represents:
  1. First 2 bytes : 0x4d4d. The ASCII (ascii being a lookup table from 1 byte to a character) representation is MM. A TIFF file will actually start with only one of two possible combinations: MM or II. This refers to the endianness of the file. MM refers to big endian and II refers to little endian.

    Computers read things in byte chunks (8 bits). However, the byte order is not specifically determined. Big endian assumes that the largest powers are first, while little endian assumes the opposite. Basically, a two-byte number positioned in memory:
    43 F2
    would be read as 0x43F2 in big endian and 0xF243 in little endian, and so forth... Note that you need to give a bye count and endianess to read in a number. Having the following string of numbers:
    04 5A C7 B3
    will yield different values as a 2-byte little endian (0x045A), 2-byte big endian (0x5A04), 4-byte little endian (0x045AC7B3) and 4-byte big endian (0xB3C75A04).
    So Nikon RAW images tend to be written in big endian with a few small exceptions.
  2. Next two bytes: the Magic number 42. Every TIFF file will always have the 3rd and 4th byte yield the magic number 42. What is 42 in hexadecimal? Well, it's 0x2a. The data we see reads:
    00 2a
    In little endian for 2-bytes, that gives 0x002a = 0x2a = 16*2 + 10 = 42
  3. The next four bytes: the position of the first IFD. The next four bytes give the position of what is called the Image File Directory. This will be explained later but is basically a header for the image in TIFF files. We can read that our next image file directory is in position:
    0x00000008=0x8 = 8
    This turns out to actually just be the next position in memory (as it is *most* of the time).

    Okay, so from now on, for notational brevity, I will call a char a 1-byte number, a short a 2-byte number, an int a 4 byte number and a long a 8-byte number.

The Image File Directory (IFD)


Allright, so we understand how to tell if a file is a TIFF file. We have a description of endianness, a magic number and finally, the position of the header of the image. Now, what is this header?

Basically, the header is simple, let's go through one:

00000000 4d 4d 00 2a 00 00 00 08 00 19 00 fe 00 04 00 00
00000010 00 01 00 00 00 01 01 00 00 04 00 00 00 01 00 00
00000020 00 a0 01 01 00 04 00 00 00 01 00 00 00 78 01 02
00000030 00 03 00 00 00 03 00 00 01 3c 01 03 00 03 00 00
00000040 00 01 00 01 00 00 01 06 00 03 00 00 00 01 00 02
00000050 00 00 01 0f 00 02 00 00 00 12 00 00 01 44 01 10
00000060 00 02 00 00 00 0a 00 00 01 58 01 11 00 04 00 00
00000070 00 01 00 00 d4 e4 01 12 00 03 00 00 00 01 00 08
00000080 00 00 01 15 00 03 00 00 00 01 00 03 00 00 01 16
00000090 00 04 00 00 00 01 00 00 00 78 01 17 00 04 00 00
000000a0 00 01 00 00 e1 00 01 1a 00 05 00 00 00 01 00 00
000000b0 01 64 01 1b 00 05 00 00 00 01 00 00 01 6c 01 1c
000000c0 00 03 00 00 00 01 00 01 00 00 01 28 00 03 00 00
000000d0 00 01 00 02 00 00 01 31 00 02 00 00 00 0a 00 00
000000e0 01 74 01 32 00 02 00 00 00 14 00 00 01 80 01 4a
000000f0 00 04 00 00 00 02 00 00 01 94 02 14 00 05 00 00
00000100 00 06 00 00 01 9c 87 69 00 04 00 00 00 01 00 00
00000110 01 e0 88 25 00 04 00 00 00 01 00 00 d4 d2 90 03
00000120 00 02 00 00 00 14 00 00 01 cc 92 16 00 01 00 00
00000130 00 04 01 00 00 00 00 00 00 00

The IFD goes as follows:
  1. The first short represents the number of Image File Directory (IFD) entries. Here it is shown as:
    00 19
    This means there are 0x0019 = 0x19 = 16*1+9 = 25 entries
  2. The next few bytes are a list of Image File Directory (IFD) entries. Each IFD entry is always exactly 12 bytes long, and contains information about the image (height, width, location of image data etc). This means, if we skip 12*25 bytes ahead, we should reach the end of the list of IFD entries.
  3. The last int (4 bytes) is the location of the next IFD. Files with only one image will have this zero but it is possible to have a sequence of images. It simply reads:
    00 00 00 00
Great, so we now know where the header is. How do we read the IFD entries?
Let's pick the first IFD from the data posted above (convince yourself you found it):
00 fe 00 04 00 00 00 01 00 00 00 01

The IFD entry has the following format:
  1. First short (2 bytes) is the tag identifier
  2. Next short is the type.
    1. A value of 1 or 2 means it's a byte in size (the latter being interpreted as an ASCII character)
    2. A value of 3 means it's a short in size
    3. A value of 4 means it's a int in size
    4. A value of 5 means it's a fraction of two int's, with the first being the numerator.
  3. Next int is the count occurences of this type
  4. Next int is either the data or a pointer to the location of the data. The rule is that if the size of the data can fit into this space (4 bytes), then this location will contain the data. It it does not fit, then this will be a pointer to the actual data. Note that type 5 will never fit into this location since it is 8 bytes in size.
Here is a table of the relevant tags:

tag idNameDescription
254NewsubFileType
256ImageWidthLength of image
257ImageLengthWidth of image
258BitsPerSampleNumber of bits per sample
259CompressionType of Compression
262PhotometricInterpretationType of image (grayscale or color)
271Make
272Model
273StripOffsetsLocation of each strip of data
274Orientation
277SamplesPerPixelNumber of samples per pixel
278RowsPerStripNumber of Rows (of image) per strip
279StripByteCountsTotal byte counts of strip
282XResolutionResolution of image (not relevant here)
283YResolutionResolution of image (not relevant here)
284PlanarConfiguration
296ResolutionUnit
305Software
306DateTime
532ReferenceBlackWhite
330SubIFDsNikon specific: Location of the IFD header for their images

That's that for the IFD entry. Let's look at the ones that are most relevant to us for now:
Image width (tag 0x100 = 256) and Image length (tag 0x101 = 257):
0000001000 01 00 00 00 01 01 00 00 04 00 00 00 01 00 00
0000002000 a0 01 01 00 04 00 00 00 01 00 00 00 78 01 02

The image width tag reads (in red):
0x0100 = 256; 0x0004: type 4 (int); 0x00000001 : 1 occurence; 0x00a0 = 160
From the same reasoning, one sees that the image length tag reads (in blue)
0x0078 = 120 as the image length. It turns out that the image being described is a little 120x 160 pixel TIFF image! In fact, Nikon saves a thumbnail of version of the actual image in TIFF format. This means that if you put a NEF file through a simple image reader, it would register is as a TIFF and display this crappy resolution image.

Anyway, let's just get the relevant data for the image for now, which is located in tags 0x111=273 (StripOffsets) and 0x116 = 278 (RowsPerStrip).
The first is the locations for the image data and the second is the number of rows located in each strip. It turns out there is only one StripOffset and the rows per strip (if there were more, then the count part of the tag would be greater than 1), located here:
0000006000 02 00 00 00 0a 00 00 01 58 01 11 00 04 00 00
00000070 00 01 00 00 d4 e4 01 12 00 03 00 00 00 01 00 08
00000080 00 00 01 15 00 03 00 00 00 01 00 03 00 00 01 16
00000090 00 04 00 00 00 01 00 00 00 78 01 17 00 04 00 00

You should be able to tell me that the answers for the StripOffset and RowsPerStrip are simply 0x0000d4e4 = 54500 and 0x00000078 = 120.
So the image is located at position 54500 in the image (position 0 being the beginning of the image).

How much data?

Before we can read it, we just need to know how big each data element is in the image (there are 120*160=19200 of them, but how many bits is each element?)
You can find this by looking for these two tags:
  1. 0x102 = 258 (BitsPerSample). The value will be 8 (look for it in the original header included above) which means that each sample is 8 bits, or 1 byte.
  2. 0x115 = 277 (Samples per Pixel). The value is 3. It turns out this is because each pixel contains a R, G, B byte color element, which the next tag should confirm
  3. 0x106 = 262 (PhotometricInterpretation). This gives the information of how to interpret the pixel data. A value of 0 or 1 means grayscale (one is the inverted version of the other) which means just one sample per pixel. A value of 2 means RGB, or each pixel will have three samples containing information on the amount of the three primary colors, red, green and blue there are (in that order). The value for this image is 2 as we expect.

Okay, so now let's read it. For this part, you need your favorite image reader or plotter. You should be reading in an array of 3 x 160 x 120 = 57600 pixels from offset 54500 in the image.

The thumbnail of the NEF
And voila, the result. I have used yorick for all my photo processing but I intend on re-writing it in python, which is a friendlier language.