Astro DIY: May 2014

I took some pictures with a Nikon D90.

I selected the photos to be in the RAW image format. I thought I would log some of the details about this image format as I slowly unravel how to decode it. This is meant to be an educational process as all of this has already been done.
All the understanding can be done using a simple hex editor. After this, use your favourite programming language to read in the data.
This first post will simply be about TIFF files and completely ignore the rest of the image data for the Nikon format.

Preliminary Header

It turns out the image, like most RAW images, follows a TIFF file format. The format is as follows (in hex format, linux command "hexdump -C myimage.NEF | less"):

byte pos	data	ASCII chars
00000000	4d 4d 00 2a 00 00 00 08 00 19 00 fe 00 04 00 00	\|MM.*............\|

This is the first line of code that I see. Now, let me break it up into what it represents:

First 2 bytes : 0x4d4d. The ASCII (ascii being a lookup table from 1 byte to a character) representation is MM. A TIFF file will actually start with only one of two possible combinations: MM or II. This refers to the endianness of the file. MM refers to big endian and II refers to little endian.

Computers read things in byte chunks (8 bits). However, the byte order is not specifically determined. Big endian assumes that the largest powers are first, while little endian assumes the opposite. Basically, a two-byte number positioned in memory:
43 F2
would be read as 0x43F2 in big endian and 0xF243 in little endian, and so forth... Note that you need to give a bye count and endianess to read in a number. Having the following string of numbers:
04 5A C7 B3
will yield different values as a 2-byte little endian (0x045A), 2-byte big endian (0x5A04), 4-byte little endian (0x045AC7B3) and 4-byte big endian (0xB3C75A04).
So Nikon RAW images tend to be written in big endian with a few small exceptions.
Next two bytes: the Magic number 42. Every TIFF file will always have the 3rd and 4th byte yield the magic number 42. What is 42 in hexadecimal? Well, it's 0x2a. The data we see reads:
00 2a
In little endian for 2-bytes, that gives 0x002a = 0x2a = 16*2 + 10 = 42
The next four bytes: the position of the first IFD. The next four bytes give the position of what is called the Image File Directory. This will be explained later but is basically a header for the image in TIFF files. We can read that our next image file directory is in position:
0x00000008=0x8 = 8
This turns out to actually just be the next position in memory (as it is *most* of the time).

Okay, so from now on, for notational brevity, I will call a char a 1-byte number, a short a 2-byte number, an int a 4 byte number and a long a 8-byte number.

The Image File Directory (IFD)

Allright, so we understand how to tell if a file is a TIFF file. We have a description of endianness, a magic number and finally, the position of the header of the image. Now, what is this header?

Basically, the header is simple, let's go through one:

00000000	4d 4d 00 2a 00 00 00 08 00 19 00 fe 00 04 00 00
00000010	00 01 00 00 00 01 01 00 00 04 00 00 00 01 00 00
00000020	00 a0 01 01 00 04 00 00 00 01 00 00 00 78 01 02
00000030	00 03 00 00 00 03 00 00 01 3c 01 03 00 03 00 00
00000040	00 01 00 01 00 00 01 06 00 03 00 00 00 01 00 02
00000050	00 00 01 0f 00 02 00 00 00 12 00 00 01 44 01 10
00000060	00 02 00 00 00 0a 00 00 01 58 01 11 00 04 00 00
00000070	00 01 00 00 d4 e4 01 12 00 03 00 00 00 01 00 08
00000080	00 00 01 15 00 03 00 00 00 01 00 03 00 00 01 16
00000090	00 04 00 00 00 01 00 00 00 78 01 17 00 04 00 00
000000a0	00 01 00 00 e1 00 01 1a 00 05 00 00 00 01 00 00
000000b0	01 64 01 1b 00 05 00 00 00 01 00 00 01 6c 01 1c
000000c0	00 03 00 00 00 01 00 01 00 00 01 28 00 03 00 00
000000d0	00 01 00 02 00 00 01 31 00 02 00 00 00 0a 00 00
000000e0	01 74 01 32 00 02 00 00 00 14 00 00 01 80 01 4a
000000f0	00 04 00 00 00 02 00 00 01 94 02 14 00 05 00 00
00000100	00 06 00 00 01 9c 87 69 00 04 00 00 00 01 00 00
00000110	01 e0 88 25 00 04 00 00 00 01 00 00 d4 d2 90 03
00000120	00 02 00 00 00 14 00 00 01 cc 92 16 00 01 00 00
00000130	00 04 01 00 00 00 00 00 00 00

The IFD goes as follows:

The first short represents the number of Image File Directory (IFD) entries. Here it is shown as:
00 19
This means there are 0x0019 = 0x19 = 16*1+9 = 25 entries
The next few bytes are a list of Image File Directory (IFD) entries. Each IFD entry is always exactly 12 bytes long, and contains information about the image (height, width, location of image data etc). This means, if we skip 12*25 bytes ahead, we should reach the end of the list of IFD entries.
The last int (4 bytes) is the location of the next IFD. Files with only one image will have this zero but it is possible to have a sequence of images. It simply reads:
00 00 00 00

Great, so we now know where the header is. How do we read the IFD entries?
Let's pick the first IFD from the data posted above (convince yourself you found it):
00 fe 00 04 00 00 00 01 00 00 00 01

The IFD entry has the following format:

First short (2 bytes) is the tag identifier
Next short is the type.

A value of 1 or 2 means it's a byte in size (the latter being interpreted as an ASCII character)
A value of 3 means it's a short in size
A value of 4 means it's a int in size
A value of 5 means it's a fraction of two int's, with the first being the numerator.

Next int is the count occurences of this type
Next int is either the data or a pointer to the location of the data. The rule is that if the size of the data can fit into this space (4 bytes), then this location will contain the data. It it does not fit, then this will be a pointer to the actual data. Note that type 5 will never fit into this location since it is 8 bytes in size.

Here is a table of the relevant tags:

tag id	Name	Description
254	NewsubFileType
256	ImageWidth	Length of image
257	ImageLength	Width of image
258	BitsPerSample	Number of bits per sample
259	Compression	Type of Compression
262	PhotometricInterpretation	Type of image (grayscale or color)
271	Make
272	Model
273	StripOffsets	Location of each strip of data
274	Orientation
277	SamplesPerPixel	Number of samples per pixel
278	RowsPerStrip	Number of Rows (of image) per strip
279	StripByteCounts	Total byte counts of strip
282	XResolution	Resolution of image (not relevant here)
283	YResolution	Resolution of image (not relevant here)
284	PlanarConfiguration
296	ResolutionUnit
305	Software
306	DateTime
532	ReferenceBlackWhite
330	SubIFDs	Nikon specific: Location of the IFD header for their images

That's that for the IFD entry. Let's look at the ones that are most relevant to us for now:
Image width (tag 0x100 = 256) and Image length (tag 0x101 = 257):

00000010	00 01 00 00 00 01 01 00 00 04 00 00 00 01 00 00
00000020	00 a0 01 01 00 04 00 00 00 01 00 00 00 78 01 02

The image width tag reads (in red):
0x0100 = 256; 0x0004: type 4 (int); 0x00000001 : 1 occurence; 0x00a0 = 160
From the same reasoning, one sees that the image length tag reads (in blue)
0x0078 = 120 as the image length. It turns out that the image being described is a little 120x 160 pixel TIFF image! In fact, Nikon saves a thumbnail of version of the actual image in TIFF format. This means that if you put a NEF file through a simple image reader, it would register is as a TIFF and display this crappy resolution image.

Anyway, let's just get the relevant data for the image for now, which is located in tags 0x111=273 (StripOffsets) and 0x116 = 278 (RowsPerStrip).
The first is the locations for the image data and the second is the number of rows located in each strip. It turns out there is only one StripOffset and the rows per strip (if there were more, then the count part of the tag would be greater than 1), located here:

00000060	00 02 00 00 00 0a 00 00 01 58 01 11 00 04 00 00
00000070	00 01 00 00 d4 e4 01 12 00 03 00 00 00 01 00 08
00000080	00 00 01 15 00 03 00 00 00 01 00 03 00 00 01 16
00000090	00 04 00 00 00 01 00 00 00 78 01 17 00 04 00 00

You should be able to tell me that the answers for the StripOffset and RowsPerStrip are simply 0x0000d4e4 = 54500 and 0x00000078 = 120.
So the image is located at position 54500 in the image (position 0 being the beginning of the image).

How much data?

Before we can read it, we just need to know how big each data element is in the image (there are 120*160=19200 of them, but how many bits is each element?)
You can find this by looking for these two tags:

0x102 = 258 (BitsPerSample). The value will be 8 (look for it in the original header included above) which means that each sample is 8 bits, or 1 byte.
0x115 = 277 (Samples per Pixel). The value is 3. It turns out this is because each pixel contains a R, G, B byte color element, which the next tag should confirm
0x106 = 262 (PhotometricInterpretation). This gives the information of how to interpret the pixel data. A value of 0 or 1 means grayscale (one is the inverted version of the other) which means just one sample per pixel. A value of 2 means RGB, or each pixel will have three samples containing information on the amount of the three primary colors, red, green and blue there are (in that order). The value for this image is 2 as we expect.

Okay, so now let's read it. For this part, you need your favorite image reader or plotter. You should be reading in an array of 3 x 160 x 120 = 57600 pixels from offset 54500 in the image.

The thumbnail of the NEF

And voila, the result. I have used yorick for all my photo processing but I intend on re-writing it in python, which is a friendlier language.

Astro DIY

Pages

Friday, May 23, 2014

Astrophotography - Unraveling the Nikon RAW image format

Preliminary Header

The Image File Directory (IFD)

How much data?