I selected the photos to be in the RAW image format. I thought I would log some of the details about this image format as I slowly unravel how to decode it. This is meant to be an educational process as all of this has already been done.
All the understanding can be done using a simple hex editor. After this, use your favourite programming language to read in the data.
This first post will simply be about TIFF files and completely ignore the rest of the image data for the Nikon format.
Preliminary Header
It turns out the image, like most RAW images, follows a TIFF file format. The format is as follows (in hex format, linux command "hexdump -C myimage.NEF | less"):
00000000 | 4d 4d 00 2a 00 00 00 08 00 19 00 fe 00 04 00 00 | |MM.*............| |
This is the first line of code that I see. Now, let me break it up into what it represents:
- First 2 bytes : 0x4d4d. The ASCII (ascii being a lookup table from 1 byte to a character) representation is MM. A TIFF file will actually start with only one of two possible combinations: MM or II. This refers to the endianness of the file. MM refers to big endian and II refers to little endian.
Computers read things in byte chunks (8 bits). However, the byte order is not specifically determined. Big endian assumes that the largest powers are first, while little endian assumes the opposite. Basically, a two-byte number positioned in memory:
43 F2
would be read as 0x43F2 in big endian and 0xF243 in little endian, and so forth... Note that you need to give a bye count and endianess to read in a number. Having the following string of numbers:
04 5A C7 B3
will yield different values as a 2-byte little endian (0x045A), 2-byte big endian (0x5A04), 4-byte little endian (0x045AC7B3) and 4-byte big endian (0xB3C75A04).
So Nikon RAW images tend to be written in big endian with a few small exceptions. - Next two bytes: the Magic number 42. Every TIFF file will always have the 3rd and 4th byte yield the magic number 42. What is 42 in hexadecimal? Well, it's 0x2a. The data we see reads:
00 2a
In little endian for 2-bytes, that gives 0x002a = 0x2a = 16*2 + 10 = 42 - The next four bytes: the position of the first IFD. The next four bytes give the position of what is called the Image File Directory. This will be explained later but is basically a header for the image in TIFF files. We can read that our next image file directory is in position:
0x00000008=0x8 = 8
This turns out to actually just be the next position in memory (as it is *most* of the time).
Okay, so from now on, for notational brevity, I will call a char a 1-byte number, a short a 2-byte number, an int a 4 byte number and a long a 8-byte number.
The Image File Directory (IFD)
Allright, so we understand how to tell if a file is a TIFF file. We have a description of endianness, a magic number and finally, the position of the header of the image. Now, what is this header?
Basically, the header is simple, let's go through one:
00000000 | 4d 4d 00 2a 00 00 00 08 00 19 00 fe 00 04 00 00 |
00000010 | 00 01 00 00 00 01 01 00 00 04 00 00 00 01 00 00 |
00000020 | 00 a0 01 01 00 04 00 00 00 01 00 00 00 78 01 02 |
00000030 | 00 03 00 00 00 03 00 00 01 3c 01 03 00 03 00 00 |
00000040 | 00 01 00 01 00 00 01 06 00 03 00 00 00 01 00 02 |
00000050 | 00 00 01 0f 00 02 00 00 00 12 00 00 01 44 01 10 |
00000060 | 00 02 00 00 00 0a 00 00 01 58 01 11 00 04 00 00 |
00000070 | 00 01 00 00 d4 e4 01 12 00 03 00 00 00 01 00 08 |
00000080 | 00 00 01 15 00 03 00 00 00 01 00 03 00 00 01 16 |
00000090 | 00 04 00 00 00 01 00 00 00 78 01 17 00 04 00 00 |
000000a0 | 00 01 00 00 e1 00 01 1a 00 05 00 00 00 01 00 00 |
000000b0 | 01 64 01 1b 00 05 00 00 00 01 00 00 01 6c 01 1c |
000000c0 | 00 03 00 00 00 01 00 01 00 00 01 28 00 03 00 00 |
000000d0 | 00 01 00 02 00 00 01 31 00 02 00 00 00 0a 00 00 |
000000e0 | 01 74 01 32 00 02 00 00 00 14 00 00 01 80 01 4a |
000000f0 | 00 04 00 00 00 02 00 00 01 94 02 14 00 05 00 00 |
00000100 | 00 06 00 00 01 9c 87 69 00 04 00 00 00 01 00 00 |
00000110 | 01 e0 88 25 00 04 00 00 00 01 00 00 d4 d2 90 03 |
00000120 | 00 02 00 00 00 14 00 00 01 cc 92 16 00 01 00 00 |
00000130 | 00 04 01 00 00 00 00 00 00 00 |
The IFD goes as follows:
- The first short represents the number of Image File Directory (IFD) entries. Here it is shown as:
00 19
This means there are 0x0019 = 0x19 = 16*1+9 = 25 entries - The next few bytes are a list of Image File Directory (IFD) entries. Each IFD entry is always exactly 12 bytes long, and contains information about the image (height, width, location of image data etc). This means, if we skip 12*25 bytes ahead, we should reach the end of the list of IFD entries.
- The last int (4 bytes) is the location of the next IFD. Files with only one image will have this zero but it is possible to have a sequence of images. It simply reads:
00 00 00 00
Let's pick the first IFD from the data posted above (convince yourself you found it):
00 fe 00 04 00 00 00 01 00 00 00 01
The IFD entry has the following format:
- First short (2 bytes) is the tag identifier
- Next short is the type.
- A value of 1 or 2 means it's a byte in size (the latter being interpreted as an ASCII character)
- A value of 3 means it's a short in size
- A value of 4 means it's a int in size
- A value of 5 means it's a fraction of two int's, with the first being the numerator.
- Next int is the count occurences of this type
- Next int is either the data or a pointer to the location of the data. The rule is that if the size of the data can fit into this space (4 bytes), then this location will contain the data. It it does not fit, then this will be a pointer to the actual data. Note that type 5 will never fit into this location since it is 8 bytes in size.
tag id | Name | Description |
254 | NewsubFileType | |
256 | ImageWidth | Length of image |
257 | ImageLength | Width of image |
258 | BitsPerSample | Number of bits per sample |
259 | Compression | Type of Compression |
262 | PhotometricInterpretation | Type of image (grayscale or color) |
271 | Make | |
272 | Model | |
273 | StripOffsets | Location of each strip of data |
274 | Orientation | |
277 | SamplesPerPixel | Number of samples per pixel |
278 | RowsPerStrip | Number of Rows (of image) per strip |
279 | StripByteCounts | Total byte counts of strip |
282 | XResolution | Resolution of image (not relevant here) |
283 | YResolution | Resolution of image (not relevant here) |
284 | PlanarConfiguration | |
296 | ResolutionUnit | |
305 | Software | |
306 | DateTime | |
532 | ReferenceBlackWhite | |
330 | SubIFDs | Nikon specific: Location of the IFD header for their images |
That's that for the IFD entry. Let's look at the ones that are most relevant to us for now:
Image width (tag 0x100 = 256) and Image length (tag 0x101 = 257):
00000010 | 00 01 00 00 00 01 01 00 00 04 00 00 00 01 00 00 |
00000020 | 00 a0 01 01 00 04 00 00 00 01 00 00 00 78 01 02 |
The image width tag reads (in red):
0x0100 = 256; 0x0004: type 4 (int); 0x00000001 : 1 occurence; 0x00a0 = 160
From the same reasoning, one sees that the image length tag reads (in blue)
0x0078 = 120 as the image length. It turns out that the image being described is a little 120x 160 pixel TIFF image! In fact, Nikon saves a thumbnail of version of the actual image in TIFF format. This means that if you put a NEF file through a simple image reader, it would register is as a TIFF and display this crappy resolution image.
Anyway, let's just get the relevant data for the image for now, which is located in tags 0x111=273 (StripOffsets) and 0x116 = 278 (RowsPerStrip).
The first is the locations for the image data and the second is the number of rows located in each strip. It turns out there is only one StripOffset and the rows per strip (if there were more, then the count part of the tag would be greater than 1), located here:
00000060 | 00 02 00 00 00 0a 00 00 01 58 01 11 00 04 00 00 |
00000070 | 00 01 00 00 d4 e4 01 12 00 03 00 00 00 01 00 08 |
00000080 | 00 00 01 15 00 03 00 00 00 01 00 03 00 00 01 16 |
00000090 | 00 04 00 00 00 01 00 00 00 78 01 17 00 04 00 00 |
You should be able to tell me that the answers for the StripOffset and RowsPerStrip are simply 0x0000d4e4 = 54500 and 0x00000078 = 120.
So the image is located at position 54500 in the image (position 0 being the beginning of the image).
How much data?
Before we can read it, we just need to know how big each data element is in the image (there are 120*160=19200 of them, but how many bits is each element?)You can find this by looking for these two tags:
- 0x102 = 258 (BitsPerSample). The value will be 8 (look for it in the original header included above) which means that each sample is 8 bits, or 1 byte.
- 0x115 = 277 (Samples per Pixel). The value is 3. It turns out this is because each pixel contains a R, G, B byte color element, which the next tag should confirm
- 0x106 = 262 (PhotometricInterpretation). This gives the information of how to interpret the pixel data. A value of 0 or 1 means grayscale (one is the inverted version of the other) which means just one sample per pixel. A value of 2 means RGB, or each pixel will have three samples containing information on the amount of the three primary colors, red, green and blue there are (in that order). The value for this image is 2 as we expect.
Okay, so now let's read it. For this part, you need your favorite image reader or plotter. You should be reading in an array of 3 x 160 x 120 = 57600 pixels from offset 54500 in the image.
The thumbnail of the NEF |
And voila, the result. I have used yorick for all my photo processing but I intend on re-writing it in python, which is a friendlier language.