Most of all that I have learned about PDFs are from the above reference. If you are really interested take time to read it. Surprisingly, it is easy and interesting to read! Pages vi to viii in the PDF32000_2008. If you were to open a PDF file in a text editor like Notepad, the contents may look like junk and probably not very interesting.

But it will make a little more sense once you understand that PDF files follow a set pattern. This PDF Imaging Model enables the description of text and graphics in a device-independent and resolution-independent manner. Unlike Postscript, which is a programming language, PDF is based on a structured binary file format that is optimized for high performance in interactive viewing. PDF also includes objects, such as annotations and hypertext links, that are not part of the page content itself but are useful for interactive viewing and document interchange.

The basic building blocks of a PDF files are objects. There are eight types of objects that are used in PDF files. Before we look at them we will briefly look at the character set of PDFs. Null, Horizontal tab, Line feed, Form feed, Carriage return and Space. White-space characters separate names and other objects from each other.

Interestingly, PDF treats all white-space characters outside a comment, string or stream the same. What this means is that you may have 5 spaces but in reality it is considered as one. Note that this does not apply to white-space characters within strings, streams and comments. EOL markers play a very important role in showing where a new line starts. Carriage return followed immediately by a Line feed is considered as one EOL marker. These are used in the objects we would look at later.