The unified diff between revisions [6cecc3ad..] and [11a5f681..] is displayed below. It can also be downloaded as a raw diff.

This diff has been restricted to the following files: 'doc/internals.tex'

#
#
# rename "doc/misc/internals.tex"
#     to "doc/internals.tex"
#
# patch "doc/internals.tex"
#  from [5ded8f53ba8442a6179653e9b857fc5b4bc9efcc]
#    to [5f7a178b979db398b7466a420aadafac98945909]
#
============================================================
--- doc/misc/internals.tex	5ded8f53ba8442a6179653e9b857fc5b4bc9efcc
+++ doc/internals.tex	5f7a178b979db398b7466a420aadafac98945909
@@ -12,7 +12,7 @@

 \title{Botan Internals}
 \author{Jack Lloyd (lloyd@randombit.net)}
-\date{July 30, 2002}
+\date{August 20, 2006}

 \newcommand{\filename}[1]{\texttt{#1}}
 \newcommand{\manpage}[2]{\texttt{#1}(#2)}
@@ -32,90 +32,114 @@ \section{Introduction}
 \section{Introduction}

 This document is intended to document some of the trickier and/or more
-complicated parts of Botan. This is not going to be terribly useful if you
-just want to use the library, but for people wishing to understand how it
-works, or contribute new code to it, it will hopefully prove helpful.
+complicated parts of Botan. This is not going to be terribly useful if
+you just want to use the library, but for people wishing to understand
+how it works, or contribute new code to it, it will hopefully prove
+helpful.

-I've realized that a lot of things Botan does internally are pretty hard to
-understand, and that a lot of things are only inside my head, which is a bad
-place for them to be (things tend to get lost in there, not to mention the
-possibility that I'll get hit by a truck next week).
+I've realized that a lot of things Botan does internally are pretty
+hard to understand, and that a lot of things are only inside my head,
+which is a bad place for them to be (things tend to get lost in there,
+not to mention the possibility that I'll get hit by a truck next
+week).

-This document is currently very incomplete. I'll be working on it as I have
-time.
+This document is currently very incomplete. I'll be working on it as I
+have time.

 \pagebreak

 \section{Filter}

-Need something here.
+\type{Filter} is one of the core abstractions of the library. It is
+used to represent any sort of transformation. Nearly all
+\type{Filter}s are linear; they take input from a single source and
+send their output (if any) to another single \type{Filter}. The one
+exception is \type{Fanout\_Filter}, which uses friend access to
+\type{Filter} in order to allow for multiple \type{Filter}s to attach
+to its output. This special access is used by the Chain and Fork
+filters; Chain encapsulates one or more \type{Filter}s into a single
+Filter, and Fork sends its input to a set of several \type{Filter}
+objects.

+The majority of the relations between filters is maintained by the
+\type{Pipe} object which ``owns'' the \type{Filter}s.
+
 \section{Pipe}

-Pipe is, conceptually, a tree structure of Filter objects. There is a single
-unique top, and an arbitrary number of leaves (which are SecureQueue objects).
-SecureQueue is a simple Filter that buffers it's input.
+\type{Pipe} is, conceptually, a tree structure of \type{Filter}
+objects. There is a single unique top, and an arbitrary number of
+leaves (which are \type{SecureQueue} objects). \type{SecureQueue} is a
+simple \type{Filter} that buffers its input.

-Writing into the pipe writes into the top of the tree. The filter at the top
-of the tree writes it's output into the next Filter, and so on until eventually
-data trickles down into the bottommost Filters, where the data is stored for
-later retrieval.
+Writing into the pipe writes into the top of the tree. The filter at
+the top of the tree writes its output into the next \type{Filter}, and
+so on until eventually data trickles down into the bottommost
+\type{Filter}s, where the data is stored for later retrieval.

-When a new message is started, Pipe searches through the tree of Filters and
-finds places where the \arg{next} field of the Filter is NULL. This implies
-that it was the lowest layer of the Filter tree that the user added. It then
-adds SecureQueue objects onto these Filters. These queues are also stored in a
-\type{std::vector} called \arg{messages}. This is how the Pipe knows how to
-read from them later without doing a tree traversal every time.
+When a new message is started, \type{Pipe} searches through the tree
+of \type{Filter}s and finds places where the \arg{next} field of the
+\type{Filter} is NULL. This implies that it was the lowest layer of
+the \type{Filter} tree that the user added. It then adds
+\type{SecureQueue} objects onto these \type{Filter}s. These queues are
+also stored in an deque; this is so \type{Pipe} can read from them
+later without doing a tree traversal each time.

-Pipe will, if asked, destroy the existing tree structure, in order to create a
-new one. However, the queue objects are not deleted, because Pipe might need to
-read from them later.
+\type{Pipe} will, if asked, destroy the existing tree structure, in
+order to create a new one. However, the queue objects are not deleted,
+because \type{Pipe} might be asked to read from them later (while
+\type{Pipe} could delete all the messages in this case, the principle
+of least astonishment suggested keeping them).

-An optimization in future versions will involve deleting empty queues that we
-``know'' can't be written to, and then replace their field in \arg{messages}
-with NULL. On reading, Pipe will know that this means that the queue is empty,
-and act as if such a queue was really there. This is relatively minor, because
-in recent versions an empty queue only takes up a few dozen bytes (previous to
-0.8.4 or so, an empty queue still took up 4 kilobytes of memory).
+What I wrote about \type{Pipe} keeing the queues in a deque is a
+lie. Sort of. It keeps them in an object called
+\type{Output\_Buffers}, which keeps them in a
+deque. \type{Output\_Buffers} is intended to abstract away how message
+queues are stored from \type{Pipe}. After a queue has been added to
+the output buffers object, \type{Pipe} keeps no references to it
+whatsoever; all access is mediated by the \type{Output\_Buffers}.
+This allows queues which have been read to be deleted, rather than
+leaving empty queue objects all over the place.

 \section{Library Initialization}

-A lot of messy corner cases.
+WRITEME

 \section{Lookup Mechanism}

-Most objects know their name, and they know how to create a new copy of
-themselves. We build mapping tables that map from an algorithm name into a
-single instance of that algorithm. The tables themselves can be found in
-\filename{src/lookup.cpp}.
+Most objects know their name, and they know how to create a new copy
+of themselves. We build mapping tables that map from an algorithm name
+into a single instance of that algorithm. The tables themselves can be
+found in \filename{src/lookup.cpp}.

-There are a set of functions named \function{add\_algorithm} that can be used
-to populate the tables. We get something out of the table with
-\function{retrieve\_x}, where x is the name of a type (\texttt{block\_cipher},
-\texttt{hash}, etc). This returns a const pointer to the single unique instance
-of the algorithm that the lookup tables know about. If it doesn't know about
-it, it falls back on calling a function called
-\function{try\_to\_get\_x}. These functions live in
-\filename{src/algolist.cpp}. They are mostly used to handle algorithms which
-need (or at least can have) arguments passed to them, like \type{HMAC} and
-\type{SAFER\_SK}. It will return NULL if it can't find the algorithm at all.
+There are a set of functions named \function{add\_algorithm} that can
+be used to populate the tables. We get something out of the table with
+\function{retrieve\_x}, where x is the name of a type
+(\texttt{block\_cipher}, \texttt{hash}, etc). This returns a const
+pointer to the single unique instance of the algorithm that the lookup
+tables know about. If it doesn't know about it, it falls back on
+calling a function called \function{try\_to\_get\_x}. These functions
+live in \filename{src/algolist.cpp}. They are mostly used to handle
+algorithms which need (or at least can have) arguments passed to them,
+like \type{HMAC} and \type{SAFER\_SK}. It will return NULL if it can't
+find the algorithm at all.

-When it's asked for an algorithm it doesn't know about (ie, isn't in the
-mapping tables), the retrieval functions will ask the try-to-get functions if
-\emph{they} know about it. If they do, then the object returned will be stored
-into the table for later retrieval.
+When it's asked for an algorithm it doesn't know about (ie, isn't in
+the mapping tables), the retrieval functions will ask the try-to-get
+functions if \emph{they} know about it. If they do, then the object
+returned will be stored into the table for later retrieval.

-The functions \function{get\_x} call the retrieval functions. If we get back
-NULL, an exception is thrown. Otherwise it will call the \function{clone}
-method to get a new copy of the algorithm, which it returns.
+The functions \function{get\_x} call the retrieval functions. If we
+get back NULL, an exception is thrown. Otherwise it will call the
+\function{clone} method to get a new copy of the algorithm, which it
+returns.

-The various functions like \function{output\_length\_of} call the retrieval
-function for each type of object that the parameter in question (in this case,
-\texttt{OUTPUT\_LENGTH}) might be meaningful for. If it manages to get back an
-object, it will return (in this case) the \texttt{OUTPUT\_LENGTH} field of the
-object. No allocations are required to call this function: all of it's
-operations work directly on the copies living in the lookup tables.
+The various functions like \function{output\_length\_of} call the
+retrieval function for each type of object that the parameter in
+question (in this case, \texttt{OUTPUT\_LENGTH}) might be meaningful
+for. If it manages to get back an object, it will return (in this
+case) the \texttt{OUTPUT\_LENGTH} field of the object. No allocations
+are required to call this function: all of its operations work
+directly on the copies living in the lookup tables.

 \section{Allocators}

@@ -128,26 +152,28 @@ \section{PEM/BER Identification}
 \section{PEM/BER Identification}

 We have a specific algorithm for figuring out if something is PEM or
-BER. Previous versions (everything before 1.3.0) requried that the caller
-specify which one it was, and they had to be right. Now we use a hueristic
-(aka, an algorithm that sometimes doesn't work right) to figure it out. If the
-first character is not 0x30 (equal to ASCII '0'), then it can't possibly be BER
-(because everything we care about is enclosed in an ASN.1 SEQUENCE, which for
-BER/DER is encoded as beginning with 0x30). Roughly 99.9% of PEM blocks
-\emph{won't} have a random 0 character in front of them, so we are mostly safe
-(unless someone does it on purpose, in which case, please hit them for me).
-But to be sure, if there is a 0, then we search the first \emph{N} bytes of the
-block for the string ``-----BEGIN ``, which marks the typical start of a PEM
-block. The specific \emph{N} depends on the variable ``base/pem\_search'',
-which defaults to 4 kilobytes.
+BER. Previous versions (everything before 1.3.0) requried that the
+caller specify which one it was, and they had to be right. Now we use
+a hueristic (aka, an algorithm that sometimes doesn't work right) to
+figure it out. If the first character is not 0x30 (equal to ASCII
+'0'), then it can't possibly be BER (because everything we care about
+is enclosed in an ASN.1 SEQUENCE, which for BER/DER is encoded as
+beginning with 0x30). Roughly 99.9% of PEM blocks \emph{won't} have a
+random 0 character in front of them, so we are mostly safe (unless
+someone does it on purpose, in which case, please hit them for me).
+But to be sure, if there is a 0, then we search the first \emph{N}
+bytes of the block for the string ``-----BEGIN ``, which marks the
+typical start of a PEM block. The specific \emph{N} depends on the
+variable ``base/pem\_search'', which defaults to 4 kilobytes.

-So, you can actually fool it either way: that a PEM file is really BER, or that
-a BER file is actually PEM. To fool it that a BER file is PEM, just have the
-string ``-----BEGIN `` somewhere (I can't imagine this string shows up in
-certificates or CRLs too often, so if it is there it means somebody is being a
-jerk). If a file starts with 0 and has at least ``base/pem\_search'' byte more
-junk in the way, it won't notice that it's PEM at all. In either case, of
-course, the loading will fail, and you'll get a nice exception saying that the
-decoding failed.
+So, you can actually fool it either way: that a PEM file is really
+BER, or that a BER file is actually PEM. To fool it that a BER file is
+PEM, just have the string ``-----BEGIN `` somewhere (I can't imagine
+this string shows up in certificates or CRLs too often, so if it is
+there it means somebody is being a jerk). If a file starts with 0 and
+has at least ``base/pem\_search'' byte more junk in the way, it won't
+notice that its PEM at all. In either case, of course, the loading
+will fail, and you'll get a nice exception saying that the decoding
+failed.

 \end{document}