Below is the file 'doc/api.tex' from this revision. You can also download the file.

\documentclass{article}

\setlength{\textwidth}{6.5in}
\setlength{\textheight}{9in}

\setlength{\headheight}{0in}
\setlength{\topmargin}{0in}
\setlength{\headsep}{0in}

\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}

\title{\textbf{Botan API Reference}}
\author{}
\date{2007/03/03}

\newcommand{\filename}[1]{\texttt{#1}}
\newcommand{\manpage}[2]{\texttt{#1}(#2)}

\newcommand{\macro}[1]{\texttt{#1}}

\newcommand{\function}[1]{\textbf{#1}}
\newcommand{\keyword}[1]{\texttt{#1}}
\newcommand{\type}[1]{\texttt{#1}}
\renewcommand{\arg}[1]{\textsl{#1}}
\newcommand{\namespace}[1]{\texttt{#1}}

\newcommand{\url}[1]{\texttt{#1}}

\newcommand{\ie}[0]{\emph{i.e.}}
\newcommand{\eg}[0]{\emph{e.g.}}

\begin{document}

\maketitle

\tableofcontents

\parskip=5pt

\pagebreak
\section{Introduction}

Botan is a C++ library which attempts to provide the most common cryptographic
algorithms and operations in an easy to use and portable package. Currently it
runs on a wide variety of systems, using numerous different compilers and on
many different CPU architectures.

The base library is written in ISO C++, so it can be ported with
minimal fuss, but Botan also supports a modules system. This system
exposes system dependent code to the library through portable
interfaces, extending the set of services available to users.

\subsection{Targets}

Botan's primary targets (system-wise) are 32 and 64-bit systems with
at least a few megabytes of memory. Generally, given the choice
between optimizing for 32-bit systems and 64-bit systems, Botan is
written to prefer 64-bit, simply on the theory that where performance
is a real concern, modern 64-bit processors are the obvious
choice. And also because two of the three machines owned by the
primary developer have 64-bit CPUs. But performance on 32 bit systems
is also quite good.

Today smaller systems, such as handhelds, set-top boxes, and the
bigger smart phones and smart cards, are also capable of using
Botan. However, Botan uses a fairly large amount of code space (up to
several megabytes, depending upon the compiler and options used),
which could be prohibitive in some systems. Usage of RAM is fairly
modest, usually under 64K.

Botan's design makes it quite easy to remove unused algorithms in such a way
that applications do not need to be recompiled to work, even applications that
use the algorithms in question. They can simply ask Botan if the algorithm
exists, and if Botan says yes, ask the library to give them such an object for
that algorithm.

\subsection{Why Botan?}

Botan may be the perfect choice for your application. Or it might be a
terribly bad idea. This section is basically to make it clear what Botan is
and is not.

First, let's cover the major strengths:

\begin{list}{$\cdot$}
  \item Support is (usually) quickly available on the project mailing lists.
        Commercial support licenses are available for those that desire them.

  \item
  \item Is written in a (fairly) clean object-oriented style, and the usual
        API works in terms of reasonably high-level abstractions.

  \item Supports a huge variety of algorithms, including most of the major
        public key algorithms and standards (such as IEEE 1363, PKCS, and
        X.509v3).

  \item Supports a name-based lookup scheme, so you can get ahold of any
        algorithm on the fly.

  \item You can easily extend much of the system at application compile time or
        at run time.

  \item Works well with a wide variety of compilers, operating systems, and
        CPUs, and more all the time.

  \item Is the only open source crypto library (that I know of) that has
        support for memory allocation techniques that prevent an attacker from
        reading swap in an attempt to gain access to keys or other secrets. In
        fact several different such methods are supported, depending on the
        system (two methods for Unix, another for Windows).

  \item Has (optional) support for Zlib and Bzip2 compression/decompression
        integrated completely into the system -- it only takes a line or two of
        code to add compression to your application.
\end{list}

\noindent
And the major downsides and deficiencies are:

\begin{list}{$\cdot$}
  \item It's written in C++. If your application isn't, Botan is probably
        going to be more pain than it's worth.
  \item

  \item Botan doesn't directly support higher-level protocols and
        formats like SSL or OpenPGP. SSH support is available from a
        third-party, and there is an alpha-level SSL/TLS library
        currently available.

  \item Doesn't support elliptic curve algorithms; ECDSA support is planned at
        some point, but demand seems quite low.

  \item Doesn't currently support any very high level 'envelope' style
        processing - support for this will probably be added once support for
        CMS is available, so code using the high level interface will produce
        data readable by many other libraries.
\end{list}

\pagebreak
\section{Getting Started}

\subsection{Basic Conventions}

With a very small number of exceptions, declarations in the library
are contained within the namespace \namespace{Botan}. Botan declares
several typedef'ed types to help buffer it against changes in machine
architecture.  These types are used extensively in the interface, and
thus it would be often be convenient to use them without the
\namespace{Botan} prefix. You can do so by \keyword{using} the
namespace \namespace{Botan\_types} (this way you can use the type
names without the namespace prefix, but the remainder of the library
stays out of the global namespace). The included types are \type{byte}
and \type{u32bit}, which are unsigned integer types.

The headers for Botan are usually available in the form
\filename{botan/headername.h}. For brevity in this documentation,
headers are always just called \filename{headername.h}, but they
should be used with the \filename{botan/} prefix in your actual code.

\subsection{Initializing the Library}

There are a set of core services which the library needs access to
while it is performing requests. To ensure these are set up, you must
create a \type{LibraryInitializer} object (using called 'init' in
Botan example code; 'botan\_library' or 'botan\_init' make more sense
in real code) prior to making any calls to Botan. This object's
lifetime must exceed that of all other Botan objects your application
creates; for this reason the best place to create the
\type{LibraryInitializer} is at the start of your \function{main}
function, since this guarantees that it will be created first and
destroyed last. The initializer does things like initializing the
memory allocation system, setting up the algorithm lookup tables,
finding out if there is a high resolution timer available to use, and
similar such matters. With no arguments, the library is initialized
with various default settings. So 99\% of the time, all you need is

\texttt{Botan::LibraryInitializer init;}

at the start of your \texttt{main}. If you're not doing anything
exotic, then you can safely skip the rest of this section.

The constructor takes an instance of another object, called
\type{InitializerOptions}, which specifies the settings of various
options. Normally you can ignore this and simply pass a human readable
string, which the \type{InitializerOptions} constructor will parse. An
empty string signifies using defaults; any options not specifically
mentioned in the initialization string also assume the compiled in
default.
If more than one option is used, they should be separated by a
space. Boolean arguments (all except for the ``config'' option) can
take an argument of ``true'' (or ``yes'') or ``false'' (or ``no'') to
explicitly turn them on or off. Simply giving the name of the option
without any argument signifies that the option should be toggled on.

\newcommand{\option}[1]{\noindent \textbf{Option ``#1''}}

\option{thread\_safe}: The library should use mutexes for guarding
access to shared resources, such as the memory allocation system. If you pass
the ``thread\_safe'' option, and the initializer can't find a useful mutex
module, it will throw an exception. Botan seems to work in threaded programs,
but it hasn't been tested thoroughly, and problems may remain. Note that Botan
is not thread safe at the object level; any objects shared between threads need
explicit locking.

\option{secure\_memory}: Try to create a more secure allocator type --
one that either locks allocated memory into RAM, or that memory maps a
disk file that it erases after use. If both are available, it will
prefer the memory mapping mechanism, because locking memory requires
privileges on many systems.

On systems that don't (currently) have any specialized allocators, like
MS Windows, this option is ignored.

\option{use\_engines}: Use any available ``engine'' modules to speed
up processing. Currently Botan has support for engines based on the
AEP1000/AEP2000 crypto hardware cards, GNU MP, and OpenSSL's BN
library. Further support for crypto acceleration hardware will be
added in future releases.

\option{fips140}: This option, in theory, toggles Botan into FIPS 140
mode. Please note that Botan \emph{has not} been FIPS 140 validated at
this time, and that a number of changes will be necessary before such
a validation could occur. Do not use this option.

\option{selftest}: Run some basic self tests during startup.
Specifically this runs a set of tests for DES, TripleDES, AES,
CMAC(AES), SHA-1, HMAC(SHA-1), SHA-256, and HMAC(SHA-256). This option
is enabled by default.

\option{seed\_rng}: Attempt to seed the global PRNGs at
startup. This option is toggled on by default, and can be disabled by passing
``seed\_rng=false''. This is primarily useful when you know that the built-in
library entropy sources will not work, and you are providing you own entropy
source(s) later on.

If you do not create a \type{LibraryInitializer} object, pretty much
any Botan operation will fail, because it will be unable to do basic
things like allocate memory or get random bits. Note too, that you
should be careful to only create one such object.

It is not strictly necessary to create a \type{LibraryInitializer};
the actual code performing the initialization and shutdown are in
static member functions of \type{LibraryInitializer}, called
\function{initialize} and \function{deinitialize}. A
\type{LibraryInitializer} merely provides a convenient RAII wrapper
for the operations (and thus for the internal library state as well).

\subsection{Gotchas}

There are a few things to watch out for to prevent problems when using Botan.

Never allocate any kind of Botan object globally. The problem with doing this
is that the constructor for such an object will be called before the library is
initialized. Many Botan objects will, in their constructor, make one or more
calls into the library global state object. Access to this object is checked,
so an exception should be thrown (rather than a memory access violation or
undetected uninitialized object access). A rough equivalent which will work is
to keep a global pointer to the object, initializing it after creating your
\type{LibraryInitializer}. Merely making the \type{LibraryInitializer} also
global will probably not help, because C++ does not make very strong guarantees
about the order that such objects will be created.

The same rule applies for making sure the destructors of all your Botan objects
are called before the \type{LibraryInitializer} is destroyed. This implies you
can't have static variables that are Botan objects inside functions or classes
(since in most C++ runtimes, these objects will be destroyed after main has
returned). This is inelegant, but seems to not cause many problems in practice.

Botan's memory object classes (\type{MemoryVector},
\type{SecureVector}, \type{SecureBuffer}) are extremely primitive, and
do not meet the requirements for an STL container object. After Botan
starts adopting C++0x features, they will be replaced by typedefs of
\type{std::vector} with a custom allocator.

Prefer using the factory methods to creating objects directly on the
stack. This helps insulate your code against changes in the
implementation, and using a late binding allows your code to access
faster implementations (hardware or faster software) that might be
detected as available at runtime.

Use a \function{try}/\function{catch} block inside your
\function{main} function, and catch any \type{std::exception} throws
(remember to catch by reference, as \type{std::exception}'s
\function{what} method is polymorphic). This is not strictly required,
but if you don't, and Botan throws an exception, the runtime will call
\function{std::terminate}, which usually calls \function{abort} or
something like it, leaving you (or worse, a user of your application)
wondering what went wrong.

\subsection{Information Flow: Pipes and Filters}

Many common uses of cryptography involve processing one or more
streams of data (be it from sockets, files, or a hardware device).
Botan provides services which make setting up data flows through
various operations, such as compression, encryption, and base64
encoding. Each of these operations is implemented in what are called
\emph{filters} in Botan. A set of filters are created and placed into
a \emph{pipe}, and information ``flows'' through the pipe until it
reaches the end, where the output is collected for retrieval. If
you're familiar with the Unix shell environment, this design will
sound quite familiar.

Here is an example which uses a pipe to base64 encode some strings:

\begin{verbatim}
  Pipe pipe(new Base64_Encoder); // pipe owns the pointer
  pipe.start_msg();
  pipe.write(``message 1'');
  pipe.end_msg(); // flushes buffers, increments message number

  // process_msg(x) is start_msg() && write(x) && end_msg()
  pipe.process_msg(``message2'');

  std::string m1 = pipe.read_all_as_string(0); // ``message1''
  std::string m2 = pipe.read_all_as_string(1); // ``message2''
\end{verbatim}

Bytestreams in the pipe are grouped into messages; blocks of data that
are processed in an identical fashion (\ie, with the same sequence of
\type{Filter}s). Messages are delimited by calls to
\function{start\_msg} and \function{end\_msg}. Each message in a pipe
has its own number, which increments starting from zero.

As you can see, the \type{Base64\_Encoder} was allocated using
\keyword{new}; but where was it deallocated? When a filter object is
passed to a \type{Pipe}, the pipe takes ownership of the object, and
will deallocate it when it is no longer needed.

There are two different ways to make use of messages. One is to send
several messages through a \type{Pipe} without changing the \type{Pipe}'s
configuration, so you end up with a sequence of messages; one use of this would
be to send a sequence of identically encrypted UDP packets, for example (note
that the \emph{data} need not be identical; it is just that each is encrypted,
encoded, signed, etc in an identical fashion). Another is to change the filters
that are used in the \type{Pipe} between each message, by adding or removing
\type{Filter}s; functions that let you do this are documented in the Pipe API
section.

Most operations in Botan have a corresponding filter for use in Pipe.
Here's code that encrypts a string with AES-128 in CBC mode:

\begin{verbatim}
  SymmetricKey key(16); // a random 128-bit key
  InitializationVector iv(16); // a random 128-bit IV

  // Notice the algorithm we want is specified by a string
  Pipe pipe(get_cipher(``AES-128/CBC'', key, iv, ENCRYPTION));

  pipe.process_msg(``secrets'');
  pipe.process_msg(``more secrets'');

  MemoryVector<byte> c1 = pipe.read_all(0);

  byte c2[4096] = { 0 };
  u32bit got_out = pipe.read(c2, sizeof(c2), 1);
  // use c2[0...got_out]
\end{verbatim}

\type{Pipe} also has convenience methods for dealing with
\type{std::iostream}s. Here is an example of those, using
the \type{Bzip\_Compression} filter (included as a module;
if you have bzlib available, check \filename{building.pdf}
for how to enable it) to compress a file:

\begin{verbatim}
  std::ifstream in(``data.bin'', std::ios::binary)
  std::ofstream out(``data.bin.bz2'', std::ios::binary)

  Pipe pipe(new Bzip_Compression);

  pipe.start_msg();
  in >> pipe;
  pipe.end_msg();
  out << pipe;
\end{verbatim}

However there is a hitch to the code above; the complete contents of
the compressed data will be held in memory until the entire message
has been compressed, at which time the statement \verb|out << pipe| is
executed, and the data is freed as it is read from the pipe and
written to the file. But if the file is very large, we might not have
enough physical memory (or even enough virtual memory!) for that to be
practical. So instead of storing the compressed data in the pipe for
reading it out later, we divert it directly to the file:

\begin{verbatim}
  std::ifstream in(``data.bin'', std::ios::binary)
  std::ofstream out(``data.bin.bz2'', std::ios::binary)

  Pipe pipe(new Bzip_Compression, new DataSink_Stream(out));

  pipe.start_msg();
  in >> pipe;
  pipe.end_msg();
\end{verbatim}

This is the first code we've seen so far that uses more than one
filter in a pipe. The output of the compressor is sent to the
\type{DataSink\_Stream}. Anything written to a \type{DataSink\_Stream}
is written to a file; the filter produces no output. As soon as the
compression algorithm finishes up a block of data, it will send it along,
at which point it will immediately be written to disk; if you were to
call \verb|pipe.read_all()| after \verb|pipe.end_msg()|, you'd get an
empty vector out.

Here's an example using two computational filters:

\begin{verbatim}
   SymmetricKey key(32);
   InitializationVector iv(16); // or use: block_size_of("AES")
   Pipe encryptor(get_cipher("AES/CBC/PKCS7", key, iv, ENCRYPTION),
                  new Base64_Encoder);
   encryptor.start_msg();
   file >> encryptor;
   encryptor.end_msg(); // flush buffers, complete computations
   std::cout << encryptor;
\end{verbatim}

\subsection{Fork}

It is fairly common that you might receive some data and want to perform more
than one operation on it (\ie, encrypt it with DES and calculate the MD5 hash
of the plaintext at the same time). That's where \type{Fork} comes
in. \type{Fork} is a filter that takes input and passes it on to \emph{one or
more} \type{Filter}s which are attached to it. \type{Fork} changes the nature
of the pipe system completely. Instead of being a linked list, it becomes a
tree.

Each \type{Filter} in the fork is given its own output buffer, and
thus its own message. For example, if you had previously written two
messages into a \type{Pipe}, then you start a new one with a
\type{Fork} which has three paths of \type{Filter}'s inside it, you
add three new messages to the \type{Pipe}. The data you put into the
\type{Pipe} is duplicated and sent into each set of \type{Filter}s,
and the eventual output is placed into a dedicated message slot in the
\type{Pipe}.

Messages in the \type{Pipe} are allocated in a depth-first manner. This is only
interesting if you are using more than one \type{Fork} in a single \type{Pipe}.
As an example, consider the following:

\begin{verbatim}
   Pipe pipe(new Fork(
                new Fork(
                   new Base64_Encoder,
                   new Fork(
                      NULL,
                      new Base64_Encoder
                      )
                   ),
                new Hex_Encoder
                )
      );
\end{verbatim}

In this case, message 0 will be the output of the first \type{Base64\_Encoder},
message 1 will be a copy of the input (see below for how \type{Fork} interprets
NULL pointers), message 2 will be the output of the second
\type{Base64\_Encoder}, and message 3 will be the output of the
\type{Hex\_Encoder}. As you can see, this results in message numbers being
allocated in a top to bottom fashion, when looked at on the screen. However,
note that there could be potential for bugs if this is not anticipated. For
example, if your code is passed a \type{Filter}, and you assume it is a
``normal'' one which only uses one message, your message offsets would be
wrong, leading to some confusion during output.

If Fork's first argument is a null pointer, but a later argument is
not, then Fork will feed a copy of its input directly through. Here's
a case where that is useful:

\begin{verbatim}
   // have std::string ciphertext, auth_code, key, iv, mac_key;

   Pipe pipe(new Base64_Decoder, get_cipher(``AES-128'', key, iv, DECRYPTION),
             new Fork(
                0
                new MAC_Filter(``HMAC(SHA-1)'', mac_key)
             )
      );

   pipe.process_msg(ciphertext);
   std::string plaintext = pipe.read_all_as_string(0);
   SecureVector<byte> mac = pipe.read_all(1);

   if(mac != auth_code)
      error();
\end{verbatim}

Here we wanted to not only decrypt the message, but send the decrypted
text through an additional computation, in order to compute the
authentication code.

Any \type{Filter}s which are attached to the \type{Pipe} after the
\type{Fork} are implicitly attached onto the first branch created by
the fork. For example, let's say you created this \type{Pipe}:

\begin{verbatim}
Pipe pipe(new Fork(new Hash_Filter("MD5"), new Hash_Filter("SHA-1")),
          new Hex_Encoder);
\end{verbatim}

And then called \function{start\_msg}, inserted some data, then
\function{end\_msg}. Then \arg{pipe} would contain two messages. The
first one (message number 0) would contain the MD5 sum of the input in
hex encoded form, and the other would contain the SHA-1 sum of the
input in raw binary. However, it's much better to use a \type{Chain}
instead.

\subsubsection{Chain}

A \type{Chain} filter creates a chain of \type{Filter}s and
encapsulates them inside a single filter (itself). This allows a
sequence of filters to become a single filter, to be passed into or
out of a function, or to a \type{Fork} constructor.

You can call \type{Chain}'s constructor with up to 4 \type{Filter*}s
(they will be added in order), or with an array of \type{Filter*}s and
a \type{u32bit} which tells \type{Chain} how many \type{Filter*}s are
in the array (again, they will be attached in order). Here's the
example from the last section, using chain instead of relying on the
obscure rule that version used.

\begin{verbatim}
  Pipe pipe(new Fork(
                new Chain(new Hash_Filter("MD5"), new Hex_Encoder),
                new Hash_Filter("SHA-1")
                )
           );
\end{verbatim}

\subsection{The Pipe API}

\subsubsection{Initializing Pipe}

By default, \type{Pipe} will do nothing at all; any input placed into
the \type{Pipe} will be read back unchanged. Obviously, this has
limited utility, and presumably you want to use one or more
\type{Filter}s to somehow process the data. First, you can choose a
set of \type{Filter}s to initialize the \type{Pipe} with via the
constructor. You can pass it either a set of up to 4 \type{Filter*}s,
or a pre-defined array and a length:

\begin{verbatim}
   Pipe pipe1(new Filter1(/*args*/), new Filter2(/*args*/),
              new Filter3(/*args*/), new Filter4(/*args*/));
   Pipe pipe2(new Filter1(/*args*/), new Filter2(/*args*/));

   Filter* filters[5] = {
     new Filter1(/*args*/), new Filter2(/*args*/), new Filter3(/*args*/),
     new Filter4(/*args*/), new Filter5(/*args*/) /* more if desired... */
   };
   Pipe pipe3(filters, 5);
\end{verbatim}

This is by far the most common way to initialize a \type{Pipe}. However,
occasionally a more flexible initialization strategy is necessary; this is
supported by 4 member functions: \function{prepend}(\type{Filter*}),
\function{append}(\type{Filter*}), \function{pop}(), and \function{reset}().
These functions may only be used while the \type{Pipe} in question is not in
use; that is, either before calling \function{start\_msg}, or after
\function{end\_msg} has been called (and no new calls to \function{start\_msg}
have been made yet).

The function \function{reset}() simply removes all the \type{Filter}s
which the \type{Pipe} is currently using~--~it is reset to an
initialize, ``empty'' state.  Any data which is being retained by the
\type{Pipe} is retained after a \function{reset}(), and
\function{reset}() does not affect the message numbers (discussed
later).

Calling \function{prepend} and \function{append} will either prepend
or append the passed \type{Filter} object to the list of
transformations. For example, if you \function{prepend} a
\type{Filter} implementing encryption, and the \type{Pipe} already had
a \type{Filter} which hex encoded the input, then the next set of
input would be first encrypted, then hex encoded. Alternately, if you
called \function{append}, then the input would be first be hex
encoded, and then encrypted (which is not terribly useful in this
particular example).

Finally, calling \function{pop}() will remove the first transformation
of the \type{Pipe}. Say we had called \function{prepend} to put an
encryption \type{Filter} into a \type{Pipe}; calling \function{pop}()
would remove this \type{Filter} and return the \type{Pipe} to its
state before we called \function{prepend}.

\subsubsection{Giving Data to a Pipe}

Input to a \type{Pipe} is delimited into messages, which can be read from
independently (\ie, you can read 5 bytes from one message, and then all of
another message, without either read affecting any other messages). The
messages are delimited by calls to \function{start\_msg} and
\function{end\_msg}. In between these two calls, you can write data into a
\type{Pipe}, and it will be processed by the \type{Filter}(s) that it
contains. Writes at any other time are invalid, and will result in an
exception.

As to writing, you can call any of the functions called \function{write}(),
which can take any of: a \type{byte[]}/\type{u32bit} pair, a
\type{SecureVector<byte>}, a \type{std::string}, a \type{DataSource\&}, or a
single \type{byte}.

Sometimes, you may want to do only a single write per message. In this case,
you can use the \function{process\_msg} series of functions, which start a
message, write their argument into the \type{Pipe}, and then end the
message. In this case you would not make any explicit calls to
\function{start\_msg}/\function{end\_msg}. The version of \function{write}
which takes a single \type{byte} is not supported by \function{process\_msg},
but all the other variants are.

\type{Pipe} can also be used with the \verb|>>| operator, and will accept a
\type{std::istream}, (or on Unix systems with the \verb|fd_unix| module), a
Unix file descriptor. In either case, the entire contents of the file will be
read into the \type{Pipe}.

\subsubsection{Getting Output from a Pipe}

Retrieving the processed data from a \type{Pipe} is a bit more complicated, for
various reasons. In particular, because \type{Pipe} will separate each message
into a separate buffer, you have to be able to retrieve data from each message
independently. Each of \type{Pipe}'s read functions has a final parameter which
specifies what message to read from (as a 32-bit integer). If this parameter is
set to \type{Pipe::DEFAULT\_MESSAGE}, it will read the current default message
(\type{DEFAULT\_MESSAGE} is also the default value of this parameter). The
parameter will not be mentioned in further discussion of the reading API, but
it is always there (unless otherwise noted).

Reading is done with a variety of functions. The most basic are \type{u32bit}
\function{read}(\type{byte} \arg{out}[], \type{u32bit} \arg{len}) and
\type{u32bit} \function{read}(\type{byte\&} \arg{out}). Each reads into
\arg{out} (either up to \arg{len} bytes, or a single byte for the one taking a
\type{byte\&}), and returns the total number of bytes read. There is a variant
of these functions, all named \function{peek}, which performs the same
operations, but does not remove the bytes from the message (reading is a
destructive operation with a \type{Pipe}).

There are also the functions \type{SecureVector<byte>} \function{read\_all}(),
and \type{std::string} \function{read\_all\_as\_string}(), which return the
entire contents of the message, either as a memory buffer, or a
\type{std::string} (which is generally only useful is the \type{Pipe} has
encoded the message into a text string, such as when a \type{Base64\_Encoder}
is used).

To determine how many bytes are left in a message, call \type{u32bit}
\function{remaining}() (which can also take an optional message
number). Finally, there are some functions for managing the default message
number: \type{u32bit} \function{default\_msg}() will return the current default
message, \type{u32bit} \function{message\_count}() will return the total number
of messages (0...\function{message\_count}()-1), and
\function{set\_default\_msg}(\type{u32bit} \arg{msgno}) will set a new default
message number (which must be a valid message number for that \type{Pipe}). The
ability to set the default message number is particularly important in the case
of using the file output operations (\verb|<<| with a \type{std::ostream} or
Unix file descriptor), because there is no way to specify it explicitly when
using the output operator.

\subsection{A Filter Example}

Here is some code which takes one or more filenames in \arg{argv} and
calculates the result of several hash functions for each file. The complete
program can be found as \filename{hasher.cpp} in the Botan distribution. For
brevity, most error checking has been removed.

\begin{verbatim}
   string name[3] = { "MD5", "SHA-1", "RIPEMD-160" };
   Botan::Filter* hash[3] = {
      new Botan::Chain(new Botan::Hash_Filter(name[0]),
                        new Botan::Hex_Encoder),
      new Botan::Chain(new Botan::Hash_Filter(name[1]),
                        new Botan::Hex_Encoder),
      new Botan::Chain(new Botan::Hash_Filter(name[2]),
                        new Botan::Hex_Encoder) };

   Botan::Pipe pipe(new Botan::Fork(hash, COUNT));

   for(u32bit j = 1; argv[j] != 0; j++)
      {
      ifstream file(argv[j]);
      pipe.start_msg();
      file >> pipe;
      pipe.end_msg();
      file.close();
      for(u32bit k = 0; k != 3; k++)
         {
         pipe.set_default_msg(3*(j-1)+k);
         cout << name[k] << "(" << argv[j] << ") = " << pipe << endl;
         }
      }
\end{verbatim}


\subsection{Filter Catalog}

This section contains descriptions of every \type{Filter} included in
the portable sections of Botan. \type{Filter}s provided by modules
are documented elsewhere.

\subsubsection{Keyed Filters}

A few sections ago, it was mentioned that \type{Pipe} can process multiple
messages, treating each of them exactly the same. Well, that was a bit of a
lie. There are some algorithms (in particular, block ciphers not in ECB mode,
and all stream ciphers) that change their state as data is put through them.

Naturally, you might well want to reset the keys or (in the case of block
cipher modes) IVs used by such filters, so multiple messages can be processed
using completely different keys, or new IVs, or new keys and IVs, or whatever.
And in fact, even for a MAC or an ECB block cipher, you might well want to
change the key used from message to message.

Enter \type{Keyed\_Filter}, which acts as an abstract interface for
any filter that is uses keys: block cipher modes, stream ciphers,
MACs, and so on. It has two functions, \function{set\_key} and
\function{set\_iv}. Calling \function{set\_key} will, naturally, set
(or reset) the key used by the algorithm. Setting the IV only makes
sense in certain algorithms -- a call to \function{set\_iv} on an
object that doesn't support IVs will be ignored. You \emph{must} call
\function{set\_key} before calling \function{set\_iv}: while not all
\type{Keyed\_Filter} objects require this, you should assume it is
required anytime you are using a \type{Keyed\_Filter}.

Here's a example:

\begin{verbatim}
   Keyed_Filter *cast, *hmac;
   Pipe pipe(new Base64_Decoder,
             // Note the assignments to the cast and hmac variables
             cast = new CBC_Decryption("CAST-128", "PKCS7", cast_key, iv),
             new Fork(
                0, // Read the section 'Fork' to understand this
                new Chain(
                   hmac = new MAC_Filter("HMAC(SHA-1)", mac_key, 12),
                   new Base64_Encoder
                   )
                )
      );
   pipe.start_msg();
   [use pipe for a while, decrypt some stuff, derive new keys and IVs]
   pipe.end_msg();

   cast->set_key(cast_key2);
   cast->set_iv(iv2);
   hmac->set_key(mac_key2);

   pipe.start_msg();
   [use pipe for some other things]
   pipe.end_msg();
\end{verbatim}

There are some requirements to using \type{Keyed\_Filter} which you must
follow. If you call \function{set\_key} or \function{set\_iv} on a filter which
is owned by a \type{Pipe}, you must do so while the \type{Pipe} is
``unlocked''. This refers to the times when no messages are being processed by
\type{Pipe} -- either before \type{Pipe}'s \function{start\_msg} is called, or
after \function{end\_msg} is called (and no new call to \function{start\_msg}
has happened yet). Doing otherwise will result in undefined behavior, probably
silently getting invalid output.

And remember: if you're resetting both values, reset the key \emph{first}.

\subsubsection{Cipher Filters}

Getting ahold of a \type{Filter} implementing a cipher is very easy. Simply
make sure you're including the header \filename{lookup.h}, and call
\function{get\_cipher}. Generally you will pass the return value directly into
a \type{Pipe}. There are actually a couple different functions, which do pretty
much the same thing:

\function{get\_cipher}(\type{std::string} \arg{cipher\_spec},
                       \type{SymmetricKey} \arg{key},
                       \type{InitializationVector} \arg{iv},
                       \type{Cipher\_Dir} \arg{dir});

\function{get\_cipher}(\type{std::string} \arg{cipher\_spec},
                       \type{SymmetricKey} \arg{key},
                       \type{Cipher\_Dir} \arg{dir});

The version that doesn't take an IV is useful for things that don't use them,
like block ciphers in ECB mode, or most stream ciphers. If you specify a
\arg{cipher\_spec} that does want a IV, and you use the version that doesn't
take one, an exception will be thrown. The \arg{dir} argument can be either
\type{ENCRYPTION} or \type{DECRYPTION}. In a few cases, like most (but not all)
stream ciphers, these are equivalent, but even then it provides a way of
showing the ``intent'' of the operation to readers of your code.

The \arg{cipher\_spec} is a string that specifies what cipher is to be
used. The general syntax for \arg{cipher\_spec} is ``STREAM\_CIPHER'',
``BLOCK\_CIPHER/MODE'', or ``BLOCK\_CIPHER/MODE/PADDING''. In the case of
stream ciphers, no mode is necessary, so just the name is sufficient. A block
cipher requires a mode of some sort, which can be ``ECB'', ``CBC'', ``CFB(n)'',
``OFB'', ``CTR-BE'', or ``EAX(n)''. The argument to CFB mode is how many bits
of feedback should be used. If you just use ``CFB'' with no argument, it will
default to using a feedback equal to the block size of the cipher. EAX mode
also takes an optional bit argument, which tells EAX how large a tag size to
use~--~generally this is the size of the block size of the cipher, which is the
default if you don't specify any argument.

In the case of the ECB and CBC modes, a padding method can also be
specified. If it is not supplied, ECB defaults to not padding, and CBC defaults
to using PKCS \#5/\#7 compatible padding. The padding methods currently
available are ``NoPadding'', ``PKCS7'', ``OneAndZeros'', and ``CTS''. CTS
padding is currently only available for CBC mode, but the others can also be
used in ECB mode.

Some example \arg{cipher\_spec} arguments are: ``DES/CFB(32)'',
``TripleDES/OFB'', ``Blowfish/CBC/CTS'', ``SAFER-SK(10)/CBC/OneAndZeros'',
``AES/EAX'', ``ARC4''

``CTR-BE'' refers to counter mode where the counter is incremented as if it
were a big-endian encoded integer. This is compatible with most other
implementations, but it is possible some will use the incompatible little
endian convention. This version would be denoted as ``CTR-LE'' if it were
supported.

``EAX'' is a new cipher mode designed by Wagner, Rogaway, and Bellare. It is an
authenticated cipher mode (that is, no separate authentication is needed), has
provable security, and is free from patent entanglements. It runs about half as
fast as most of the other cipher modes (like CBC, OFB, or CTR), which is not
bad considering you don't need to use an authentication code.

\subsubsection{Hashes and MACs}

Hash functions and MACs don't need anything special when it comes to
filters. Both just take their input and produce no output until
\function{end\_msg()} is called, at which time they complete the hash or MAC
and send that as output.

These \type{Filter}s take a string naming the type to be used. If for some
reason you name something that doesn't exist, an exception will be thrown.

\noindent
\function{Hash\_Filter}(\type{std::string} \arg{hash},
                        \type{u32bit} \arg{outlength}):

This type hashes its input with \arg{hash}. When \function{end\_msg} is called
on the owning \type{Pipe}, the hash is completed and the digest is sent on to
the next thing in the pipe. The argument \arg{outlength} specifies how much of
the output of the hash will be passed along to the next filter when
\function{end\_msg} is called. By default, it will pass the entire hash.

Examples of names for \function{Hash\_Filter} are ``SHA-1'' and ``Whirlpool''.

\noindent
\function{MAC\_Filter}(\type{std::string} \arg{mac},
                       \type{const SymmetricKey\&} \arg{key},
                       \type{u32bit} \arg{outlength}):

The constructor for a \type{MAC\_Filter} takes a key, used in calculating the
MAC, and a length parameter, which has semantics exactly the same as the one
passed to \type{Hash\_Filter}s constructor.

Examples for \arg{mac} are ``HMAC(SHA-1)'', ``CMAC(AES-128)'', and the
exceptionally long, strange, and probably useless name
``CMAC(Lion(Tiger(20,3),MARK-4,1024))''.

\subsubsection{PK Filters}

There are four classes in this category, \type{PK\_Encryptor\_Filter},
\type{PK\_Decryptor\_Filter}, \type{PK\_Signer\_Filter}, and
\type{PK\_Verifier\_Filter}. Each takes a pointer to an object of the
appropriate type (\type{PK\_Encryptor}, \type{PK\_Decryptor}, etc) which is
deleted by the destructor. These classes are found in \filename{pk\_filts.h}.

Three of these, for encryption, decryption, and signing are pretty much
identical conceptually. Each of them buffers its input until the end of the
message is marked with a call to the \function{end\_msg} function. Then they
encrypt, decrypt, or sign their input and send the output (the ciphertext, the
plaintext, or the signature) into the next filter.

Signature verification works a little differently, because it needs to know
what the signature is in order to check it. You can either pass this in along
with the constructor, or call the function \function{set\_signature} -- with
this second method, you need to keep a pointer to the filter around so you can
send it this command. In either case, after \function{end\_msg} is called, it
will try to verify the signature (if the signature has not been set by either
method, an exception will be thrown here). It will then send a single byte onto
the next filter -- a 1 or a 0, which specifies whether the signature verified
or not (respectively).

For more information about PK algorithms (including creating the appropriate
objects to pass to the constructors), read the section ``Public Key
Cryptography'' in this manual.

\subsubsection{Encoders}

Often you want your data to be in some form of text (for sending over channels
which aren't 8-bit clean, printing it, etc). The filters \type{Hex\_Encoder}
and \type{Base64\_Encoder} will convert arbitrary binary data into hex or
base64 formats. Not surprisingly, you can use \type{Hex\_Decoder} and
\type{Base64\_Decoder} to convert it back into its original form.

Both of the encoders can take a few options about how the data should be
formatted (all of which have defaults). The first is a \type{bool} which simply
says if the encoder should insert line breaks. This defaults to
false. Line breaks don't matter either way to the decoder, but it makes the
output a bit more appealing to the human eye, and a few transport mechanisms
(notably some email systems) limit the maximum line length.

The second encoder option is an integer specifying how long such lines will be
(obviously this will be ignored if line-breaking isn't being used). The default
tends to be in the range of 60-80 characters, but is not specified exactly. If
you want a specific value, set it. Otherwise the default should be fine.

Lastly, \type{Hex\_Encoder} takes an argument of type \type{Case}, which can be
\type{Uppercase} or \type{Lowercase} (default is \type{Uppercase}). This
specifies what case the characters A-F should be output as. The base64 encoder
has no such option, because it uses both upper and lower case letters for its
output.

The decoders both take a single option, which tells it how the object should
behave in the case of invalid input. The enum (called \type{Decoder\_Checking})
can take on any of three values: \type{NONE}, \type{IGNORE\_WS}, and
\type{FULL\_CHECK}. With \type{NONE} (the default, for compatibility with
previous releases), invalid input (for example, a ``z'' character in supposedly
hex input) will simply be ignored. With \type{IGNORE\_WS}, whitespace will be
ignored by the decoder, but receiving other non-valid data will raise an
exception. Finally, \type{FULL\_CHECK} will raise an exception for \emph{any}
characters not in the encoded character set, including whitespace.

You can find the declarations for these types in \filename{hex.h} and
\filename{base64.h}.

\subsection{Rolling Your Own}

The system of filters and pipes was designed in an attempt to make it
as simple as possible to write new \type{Filter} objects. There are
essentially four functions that need to be implemented by an object
deriving from \type{Filter}:

\noindent
\type{void} \function{write}(\type{byte} \arg{input}[], \type{u32bit}
\arg{length}):

The \function{write} function is what is called when a filter receives input
for it to process. The filter is \emph{not} required to process it right away;
many filters buffer their input before producing any output. A filter will
usually have \function{write} called many times during its lifetime.

\noindent
\type{void} \function{send}(\type{byte} \arg{output}[], \type{u32bit}
\arg{length}):

Eventually, a filter will want to produce some output to send along to the next
filter in the pipeline. It does so by calling \function{send} with whatever it
wants to send along to the next filter. There is also a version of
\function{send} taking a single byte argument, as a convenience.

\noindent
\type{void} \function{start\_msg()}:

This function is optional. Implement it if your \type{Filter} would like to do
some processing or setup at the start of each message (for an example, see the
Zlib compression module).

\noindent
\type{void} \function{end\_msg()}:

Implementing the \function{end\_msg} function is optional. It is called when it
has been requested that filters finish up their computations. Note that they
must \emph{not} deallocate their resources; this should be done by their
destructor. They should simply finish up with whatever computation they have
been working on (for example, a compressing filter would flush the compressor
and \function{send} the final block), and empty any buffers in preparation for
processing a fresh new set of input. It is essentially the inverse of
\function{start\_msg}.

Additionally, if necessary, filters can define a constructor that takes any
needed arguments, and a destructor to deal with deallocating memory, closing
files, etc.

There is also a \type{BufferingFilter} class (in \filename{buf\_filt.h}) which
will take a message and split it up into an initial block which can be of any
size (including zero), a sequence of fixed sized blocks of any non-zero size,
and last (possibly zero-sized) final block. This might make a useful base class
for your filters, depending on what you have in mind.


\pagebreak
\section{Public Key Cryptography}

Let's create an RSA private key:

\begin{verbatim}
   RSA_PrivateKey priv_rsa(1024 /* bits */);
\end{verbatim}

We can easily turn this into a public key, which we can then send to
someone:

\begin{verbatim}
   RSA_PublicKey pub_rsa = priv_rsa;
\end{verbatim}




\subsection{Creating PK Algorithm Key Objects}

The library has interfaces for encryption, signatures, etc that do not require
knowing the exact algorithm in use (for example RSA and Rabin-Williams
signatures are handled by the exact same code path).

One place where we \emph{do} need to know exactly what kind of algorithm is in
use is when we are creating a key (\emph{But}: read the section ``Importing and
Exporting PK Keys'', later in this manual).

There are (currently) two kinds of public key algorithms in Botan: ones based
on integer factorization (RSA and Rabin-Williams), and ones based on the
discrete logarithm problem (DSA, Diffie-Hellman, Nyberg-Rueppel, and
ElGamal). Since discrete logarithm parameters (primes and generators) can be
shared among many keys, there is the notion of these being a combined type
(called \type{DL\_Group}).

There are two ways to create a DL private key (such as
\type{DSA\_PrivateKey}). One is to pass in just a \type{DL\_Group} object -- a
new key will automatically be generated. The other involves passing in a group
to use, along with both the public and private values (private value first).

Since in integer factorization algorithms, the modulus used isn't shared by
other keys, we don't use this notion. You can create a new key by passing in a
\type{u32bit} telling how long (in bits) the key should be, or you can copy an
pre-existing key by passing in the appropriate parameters (primes, exponents,
etc). For RSA and Rabin-Williams (the two IF schemes in Botan), the parameters
are all \type{BigInt}s: prime 1, prime 2, encryption exponent, decryption
exponent, modulus. The last two are optional, since they can easily be derived
from the first three.

\subsubsection{Creating a DL\_Group}

There are quite a few ways to get a \type{DL\_Group} object. The best is to use
the function \function{get\_dl\_group}, which takes a string naming a group; it
will either return that group, if it knows about it, or throw an
exception. Names it knows about include ``IETF-n'' where n is 768, 1024, 1536,
2048, 3072, or 4096, and ``DSA-n'', where n is 512, 768, or 1024. The IETF
groups are the ones specified for use with IPSec, and the DSA ones are the
default DSA parameters specified by Java's JCE. For DSA and Nyberg-Rueppel, you
should only use the ``DSA-n'' groups, while Diffie-Hellman and ElGamal can use
either type (keep in mind that some applications/standards require DH/ELG to
use DSA-style primes, while others require strong prime groups).

You can also generate a new random group. This is not recommend, because it is
quite slow, especially for safe primes.

You can register a new DL group with \function{add\_dl\_group} with a string
naming the group and the \type{DL\_Group}. Future lookups on that name will
return the group. There is no reason to register the group if you do decide to
use a distinct DL group for each key.

\subsection{Key Checking}

Most public key algorithms have limitations or restrictions on their
parameters. For example RSA requires an odd exponent, and algorithms based on
the discrete logarithm problem need a generator $> 1$.

Each low-level public key type has a function named \function{check\_key} which
takes a \type{bool}. This function returns a boolean value that declares
whether or not the key is valid (from an algorithmic standpoint). For example,
it will check to make sure that the prime parameters of a DSA key are, in fact,
prime. It does not have anything to do with the validity of the key for any
particular use, nor does it have anything to do with certificates which link a
key (which, after all, is just some numbers) with a user or other entity. If
\function{check\_key}'s argument is \type{true}, then it does ``strong''
checking, which includes fairly expensive operations like primality checking.

Keys are always checked when they are loaded or generated, so typically there
is no reason to use this function directly. However, you can disable or reduce
the checks for particular cases (public keys, loaded private keys, generated
private keys) by setting the right config toggle (see the section on the
configuration subsystem for details).

\subsection{Getting a PK algorithm object}

The key types, like \type{RSA\_PrivateKey}, do not implement any kind of
padding or encoding (which is generally necessary for security). To get an
object like this, the easiest thing to do is call the functions found in
\filename{look\_pk.h}. Generally these take a key, followed by a string that
specified what hashing and encoding method(s) to use. Examples of such strings
are ``EME1(SHA-1)'' for OAEP encryption and ``EMSA4(SHA-1)'' for PSS signatures
(where the message is hashed using SHA-1).

Here are some basic examples (using an RSA key) to give you a feel for the
possibilities. These examples assume \type{rsakey} is an
\type{RSA\_PrivateKey}, since otherwise we would not be able to create a
decryption or signature object with it (you can create encryption or signature
verification objects with public keys, naturally). Remember to delete these
objects when you're done with them.

\begin{verbatim}
   // PKCS #1 v2.0 / IEEE 1363 compatible encryption
   PK_Encryptor* rsa_enc1 = get_pk_encryptor(rsakey, "EME1(RIPEMD-160)");
   // PKCS #1 v1.5 compatible encryption
   PK_Encryptor* rsa_enc2 = get_pk_encryptor(rsakey, "PKCS1v15");

   // Raw encryption: no padding, input is directly encrypted by the key
   // Don't use this unless you know what you're doing
   PK_Encryptor* rsa_enc3 = get_pk_encryptor(rsakey, "Raw");

   // This object can decrypt things encrypted by rsa_enc1
   PK_Decryptor* rsa_dec1 = get_pk_decryptor(rsakey, "EME1(RIPEMD-160)");

   // PKCS #1 v1.5 compatible signatures
   PK_Signer* rsa_sig = get_pk_signer(rsakey, "EMSA3(MD5)");
   PK_Verifier* rsa_verify = get_pk_verifier(rsakey, "EMSA3(MD5)");

   // PKCS #1 v2.1 compatible signatures
   PK_Signer* rsa_sig2 = get_pk_signer(rsakey, "EMSA4(SHA-1)");
   PK_Verifier* rsa_verify2 = get_pk_verifier(rsakey, "EMSA4(SHA-1)");

   // Hash input with SHA-1, but don't pad the input in any way; usually
   // used with DSA/NR, not RSA
   PK_Signer* rsa_sig = get_pk_signer(rsakey, "EMSA1(SHA-1)");
\end{verbatim}

\subsection{Encryption}

The \type{PK\_Encryptor} and \type{PK\_Decryptor} classes are the interface for
encryption and decryption, respectively.

Calling \function{encrypt} with a \type{byte} array and a length parameter will
return the input encrypted with whatever scheme is being used. Calling the
similar \function{decrypt} will perform the inverse operation. You can also do
these operations with \type{SecureVector<byte>}s. In all cases, the output is
returned via a \type{SecureVector<byte>}.

If you attempt an operation with a larger size than the key can support (this
limit varies based on the algorithm, the key size, and the padding method used
(if any)), an exception will be thrown. Alternately, you can call
\function{maximum\_input\_size}, which will return the maximum size you can
safely encrypt. In fact, you can often encrypt an object that is one byte
longer, but only if enough of the high bits of the leading byte are set to
zero. Since this is pretty dicey, it's best to stick with the advertised
maximum.

Available public key encryption algorithms in Botan are RSA and ElGamal. The
encoding methods are EME1, denoted by ``EME1(HASHNAME)'', PKCS \#1 v1.5,
called ``PKCS1v15'' or ``EME-PKCS1-v1\_5'', and raw encoding (``Raw'').

For compatibility reasons, PKCS \#1 v1.5 is recommend for use with ElGamal
(most other implementations of ElGamal do not support any other encoding
format). RSA can also be used with PKCS \# 1 encoding, but because of various
possible attacks, EME1 is the preferred encoding. EME1 requires the use of a
hash function: unless a competent applied cryptographer tells you otherwise,
you should use SHA-1.

Don't use ``Raw'' encoding unless you need it for backward compatibility with
old protocols. There are many possible attacks against both ElGamal and RSA
when they are used this way.

\subsection{Signatures}

The signature algorithms look quite a bit like the hash functions. You can
repeatedly call \function{update}, giving more and more of a message you wish
to sign, and then call \function{signature}, which will return a signature for
that message. If you want to do it all in one shot, call
\function{sign\_message}, which will just call \function{update} with its
argument and then return whatever \function{signature} returns.

You can validate a signature by updating the verifier class, and finally seeing
the if the value returned from \function{check\_signature} is true (you pass
the supposed signature to the \function{check\_signature} function as a byte
array and a length or as a \type{MemoryRegion<byte>}). There is another
function, \function{verify\_message}, which takes a pair of byte array/length
pairs (or a pair of \type{MemoryRegion<byte>} objects), the first of which is
the message, the second being the (supposed) signature. It returns true if the
signature is valid and false otherwise.

Available public key signature algorithms in Botan are RSA, DSA,
Nyberg-Rueppel, and Rabin-Williams. Signature encoding methods include EMSA1,
EMSA2, EMSA3, EMSA4, and Raw. All of them, except Raw, take a parameter naming
a message digest function to hash the message with. Raw actually signs the
input directly; if the message is too big, the signing operation will fail. Raw
is not useful except in very specialized applications.

There are various interactions which make certain encoding schemes and signing
algorithms more or less useful.

EMSA2 is the usual method for encoding Rabin-William signatures, so for
compatibility with other implementations you may have to use that. EMSA4 (also
called PSS), also works with Rabin-Williams. EMSA1 and EMSA3 do \emph{not} work
with Rabin-Williams.

RSA can be used with any of the available encoding methods. EMSA4 is by far the
most secure, but is not (as of now) widely implemented. EMSA3 (also called
``EMSA-PKCS1-v1\_5'') is commonly used with RSA (for example in SSL). EMSA1
signs the message digest directly, without any extra padding or encoding. This
may be useful, but is not as secure as either EMSA3 or EMSA4. EMSA2 may be used
but is not recommended.

For DSA and Nyberg-Rueppel, you should use EMSA1. None of the other encoding
methods are particularly useful for these algorithms.

\subsection{Key Agreement}

You can get ahold of a \type{PK\_Key\_Agreement\_Scheme} object by calling
\function{get\_pk\_kas} with a key that is of a type that supports key
agreement (such as a Diffie-Hellman key stored in a \type{DH\_PrivateKey}
object), and the name of a key derivation function. This can be ``Raw'',
meaning the output of the primitive itself is returned as the key, or
``KDF1(hash)'' or ``KDF2(hash)'' where ``hash'' is any string you happen to
like (hopefully you like strings like ``SHA-1'' or ``RIPEMD-160''), or
``X9.42-PRF(keywrap)'', which uses the PRF specified in ANSI X9.42. It takes
the name or OID of the key wrap algorithm which will be used to encrypt a
content encryption key.

How key agreement generally works is that you trade public values with some
other party, and then each of you runs a computation with the other's value and
your key (this should return the same result to both parties). This computation
can be called by using \function{derive\_key} with either a byte array/length
pair, or a \type{SecureVector<byte>} than holds the public value of the other
party. The last argument to either call is a number that specifies how long a
key you want.

Depending on the key derivation function you're using, you many not
\emph{actually} get back a key of that size. In particular, ``Raw'' will return
a number about the size of the Diffie-Hellman modulus, and KDF1 can only return
a key which is the same size as the output of the hash. KDF2, on the other
hand, will always give you a key exactly as long as you request, regardless of
the underlying hash used with it. The key returned is a \type{SymmetricKey},
ready to pass to a block cipher, MAC, or other symmetric algorithm.

The public value which should be used can be obtained by calling
\function{public\_data}, which exists for any key that is associated with a
key agreement algorithm. It returns a \type{SecureVector<byte>}.

``KDF2(SHA-1)'' is by far the preferred algorithm for key derivation in new
applications. The X9.42 algorithm may be useful in some circumstances, but
unless you need X9.42 compatibility, KDF2 is easier to use.

There is a Diffie-Hellman example included in the distribution, which you may
want to examine.

\subsection{Importing and Exporting PK Keys}

[This section mentions \type{Pipe} and \type{DataSource}, which is not covered
until later in the manual. Please read those sections for more about
\type{Pipe} and \type{DataSource} and their uses.]

There are many, many different (often conflicting) standards surrounding public
key cryptography. There is, thankfully, only two major standards surrounding
the representation of a public or private key: X.509 (for public keys), and
PKCS \#8 (for private keys). Other crypto libraries, like OpenSSL and B-SAFE,
also support these formats, so you can easily exchange keys with software that
doesn't use Botan.

In addition to ``plain'' public keys, Botan also supports X.509 certificates.
These are documented in the section ``Certificate Handling'', later in this
manual.

\subsubsection{Public Keys}

The interfaces for doing either of these is quite similar. Let's look at the
X.509 stuff first:
\begin{verbatim}
namespace X509 {
   void encode(const X509_PublicKey& key, Pipe& out, X509_Encoding enc = PEM);
   std::string PEM_encode(const X509_PublicKey& out);

   X509_PublicKey* load_key(DataSource& in);
   X509_PublicKey* load_key(const std::string& file);
   X509_PublicKey* load_key(const SecureVector<byte>& buffer);
}
\end{verbatim}

Basically, \function{X509::encode} will take an \type{X509\_PublicKey}
(as of now, that's any RSA, DSA, or Diffie-Hellman key) and encodes it
using \arg{enc}, which can be either \type{PEM} or
\type{RAW\_BER}. Using \type{PEM} is \emph{highly} recommended for
many reasons, including compatibility with other software, for
transmission over 8-bit unclean channels, because it can be identified
by a human without special tools, and because it sometimes allows more
sane behavior of tools that process the data. It will place the
encoding into \arg{out}. Remember that if you have just created the
\type{Pipe} that you are passing to \function{X509::encode}, you need
to call \function{start\_msg} first. Particularly with public keys,
about 99\% of the time you just want to PEM encode the key and then
write it to a file or something. In this case, it's probably easier to
use \function{X509::PEM\_encode}. This function will simply return the
PEM encoding of the key as a \type{std::string}.

For loading a public key, the preferred method is one of the variants
of \function{load\_key}. This function will return a newly allocated
key based on the data from whatever source it is using (assuming, of
course, the source is in fact storing a representation of a public
key). The encoding used (PEM or BER) need not be specified; the format
will be detected automatically. The key is allocated with
\function{new}, and should be released with \function{delete} when you
are done with it. The first takes a generic \type{DataSource} which
you have to allocate~--~the others are simple wrapper functions that
take either a filename or a memory buffer.

So what can you do with the return value of \function{load\_key}? On
its own, a \type{X509\_PublicKey} isn't particularly useful; you can't
encrypt messages or verify signatures, or much else. But, using
\function{dynamic\_cast}, you can figure out what kind of operations
the key supports. Then, you can cast the key to the appropriate type
and pass it to a higher-level class. For example:

\begin{verbatim}
   /* Might be RSA, might be ElGamal, might be ... */
   X509_PublicKey* key = X509::load_key("pubkey.asc");
      /* You MUST use dynamic_cast to convert, because of virtual bases */
   PK_Encrypting_Key* enc_key = dynamic_cast<PK_Encrypting_Key*>(key);
   if(!enc_key)
      throw Some_Exception();
   PK_Encryptor* enc = get_pk_encryptor(*enc_key, "EME1(SHA-1)");
   SecureVector<byte> cipher = enc->encrypt(some_message, size_of_message);
\end{verbatim}

\subsubsection{Private Keys}

There are two different options for private key import/export. The first is a
plaintext version of the private key. This is supported by the following
functions:

\begin{verbatim}
namespace PKCS8 {
   void encode(const PKCS8_PrivateKey& key, Pipe& to, X509_Encoding enc = PEM);

   std::string PEM_encode(const PKCS8_PrivateKey& key);
}
\end{verbatim}

These functions are basically the same as the X.509 functions described
previously. The only difference is that they take a \type{PKCS8\_PrivateKey}
type (which, again, can be either RSA, DSA, or Diffie-Hellman, but this time
the key must be a private key). In most situations, using these is a bad idea,
because anyone can come along and grab the private key without having to know
any passwords or other secrets. Unless you have very particular security
requirements, always use the versions that encrypt the key based on a
passphrase. For importing, the same functions can be used for encrypted and
unencrypted keys.

The other way to export a PKCS \#8 key is to first encode it in the same manner
as done above, then encrypt it (using a passphrase and the techniques of PKCS
\#5), and store the whole thing into another structure. This method is
definitely preferred, since otherwise the private key is unprotected. The
following functions support this technique:

\begin{verbatim}
namespace PKCS8 {
   void encrypt_key(const PKCS8_PrivateKey& key, Pipe& out,
                    std::string passphrase, std::string pbe = "",
                    X509_Encoding enc = PEM);

   std::string PEM_encode(const PKCS8_PrivateKey& key, std::string passphrase,
                          std::string pbe = "");
}
\end{verbatim}

To export an encrypted private key, call \function{PKCS8::encrypt\_key}. The
\arg{key}, \arg{out}, and \arg{enc} arguments are similar in usage to the ones
for \function{PKCS8::encode}. As you might notice, there are two new arguments
for \function{PKCS8::encrypt\_key}, however. The first is a passphrase (which
you presumably got from a user somehow). This will be used to encrypt the key.
The second new argument is \arg{pbe}; this specifies a particular password
based encryption (or PBE) algorithm.

The \function{PEM\_encode} version shown here is similar to the one that
doesn't take a passphrase. Essentially it encrypts the key (using the default
PBE algorithm), and then returns a C++ string with the PEM encoding of the key.

If \arg{pbe} is blank, then the default algorithm (controlled by the
``base/default\_pbe'' option) will be used. As shipped, this default is
``PBE-PKCS5v20(SHA-1,TripleDES/CBC)'' . This is among the more secure options
of PKCS \#5, and is widely supported among implementations of PKCS \#5 v2.0. It
offers 168 bits of security against attacks, which should be more that
sufficient. If you need compatibility with systems that only support PKCS \#5
v1.5, pass ``PBE-PKCS5v15(MD5,DES/CBC)'' as \arg{pbe}. However, be warned that
this PBE algorithm only has 56 bits of security against brute force attacks. As
of 1.4.5, all three keylengths of AES are also available as options, which can
be used with by specifying a PBE algorithm of
``PBE-PKCS5v20(SHA-1,AES-256/CBC)'' (or ``AES-128'' or ``AES-192''). Support
for AES is slightly non-standard, and some applications or libraries might not
handle it. It is known that OpenSSL (0.9.7 and later) do handle AES for private
key encryption.

There may be some strange programs out there that support the v2.0 extensions
to PBES1 but not PBES2; if you need to inter-operate with a program like that,
use ``PBE-PKCS5v15(MD5,RC2/CBC)''. For example, OpenSSL supports this format
(though since it also supports the v2.0 schemes, there is no reason not to just
use TripleDES or AES). This scheme uses a 64 bit key, which, while
significantly better than a 56 bit key, is a bit too small for comfort.

Last but not least, there are some functions which is basically identical to
\function{X509::load\_key}, which will load, and possibly decrypt, a PKCS \#8
private key:

\begin{verbatim}
namespace PKCS8 {
   PKCS8_PrivateKey* load_key(DataSource& in, const User_Interface& ui);
   PKCS8_PrivateKey* load_key(DataSource& in, std::string passphrase = "");

   PKCS8_PrivateKey* load_key(const std::string& filename,
                              const User_Interface& ui);
   PKCS8_PrivateKey* load_key(const std::string& filename,
                              const std::string& passphrase = "");
}
\end{verbatim}

The versions that take \type{std::string} \arg{passphrase}s are primarily for
compatibility, but they are useful in limited circumstances. The
\type{User\_Interface} versions are how \function{load\_key} is actually
implemented, and provides for much more flexibility. Essentially, if the
passphrase given to the function is not correct, then an exception is thrown
and that is that. However, if you pass in an UI object instead, then the UI
object can keep asking the user for the passphrase until they get it right (or
until they cancel the action, though the UI interface). A
\type{User\_Interface} has very little to do with talking to users; it's just a
way to glue together Botan and whatever user interface you happen to be
using. You can think of it as a user interface interface. The default
\type{User\_Interface} is actually very dumb, and effectively acts just like
the versions taking the \type{std::string}.

After loading a key, you can use \function{dynamic\_cast} to find out what
operations it supports, and use it appropriately. Remember to \function{delete}
it once you are done with it.

\subsubsection{Limitations}

As of now Nyberg-Rueppel and Rabin-Williams keys cannot be imported or
exported, because they have no official ASN.1 OID or definition. ElGamal keys
can (as of Botan 1.3.8) be imported and exported, but the only other
implementation which supports the format is Peter Gutmann's Cryptlib. If you
can help it, stick to RSA and DSA.

\emph{Note}: Currently NR and RW are given basic ASN.1 key formats (which
mirror DSA and RSA, respectively), which means that, if they are assigned an
OID, they can be imported and exported just as easily as RSA and DSA. You can
assign them an OID by putting a line in a Botan configuration file, calling
\function{OIDS::add\_oid}, or editing \filename{src/policy.cpp}. Be warned that
it is possible that a future version will use a format which is different from
the current one (\ie, a newly standardized format).

\pagebreak
\section{Certificate Handling}

A certificate is essentially a binding between some identifying information of
a person or other entity (called a \emph{subject}) and a public key. This
binding is asserted by a signature on the certificate, which is placed there by
some authority (the \emph{issuer}) which at least claims that it knows the
subject that is named in the certificate really ``owns'' the private key
corresponding to the public key in the certificate.

The major certificate format in use today is X.509v3, designed by ISO and
further hacked on by dozens (hundreds?) of other organizations.

When working with certificates, the main class to remember is
\type{X509\_Certificate}. You can read an object of this type, but you can't
create one on the fly; a CA object is necessary for actually making a new
certificate. So for the most part, you only have to worry about reading them
in, verifying the signatures, and getting the bits of data in them (most
commonly the public key, and the information about the user of that key). An
X.509v3 certificate can contain a literally infinite number of items related to
all kinds of things. Botan doesn't support a lot of them, simply because nobody
uses them and they're an impossible mess to work with. This section only
documents the most commonly used ones of the ones that are supported; for the
rest, read \filename{x509cert.h} and \filename{asn1\_obj.h} (which has the
definitions of various common ASN.1 constructs used in X.509).

\subsection{So what's in an X.509 certificate?}

Obviously, you want to be able to get the public key. This is achieved by
calling the member function \function{subject\_public\_key}, which will return
a \type{X509\_PublicKey*}. As to what to do with this, read about
\function{load\_key} in the section ``Importing and Exporting PK Keys''. In the
general case, this could be any kind of public key, though 99\% of the time it
will be an RSA key. However, Diffie-Hellman and DSA keys are also supported, so
be careful about how you treat this. It is also a wise idea to examine the
value returned by \function{constraints}, to see what uses the public key is
approved for.

The second major piece of information you'll want is the name/email/etc of the
person to whom this certificate is assigned. Here is where things get a little
nasty. X.509v3 has two (well, mostly just two $\ldots$) different places where
you can stick information about the user: the \emph{subject} field, and in an
extension called \emph{subjectAlternativeName}. The \emph{subject} field is
supposed to only included the following information: country, organization
(possibly), an organizational sub-unit name (possibly), and a so-called common
name. The common name is usually the name of the person, or it could be a title
associated with a position of some sort in the organization. It may also
include fields for state/province and locality. What exactly a locality is,
nobody knows, but it's usually given as a city name.

Botan doesn't currently support any of the Unicode variants used in ASN.1
(UTF-8, UCS-2, and UCS-4), any of which could be used for the fields in the
DN. This could be problematic, particularly in Asia and other areas where
non-ASCII characters are needed for most names. The UTF-8 and UCS-2 string
types \emph{are} accepted (in fact, UTF-8 is used when encoding much of the
time), but if any of the characters included in the string are not in ISO
8859-1 (\ie 0 \ldots 255), an exception will get thrown. Currently the
\type{ASN1\_String} type holds its data as ISO 8859-1 internally (regardless
of local character set); this would have to be changed to hold UCS-2 or UCS-4
in order to support Unicode (also, many interfaces in the X.509 code would have
to accept or return a \type{std::wstring} instead of a \type{std::string}).

Like the distinguished names, subject alternative names can contain a lot of
things that Botan will flat out ignore (most of which you would never actually
want to use). However, there are three very useful pieces of information which
this extension might hold: an email address (``person@site1.com''), a DNS name
(``somehost.site2.com''), or a URI (``http://www.site3.com'').

So, how to get the information? Simply call \function{subject\_info} with the
name of the piece of information you want, and it will return a
\type{std::string} which is either empty (signifying that the certificate
doesn't have this information), or has the information requested. There are
several names for each possible item, but the most easily readable ones are:
``Name'', ``Country'', ``Organization'', ``Organizational Unit'', ``Locality'',
``State'', ``RFC822'', ``URI'', and ``DNS''. These values are returned as a
\type{std::string}.

You can also get information about the issuer of the certificate in the same
way, using \function{issuer\_info}.

\subsubsection{X.509v3 Extensions}

X.509v3 specifies a large number of possible extensions. Botan supports some,
but by no means all of them. This section lists which ones are supported, and
notes areas where there may be problems with the handling. You have to be
pretty familiar with X.509 in order to understand what this is talking about.

\begin{list}{$\cdot$}
  \item Key Usage and Extended Key Usage: No problems known.
  \item

  \item Basic Constraints: No problems known. The default for a v1/v2
        certificate is assume it's a CA if and only if the option
        ``x509/default\_to\_ca'' is set. A v3 certificate is marked as a CA if
        (and only if) the basic constraints extension is present and set for a
        CA cert.

  \item Subject Alternative Names: Only the ``rfc822Name'', ``dNSName'', and
        ``uniformResourceIdentifier'' fields will be stored; all others are
        ignored.

  \item Issuer Alternative Names: Same restrictions as the Subject Alternative
        Names extension. New certificates generated by Botan never include the
        issuer alternative name.