Tuesday, January 15, 2008

Learning doxygen for source code documentation

Maintaining and adding new features to legacy systems developed using C/C++ is a daunting task. Fortunately, doxygen—a documentation system for the C/C++, Java™, Python, and other programming languages—can help. Discover the features of doxygen in the context of projects using C/C++ as well as how to document code using doxygen-defined tags.

Maintaining and adding new features to legacy systems developed using C/C++ is a daunting task. There are several facets to the problem—understanding the existing class hierarchy and global variables, the different user-defined types, and function call graph analysis, to name a few. This article discusses several features of doxygen, with examples in the context of projects using C/C++. However, doxygen is flexible enough to be used for software projects developed using the Python, Java, PHP, and other languages, as well. The primary motivation of this article is to help extract information from C/C++ sources, but it also briefly describes how to document code using doxygen-defined tags.

Installing doxygen

You have two choices for acquiring doxygen. You can download it as a pre-compiled executable file, or you can check out sources from the SVN repository and build it. Listing 1 shows the latter process.


Listing 1. Install and build doxygen sources
 
                
bash-2.05$ svn co https://doxygen.svn.sourceforge.net/svnroot/doxygen/trunk doxygen-svn

bash-2.05$ cd doxygen-svn
bash-2.05$ ./configure –prefix=/home/user1/bin
bash-2.05$ make

bash-2.05$ make install

 

Note that the configure script is tailored to dump the compiled sources in /home/user1/bin (add this directory to the PATH variable after the build), as not every UNIX® user has permission to write to the /usr folder. Also, you need the svn utility to check out sources.


 


 

Generating documentation using doxygen

To use doxygen to generate documentation of the sources, you perform three steps.

Generate the configuration file

At a shell prompt, type the command doxygen -g . This command generates a text-editable configuration file called Doxyfile in the current directory. You can choose to override this file name, in which case the invocation should be doxygen -g <user-specified file name>, as shown in Listing 2.


Listing 2. Generate the default configuration file
 
                
bash-2.05b$ doxygen -g
Configuration file 'Doxyfile' created.
Now edit the configuration file and enter
  doxygen Doxyfile
to generate the documentation for your project
bash-2.05b$ ls Doxyfile
Doxyfile

 

Edit the configuration file

The configuration file is structured as <TAGNAME> = <VALUE>, similar to the Make file format. Here are the most important tags:

  • <OUTPUT_DIRECTORY>: You must provide a directory name here—for example, /home/user1/documentation—for the directory in which the generated documentation files will reside. If you provide a nonexistent directory name, doxygen creates the directory subject to proper user permissions.
  • <INPUT>: This tag creates a space-separated list of all the directories in which the C/C++ source and header files reside whose documentation is to be generated. For example, consider the following snippet:
    INPUT = /home/user1/project/kernel /home/user1/project/memory
    

     

    In this case, doxygen would read in the C/C++ sources from these two directories. If your project has a single source root directory with multiple sub-directories, specify that folder and make the <RECURSIVE> tag Yes.

  • <FILE_PATTERNS>: By default, doxygen searches for files with typical C/C++ extensions such as .c, .cc, .cpp, .h, and .hpp. This happens when the <FILE_PATTERNS> tag has no value associated with it. If the sources use different naming conventions, update this tag accordingly. For example, if a project convention is to use .c86 as a C file extension, add this to the <FILE_PATTERNS> tag.
  • <RECURSIVE>: Set this tag to Yes if the source hierarchy is nested and you need to generate documentation for C/C++ files at all hierarchy levels. For example, consider the root-level source hierarchy /home/user1/project/kernel, which has multiple sub-directories such as /home/user1/project/kernel/vmm and /home/user1/project/kernel/asm. If this tag is set to Yes, doxygen recursively traverses the hierarchy, extracting information.
  • <EXTRACT_ALL>: This tag is an indicator to doxygen to extract documentation even when the individual classes or functions are undocumented. You must set this tag to Yes.
  • <EXTRACT_PRIVATE>: Set this tag to Yes. Otherwise, private data members of a class would not be included in the documentation.
  • <EXTRACT_STATIC>: Set this tag to Yes. Otherwise, static members of a file (both functions and variables) would not be included in the documentation.

Listing 3 shows an example of a Doxyfile.


Listing 3. Sample doxyfile with user-provided tag values
 
                
OUTPUT_DIRECTORY = /home/user1/docs
EXTRACT_ALL = yes
EXTRACT_PRIVATE = yes
EXTRACT_STATIC = yes
INPUT = /home/user1/project/kernel
#Do not add anything here unless you need to. Doxygen already covers all 
#common formats like .c/.cc/.cxx/.c++/.cpp/.inl/.h/.hpp
FILE_PATTERNS = 
RECURSIVE = yes

 

Run doxygen

Run doxygen in the shell prompt as doxygen Doxyfile (or with whatever file name you've chosen for the configuration file). Doxygen issues several messages before it finally produces the documentation in Hypertext Markup Language (HTML) and Latex formats (the default). In the folder that the <OUTPUT_DIRECTORY> tag specifies, two sub-folders named html and latex are created as part of the documentation-generation process. Listing 4 shows a sample doxygen run log.


Listing 4. Sample log output from doxygen
 
                
Searching for include files...
Searching for example files...
Searching for images...
Searching for dot files...
Searching for files to exclude
Reading input files...
Reading and parsing tag files
Preprocessing /home/user1/project/kernel/kernel.h
…
Read 12489207 bytes
Parsing input...
Parsing file /project/user1/project/kernel/epico.cxx
…
Freeing input...
Building group list...
..
Generating docs for compound MemoryManager::ProcessSpec
…
Generating docs for namespace std
Generating group index...
Generating example index...
Generating file member index...
Generating namespace member index...
Generating page index...
Generating graph info page...
Generating search index...
Generating style sheet...


 


 

Documentation output formats

Doxygen can generate documentation in several output formats other than HTML. You can configure doxygen to produce documentation in the following formats:

  • UNIX man pages: Set the <GENERATE_MAN> tag to Yes. By default, a sub-folder named man is created within the directory provided using <OUTPUT_DIRECTORY>, and the documentation is generated inside the folder. You must add this folder to the MANPATH environment variable.
  • Rich Text Format (RTF): Set the <GENERATE_RTF> tag to Yes. Set the <RTF_OUTPUT> to wherever you want the .rtf files to be generated—by default, the documentation is within a sub-folder named rtf within the OUTPUT_DIRECTORY. For browsing across documents, set the <RTF_HYPERLINKS> tag to Yes. If set, the generated .rtf files contain links for cross-browsing.
  • Latex: By default, doxygen generates documentation in Latex and HTML formats. The <GENERATE_LATEX> tag is set to Yes in the default Doxyfile. Also, the <LATEX_OUTPUT> tag is set to Latex, which implies that a folder named latex would be generated inside OUTPUT_DIRECTORY, where the Latex files would reside.
  • Microsoft® Compiled HTML Help (CHM) format: Set the <GENERATE_HTMLHELP> tag to Yes. Because this format is not supported on UNIX platforms, doxygen would only generate a file named index.hhp in the same folder in which it keeps the HTML files. You must feed this file to the HTML help compiler for actual generation of the .chm file.
  • Extensible Markup Language (XML) format: Set the <GENERATE_XML> tag to Yes. (Note that the XML output is still a work in progress for the doxygen team.)

Listing 5 provides an example of a Doxyfile that generates documentation in all the formats discussed.


Listing 5. Doxyfile with tags for generating documentation in several formats
 
                
#for HTML 
GENERATE_HTML = YES
HTML_FILE_EXTENSION = .htm

#for CHM files
GENERATE_HTMLHELP = YES

#for Latex output
GENERATE_LATEX = YES
LATEX_OUTPUT = latex

#for RTF
GENERATE_RTF = YES
RTF_OUTPUT = rtf 
RTF_HYPERLINKS = YES

#for MAN pages
GENERATE_MAN = YES
MAN_OUTPUT = man
#for XML
GENERATE_XML = YES


 


 

Special tags in doxygen

Doxygen contains a couple of special tags.

Preprocessing C/C++ code

First, doxygen must preprocess C/C++ code to extract information. By default, however, it does only partial preprocessing -- conditional compilation statements (#if…#endif) are evaluated, but macro expansions are not performed. Consider the code in Listing 6.


Listing 6. Sample C++ code that makes use of macros
 
                
#include <cstring>
#include <rope>

#define USE_ROPE

#ifdef USE_ROPE
  #define STRING std::rope
#else
  #define STRING std::string
#endif

static STRING name;

 

With <USE_ROPE> defined in sources, generated documentation from doxygen looks like this:

                Defines
    #define USE_ROPE
    #define STRING std::rope

Variables
    static STRING name

 

Here, you see that doxygen has performed a conditional compilation but has not done a macro expansion of STRING. The <ENABLE_PREPROCESSING> tag in the Doxyfile is set by default to Yes. To allow for macro expansions, also set the <MACRO_EXPANSION> tag to Yes. Doing so produces this output from doxygen:

                Defines
   #define USE_ROPE
    #define STRING std::string

Variables
    static std::rope name

 

If you set the <ENABLE_PREPROCESSING> tag to No, the output from doxygen for the earlier sources looks like this:

                Variables
    static STRING name

 

Note that the documentation now has no definitions, and it is not possible to deduce the type of STRING. It thus makes sense always to set the <ENABLE_PREPROCESSING> tag to Yes.

As part of the documentation, it might be desirable to expand only specific macros. For such purposes, along setting <ENABLE_PREPROCESSING> and <MACRO_EXPANSION> to Yes, you must set the <EXPAND_ONLY_PREDEF> tag to Yes (this tag is set to No by default) and provide the macro details as part of the <PREDEFINED> or <EXPAND_AS_DEFINED> tag. Consider the code in Listing 7, where only the macro CONTAINER would be expanded.


Listing 7. C++ source with multiple macros
 
                
#ifdef USE_ROPE
  #define STRING std::rope
#else
  #define STRING std::string
#endif

#if ALLOW_RANDOM_ACCESS == 1
  #define CONTAINER std::vector
#else
  #define CONTAINER std::list
#endif

static STRING name;
static CONTAINER gList;

 

Listing 8 shows the configuration file.


Listing 8. Doxyfile set to allow select macro expansions
 
                
ENABLE_PREPROCESSING = YES
MACRO_EXPANSION = YES
EXPAND_ONLY_PREDEF = YES
EXPAND_AS_DEFINED = CONTAINER
…

 

Here's the doxygen output with only CONTAINER expanded:

                Defines
#define STRING   std::string 
#define CONTAINER   std::list

Variables
static STRING name
static std::list gList

 

Notice that only the CONTAINER macro has been expanded. Subject to <MACRO_EXPANSION> and <EXPAND_AS_DEFINED> both being Yes, the <EXPAND_AS_DEFINED> tag selectively expands only those macros listed on the right-hand side of the equality operator.

As part of preprocessing, the final tag to note is <PREDEFINED>. Much like the same way you use the -D switch to pass the G++ compiler preprocessor definitions, you use this tag to define macros. Consider the Doxyfile in Listing 9.


Listing 9. Doxyfile with macro expansion tags defined
 
                
ENABLE_PREPROCESSING = YES
MACRO_EXPANSION = YES
EXPAND_ONLY_PREDEF = YES
EXPAND_AS_DEFINED = 
PREDEFINED = USE_ROPE= \
                             ALLOW_RANDOM_ACCESS=1

 

Here's the doxygen-generated output:

                Defines
#define USE_CROPE 
#define STRING   std::rope 
#define CONTAINER   std::vector

Variables
static std::rope name 
static std::vector gList

 

When used with the <PREDEFINED> tag, macros should be defined as <macro name>=<value>. If no value is provided—as in the case of simple #define —just using <macro name>=<spaces> suffices. Separate multiple macro definitions by spaces or a backslash (\).

Excluding specific files or directories from the documentation process

In the <EXCLUDE> tag in the Doxyfile, add the names of the files and directories for which documentation should not be generated separated by spaces. This comes in handy when the root of the source hierarchy is provided and some sub-directories must be skipped. For example, if the root of the hierarchy is src_root and you want to skip the examples/ and test/memoryleaks folders from the documentation process, the Doxyfile should look like Listing 10.


Listing 10. Using the EXCLUDE tag as part of the Doxyfile
 
                
INPUT = /home/user1/src_root
EXCLUDE = /home/user1/src_root/examples /home/user1/src_root/test/memoryleaks
…


 


 

Generating graphs and diagrams

By default, the Doxyfile has the <CLASS_DIAGRAMS> tag set to Yes. This tag is used for generation of class hierarchy diagrams. For a better view, download the dot tool from Graphviz download site. The following tags in the Doxyfile deal with generating diagrams:

  • <CLASS_DIAGRAMS>: The default tag is set to Yes in the Doxyfile. If the tag is set to No, diagrams for inheritance hierarchy would not be generated.
  • <HAVE_DOT>: If this tag is set to Yes, doxygen uses the dot tool to generate more powerful graphs, such as collaboration diagrams that help you understand individual class members and their data structures. Note that if this tag is set to Yes, the effect of the <CLASS_DIAGRAMS> tag is nullified.
  • <CLASS_GRAPH>: If the <HAVE_DOT> tag is set to Yes along with this tag, the inheritance hierarchy diagrams are generated using the dot tool and have a richer look and feel than what you'd get by using only <CLASS_DIAGRAMS>.
  • <COLLABORATION_GRAPH>: If the <HAVE_DOT> tag is set to Yes along with this tag, doxygen generates a collaboration diagram (apart from an inheritance diagram) that shows the individual class members (that is, containment) and their inheritance hierarchy.

Listing 11 provides an example using a few data structures. Note that the <HAVE_DOT>, <CLASS_GRAPH>, and <COLLABORATION_GRAPH> tags are all set to Yes in the configuration file.


Listing 11. Interacting C++ classes and structures
 
                
struct D {
  int d;
};

class A {
  int a;
};

class B : public A {
  int b;
};

class C : public B {
  int c;
  D d;
};

 

Figure 1 shows the output from doxygen.


Figure 1. The Class inheritance graph and collaboration graph generated using the dot tool
Class inheritance graph

 


 

Code documentation style

So far, you've used doxygen to extract information from code that is otherwise undocumented. However, doxygen also advocates documentation style and syntax, which helps it generate more detailed documentation. This section discusses some of the more common tags doxygen advocates using as part of C/C++ code. For further details, see Resources.

Every code item has two kinds of descriptions: one brief and one detailed. Brief descriptions are typically single lines. Functions and class methods have a third kind of description known as the in-body description, which is a concatenation of all comment blocks found within the function body. Some of the more common doxygen tags and styles of commenting are:

  • Brief description: Use a single-line C++ comment, or use the <\brief> tag.
  • Detailed description: Use JavaDoc-style commenting /** … test … */ (note the two asterisks [*] in the beginning) or the Qt-style /*! … text … */.
  • In-body description: Individual C++ elements like classes, structures, unions, and namespaces have their own tags, such as <\class>, <\struct>, <\union>, and <\namespace>.

To document global functions, variables, and enum types, the corresponding file must first be documented using the <\file> tag. Listing 12 provides an example that discusses item 4 with a function tag (<\fn>), a function argument tag (<\param>), a variable name tag (<\var>), a tag for #define (<\def>), and a tag to indicate some specific issues related to a code snippet (<\warning>).


Listing 12. Typical doxygen tags and their use
 
                
/*! \file globaldecls.h
      \brief Place to look for global variables, enums, functions
           and macro definitions
  */

/** \var const int fileSize
      \brief Default size of the file on disk
  */
const int fileSize = 1048576;

/** \def SHIFT(value, length)
      \brief Left shift value by length in bits
  */
#define SHIFT(value, length) ((value) << (length))

/** \fn bool check_for_io_errors(FILE* fp)
      \brief Checks if a file is corrupted or not
      \param fp Pointer to an already opened file
      \warning Not thread safe!
  */
bool check_for_io_errors(FILE* fp);

 

Here's how the generated documentation looks:

                Defines
#define SHIFT(value, length)   ((value) << (length))  
             Left shift value by length in bits.

Functions
bool check_for_io_errors (FILE *fp)  
        Checks if a file is corrupted or not.

Variables
const int fileSize = 1048576;
Function Documentation
bool check_for_io_errors (FILE* fp)
Checks if a file is corrupted or not.

Parameters
              fp: Pointer to an already opened file

Warning
Not thread safe! 


 


 

Conclusion

This article discusses how doxygen can extract a lot of relevant information from legacy C/C++ code. If the code is documented using doxygen tags, doxygen generates output in an easy-to-read format. Put to good use, doxygen is a ripe candidate in any developer's arsenal for maintaining and managing legacy systems.

No comments: