% dpp version 0.2.0 % Dan Schmidt % % This code is copyright (C) 1997 Dan Schmidt. % Redistribute it at will. % You are free to distribute modified versions, provided you say % how your version differs from this one. % Please do not distribute dpp.pl without dpp.nw. % If you add features or fix bugs, I would appreciate hearing % about it so I can merge your changes. % % To make, just "noweb dpp.nw". % Revision history: % 0.1.0 7 Mar 1997 % - first release % 0.1.1 16 Mar 1997 % - Identifiers can have digits in them. % - &escape works much better; comments should be typeset % correctly now, with -tex or not. But the code for &escape % is _really_ gross. % - Strings on a #define line don't mess up the rest of the line % (but now comments on those lines are typeset in tt) % - The asterisks in C comments look nicer. % - The -cw option compresses all whitespace not at the beginning % of a line, and expands whitespace between code and comments to % four characters. Basically, it undoes hand-formatting that % looks good monospaced but bad variably-spaced. % 0.2.0 26 May 1997 % - -N option specifies that a given root chunk and all of its % descendants are not to be prettyprinted. % - Didn't work on Unix if input came from a pipe. \documentclass{article} \usepackage{noweb} \noweboptions{longxref,smallcode} % Don't waste space \addtolength{\topmargin}{-1.1in} \setlength{\textheight}{9.4in} \setlength{\textwidth}{6.9in} \setlength{\oddsidemargin}{-.2in} \setlength{\evensidemargin}{-.2in} \pagestyle{noweb} % I like roman chunk names \def\LA{\begingroup\maybehbox\bgroup\setupmodname\Rm$\langle$} \def\RA{$\rangle$\egroup\endgroup} \begin{document} @ \section{Introduction} This is {\tt dpp}, a C/C++ pretty-printer for the literate programming tool {\tt noweb}. {\tt noweb} does not prettyprint code by default; by inserting {\tt dpp} into the {\tt noweb} pipeline, you can produce output similar to that of {\tt CWEB}. Here are some of the features of {\tt dpp}: \begin{itemize} \item Keywords are typeset in {\bf bold}, variables in {\it italics}, comments in roman, strings in {\tt typewriter}. \item Some punctuation is prettyprinted as well. \item Keywords are definable by the user with the `{\tt @ \%keyword}' construct. \item All instances of code quoting ({\tt [{}[ ]{}]}) are prettyprinted, including quotes inside comments and chunk names. \item Comments are optionally fed straight to \TeX, so that you use (for example) \TeX's extensive math typesetting capabilities. \item Indentation and line-breaking are not messed with. \item {\tt dpp} is written in Perl, which you may or may not consider an advantage. \end{itemize} To use {\tt dpp}, just provide the `{\tt -filter dpp}' option to {\tt noweave}. There are a few options: \begin{itemize} \item `{\tt -tex}' will send your comments to \TeX\ rather than print them verbatim. It is off by default to avoid nasty surprises for people running their programs through {\tt dpp} for the first time. \item `{\tt -cw}' compresses whitespace in the code. It compresses all whitespace not at the beginning of a line to one space, and expands whitespace between code and comments to four characters. Basically, it undoes hand-formatting that looks good when monospaced but bad in a proportional font. \item `{\tt -N}{\it root\/}' turns off prettyprinting for the chunk named {\it root\/} and all of its descendants. This way you can include a makefile or test data in your {\tt .nw} file without having it printed like C code. \end{itemize} To tell {\tt dpp} about a user-defined keyword such as {\tt Elephant}, simply insert a line such as `{\tt @ \%keyword Elephant}' into the {\tt .nw} file. Multiple keywords may be specified on a single line. {\tt dpp} was written by Dan Schmidt ({\tt dfan@alum.mit.edu}). The current version is 0.2.0. To do: \begin{itemize} \item Put all \TeX\ stuff in {\tt dppmac.tex} or something. \item Make sure I am interacting with \LaTeX\ $2_\varepsilon$ correctly. \item Finish this section of the document! Provide an example. \item Comments on lines with preprocessor directives. \item Multiline /* */ comments. \item Be able to typeset specific words in {\tt typewriter}. \item Identifier crossreferencing (and index) should be typeset like code (may require changes to {\tt totex}). \item {\tt finduses} seems pretty slow, and I'm doing a lot of the same work. Can I fold that functionality in? \item Single-quoted character constants use the wrong quote mark. But so does CWEB\@. If it's good enough for Knuth$\ldots$ \item Italic correction when necessary. \item Blank lines should not be a whole line tall. \item HTML version? \item Profile. \end{itemize} To not do: \begin{itemize} \item Can't nest {\tt [{}[ ]{}]}. \end{itemize} \section{Code} {\tt dpp} is written in Perl 5. To understand this code, it's probably best to first skim through the {\sl Noweb Hacker's Guide}. Note: I am a Perl hacker, not a \TeX\ hacker. Therefore, there are many places in this code where it would probably make sense to accomplish tasks through \TeX\ code, and instead I do lots of things ``by hand.'' I appreciate any suggestions for cleanup. <>= #!/usr/bin/perl # dpp version 0.2.0 # Copyright (C) 1997 Dan Schmidt, , # see dpp.nw for full notice # Don't modify this file, modify dpp.nw! <> <> <> <> <> @ Individual options will be introduced during the course of the program. Here's the shell of the option-processing section. <>= while ($option = shift) { <> } @ %def option \subsection{Setting up STDIN} We make two passes through [[STDIN]], so we have to make sure we'll be able to seek back to the beginning of it. <>= <> @ <>= $TMPFILE = "dpp.tmp"; if (! -f STDIN) { open TMPFILE, "> $TMPFILE" or die "Couldn't open `$TMPFILE' for writing: $!; aborting"; print TMPFILE while ; close TMPFILE; open STDIN, "< $TMPFILE" or die "Couldn't open `$TMPFILE' for reading: $!; aborting"; unlink $TMPFILE; } @ \subsection{Keywords} Let's make our list of keywords first, so we can get it out of the way, and be done with the first pass. For quick lookup, we have a hash [[%is_keyword]], so that [[$is_keyword{$word}]] is 1 if [[$word]] is a keyword and undefined otherwise. To make things easier, we build up an array [[@keywords]] of keywords, and insert them into [[%is_keyword]] all at once. <>= <> @ <>= @keywords = qw( TRUE FALSE bool break case catch char class const continue default delete do double else enum exit extern float for friend goto if inline int long namespace new NULL operator private protected public register return short signed sizeof static struct switch template throw try typedef union unsigned vector virtual void while ); @ %def keywords <>= <> <> <> <> @ Keyword lines are of the form {\tt \%keyword kw1 kw2 ...} in the original input file, but look like {\tt @text \%keyword kw1 kw2 ...} by the time they get to us. Note that {\tt dpp} makes no use of the Perl constructs {\tt \$`}, {\tt \$\&}, and {\tt \$'}. Jeffrey Friedl's book {\sl Mastering Regular Expressions\/} notes that any use of these three variables slows down all regular expressions throughout the program. <>= while (<>) { if (/^\@text %keyword (.*)/) { push @keywords, (split " ", $1); next; } <> } @ <>= foreach $word (@keywords) { $is_keyword{$word} = 1; } @ %def is_keyword <>= seek (STDIN, 0, 0); @ We already know how to prettyprint a word now; if it's in the keyword list it's bold, otherwise it's in italics. Originally, this was most of the entire program, but things have gotten a bit more complicated since then$\ldots$ <>= sub pp_word { my ($this_word) = shift; if (defined $is_keyword{$this_word}) { return "{\\bf{}$this_word}"; } else { return "{\\it{}$this_word}"; } } @ %def pp_word \subsection{Discovering roots} For each chunk, we need to know its root chunk, the farthestmost ancestor of this chunk in the forest of chunks. When we eventually print this chunk during the second pass, we will not prettyprint it if its root has been identified as not being C or C++. While we read over the input, if we see that chunk [[$a]] invokes chunk [[$b]], we set [[$ancestor{$b} = $a]]. A root chunk will have no ancestor, since it is never invoked. Of course, the chunks actually define a directed acyclic graph, not a forest, since a chunk may be invoked from more than one place. But pretending it has only one parent will not affect what language its root chunk is in (I sincerely hope). <>= if (/^\@defn (.*)/) { $cur_chunk = $1; push @chunks, $cur_chunk; next; } $ancestor{$1} = $cur_chunk, next if (/^\@use (.*)/); @ %def ancestor chunks Now we go through all the chunks, finding each one's farthestmost ancestor. By the end of this process, [[$ancestor{$chunk}]] will be the root chunk of each chunk [[$chunk]]. Each chunk is marked as `settled' when its root is determined. We settle all chunks on the path between the current chunk and the root chunk, remembering them by putting them in [[@chunks_to_settle]]. This saves time when processing future chunks, since as soon as we get to a settled chunk while ascending the tree, we know that its root is our root. <>= foreach $chunk (@chunks) { next if $settled{$chunk}; <> <> } @ %def chunk settled We ascend the tree until we get to a settled chunk, or one that has no ancestor, which must be a root. Along the way, we add to our list of chunks to settle. <>= $c = $chunk; @chunks_to_settle = (); while ((! $settled{$c}) && ($ancestor{$c})) { push @chunks_to_settle, $c; $c = $ancestor{$c}; } <> @ %def chunks_to_settle We may have left the above loop for two different reasons. If [[$c]] has an ancestor, then [[$ancestor{$c}]] is its root (since it's settled), and ours as well. Otherwise, [[$c]] is a root that we're seeing for the first time, and we settle it `by hand.' <>= if ($ancestor{$c}) { $root = $ancestor{$c}; } else { $root = $c; $ancestor{$root} = $root; $settled{$root} = 1; } @ %def root Now that [[$root]] is set, the last step is trivial. <>= foreach $c (@chunks_to_settle) { $ancestor{$c} = $root; $settled{$c} = 1; } @ \subsection{The outer loop} Our main job is to go through the input file, prettyprinting whatever lines occur in code or quoted code. The former lines are bracketed by {\tt @begin code} and {\tt @end code}, the latter by {\tt @quote} and {\tt @endquote}. We use a boolean variable [[$incode]] to keep track of which state we're in. <>= $incode = 0; @ %def incode <>= while (<>) { if ($incode) { <> } else { <> } } @ \subsubsection{Documentation lines} We'll handle documentation lines first, because they're easier. There are a few special cases that we have to handle, but basically we just do a little processing (usually unnecessary), output the line, and see if we have to go into ``code mode.'' <>= <> <> <> print $_; <> @ When entering code mode, we set the font so that roman, not italics will be the default font of the code. We don't want to go into code mode at all if this chunk is indirectly invoked by a root chunk that has been explicitly requested to be printed plain. <>= if (/^\@quote/ || (/^\@defn (.*)/ && (! $plain{$ancestor{$1}}))) { $incode = 1; print "\@literal \\Rm{}\n"; } @ Setting up the list of ``plainprinted'' chunks is easy. <>= if ($option =~ /^-N(.*)$/) { $plain{$1} = 1; } @ %def plain The {\tt \%keyword} lines are in the source just for our benefit, and we want to strip them out so later filters don't have to deal with them. We have to be tricky in order to keep line numbers consistent, so we use the {\tt @index nl} trick introduced mentioned in the {\sl Noweb Hacker's Guide}. We replace the {\tt \%keyword} line and the newline {\tt @nl} that comes after it by a dummy newline. <>= if (/^\@text %keyword/) { $nextline = (<>); die "%keyword confusion\n" if (! ($nextline =~ /^\@nl$/)); print "\@index nl\n"; next; } @ We may need to prettyprint something even if we're in documentation mode. The two cases are 1) the name of a chunk we're about to define contains a code quote, and 2) the index mentions a chunk whose name contains a code quote. The function [[pp_line]] is used to prettyprint a line of code. All this line does is grab everything before {\tt [{}[} and after {\tt ]{}]}, and then insert the contents of the quote between them. We execute the substitution multiple times in order to catch multiple quotes on the same line. The {\tt (?!\char92])} trickiness is to deal with quoting code that ends with a right bracket; we make sure to catch the outermost bracket pair. <>= 1 while s/((?:\@xref chunkbegin|\@defn) .*?)\[\[(.*?)\]\](?!\])(.*)/$1 . pp_line($2) . $3/e; @ \paragraph {The \LaTeX\ header.} There's just one bit left to deal with regarding documentation lines. We need to print out some special-purpose \LaTeX\ code at some point, traditionally the end of the first documentation chunk. The variable [[$delay]] is 1 if we are waiting to print out said code, which is stored in [[$texdefs]]. <>= $delay = 1; my ($texdefs); <> @ %def texdefs delay <>= if ($delay && /^\@end docs/) { print $texdefs; $delay = 0; } @ Here's an example of something to go in [[$texdefs]] (there will be more down the line). Given that quoted code uses italics for variable names, it makes much more sense for chunk names to be in roman, as in {\tt CWEB}. So I override {\tt noweb.sty} here. These definitions are exactly the same as {\tt noweb.sty}'s \verb|\LA| and \verb|\RA| except for \verb|\Rm| instead of \verb|\It|. <>= $texdefs .= "\@literal \\def\\LA{\\begingroup\\maybehbox\\bgroup\\setupmodname\\Rm\$\\langle\$}\n" . "\@literal \\def\\RA{\$\\rangle\$\\egroup\\endgroup}\n"; @ \subsubsection{Code lines} The outer loop for prettyprinting the lines of code is theoretically simple, but it's complicated by the fact that previous filters may have broken up lines in inconvenient places such as the middle of a comment. To compensate, as we parse between {\tt @nl}'s we accumulate two kinds of text. Actual code is accumulated in [[$text]], and other markup lines are accumulated in [[$extra]]. When we finally hit a {\tt @nl}, we prettyprint [[$text]] and print it out, followed by the [[$extra]] text we've saved up. I don't think this will mess up any crossreferencing. <>= if (/\@text (.*)/) { $text .= $1; } elsif (/\@nl/ || /\@end/) { <> $incode = 0 if (/\@end/); } else { <> $extra .= $_; } @ <>= if (length $text) { print "\@literal " . pp_line ($text) . "\n"; } if (length $extra) { print "$extra"; } print $_; $text = $extra = ""; @ <>= $text = $extra = ""; @ %def text extra All the actual code is in {\tt @text} lines, but there are three other cases in which we have to prettyprint part of the line. We may be referencing another chunk, defining a variable, or defining a chunk. <>= 1 while s/((?:\@use|\@defn) .*?)\[\[(.*?)\]\](?!\])(.*)/$1 . pp_line($2) . $3/e; s/(\@index defn )(\S+)/$1 . pp_line($2)/e; @ \subsection{Prettyprinting lines} Now we've reduced the problem to prettyprinting an individual line of code. Luckily, we don't have keep track of context much, since for the most part, we're going to do the same thing with a string of letters (for example) no matter where we see it. We process the line from left to right, looking for tokens to beautify and appending them to [[$preline]]. As we go through this loop, [[$line]] contains the part of the line that we have yet to process, while [[$preline]] contains the already-processed beginning of the line. In some cases, we need to save off post-processed stuff to put at the end of the line, which is stored in [[$postline]]. Thus, at any point, [[$preline . $line . $postline]] will reconstruct all of the original line (although the middle part will not be prettyprinted yet). When [[$line]] is empty, we're done. [[$seen_token]] is 0 before we do anything, and 1 after we've processed at least one token of the line. We use it if we're compressing whitespace. There are three major kinds of constructs that really do not work and play well with others: strings, comments, and preprocessor directives. We grab these early before any of the ``regular'' prettyprinting code can see them. <>= sub pp_line { my ($line) = shift; my ($preline, $postline, $token, $seen_token); <> <> while (length $line) { <> } continue { $seen_token = 1; } return $preline . $postline; } @ %def pp_line line preline postline token seen_token There's a restriction on the ordering of these chunks: if the beginning of the line fulfills the patterns for two different kinds of tokens, we want to match the longer one, so its corresponding chunk must come first. For example, we want to typeset [[0xff]] like a number, so we must let the hex-checker grab the whole thing before the number-checker takes just the leading 0, leaving us with the ``word'' [[xff]], which would be set in italics. <>= <> <> <> <> <> @ \subsubsection{The outer loop} The following chunks are presented out of order, so that we can start with the simpler ones. For these cases, we strip off the matching part, do nothing to it (so it will be typeset in roman) and add it to [[$preline]]. <>= if ($line =~ /^(0[xX][\dabcdefABCDEF]+)(.*)/) { $preline .= $1; $line = $2; next; } @ <>= if ($line =~ /^(\d+)(.*)/) { $preline .= $1; $line = $2; next; } @ You'd think that whitespace should be the simplest case, but we muck with it a bit. The idea is that people often insert extra whitespace in order to align their comments or equals signs. This looks great with a monospaced font, but stupid with a variably-spaced font, so we take it all out. This explains why we need [[$seen_token]]; we don't want to compress any whitespace that occurs at the very beginning of the line, since that would destroy all indentation. All messing with whitespace is done only if the user has specified the `{\tt -cw}' option. <>= if ($line =~ /^(\s+)(.*)/) { if ($compress_whitespace && $seen_token) { $preline .= " "; $line = $2; next; } else { $preline .= $1; $line = $2; next; } } @ <>= $compress_whitespace = 1, next if ($option =~ /^-cw/); @ %def compress_whitespace In the case of a word, we prettyprint it with the [[pp_word]] subroutine we defined ages ago. We also have to escape all underscore characters so \TeX\ doesn't think they're subscripts. We already checked that the token doesn't begin with a digit, so we can just stick [[\d]] in our character class without worrying about it. <>= if ($line =~ /^([a-zA-Z\d_]+)(.*)/) { $token = $1; $line = $2; ($token = pp_word ($token)) =~ s|_|\\_|g; $preline .= $token; next; } @ The only possibility left is that we're handling a string of punctuation, in which case we hand it off to the [[pp_punc]] subroutine. <>= if ($line =~ /^([^\d\sa-zA-Z_]+)(.*)/) { $token = $1; $line = $2; $preline .= pp_punc ($token); next; } @ We still haven't bothered to deal with preprocessor directives. We just set the first word in bold and everything else in typewriter; perhaps for certain constructs (like {\tt \#define}), we should [[pp_line]] the rest of the line. [[escape]] is a general subroutine meant to take an arbitrary string of characters and massage it into a form that \TeX\ won't gack on. <>= if ($line =~ /^(\s*)#(\s*\S*)((\s*)(.*))/) { $arg = escape ($3); $preline = "{\\bf{}$1\\char35{}$2}{\\tt{}$arg}"; $line = ""; } @ \subsubsection{Processing punctuation} First, let's set up a few \TeX\ definitions for some symbols. These two are stolen from Kaelin Colclasure's pretty printer (found in {\tt contrib/kaelin} in the {\tt noweb} distribution). <>= $bm = "\\begin{math}"; $em = "\\end{math}"; $texdefs .= "\@literal \\newcommand{\\MM}{\\kern.5pt\\raisebox{.4ex}" . "{$bm\\scriptscriptstyle-\\kern-1pt-$em}\\kern.5pt}\n" . "\@literal \\newcommand{\\PP}{\\kern.5pt\\raisebox{.4ex}" . "{$bm\\scriptscriptstyle+\\kern-1pt+$em}\\kern.5pt}\n"; @ %def bm em We call [[init_punc]] to set up the punctuation table. [[reg_punc]] associates a sequence of punctuation found in the source with its typeset equivalent; matches are checked in the order that they are given to [[reg_punc]]. We need to enter all the longer matches first, so that we don't do something like typeset a {\tt <} before we find out that it's really part of {\tt <=} and we want to typeset it as $\leq$. <>= sub init_punc { reg_punc ("!=", "${bm}\\neq${em}"); reg_punc ("&&", "${bm}\\wedge${em}"); reg_punc ("++", "\\protect\\PP"); reg_punc ("--", "\\protect\\MM"); reg_punc ("->", "${bm}\\rightarrow${em}"); reg_punc ("<<", "${bm}\\ll${em}"); reg_punc ("<=", "${bm}\\leq${em}"); reg_punc ("==", "${bm}\\equiv${em}"); reg_punc (">=", "${bm}\\geq${em}"); reg_punc (">>", "${bm}\\gg${em}"); reg_punc ("||", "${bm}\\vee${em}"); reg_punc ("!", "${bm}\\neg${em}"); reg_punc ("*", "${bm}\\ast${em}"); reg_punc ("/", "${bm}\\div${em}"); reg_punc ("<", "${bm}<${em}"); reg_punc (">", "${bm}>${em}"); reg_punc ("^", "${bm}\\oplus${em}"); reg_punc ("|", "${bm}\\mid${em}"); reg_punc ("~", "${bm}\\sim${em}"); reg_punc ("{", "\\nwlbrace"); reg_punc ("}", "\\nwrbrace"); } @ %def init_punc <>= init_punc(); @ [[reg_punc]] puts the punctuation sequence at the end of the list [[@puncs]], and makes the hash [[%punc_map]] contain, for each punctuation sequence, its typeset equivalent. Earlier insertions appear before later ones. <>= sub reg_punc { my ($punc, $set) = @_; push @puncs, $punc; $punc_map{$punc} = $set; } @ %def reg_punc punc_map puncs I'm not too happy about the punctuation routine [[pp_punc]]; it just doesn't look that efficient, with all those {\tt substr}'s everywhere, but I couldn't find a faster way to get at those individual characters. Profiling must be done. [[$this_punc]] and [[$out]] are analogous to [[$line]] and [[$preline]] in [[pp_line]]; [[$this_punc]] contains what remains to be typeset, and [[$out]] contains the stuff that's been stripped out of [[$this_punc]] and typeset. We go through [[@puncs]] in order looking for exact matches, and if we find one, we replace the match by the \TeX\ code in [[%punc_map]]. <>= sub pp_punc { my ($this_punc) = shift; my ($out); punc_loop: while (length $this_punc) { foreach $punc (@puncs) { if (substr ($this_punc, 0, length $punc) eq $punc) { $out .= $punc_map{$punc}; $this_punc = substr ($this_punc, length $punc); next punc_loop; } } # No match found $out .= substr ($this_punc, 0, 1); $this_punc = substr ($this_punc, 1); } return $out; } @ %def pp_punc \subsubsection{Processing strings and comments} The hairiest part has been saved for last. The following regular expression was mostly cribbed from Jeffrey Friedl's excellent book {\sl Mastering Regular Expressions}. It finds the earliest (in the line) occurrence of a string or comment. We then prettyprint that element correctly and look at the rest of the line. Because some of these constructs may occur in the middle of the line, we have to reinvoke [[pp_line]] recursively in order to avoid breaking our requirement that there's only one string, [[$line]], left to process after leaving this chunk. <>= while ($line =~ m{ (.*?) #1 ( #2 ( #3 double-quoted string " (\\.|[^\\\"])* " #4 ) | ( #5 single-quoted string ' (\\.|[^\\\']) ' #6 ) | (?: #7 C comment (?: /\*) (.*?) (?: \*/) ) | ( #8 C++ comment //.* ) ) (.*) #9 }x) { if ($3 || $5) { <> } if ($7) { <> } if ($8) { <> } die "comment/string confusion\n"; # no match, impossible } @ The [[$3 || $5]] trick grabs [[$3]] if it's non-empty, and otherwise grabs [[$5]]. We have to escape the string, and tell [[escape]] that we're inside a {\tt {\char92}tt} environment. <>= $before = $1; $string = $3 || $5; $after = $9; $string = escape($string, 1); $preline .= pp_line ($before) . "{\\tt{}$string}"; $line = $after; next; @ We might as well define [[escape]] now that we know how it's used. This code is completely disgusting. There's got to be a better way; please tell me what it is! We have to do everything in very careful order to avoid, for example, creating a lot of backslashes and then trying to escape them. Since we can't do both left and right curly braces in the same pass, we change them to a special sequence and back again. It's all too gross. <>= sub escape { my ($this_line) = shift; my ($in_tt) = shift; if ($in_tt) { $this_line =~ s|\\|\001\\char92\002|g; $this_line =~ s|{|\001\\char123\002|g; $this_line =~ s|}|\001\\char125\002|g; } else { local ($bm) = "\\begin\001math\002"; local ($em) = "\\end\001math\002"; $this_line =~ s|\\|${bm}\\backslash${em}|g; $this_line =~ s!\|!${bm}\\mid${em}!g; $this_line =~ s|<|${bm}<${em}|g ; $this_line =~ s|>|${bm}>${em}|g ; $this_line =~ s|{|\001\\nwlbrace\002|g; $this_line =~ s|}|\001\\nwrbrace\002|g; } $this_line =~ s|_|\\_|g ; $this_line =~ s|#|\001\\char35\002|g ; $this_line =~ s|\001|{|g; $this_line =~ s|\002|}|g; return $this_line; } @ %def escape You may have noticed that we grabbed the `{\tt /*}' and `{\tt */}' out of the C comment, so we can typeset those bits a little nicer. This is the other place that we potentially do whitespace adjustment; if the `{\tt -cw}' option has been specified, and thus [[$compress_whitespace]] is true, we insert some extra space between code and comments. The space is not inserted if there is no adjacent code, so a comment on a line by itself will be aligned properly. <>= $before = $1; $comment = $7; $after = $9; $preline .= pp_line ($before); if ($compress_whitespace && ($before =~ /\S/)) { $preline .= " " } $preline .= "{\\commopen}" . pp_comment ($comment) . "{\\commclose}"; if ($compress_whitespace && ($after =~ /\S/)) { $preline .= " " } $line = $after; next; @ <>= $texdefs .= "\@literal \\def\\commopen{/$bm\\ast\\,$em}\n" . "\@literal \\def\\commclose{\\,$bm\\ast$em\\kern-.5pt/}\n"; @ %def commopen commclose We know that nothing can possibly follow a C++ comment, so we don't have to worry about [[$9]]. <>= $line = $1; $postline = $8; $postline = pp_comment ($postline); if ($compress_whitespace && ($line =~ /\S/)) { $postline = " " . $postline } next; @ Comments are made tricky by the fact that they can reference code with the {\tt [{}[...]{}]} construct. If we see one, we prettyprint the quote and continue recursively with the rest of the comment (a different approach from the iterative technique used in [[pp_line]]). [[$begcomm]] and [[$endcomm]] get us into ``comment mode'' from ``code mode,'' and out again. What we do exactly depends on whether comments are passed directly to \TeX\ or are meant to be printed verbatim. In the former case, we have to temporarily exit code mode; in the latter, we need to make sure that \TeX\ doesn't interpret anything that it normally would. <>= sub pp_comment { my ($this_comment) = shift; my ($pre, $code, $post); if ($this_comment =~ /(.*?)\[\[(.*?)\]\](?!\])(.*)/) { $pre = $1; $code = $2; $post = $3; if (defined $tex) { return $begcomm . $pre . $endcomm . pp_line ($code) . pp_comment ($post); } else { return escape ($pre) . pp_line ($code) . pp_comment ($post); } } else { if (defined $tex) { return $begcomm . $this_comment . $endcomm; } else { return escape ($this_comment); } } } @ %def pp_comment [[$begcomm]] and [[$endcomm]] may be used, depending on a user option. If the user has specified the `{\tt -tex}' option, then comments are typeset by \TeX; otherwise they are printed verbatim. Verbatim is the default, so that if you have a big program and run it through {\tt dpp} for the first time, you won't have any nasty surprises because you used dollar signs or something. <>= $tex = 1, next if ($option =~ /^-tex/); @ <>= $texdefs .= "\@literal \\def\\begcomm{\\begingroup\\maybehbox\\bgroup\\setupmodname}\n" . "\@literal \\def\\endcomm{\\egroup\\endgroup}\n"; <>= if (defined $tex) { $begcomm = "\\begcomm{}"; $endcomm = "\\endcomm{}"; } @ %def begcomm endcomm %%%% \section{Appendices} \subsection*{Chunk list} \nowebchunks \subsection*{Index} \nowebindex \end{document}