Introduction

This manual describes all of webgen in as much detail as possible. If you still find something missing, don’t hesistate to write a mail to the mailing list!

The manual is structured in such a way that it can be read from top to bottom while creating a webgen website. This means that first the information on how to use the webgen CLI comes, then how a webgen website is structured and how content can be added and so on.

The webgen command

The executable for webgen is called… webgen ;-) It uses a command style syntax (like Subversion’s svn or Rubygem’s gem commands) through the cmdparse library. To get an overview of the possible commands run webgen help.

The main command is the render command which does the actual website generation. This command uses either the environment variable WEBGEN_WEBSITE (if it is set and non-empty) or the current working directory as website directory if none was specified via the gloabl -d option. All other commands that need a valid webgen website directory specified work in the same way.

You can invoke a command by specifying its name after the executable name. Also counting the executable webgen as a command, the options for a command are specified directly after the command name and before the next command or any arguments. For example, all the following command lines are valid:

$ webgen
$ webgen render
$ webgen -d doc render
$ webgen -v create -t project new_site
$ webgen help create

Following is a short overview of the available commands:

  • apply [-f] (BUNDLE_NAME|BUNDLE_URL)

    The argument may either be a valid bundle name or an URL (in this case you will need to have all dependencies for Source::TarArchive installed) to a bundle archive (a (gzipped) tar archive). The command applies the bundle to the given webgen website. The files that will be written to website directory are shown and the user is asked to confirm that the shown files should indeed be written (and sometimes overwritten). If the -f option is used, the files are always written.

    You can get the full list of available bundle names by running webgen help apply!

  • create [-b BUNDLE] SITE_DIR

    Creates a basic webgen website in SITE_DIR using the optionally specified bundles. The option -b can be used more than once to specify more than one bundle - the bundles are applied in the order they are specified. As with the apply command, BUNDLE may either be a valid bundle name or a bundle URL.

    If you don’t specify the -b option, the default value is used which applies the default and the style-andreas07 bundles. The former only creates a simple src/index.page sothat some output can be seen and the latter applies a nice layout.

    All available bundles are listed in the help output for the command.

  • help

    Displays usage information. Can be used to show information about a command by using the command name as argument, eg. webgen help create.

  • render

    Renders the given webgen website.

    If you run webgen in verbose mode using the global -v option, it prints out information about the written files. Note, however, that the shown paths are the internally used path names which may be different from the actual output path names! For more information on this have a look at the canonical path name section!

  • version

    Displays the version of webgen.

  • webgui

    Starts the webgen webgui, a browser based graphical user interface for managing webgen websites. First the webgui web application is started and then the webgui is opened in the default browser.

A webgen Website

webgen needs a special directory structure so that it works out of the box. This directory structure is automatically created by the webgen create command. You can alternatively use the webgui to create the website directory.

The root directory of webgen website is called the website directory. You can have the following files and directories under this directory:

  • src: The source directory in which all the source files for the website are. If this directory should not be called src or you want to include additional source directories, you need to change the sources configuration option.

  • out: This directory is created, if it does not exist, when webgen generates the HTML files. All the output files are put into this directory. The name of this directory can be changed by setting the output configuration option.

  • ext: The extension directory (optional). You can put self-written extensions into this directory so that they are used by webgen. See the extension directory section for more information.

  • config.yaml: This file can be used to set configuration options for the website. See the Configuration File section for more information.

  • Rakefile: This file is provided for your convenience to execute tasks via rake and provides some useful tasks out of the box. See the Rakefile section for more information.

Extension Directory

The extension directory is used for modifying core behaviour of webgen or adding extensions. It is called ext and has to reside directly under the website directory. All files called init.rb in this directory or any of its subdirectories are loaded when webgen renders the website. So you need to make sure to either place all extensions in init.rb or load them from this file. The latter approach is better since you can use the lazy loading feature that webgen uses internally, ie. extensions are loaded only when they are needed.

Configuration File

Many user will want to change some configuration options, for example, the default language of the website since not all people will want to write websites in English. This is primarily done via the configuration file.

The configuration file is called config.yaml and has to be placed directly under the website directory. It uses YAML as file format. You can find a list of all available configuration options that can be set in the Configuration Options Reference.

Each configuration option can be set in the configuration file by specifing the configuration option name and the new value as a key/value pair. A sample configuration file looks like this:

website.lang: de
website.link_to_current_page: true

Since some configuration options have a complex structure, there exist several special configuration keys to help setting these configuration options:

  • default_meta_info: Set the default meta information for one or more source handlers.

    This key needs needs a Hash as value which should be of the following form:

    SOURCE_HANDLER_NAME:
      mi_key: mi_value
      other_key: other_value
      :action: replace
    

    So each entry has to specify the name of a source handler (or the special value :all which sets the default meta information for all source handlers) and the meta information items that should be set or modified. If you don’t want to update the meta information hash but want to replace it, you need to add :action: replace as meta information entry.

    If a source handler called Webgen::SourceHandler::SOURCE_HANDLER_NAME exists, the meta information is set for this source handler instead.

    For example, the following snippet of a configuration file sets the meta information item in_menu to true for Webgen::SourceHandler::Page:

    default_meta_info:
      Page:
        in_menu: true
    
  • default_processing_pipeline: Set the default processing pipeline for one or more source handlers.

    This key needs a Hash as value which should be of the following form:

    SOURCE_HANDLER_NAME: PIPELINE
    

    Each key-value pair specifies the name of a source handler and the new default processing pipeline. The value for the processing pipeline has to consist of the short names of content processors separated by commas and normally includes erb, tags, blocks and the name of a content processor for a markup language. The short name(s) of a content processor is (are) stated on its documentation page. Be aware that the content processors are executed in the order in which they are specified!

    If a source handler called Webgen::SourceHandler::SOURCE_HANDLER_NAME exists, the meta information is set for this source handler instead.

    For example, the following snippet of a configuration file sets the default processing pipeline for ‘Webgen::SourceHandler::Page’:

    default_processing_pipeline:
      Page: erb,tags,textile,blocks
    
  • patterns: Set the used path patterns for one or more source handlers.

    This key needs a Hash value and provides two different ways of setting the path patterns:

    SOURCE_HANDLER_NAME: [**/*.jpg, **/*.css]
    
    SOURCE_HANDLER_NAME:
      add: [**/*.jpg]
      del: [**/*.js]
    

    The first form replaces the path patterns for the source handler with the given patterns. The second form allows to add or delete individual patterns.

    If a source handler called Webgen::SourceHandler::SOURCE_HANDLER_NAME exists, the meta information is set for this source handler instead.

    For example, the following snippet of a configuration file adds the pattern **/*.pdf to the patterns of Webgen::SourceHandler::Copy:

    patterns:
      Copy:
        add: [**/*.pdf]
    

Rakefile

The Rakefile that is automatically created upon website creation provides a place to specify recurring task for your website, for example, for deploying the website to a server. It contains some useful tasks out of the box:

  • webgen: Renders the webgen website once.
  • auto_webgen: Automatically renders the website when a source file has changed. This is very useful when developing a website because you don’t need to change back and forth between your website code and the command line to rebuild the site.
  • clobber_webgen: Removes all webgen generated products (the output files and the cache file).

All About Paths and Sources

A source provides paths that identity files or directories. webgen can use paths from many sources. The most commonly used source is the file system source which provides paths and information on them from the file system.

Path Types

webgen can handle many different types of files through the different source handler classes.

The most important files are the page and template files as they are used to define the content and the layout of a website. Have a look at the Webgen Page Format documentation to see how these files look like and how they are structured. After that have a look at the documentation of the source handler class SourceHandler::Page and SourceHandler::Template as they are responsible for handling these page and template files!

You can naturally use any other type of file. However, be aware that some files may not be processed by webgen when no source handler class for them exists. For example, there is currently no source handler class for .svg files, so those files would be ignored. If you just want to have files copied from a source to the output directory (like images or CSS files), the SourceHandler::Copy class is what you need! Look through the documentation of the availabe source handler classes to get a feeling of what files are handled by webgen.

Source Paths Naming Convention

webgen assumes that the paths provided by the sources follow a special naming convention sothat meta information can be extracted correctly from the path names. There are three different cases depending on the type of path (the individual parts of a path are explained below):

  • The path specifies a directory. It must end with a slash character and must not contain any hash characters. A directory path has to follow the scheme:

    /parent_path/basename/
    
  • The path specifies a file. It must not end with a slash character and must not contain any hash characters. A file path has to follow the scheme:

    /parent_path/[sort_info.]basename[.lang][.extension]
    

    There is one special case: if the file path has only one dot and only numbers are before the dot, then the first part is considered to be the basename and the second part the extension (eg. 53.png)!

  • The path specifies a fragment (e.g. part of a file). It must contain exactly one hash character and it has to follow the scheme (where /parent_path needs to be a file path):

    /parent_path#basename
    

Following is the explanation of the parts of the path names:

  • /parent_path

    This part specifies the path of the parent of this path and is used internally to create a hierarchy of paths. Although the leading slash is explicitly written here, it is part of the parent path, i.e. each parent path begins with a slash. Also note that the parent path will end with a slash if it is a directory!

  • sort_info

    This part is optional and has to consist of the digits 0 to 9. Its value is used for the meta information sort_info. If not specified, it defaults to the value zero.

  • basename

    This part is used on the one hand to generate the title meta information (but with _ and - replaced by spaces). And on the other hand, the canonical name is derived from it. basename must not contain any dots, spaces or any character from the following list: ; ? * : ` & = + $ ,. If you do use one of them, webgen may not work correctly!

    If two file paths have the same basename and extension part, they should define the same content but for different languages. This allows webgen to automatically deliver the right language version of the content.

  • lang

    This part is optional and has to be an ISO-639-1/2 language identifier (two or three characters (a-z) long). It can only be specified if an extension is also specified. If the file path should not have an extension, then just add a trailing dot, e.g. /dir/file.de. - the trailing dot is ignored by webgen.

    If the language part is not specified, it is assumed that the path is language independent (for example, images are normally not specific for a specific language). However, this behaviour may be different for some source handler classes (for example, all paths handled by SourceHandler::Page are assigned the default language if none is set).

    If the language identifier can’t be matched to a valid language, it is assumed that this part isn’t actually a language identifier but a part of the extension. This also means that in the special case where the first part of an extension is also a valid language identifier, the first part is interpreted as language identifier and not as part of the extension!

  • extension

    The file extension can be anything and can include dots.

Following are some examples of source path names (the acn and alcn parts are explained below):

Path Basename Language title sort_info acn alcn
/directory/ directory -none- Directory -none- /directory/ /directory/
/image.png image -none- Image 0 /image.png /image.png
/image.de.png image de Image 0 /image.png /image.de.png
/01.name_of-file.eo.page name_of-file eo Name of file 1 /name_of-file.page /name_of-file.eo.page
/53.png 53 -none- 53 0 /53.png /53.png
/archive.tar.bz2 archive -none- Archive 0 /archive.tar.bz2 /archive.tar.bz2
/archive.de.tar.bz2 archive de Archive 0 /archive.tar.bz2 /archive.de.tar.bz2
/manual.html#sources manual -none- Manual -none- /manual.html#sources /manual.html#sources

Notice: The first two file path and the last two file path examples define the same content for two different languages (or more exactly: the first one is unlocalized in both cases and the second one localized to German) as they have the same canonical name.

Canonical Name (CN) of a File

webgen provides the functionality to define the same content in more than one language, ie. to localize content. This is achieved with the canonical name of a path, short CN.

The canonical name as well as all the other variants ([absolute] [localized] canonical name, all described below) are created directly from the source paths sothat one can easily derive the canonical name onself. Here are some examples:

Source path CN ACN LCN ALCN
/images/test/ test/ /images/test/ test/ /images/test/
/images/logo.png logo.png /images/logo.png logo.png /images/logo.png
/images/logo.de.png logo.png /images/logo.png logo.de.png /images/logo.de.png
/test.de.html#frag #frag /test.html#frag #frag /test.de.html#frag

What one can immediately see is that if a source path is unlocalized, the CN is the same as the LCN and the ACN is the same as the ALCN. This is always true for directory and fragment paths since they are always unlocalized! The first example shows a directory path - note the trailing slash! And the last example shows a fragment path - note the difference in the ACN and the ALCN! Another good-to-know fact is that an LCN is unique within its hierarchy level and that an ALCN is globally unique!

Source handlers can change some parts of a source path. For example, the page source handler changes the extension from page to html. This has to be taken into account when specifying canonical names! These changes are documented on the documentation pages for the source handlers!

When multiple paths share the same canonical name, webgen assumes that they have the same content but in different languages. It is also possible to specify a language independent path (i.e. one without a language) which is used as a fallback. Therefore when a path should be resolved using a canonical name and a given language, it is first tried to get the path in the requested language. If this is not possible (ie. no such localization exists), the unlocalized path is returned if it exists.

For example, assume that we have the following src directory:

/test.jpg
/images/my.jpg
/images/my.de.jpg
/index.page
/index.fr.page
/index.de.page

Since the path images/my.jpg has no language part, it is assumed to be unlocalized. In contrast, images/my.de.jpg has the “same” picture but in a German version. So, when specifying images/my.jpg in index.de.page, the correct localized version will be returned, i.e. images/my.de.jpg. When specifying images/my.jpg in index.page or in index.fr.page, the unlocalized version will be returned since no localized version exists.

Directories and fragments are never localized, only files are!

It is also possible to use the localized canonical name of a path (short: LCN) to resolve it. The localized canonical name is the same as the canonical name but with a language code inserted before the extension. If the localized canonical name is used to resolve a path, a possibly additionally specified language is ignored as it is assumed that the user really only wants the path in the specified language!

Continuing the above example, this means that images/my.de.jpg is always returned when one specifies images/my.de.jpg in any of the index page files, e.g. even in index.fr.page.

Both canonical and localized canonical names can be absolute (ACN and ALCN) which is done by using the full path starting with a slash. This is especially useful when you don’t exactly know in which hierarchy level a certain canoncial name gets evaluated.

Everything said above also means that all paths are not resolved using their real source or output names but using the (localized) canonical name! So when specifying paths for extensions, for example when using the relocatable tag, you always have to use the (absolute) (localized) canonical names. This is different from previous webgen versions! However, it allows for using a naming scheme based on the source paths to be used independently from the output paths and therefore the output paths may change but no changes in the files themselves are needed.

Output Path Name Construction

There is currently only one method, called standard, for creating output path names which is described here. However, webgen allows developers to create other output path name creation methods. More information on this can be found in the API documentation.

The output path for a given source path is constructed using the meta information output_path_style. This meta information is set to a default value and can be overwritten by setting it for a specific source handler or for a path individually. The value for this meta information key is an array which can have the following values:

  • strings (for inserting arbitrary text into output names)
  • arrays (for grouping values - only interesting for the language part)
  • symbols for inserting special values:
    • :basename: The basename of the path.
    • :parent: The parent path.
    • :lang: The language.
    • :ext: The file extension including the leading dot.
    • :year, :month, :day: The creation year, month and day, respectively, of the source path. An error is raised if the needed meta information created_at is not set on the path.

The constructed output path must always be an absolute one, ie. it has to start at the root of the output directory. Therefore, the :parent part should always be included!

Following are some examples of output path names for given source path names (assuming that en is the default language and that the path is under a directory called /img/):

  • output_path_style=[:parent, :basename, [., :lang], :ext] (the default)

    • index.jpg –> /img/index.jpg

      Since the source path is unlocalized, no language part is used and the whole sub array with the :lang symbol is dropped.

    • index.en.jpg –> /img/index.jpg

      This happens if the configuration option sourcehandler.default_lang_in_output_path is false and no unlocalized version of this path exists.

    • index.en.jpg –> /img/index.en.jpg

      Similar to the last example but this result occurs when there is an unlocalized version of the path which is naturally named index.jpg!

    • index.de.jpg –> /img/index.de.jpg

      Since de is not the default language, the language part is always used!

  • output_path_style=[:parent, :basename, :ext, ., :lang]

    • index.jpg –> /img/index.jpg.

      Be aware of the trailing dot since the :lang value is not defined in a sub array!

  • output_path_style=[:parent, :year, /, :month, /, :basename, [., :lang], :ext]

    • index.jpg –> /img/2008/09/index.jpg

      This output path style can be used to create blog archive style output names.

Path Patterns

Each source handler specifies path patterns which are used to locate the files that the class can handle. Normally these patterns are used to match file extensions, however, they are much more powerful. For detailed information on the structure of path patterns have a look at the Dir.glob API documentation.

The path patterns that are handled by a particular source handler are stated on its documentation page. These patterns can be changed by modfying the configuration option sourcehandler.patterns although that is not recommended except in some few cases (for example, it is useful to add some patterns for SourceHandler::Copy). The information about how these path patterns work are useful for the usage of webgen because of two reasons:

  • so that you know which files will be processed by a specific source handler class

  • so that you can specify additional path patterns for some source handlers like the SourceHandler::Copy

Here are some example path patterns:

Path PatternResult
`*/*.html` All files with the extension `html` in the subdirectories of the source directory
`**/*.html` All files with the extension `html` in all directories
`**/{foo,bar}*` All files in all directories which start with `foo` or `bar`
`**/???` All files in all directories whose file name is exactly three characters long

Handling of files in the source directory

Following is the list of rules how source files are handled by webgen:

  • All path names of all sources specified in the configuration option sources are fetched. Prior listed sources have priority over later listed sources if both specify the same path.

  • Those paths which match a pattern of the configuration option sourcehandler.ignore are excluded.

  • The source handler classes are invoked according to the invocation order specified in sourcehandler.invoke and they use only those paths that match one of their path patterns specified in sourcehandler.patterns.

As you might have deduced from the processing list above, it is possible that one path is handled by multiple source handlers. This can be used, for example, to render an XML file as HTML and copy it verbatim.

Path Meta Information

Each path can have meta information, i.e. information about the path itself, associated with it, for example the title of the path, if it should appear in a menu and so on. This meta information can be specified in several ways, including:

  • Source handlers can provide default meta information for their handled paths (set using the configuration option sourcehandler.default_meta_info.

  • Some file types allow meta information to be specified directly in the file, for example page files in Webgen Page Format.

  • Meta information can also be specified in special meta information backing files. These files are handled by the source handler SourceHandler::Metainfo.

There is clearly defined order in which meta information is applied to a node for a path:

  1. The path gets read by a source and some meta information is extracted from the path name.

  2. This meta information is overwritten with meta information specified for all source handlers and then with meta information specified for the source handler that handles the path.

  3. Extensions can now overwrite meta information before the path gets handled by a source handler. For example, the SourceHandler::Metainfo assigns meta information specified for paths at this stage.

  4. Before the node for the path gets created, meta information from the content of the path itself may overwrite the current meta information (this is the case, for example, with files in Webgen Page Format).

  5. The node gets created with the provided meta information.

  6. After the node has been created, extensions may overwrite meta information again. For example, the SourceHandler::Metainfo assigns meta information specified for nodes at this stage.

webgen Extensions

webgen comes with many extensions that allow for rapid creation of static websites. The variety of the extensions allows you to use your tools of choice, for example, your preferred markup language. And if your choice is not available, you can write the extension for it yourself or make a feature request!

Extensions are only loaded when they are needed. For example, if you don’t use .feed files for automatically generating atom/RSS feeds for your website, the source handler that handles these files will never be loaded. Therefore you don’t need to explicitly enable extensions, they just sit in the background and wait till the webgen core needs them.

There are several types of extensions:

  • Content Processors: These extensions process content, for example, the content of files in Webgen Page Format. It is not specified how they have to process the content but this type of extension can basically be divided into two groups:

    • Text content processors: These processors are used to process textual content like files in Webgen Page Format. This group can furthe be divided:

      • Markup processors: Processors like kramdown or RedCloth belong to this group and they convert markup text that is easy to read and write to a more structure format like HTML. This allows you to write an HTML page without knowing HTML.

      • String replacers: These processors normally process special strings and substitute them with other content. For example, the erb processors replaces delimited strings interpreted as Ruby code with the interpreted value. Another example would be webgen’s tags processor which replaces strings like {relocatable: ../index.html} with a processed value.

    • Binary content processors: These processors are used to process binary data. For example, one might want to process an image to produce a thumbnailed version of it.

  • Source Handler: These extensions are used for handling source paths. They read the content of a path and produce one or more nodes (the internal representation of an output path, see source handling for more information on nodes). So a source handler can do something simple like just copying a source path to the output directory but they can also do complex things like creating a whole image gallery from just one source path.

  • Tags: This type of extension allows the easy inclusion of dynamic content in, for example, page and template files. Each tag extension is used for a specific task like creating a menu or a breadcrumb trail.

A webgen Run

When webgen is started, it first looks for a cache file and uses it if it exists. The cache file provides information on the previous run and allows webgen to render only those paths that have changed since the last time.

Each webgen run consists of one or more update/write cycles. During the update phase the internal tree structure is updated to reflect the current situation:

  • Nodes are created from newly found source paths.
  • If a source path was deleted since the last webgen run, the nodes created from it will be deleted.
  • All nodes in the tree are checked if they have changed and, if so, are marked for regeneration.

The name used for describing a directory or file once it is placed in the internal tree structure is ‘node’.

After the update cycle has finished, the internal tree representation is up-to-date and contains all necessary information to write its nodes. This is done in the write phase which writes out all changed nodes.

It is possible that the internal tree structure changes during the write phase. For example, after writing a page file fragments node for it may have been generated. After the write phase it is checked if something like this has happened. If webgen finds such a condition, a new update/write cycle is initiated. This is done as long as needed.

Since it is possible that multiple update/write cycles are used in one webgen run, it is shown in the log messages when a new cycle started. This is necessary because sometimes some warning or error log messages may be generated during an earlier cycle but the error conditions are automatically solved in later cycles. By marking where the cycles start a user can compare the log messages of the different cycles and see what he needs to solve himself.

Extending webgen

If you know the programming language Ruby a little bit, you can easily extend webgen and add new features that you need. All the needed developer documentation is available in the API documentation, along with sample implementations for all major types of extensions (source class, output classes, source handler class, content processors classes, tag classes, …) and detailed information about the inner workings of webgen. Have a look at the extension directory section for more information on naming files and where to put the extensions.

Creating a Website Style Bundle

webgen ships some style bundles sothat one can create a basic website out of the box. However, if you want to create a website style bundle yourself, it is easy.

The following assumes that you already have a website template, i.e. an HTML file with your layout, the accompanying CSS file and the needed images. The example below assume you have one downloaded from http://www.openwebdesign.org/.

  • First create the needed directories and the README file:

    • The bundle directory: BUNDLE
    • The source directory: BUNDLE/src
    • The README file: BUNDLE/README

    The README file contains information about your style bundle in YAML format. You should include at least the description and author keys:

    description: Website template based on the OSWD template BUNDLE
    author: Me Myself
    
  • Unzip the the archive from OSWD and move the extracted files into the BUNDLE/src directory.

  • Rename the file BUNDLE/src/index.html (or whatever the main HTML file is called) to default.template. Now go through this file and augment relative URLs to style sheets, images and other files with relocatable tags.

  • Identify the location of the main menu items, remove them and and use a menu tag instead. You may need to restrict the shown menu levels to a single level if the template is built for only one main menu level. Then adjust the CSS used to style the menu to include the webgen specific CSS classes for showing the currently selected link. Remember that you need to take care of styling not only <a> tags but also <span> tags. See the menu tag documentation for more information on how the menu is output by webgen and what CSS classes can be used!

  • Adjust the <title> tag in the <head> section of the default.template file – you can use the {title:} to insert the title of the processed page file.

  • Identify the location where the page content should go and put <webgen:block name="content' /> there. This tells webgen to insert the block named content of the page file that uses the template at this location.

  • If the template features a breadcrumb trail area, use the breadcrumb trail tag so that the breadcrumb trail gets created automatically.

The BUNDLE directory can now be directly used since it is a valid webgen website directory. If you want to publish your style bundle, just create a (gzipped) tar archive from the contents of the BUNDLE directory.