This is part of a series of posts documenting the code of mldblog.

An Entry Dissected

mldblog-entries are UTF-8 text files, that are split into header and article by an empty line.

Title: This is the Header
Date: 2010-08-10 00:00

Article text comes here. Blah blah...

The article is rendered as-is, so you are not only allowed to use HTML, but you are required to use HTML for formatting.
(Based on the plug-ins you use, the validity of this statement may vary, of course.)

Don't lose your head(er)

The header consists of lines of key/value-pairs, where the key and value are separated by a colon (:).

Values can span a single line or multiple lines. For the latter each subsequent line needs to start with a whitespace.

X-HypotheticalPlugin-Key1: This is a
 multi-line meta
 data thing
 that
 spreads like
 a wild-fire
 over multiple lines
X-HypotheticalPlugin-Key2: Hello World

Keys are not case-sensitive and split into two classes:

  1. Firstly there are keys starting with X-. Those are the keys, used by plug-ins.
    A hypothetical plug-in may look for keys as noted in the code-fragment above, while the Tags-plug-in, looks for an X-Tags-line.
    X-Tags: test,tag,something,hello_world
  2. All other names are reserved for the core of mldblog, with Title and Date being the only keys currently interpreted.
    And those 2 keys are required, unless $BLOSXOM_COMPAT is TRUE.
    If mldblog detects that the first line is not a key/value-pair and $BLOSXOM_COMPAT is TRUE, it interprets the first line as title and uses the file's modification time as the entry's date.
    sub read_header {
      my $e = $_[0]; # an entry's data-hash
      my $last_key = undef;
      open (FILE, $e->{path});
      while (<FILE>) {
        last if ($_ eq "\n"); # stop at first empty line
        # line is a key/value-pair
        if (my ($k, $v) = $_ =~ $rh_head_rex) {
          $last_key = lc($k);
          $e->{header}->{$last_key} = $v;
        }
        # continuation of multi-line value
        elsif (my ($c) = $_ =~ $rh_multi_rex) {
          $e->{header}->{$last_key} .= $c;
        }
        # nothing of the above AND first line AND compatibility mode
        elsif ($. == 1 && $BLOSXOM_COMPAT) {
          $e->{header}->{title} = $_;
          $e->{compat} = 1;
          last;
        }
        else { # not required for well-formed file
          last;
        }
      }
      close (<FILE>);
      # ...
    }
    
If Date is defined in an entry's header, it is interpreted by parse_date(), which uses the following RegEx:
/(\d{2,5})\D+(\d{1,2})\D+(\d{1,2})\s+(?:(\d{1,2}):(\d{1,2}))?\s*(GMT|Z|UT|UTC)?/
This means, that the expected format of Date's value is: YYYY MM DD hh:mm
(If you want the date to be UTC instead of the server's local time-zone, append GMT, Z, UT or UTC)

Organisation of Entries

read_entries() scans $DIR_ENTRIES and its sub-directories for files that end with $ENTRY_EXTENSION. Each matching file is considered to be an entry.

The part of the entry's path after $DIR_ENTRIES and before $ENTRY_EXTENSION is considered to be the entry's ID. This ID is used for perma-linking, so renaming a file after it has been published is not such a good idea.

/^$DIR_ENTRIES(.*)$ENTRY_EXTENSION$/
Because $ENTRY_EXTENSION is used inside a RegEx, it needs to be escaped accordingly.
The very same would apply to $DIR_ENTRIES, were it not used as parameter for find(). So escaping $DIR_ENTRIES is not an option, however no sane directory name should contain characters that need to be escaped in a RegEx (don't quote me on that, I didn't really think about that assumption, I only made it).

Basic filtering

Entries that have a date set in the future, are automatically filtered out by mldblog, unless $SHOW_FUTURE is TRUE. This prevents plug-ins from having access to entries that are not meant for the public, yet.
sub read_entries {
  # ...
  read_header($e);
  if ($SHOW_FUTURE || $e->{header}->{date} <= time()) {
    $full_entries{$e->{id}} = $e;
  }
  # ...
}
No comments
Post Comment