This is part of a series of posts documenting the code of mldblog.

Caching of templates

Some templates (eg. entry, comment) are used multiple times during rendering, so they are loaded form disk only once and cached inside a hash. On subsequent uses those templates are loaded directly from memory instead of the hard-drive.
Admittedly, this may not be a huge performance gain because the template-files are accessed repeatedly in a short amount of time so they are very likely to be buffered by the operating system.
However, it gets rid of some context-switches and... <blink>OVERENGINEERING</blink>
my %template_cache = ();
sub render_template {
  my $template = $template_cache{$_[0]};
  if (!defined($template)) {
    $template = $template_cache{$_[0]} = load_file($DIR_TEMPLATES.'/'.$_[0]);
  }
  # snip
}

Indexing of entries

The script looks for entries in the designated directory ($DIR_ENTRIES) on every start. Adding to that it needs to open each file and read all the header-information.

To minimise the file accesses, the result of that scan is cached.
Two things happen when such an index file exists at startup.

  1. If its age (in regards to modification time) is below a certain threshold ($CACHE_GRACE) the cached data is assumed to be valid and no further file access is needed. (Except for loading entry-content later on)
  2. If the cache file is older than $CACHE_GRACE mldblog scans for all entry-files and compares their modification time to the one stored in the cache. If they match, the cached data is considered to be valid, if they don't, the file is opened and the header-information is extracted.
After a new index is built, it is written to the cache file.

Keep in mind, that only the header-data is stored in the index file. The article itself is still loaded from disk every time. So if you fix a typo or update the content of the entry it is reflected immediately.
Example: Because I add a new entry to my blog only once a week, I set the grace period to 1 day and delete the cache file manually. I keep it at 1 day for the very unlikely (ahem) case, that I forget to delete the cache file.

my %all_entries = (); # used by all_entries() during find()
sub read_entries {
  my @entries = ();
  my %full_entries = ();
  my $cached_entries = Storable::lock_retrieve($PATH_CACHE) if -f $PATH_CACHE;
  my $cache_age = (-f $PATH_CACHE) ? (time() - (stat($PATH_CACHE))[9]) : 0;
  if ($cached_entries && $cache_age < $CACHE_GRACE) {
    @entries = values %$cached_entries;
  }
  else {
    # all_entries() stores all found entries in %all_entries
    find(\&all_entries, $DIR_ENTRIES);
    foreach my $e (values %all_entries) {
      if (exists $cached_entries->{$e->{id}}
          && $cached_entries->{$e->{id}}->{moddate} == $e->{moddate}
         ) {
        $full_entries{$e->{id}} = $cached_entries->{$e->{id}};
      }
      else {
        read_header($e);
        if ($SHOW_FUTURE || $e->{header}->{date} <= time()) {
          $full_entries{$e->{id}} = $e;
        }
      }
    }
    # snip
    Storable::lock_store(\%full_entries, $PATH_CACHE);
    @entries = values %full_entries;
  }
  # snip
  return \@entries;
}

Configuration

2 settings are used for index-caching:
  • $PATH_CACHE is the full path of the cache file.
  • $CACHE_GRACE specifies the number of seconds the cache file is considered valid without comparing the entries to their file-system counter-parts. During this time new files will also be missed.

Primitive Benchmark

To test if that bunch of code brings any improvements to the table, I created 10 directories with 100 entries each.
(by hand, of course)
1st run2nd run
Without cache:195.5ms195.5ms
With cache:71.4ms60.5ms
Grace-period:16.8ms16.7ms
Indexing without the cache and reading the cache file is very constant. However, using the cache file and stating the individual files is all over the map. But all the samples I took were somewhere between ~60ms and ~70ms.
Why this variation of 10ms does not show up when accessing and opening all files I don't know. If anyone's got a clue, let me know!

Nonetheless it's still a ~3x increase with the cache file, which is ok.
And a little bit more than 10x increase when inside the grace period.

So much for the caching and indexing of mldblog. As one can see by the numbers above, it improves performance considerably for blogs that have lots of entries or are hit very often.

No comments
Post Comment