The next thing to do to get BlogHelper9000 functional is to write a command which provides some information about the posts in the blog. I want to know:
- How many published posts there are
- How many drafts there are
- A short list of recent posts
- How long it’s been since a post was published
I also know that I want to introduce a command which will allow me to fix the metadata in the posts, which is a little messy. I’ve been inconsistently blogging since 2007, originally starting off on a self-hosted python blog I’ve forgot the name of before migrating to Wordpress, and then migrating to a short lived .net static site generator before switching over to Jekyll.
Obviously, Markdown powered blogs like Jekyll have to provide non-markdown metadata in each post, and for Jekyll (and most markdown powered blogs) that means: YAML.
Parse that YAML
There are a couple of options when it comes to parsing YAML. One would be to use YamlDotNet which is a stable library which conforms with V1.1 and v1.2 of the YAML specifications.
But where is the fun in that?
I’ve defined a POCO called YamlHeader
which I’m going to use to use as the in-memory object to represent the YAML metadata header at the top of a markdown file.
If we take a leaf from different JSON converters, we can define a YamlConvert
class like this:
public static class YamlConvert
{
public static string Serialise(YamlHeader header)
{
}
public static YamlHeader Deserialise(string filePath)
{
}
}
With this, we can easily serialise a YamlHeader
into a string, and deserialise a file into a YamlHeader
.
Deserialise
Deserialising is the slight more complicated of the two, so lets start with that.
Our first unit test looks like this:
[Fact]
public void Should_Deserialise_YamlHeader()
{
var yaml = @"---
layout: post
title: 'Dynamic port assignment in Octopus Deploy'
tags: ['build tools', 'octopus deploy']
featured_image: /assets/images/posts/2020/artem-sapegin-b18TRXc8UPQ-unsplash.jpg
featured: false
hidden: false
---
post content that's not parsed";
var yamlObject = YamlConvert.Deserialise(yaml.Split(Environment.NewLine));
yamlObject.Layout.Should().Be("post");
yamlObject.Tags.Should().NotBeEmpty();
}
This immediately requires us to add an overload for Deserialise
to the YamlConvert
class, which takes a string[]
. This means our implementation for the first Deserialise
method is simply:
public static YamlHeader Deserialise(string filePath)
{
if (!File.Exists(filePath)) throw new FileNotFoundException("Unable to find specified file", filePath);
var content = File.ReadAllLines(filePath);
return Deserialise(content);
}
Now we get into the fun part. And a big caveat: I’m not sure if this is the best way of doing this, but it works for me and that’s all I care about.
Anyway. A YAML header block is identified by a single line of only ---
followd by n
lines of YAML which is signified to have ended by another single line of only ---
. You can see this in the unit test above.
The algorithm I came up with goes like this:
For each line in lines:
if line is '---' then
if header start marker not found then
header start marker found
continue
break loop
store line
parse each line of found header
So in a nutshell, it loops through each line in the file, look for the first ---
to identify the start of the header, and then until it hits another ---
, it gathers the lines for further processing.
Translated into C#, the code looks like this:
public static YamlHeader Deserialise(string[] fileContent)
{
var headerStartMarkerFound = false;
var yamlBlock = new List<string>();
foreach (var line in fileContent)
{
if (line.Trim() == "---")
{
if (!headerStartMarkerFound)
{
headerStartMarkerFound = true;
continue;
}
break;
}
yamlBlock.Add(line);
}
return ParseYamlHeader(yamlBlock);
}
This is fairly straightforward, and isn’t where I think some of the problems with the way it works actually are - all that is hidden behind ParseYamlHeader
, and is worth a post on its own.