Using Nitride - Markdown

Up a Level

Yesterday, I created an over-engineered program to copy a single file from one directory to another. Now, time to make it less overkill by transforming that Markdown file into simple HTML.

Series

2025-06-07 Using Nitride - Introduction - I'm going to start a long, erratic journey to update my publisher website. Along the way, I'm going to document how to create a MfGames.Nitride website from the ground up, along with a lot of little asides about reasoning or purpose.

2025-06-08 Using Nitride - Pipelines - The first major concept of Nitride is the idea of a "pipeline" which are the largest units of working in the system. This shows a relatively complex setup of pipelines that will be used to generate the website.

2025-06-09 Using Nitride - Entities - The continued adventures of creating the Typewriter website using MfGames.Nitride, a C# static site generator. The topic of today is the Entity class and how Nitride uses an ECS (Entity-Component-System) to generate pages.

2025-06-10 Using Nitride - Markdown - Examples and explanations of converting Markdown to HTML using MfGames.Nitride and MarkDig.

2025-06-12 Using Nitride - Front Matter - How to use Nitride to take the front matter from pages and add them as a component into the Entity class.

MarkDig

I don't like reinventing the wheel. I mean, I seem to keep doing it but I don't enjoy it. That was one reason why I tried another static site generators before working on [[MfGames.Nitride]]. However, when it comes to redoing a Markdown parser, even I'm not that foolish when there is already [[MarkDig]], an excellent library for turning Markdown into HTML and extendable enough that I could also turn Markdown into [[Gemini]] (a later post).

In these cases, we need to tell Nitride how to do anything with Markdown since I didn't make it part of the core library. To do that, we need to pull in the NuGet package. While we're at it, we're also going to add the HTML processing library.

$ cd src/dotnet
$ dotnet add package MfGames.Nitride.Markdown
$ dotnet add package MfGames.Nitride.Html

Once we have the packages installed, we need to add those modules into the system. This is where [[Autofac]] came in helpful since I just have to add a module for the package it will handle the registration of any operations, components, and systems that we need to use.

// In //src/dotnet/Program.cs
var builder = new NitrideBuilder(args)
    .UseIO(rootDirectory)
    .UseMarkdown()
    .UseHtml()
    .UseModule<WebsiteModule>();

As you can see, I'm trying to follow the generic host pattern for the setup.

Identifying Markdown

While it may be obvious to convert any entity class that ends in `.md` into `.html`, we break this apart into separate steps. First, is that we identify a file as a Markdown file. This does two things, it adds the `MfGames.Nitride.Markdown.IsMarkdown` as a component, and then treats the contents as text instead of binary.

If you remember previously, we had this output:

[00:41:57 INF] <PagesPipeline> Read in 1 files from /src/pages
[00:41:57 INF] <PagesPipeline> Entity: Path /index.md, Components ["MfGames.Nitride.Contents.IBinaryContent","Zio.UPath"]
[00:41:57 INF] <SimplifiedMarkdownPipeline> Reading 1 entities
[00:41:57 INF] <StyleHtmlPipeline> Reading 1 entities
[00:41:57 INF] <OutputHtmlPipeline> Writing out 1 files
[00:41:57 INF] <OutputHtmlPipeline> Entity: Path /build/typewriter/html/index.md, Components ["MfGames.Nitride.Contents.IBinaryContent","Zio.UPath"]

Now, we're going to use a new operation, `MfGames.Nitride.Markdown.IdentifyMarkdownFromPath`. This could be put in a central place, such as `SimplifiedMarkdownPipeline`, but I found it is better to do this earlier than later so I usually put the identify process in the input methods. In this case, `PagesPipeline`:

// In //src/dotnet/Pipelines/Inputs/PagesPipeline.cs
public PagesPipeline(
    ILogger<PagesPipeline> logger,
    ReadFiles readFiles,
    IdentifyMarkdownFromPath identifyMarkdownFromPath)
{
    _logger = logger;
    _identifyMarkdownFromPath = identifyMarkdownFromPath;

    _readFiles = readFiles
        .WithPattern("/src/pages/typewriter/**/*.md")
        .WithRemovePathPrefix("/src/pages/typewriter");
}

public override IAsyncEnumerable<Entity> RunAsync(
    IEnumerable<Entity> entities,
    CancellationToken cancellationToken = default)
{
    var list = _readFiles
        .Run(cancellationToken)
        .Run(_identifyMarkdownFromPath)
        .ToList();

This operation doesn't take any parameters because it attempts to “do the right thing” with a minimal amount of effort. Running the code now produces this:

[00:46:43 INF] <PagesPipeline> Entity: Path /index.md, Components ["MfGames.Nitride.Markdown.IsMarkdown","Zio.UPath","MfGames.Nitride.Contents.ITextContent"]

The big things is that we have a new component, `IsMarkdown`, and the `IBinaryContent` changed to `ITextComponent` which gives us some additional extension methods but also indicates that the file is a text file instead of treating it as a simple binary. It still hasn't loaded the file into memory, it just switched how it is handled.

This is a separate step is because sometimes I want to keep a specific file as Markdown and not convert it to HTML. Also, there are times when I construct an `Entity` directly without having a valid path and I just have to add the `IsHtml` and `ITextContent` and it then acts like every other file without having to have any special rules.

Content

Entities with `ITextComponent` have a number of useful extension methods associated with them. (Technically, you can use these methods against any `Entity` class and it will try to convert a binary to text if there is a `IBinaryContent` and we want a `ITextContent`).

bool hasText = entity.HasTextContent();
string text = entity.GetTextContent();

entity.SetTextContent(stringValue);
entity.SetTextContent(stringBufferValue);

These are also stored as a `ITextContent` instead of `string` or `StringBuffer`. This is because the default interface is to leave the content on the disk and use [[Zio]] to retrieve it. However, as soon as `SetTextContent` is used, then it keeps that value in memory for the rest of the execution. This is the point where memory pressure begins to increase.

In the future, we could easily create a `ITextContent` implementation that writes large text files to the disk to get them out of memory. However, even my largest chapter of twenty-five thousand words doesn't create too much of a problem so I haven't bothered trying to implement that at this point (but if I did, it would go into `//.cache` in some manner).

Converting Markdown to HTML

Just identifying a file as Markdown doesn't do anything in itself. To convert it, we use another operation, `MfGames.Nitride.Markdown.ConvertMarkdownToHtml`. This easily goes into the `StyleHtmlPipeline` to handle the conversion and styling. It allows any MarkDig extension or plugin to be called as part of the setup, allow one to customize exactly how the output is generated.

We also want to change the extension from `.md` to `.html`. I realize I should have baked that logic into the `ConvertMarkdownToHtml` since it is a common operation, which I will do, but for now, we also need to use the `MfGames.Nitride.IO.Paths.ChangePathExtension` to do that.

public StyleHtmlPipeline(
    ILogger<StyleHtmlPipeline> logger,
    SimplifiedMarkdownPipeline simplifiedMarkdownPipeline,
    ConvertMarkdownToHtml convertMarkdownToHtml,
    ChangePathExtension changePathExtension)
{
    _logger = logger;
    _changePathExtension = changePathExtension.WithExtension(".html");

    _convertMarkdownToHtml = convertMarkdownToHtml
        .WithConfigureMarkdown(builder =>
        {
            SmartyPantOptions smartyPantOptions = new();
            SmartyPantsExtension smartyPants = new(smartyPantOptions);

            builder
                .Use<GenericAttributesExtension>()
                .Use(smartyPants);
        });

    AddDependency(simplifiedMarkdownPipeline);
}

public override IAsyncEnumerable<Entity> RunAsync(
    IEnumerable<Entity> entities,
    CancellationToken cancellationToken = default)
{
    var list = entities
        .Run(_convertMarkdownToHtml, cancellationToken)
        .Run(_changePathExtension, cancellationToken)
        .ToList();

    _logger.LogInformation("Reading {Count:N0} entities", list.Count);

    return list.ToAsyncEnumerable();
}

Running this gets us:

$ just build
[01:00:43 INF] <PagesPipeline> Read in 1 files from /src/pages
[01:00:43 INF] <PagesPipeline> Entity: Path /index.md, Components ["MfGames.Nitride.Markdown.IsMarkdown","Zio.UPath","MfGames.Nitride.Contents.ITextContent"]
[01:00:43 INF] <SimplifiedMarkdownPipeline> Reading 1 entities
[01:00:44 INF] <StyleHtmlPipeline> Reading 1 entities
[01:00:44 INF] <OutputHtmlPipeline> Writing out 1 files
[01:00:44 INF] <OutputHtmlPipeline> Entity: Path /build/typewriter/html/index.html, Components ["MfGames.Nitride.Html.IsHtml","Zio.UPath","MfGames.Nitride.Contents.ITextContent"]
$ find build -type f
build/typewriter/html/index.html
$ cat src/pages/typewriter/index.md 
# Typewriter Press
$ cat build/typewriter/html/index.html
cat build/typewriter/html/index.html
<h1>Typewriter Press</h1>

And now our over-engineered copy method has duplicated `markdown2html` for a single file.

Components

You may notice that `IsMarkdown` has been removed and `IsHtml` was added. This is part of where I struggled with [[Statiq]] and lead me down the path of using components. I don't have to pre-define the different data types, purposes, or even formats of a file. Enums are great, but they don't allow easy extension but with an ECS, it's just a matter of adding and removing components based on the use.

I've used components for a lot of things including identifying pages that should be in the blog archives, special notices, or pages that I want to ignore because they are aliases. I also embed indexes and lists into the pages to allow things like the “next” or “previous” links. If I was doing a web comic, I could have a per-character next/previous system easily implemented via those components.

My other static site generators didn't even have the content type tagging Statiq did, which was a novel concept for me and one that I'm glad I had a chance to use. It simplified a lot of my logic and lead nicely into where I am today.

Planning Ahead

As I'm planning ahead, I'm going to do the following change:

Rename `SimplifiedMarkdownPipeline` to `ContentPipeline`

Move the logic I just added into a new pipeline called `BareHtmlPipeline` and insert it into between `ContentPipeline` and `StyleHtmlPipeline`

The reason for this is because RSS/Atom feeds use bare HTML to generate their content, so it makes sense to have that bare pipeline feed both of them while having the `ContentPipeline` handle a lot of the linking and references that we'll need.

Directory Paths

One last thing for this post: I prefer paths that end in directory slashes instead of files. So, if create a contact page at `//src/pages/content.md`, we want the HTML to be at `https://typewriter.press/content/`. This is, creatively enough, another operation: `MfGames.Nitride.IO.Paths.MoveToIndexPath`.

// In //src/dotnet/Pipelines/Inputs/PagesPipeline.cs
public PagesPipeline(
    ILogger<PagesPipeline> logger,
    ReadFiles readFiles,
    IdentifyMarkdownFromPath identifyMarkdownFromPath,
    MoveToIndexPath moveToIndexPath)
{
    _logger = logger;
    _identifyMarkdownFromPath = identifyMarkdownFromPath;
    _moveToIndexPath = moveToIndexPath;

    _readFiles = readFiles
        .WithPattern("/src/pages/typewriter/**/*.md")
        .WithRemovePathPrefix("/src/pages/typewriter");
}

public override IAsyncEnumerable<Entity> RunAsync(
    IEnumerable<Entity> entities,
    CancellationToken cancellationToken = default)
{
    var list = _readFiles
        .Run(cancellationToken)
        .Run(_identifyMarkdownFromPath)
        .Run(_moveToIndexPath)
        .ToList();

And we add a `contact.md` page which a run gives us this:

$ just build
[01:12:26 INF] <PagesPipeline> Entity: Path /contact/index.md, Components ["MfGames.Nitride.Contents.ITextContent","MfGames.Nitride.Markdown.IsMarkdown","Zio.UPath"]
[01:12:26 INF] <PagesPipeline> Entity: Path /index.md, Components ["MfGames.Nitride.Contents.ITextContent","MfGames.Nitride.Markdown.IsMarkdown","Zio.UPath"]
[01:12:26 INF] <ContentPipeline> Reading 2 entities
[01:12:26 INF] <BareHtmlPipeline> Reading 2 entities
[01:12:26 INF] <StyleHtmlPipeline> Reading 2 entities
[01:12:26 INF] <OutputHtmlPipeline> Writing out 2 files
[01:12:26 INF] <OutputHtmlPipeline> Entity: Path /build/typewriter/html/contact/index.html, Components ["MfGames.Nitride.Contents.ITextContent","MfGames.Nitride.Html.IsHtml","Zio.UPath"]
[01:12:26 INF] <OutputHtmlPipeline> Entity: Path /build/typewriter/html/index.html, Components ["MfGames.Nitride.Contents.ITextContent","MfGames.Nitride.Html.IsHtml","Zio.UPath"]
$ cat src/pages/typewriter/contact.md 
# Contact Us

Our emails is [contact@typewriter.press](mailto:contact@typewriter.press).
$ cat build/typewriter/html/contact/index.html 
<h1>Contact Us</h1>
<p>Our emails is <a href="mailto:contact@typewriter.press">contact@typewriter.press</a>.</p>

As you can see, the input is `//src/pages/contact.md`, but through the pipeline, it is written out as `//build/typewriter/html/contact/index.html`.