Using Nitride - Markdown
Yesterday, I created an over-engineered program to copy a single file from one directory to another. Now, time to make it less overkill by transforming that Markdown file into simple HTML.
Series
MarkDig
I don't like reinventing the wheel. I mean, I seem to keep doing it but I don't enjoy it. That was one reason why I tried another static site generators before working on [[MfGames.Nitride]]. However, when it comes to redoing a Markdown parser, even I'm not that foolish when there is already [[MarkDig]], an excellent library for turning Markdown into HTML and extendable enough that I could also turn Markdown into [[Gemini]] (a later post).
In these cases, we need to tell Nitride how to do anything with Markdown since I didn't make it part of the core library. To do that, we need to pull in the NuGet package. While we're at it, we're also going to add the HTML processing library.
$ cd src/dotnet $ dotnet add package MfGames.Nitride.Markdown $ dotnet add package MfGames.Nitride.Html
Once we have the packages installed, we need to add those modules into the system. This is where [[Autofac]] came in helpful since I just have to add a module for the package it will handle the registration of any operations, components, and systems that we need to use.
// In //src/dotnet/Program.cs var builder = new NitrideBuilder(args) .UseIO(rootDirectory) .UseMarkdown() .UseHtml() .UseModule<WebsiteModule>();
As you can see, I'm trying to follow the generic host pattern for the setup.
Identifying Markdown
While it may be obvious to convert any entity class that ends in `.md` into `.html`, we break this apart into separate steps. First, is that we identify a file as a Markdown file. This does two things, it adds the `MfGames.Nitride.Markdown.IsMarkdown` as a component, and then treats the contents as text instead of binary.
If you remember previously, we had this output:
[00:41:57 INF] <PagesPipeline> Read in 1 files from /src/pages [00:41:57 INF] <PagesPipeline> Entity: Path /index.md, Components ["MfGames.Nitride.Contents.IBinaryContent","Zio.UPath"] [00:41:57 INF] <SimplifiedMarkdownPipeline> Reading 1 entities [00:41:57 INF] <StyleHtmlPipeline> Reading 1 entities [00:41:57 INF] <OutputHtmlPipeline> Writing out 1 files [00:41:57 INF] <OutputHtmlPipeline> Entity: Path /build/typewriter/html/index.md, Components ["MfGames.Nitride.Contents.IBinaryContent","Zio.UPath"]
Now, we're going to use a new operation, `MfGames.Nitride.Markdown.IdentifyMarkdownFromPath`. This could be put in a central place, such as `SimplifiedMarkdownPipeline`, but I found it is better to do this earlier than later so I usually put the identify process in the input methods. In this case, `PagesPipeline`:
// In //src/dotnet/Pipelines/Inputs/PagesPipeline.cs public PagesPipeline( ILogger<PagesPipeline> logger, ReadFiles readFiles, IdentifyMarkdownFromPath identifyMarkdownFromPath) { _logger = logger; _identifyMarkdownFromPath = identifyMarkdownFromPath; _readFiles = readFiles .WithPattern("/src/pages/typewriter/**/*.md") .WithRemovePathPrefix("/src/pages/typewriter"); } public override IAsyncEnumerable<Entity> RunAsync( IEnumerable<Entity> entities, CancellationToken cancellationToken = default) { var list = _readFiles .Run(cancellationToken) .Run(_identifyMarkdownFromPath) .ToList();
This operation doesn't take any parameters because it attempts to “do the right thing” with a minimal amount of effort. Running the code now produces this:
[00:46:43 INF] <PagesPipeline> Entity: Path /index.md, Components ["MfGames.Nitride.Markdown.IsMarkdown","Zio.UPath","MfGames.Nitride.Contents.ITextContent"]
The big things is that we have a new component, `IsMarkdown`, and the `IBinaryContent` changed to `ITextComponent` which gives us some additional extension methods but also indicates that the file is a text file instead of treating it as a simple binary. It still hasn't loaded the file into memory, it just switched how it is handled.
This is a separate step is because sometimes I want to keep a specific file as Markdown and not convert it to HTML. Also, there are times when I construct an `Entity` directly without having a valid path and I just have to add the `IsHtml` and `ITextContent` and it then acts like every other file without having to have any special rules.
Content
Entities with `ITextComponent` have a number of useful extension methods associated with them. (Technically, you can use these methods against any `Entity` class and it will try to convert a binary to text if there is a `IBinaryContent` and we want a `ITextContent`).
bool hasText = entity.HasTextContent(); string text = entity.GetTextContent(); entity.SetTextContent(stringValue); entity.SetTextContent(stringBufferValue);
These are also stored as a `ITextContent` instead of `string` or `StringBuffer`. This is because the default interface is to leave the content on the disk and use [[Zio]] to retrieve it. However, as soon as `SetTextContent` is used, then it keeps that value in memory for the rest of the execution. This is the point where memory pressure begins to increase.
In the future, we could easily create a `ITextContent` implementation that writes large text files to the disk to get them out of memory. However, even my largest chapter of twenty-five thousand words doesn't create too much of a problem so I haven't bothered trying to implement that at this point (but if I did, it would go into `//.cache` in some manner).
Converting Markdown to HTML
Just identifying a file as Markdown doesn't do anything in itself. To convert it, we use another operation, `MfGames.Nitride.Markdown.ConvertMarkdownToHtml`. This easily goes into the `StyleHtmlPipeline` to handle the conversion and styling. It allows any MarkDig extension or plugin to be called as part of the setup, allow one to customize exactly how the output is generated.
We also want to change the extension from `.md` to `.html`. I realize I should have baked that logic into the `ConvertMarkdownToHtml` since it is a common operation, which I will do, but for now, we also need to use the `MfGames.Nitride.IO.Paths.ChangePathExtension` to do that.
public StyleHtmlPipeline( ILogger<StyleHtmlPipeline> logger, SimplifiedMarkdownPipeline simplifiedMarkdownPipeline, ConvertMarkdownToHtml convertMarkdownToHtml, ChangePathExtension changePathExtension) { _logger = logger; _changePathExtension = changePathExtension.WithExtension(".html"); _convertMarkdownToHtml = convertMarkdownToHtml .WithConfigureMarkdown(builder => { SmartyPantOptions smartyPantOptions = new(); SmartyPantsExtension smartyPants = new(smartyPantOptions); builder .Use<GenericAttributesExtension>() .Use(smartyPants); }); AddDependency(simplifiedMarkdownPipeline); } public override IAsyncEnumerable<Entity> RunAsync( IEnumerable<Entity> entities, CancellationToken cancellationToken = default) { var list = entities .Run(_convertMarkdownToHtml, cancellationToken) .Run(_changePathExtension, cancellationToken) .ToList(); _logger.LogInformation("Reading {Count:N0} entities", list.Count); return list.ToAsyncEnumerable(); }
Running this gets us:
$ just build [01:00:43 INF] <PagesPipeline> Read in 1 files from /src/pages [01:00:43 INF] <PagesPipeline> Entity: Path /index.md, Components ["MfGames.Nitride.Markdown.IsMarkdown","Zio.UPath","MfGames.Nitride.Contents.ITextContent"] [01:00:43 INF] <SimplifiedMarkdownPipeline> Reading 1 entities [01:00:44 INF] <StyleHtmlPipeline> Reading 1 entities [01:00:44 INF] <OutputHtmlPipeline> Writing out 1 files [01:00:44 INF] <OutputHtmlPipeline> Entity: Path /build/typewriter/html/index.html, Components ["MfGames.Nitride.Html.IsHtml","Zio.UPath","MfGames.Nitride.Contents.ITextContent"] $ find build -type f build/typewriter/html/index.html $ cat src/pages/typewriter/index.md # Typewriter Press $ cat build/typewriter/html/index.html cat build/typewriter/html/index.html <h1>Typewriter Press</h1>
And now our over-engineered copy method has duplicated `markdown2html` for a single file.
Components
You may notice that `IsMarkdown` has been removed and `IsHtml` was added. This is part of where I struggled with [[Statiq]] and lead me down the path of using components. I don't have to pre-define the different data types, purposes, or even formats of a file. Enums are great, but they don't allow easy extension but with an ECS, it's just a matter of adding and removing components based on the use.
I've used components for a lot of things including identifying pages that should be in the blog archives, special notices, or pages that I want to ignore because they are aliases. I also embed indexes and lists into the pages to allow things like the “next” or “previous” links. If I was doing a web comic, I could have a per-character next/previous system easily implemented via those components.
My other static site generators didn't even have the content type tagging Statiq did, which was a novel concept for me and one that I'm glad I had a chance to use. It simplified a lot of my logic and lead nicely into where I am today.
Planning Ahead
As I'm planning ahead, I'm going to do the following change:
- Rename `SimplifiedMarkdownPipeline` to `ContentPipeline`
- Move the logic I just added into a new pipeline called `BareHtmlPipeline` and insert it into between `ContentPipeline` and `StyleHtmlPipeline`
The reason for this is because RSS/Atom feeds use bare HTML to generate their content, so it makes sense to have that bare pipeline feed both of them while having the `ContentPipeline` handle a lot of the linking and references that we'll need.
Directory Paths
One last thing for this post: I prefer paths that end in directory slashes instead of files. So, if create a contact page at `//src/pages/content.md`, we want the HTML to be at `https://typewriter.press/content/`. This is, creatively enough, another operation: `MfGames.Nitride.IO.Paths.MoveToIndexPath`.
// In //src/dotnet/Pipelines/Inputs/PagesPipeline.cs public PagesPipeline( ILogger<PagesPipeline> logger, ReadFiles readFiles, IdentifyMarkdownFromPath identifyMarkdownFromPath, MoveToIndexPath moveToIndexPath) { _logger = logger; _identifyMarkdownFromPath = identifyMarkdownFromPath; _moveToIndexPath = moveToIndexPath; _readFiles = readFiles .WithPattern("/src/pages/typewriter/**/*.md") .WithRemovePathPrefix("/src/pages/typewriter"); } public override IAsyncEnumerable<Entity> RunAsync( IEnumerable<Entity> entities, CancellationToken cancellationToken = default) { var list = _readFiles .Run(cancellationToken) .Run(_identifyMarkdownFromPath) .Run(_moveToIndexPath) .ToList();
And we add a `contact.md` page which a run gives us this:
$ just build [01:12:26 INF] <PagesPipeline> Entity: Path /contact/index.md, Components ["MfGames.Nitride.Contents.ITextContent","MfGames.Nitride.Markdown.IsMarkdown","Zio.UPath"] [01:12:26 INF] <PagesPipeline> Entity: Path /index.md, Components ["MfGames.Nitride.Contents.ITextContent","MfGames.Nitride.Markdown.IsMarkdown","Zio.UPath"] [01:12:26 INF] <ContentPipeline> Reading 2 entities [01:12:26 INF] <BareHtmlPipeline> Reading 2 entities [01:12:26 INF] <StyleHtmlPipeline> Reading 2 entities [01:12:26 INF] <OutputHtmlPipeline> Writing out 2 files [01:12:26 INF] <OutputHtmlPipeline> Entity: Path /build/typewriter/html/contact/index.html, Components ["MfGames.Nitride.Contents.ITextContent","MfGames.Nitride.Html.IsHtml","Zio.UPath"] [01:12:26 INF] <OutputHtmlPipeline> Entity: Path /build/typewriter/html/index.html, Components ["MfGames.Nitride.Contents.ITextContent","MfGames.Nitride.Html.IsHtml","Zio.UPath"] $ cat src/pages/typewriter/contact.md # Contact Us Our emails is [contact@typewriter.press](mailto:contact@typewriter.press). $ cat build/typewriter/html/contact/index.html <h1>Contact Us</h1> <p>Our emails is <a href="mailto:contact@typewriter.press">contact@typewriter.press</a>.</p>
As you can see, the input is `//src/pages/contact.md`, but through the pipeline, it is written out as `//build/typewriter/html/contact/index.html`.
What's Next
Next up, handling front matter. We need that for many reasons, but one of the biggest reason is to provide metadata to generate styled HTML output.
Metadata
Categories:
Tags:
Footer
Below are various useful links within this site and to related sites (not all have been converted over to Gemini).
Source