Cross References in Notes

My notes are continuous struggle. Because I render my notes with Hugo, which is a fabulous static site generator, I kept links in Hugo-friendly form: [foo]({{<!--* ref "" */-->}}). Unfortunately this is only understood by Hugo and sometimes I want to read and navigate notes from my phone. But fear no more - I fixed it!

I create Makefiles for almost all of my projects. They’re great: they provide easy to remember facade to complex tasks and have shell completion for their targets, so choosing what to run usually boils down to a little tabbing through the list. How does it relates to having and not having Hugo-like references? What should we do to have our cake and eat it? Add a little preprocessing in Makefile of course!

First of all I needed to convert back all of my Hugo-like refs to ordinary markdown links. I used some magic sed for it which worked for ~95% of all original refs the rest I fixed manually. It was one time thing and I don’t remember the exact regular expression unfortunately, but it was something like sed -i -E -e 's/{{<!--* ref "(" */-->}}/\1/g'. Unhandled cases? relref shortcode (obviously, I just HAD to use it at least once somewhere) and several refs in a single line (.+ is a greedy match).

Next, I wanted Hugo to understand that any links to .md files are in fact cross-references, which means that we must somehow replace bare file name with a shortcode, without changing – without even touching a working copy (because who likes when his text editor screams at you that file has changed when it hasn’t).

First I thought that I’ll be able to use these fancy pipelines which Hugo uses to pre-process assets, but apparently they’re special snowflakes, specialized for minifying Javascript and compiling SASS to CSS. Not good. So I decided to do the simplest thing ever: copy the whole content aside, run sed through it and use it to generate my site.

PUBDIR := /var/www/docs
CONTENT_TMP := /tmp/docs_content_tmp

    @rm -rf "$(PUBDIR)"
    @mkdir -p "$(PUBDIR)"
    @rm -rf "$(CONTENT_TMP)"
    cp -r content "$(CONTENT_TMP)"

    sed -i -E\
           -e 's,]\(([^).]\),]({{</* ref "\1" */>}}),g' \
           -e 's,]\((files/[^)]+)\),]({{</* file "\1" */>}}),g' \
           `find "$(CONTENT_TMP)" -name "*.md" -type f`

    hugo -d "$(PUBDIR)" -c "$(CONTENT_TMP)"
    @rm -rf "$(CONTENT_TMP)"
.PHONY: build

I hate that sed, but it does the job and probably it’s not the worst one I’ve ever written. And hey, it even works for several links in a single line, which wasn’t that trivial (see that [^)] block? That’s a hack for catching all unicode characters in a non-greedy way, which is generally unsupported by BRE and ERE used by GNU tools. Or it depends on locale. Or something else obscure enough that I don’t want to think about it).

Second -e script additionally detects file attachments, which are stored inside each repo in files directory, and wraps them with a custom {{<!--* file */-->}} shortcode which handles conversions to absolute paths. So for example image ![foo](files/bar.jpg) gets converted to ![foo]({{<!--* "files/bar.jpg" */-->}}), which in turn is generated to <img src="http://localhost/docs/0/files/bar.jpg" />.

Shortcode implementation is one-liner:

{{-/* printf "%s/%s/%s" .Site.BaseURL .Page.Section (.Get 0) */-}}

I like this approach a lot because finally content of my notes is independent from any external system/renderer. And now they work rather nicely with Markor.

One thing which I’ll for sure improve in future: I won’t blindly copy the whole content directory, but only Markdown files. All other files (“attachments”) will be symlinked. It should save a little disk space and a little time spent on unnecessary copying.