Ctags For Markdown Notes

I keep my notes in plaintext Markdown files. Previously I used vimwiki for managing them but I felt that it was somehow bloated and its Markdown engine slightly differed from the more typical engine used in Markor (which is an Android app which I sometimes use to view and edit notes on the phone). Since I switched away from vimwiki to plain Markdown, there was one thing I missed: tags.

Generally I tag my notes and since recently I use 2 types of tags: inline tags which resemble org-mode style of tagging (:tag1:tag2:tag3:), and document-wide tags which are placed in YAML Front Matter of my Markdown files. Typical note looks like this:

---
title: Some title
tags: [foo, bar]
---

- Lorem ipsum               :baz:blah:
  - dolor sit amet          :consecteur:

Vimwiki impements its custom way of hand-generating tags file for org-mode style tags. Thanks to it I can jump in vim directly to the place where selected tag is found. For example with :tjump /<tab> I can easily search through all tags in all of my notes.

Today I managed to replicate the way of generating tags for my notes, thanks to extensibility of ctags implementation that I use: Universal Ctags.

Long story short, we have to create a file in ~/.ctags.d with the following content:

--langdef=notes
--languages=notes
--langmap=notes:.md
--kinddef-notes=t,tag,tags
--kinddef-notes=d,doctag,doctags

--_tabledef-notes=main
--_tabledef-notes=frontmatter
--_tabledef-notes=fmtags

--_mtable-regex-notes=main/---//{tenter=frontmatter}
--_mtable-regex-notes=main/:([a-zA-Z][a-zA-Z0-9]*):/\1/t/{mgroup=1}{_advanceTo=1end}
--_mtable-regex-notes=main/.//

--_mtable-regex-notes=frontmatter/^tags: *//{tenter=fmtags}
--_mtable-regex-notes=frontmatter/---//{tleave}
--_mtable-regex-notes=frontmatter/.//

--_mtable-regex-notes=fmtags/([a-zA-Z][a-zA-Z0-9]*)/\1/d/
--_mtable-regex-notes=fmtags/\]//{tleave}
--_mtable-regex-notes=fmtags/://{tleave}
--_mtable-regex-notes=fmtags/---//{tleave}{_advanceTo=0start}
--_mtable-regex-notes=fmtags/.//

This defines a new regex parser for files with .md extension. This parser is aware of its context (i.e. whether it is inside Front Matter or not) and works like this:

  1. Start in main table (context). This is generic context able to capture org-mode style tags. Thanks to {_advanceTo=1end} directive, a single colon between tags acts as a separator (i.e. will be the start character for the next iteration of ctags regex engine).
  2. When we’re inside main context, look out for 3 dashes. When they’re found, push frontmatter context on top of the stack. This context itself acts as intermediate context: it will pop itself when another 3 dashes occur or it will push fmtags context when tags: text is found.
  3. When in fmtags context, any alphanumeric string will be matched as a tag. This automatically handles two forms of YAML’s lists: one-line inside brackets, and multiline:

    tags: [foo, bar]
    tags:
      - foo
      - bar
    
  4. fmtags context pops when 3 dashes or colon (some other metadata) or list closing bracket is found. 3 dashes mean end of Front Matter and thanks to {_advanceTo=0start} after the pop they’re found again by frontmatter table, which can in turn pop again and bring us back to main context.

  5. All tags must start with a letter (and not a number) to avoid interpreting time as a tag (e.g. 12:00:13 AM).

Ctags regex parser uses Extended Regular Expression syntax for regular expressions as defined by POSIZ and implemented by glibc. Unfortunately it has many limitations. For example, my tag-matching regular expression (([a-zA-Z][a-zA-Z0-9]*)) fails to match non-ASCII characters, which is kinda braindead in the age of Unicode. For now I changed some of my Polish tags to their English names, but I’ll keep on searching for more elegant solution.