A translated version of the Ruby algorithm is used in md-toc.
The original one is repored here:
I could not find the code directly responsable for the anchor link generation.
Apparently GitHub (and possibly others) filter HTML tags in the anchor links.
This is an undocumented feature (?) so the
remove_html_tags function was
added to address this problem. Instead of designing an algorithm to detect HTML tags,
regular expressions came in handy. All the rules
present in https://spec.commonmark.org/0.28/#raw-html have been followed by the
letter. Regular expressions are divided by type and are composed at the end
by concatenating all the strings. For example:
1# Comment start.
2COS = '<!--'
3# Comment text.
4COT = '((?!>|->)(?:(?!--).))+(?!-).?'
5# Comment end.
6COE = '-->'
8CO = COS + COT + COE
HTML tags are stripped using the
re.sub replace function, for example:
line = re.sub(CO, str(), line, flags=re.DOTALL)
GitHub added an extension in GFM to ignore certain HTML tags, valid at least from versions 0.27.1.gfm.3 to 0.29.0.gfm.0: