⚠️ starting from version 9 all the functions are only accessible via the full module path. For example: md_toc.build_toc(...) is now md_toc.api.build_toc(...) ⚠️

Developer Interface#

Functions#

Important

If you are a developer and you need a quick way to generate a TOC, the function you may want to use is build_toc

md_toc.api.get_atx_heading

Given a line extract the link label and its type.

md_toc.api.get_md_header

Build a data structure with the elements needed to create a TOC line.

md_toc.api.build_toc_line

Build the TOC line.

md_toc.api.increase_index_ordered_list

Compute the current index for ordered list table of contents.

md_toc.api.anchor_link_punctuation_filter

Remove punctuation and other unwanted characters from the anchor link string.

md_toc.api.build_anchor_link

Apply the specified slug rule to build the anchor link.

md_toc.api.build_toc

Build the table of contents of a single file.

md_toc.api.build_multiple_tocs

Parse files by line and build the table of contents of each file.

md_toc.api.write_string_on_file_between_markers

Write the table of contents on a single file.

md_toc.api.write_strings_on_files_between_markers

Write the table of contents on multiple files.

md_toc.api.init_indentation_log

Create a data structure that holds list marker information.

md_toc.api.compute_toc_line_indentation_spaces

Compute the number of indentation spaces for the TOC list element.

md_toc.api.build_toc_line_without_indentation

Return a list element of the table of contents.

md_toc.api.is_valid_code_fence_indent

Determine if the given line has valid indentation for a code block fence.

md_toc.api.is_opening_code_fence

Determine if the given line is possibly the opening of a fenced code block.

md_toc.api.is_closing_code_fence

Determine if the given line is the end of a fenced code block.

md_toc.api.tocs_equal

Check if the TOC already present in a file is the samw of the one passed to this function.

md_toc.api.remove_html_tags

Remove HTML tags.

md_toc.api.remove_emphasis

Remove markdown emphasis.

md_toc.api.replace_and_split_newlines

Replace all the newline characters with line feeds and separate the components.

md_toc.api.filter_indices_from_line

Given a line and a Python ranges, remove the characters in the ranges.

The main file.

Remove punctuation and other unwanted characters from the anchor link string.

Parameters:
  • input_string (str) – the unfiltered anchor link

  • parser (str) – decides rules on how to generate anchor links. Defaults to github.

Returns:

a string without the unwanted characters.

Return type:

str

Raises:

a built-in exception.

Apply the specified slug rule to build the anchor link.

Parameters:
  • header_text_trimmed (str) – the text that needs to be transformed in a link.

  • header_duplicate_counter (types.HeaderDuplicateCounter) – a data structure that keeps track of possible duplicate header links in order to avoid them. This is meaningful only for certain values of parser.

  • parser (str) – decides rules on how to generate anchor links. Defaults to github.

Returns:

None if the specified parser is not recognized, or the anchor link, otherwise.

Return type:

str

Raises:

a built-in exception.

Example:

>>> import md_toc
>>> md_toc.api.build_anchor_link('This is an example test       header', {}, 'gitlab')
'this-is-an-example-test-header'
md_toc.api.build_multiple_tocs(filenames: list[str], ordered: bool = False, no_links: bool = False, no_indentation: bool = False, no_list_coherence: bool = False, keep_header_levels: int = 3, parser: str = 'github', list_marker: str = '-', skip_lines: int = 0, constant_ordered_list: bool = False, newline_string: str = '\n') list[str]#

Parse files by line and build the table of contents of each file.

Parameters:
  • filenames (list) – the files that needs to be read.

  • ordered (bool) – decides whether to build an ordered list or not. Defaults to False.

  • no_links (bool) – disables the use of links. Defaults to False.

  • no_indentation (bool) – disables indentation in the list. Defaults to False.

  • keep_header_levels (int) – the maximum level of headers to be considered as such when building the table of contents. Defaults to 3.

  • parser (str) – decides rules on how to generate anchor links. Defaults to github.

  • skip_lines (int) – the number of lines to be skipped from the start of file before parsing for table of contents. Defaults to 0`.

  • list_marker (str) – a string that contains some of the first characters of the list element. Defaults to -.

  • constant_ordered_list (bool) – use a single integer as list marker. This sets ordered to True.

  • newline_string (str) – the newline separator. Defaults to os.linesep.

Returns:

toc_struct, the corresponding table of contents for each input file.

Return type:

list[str]

Raises:

a built-in exception.

Warning

In case of ordered TOCs you must explicitly pass one of the supported ordered list markers.

md_toc.api.build_toc(filename: str, ordered: bool = False, no_links: bool = False, no_indentation: bool = False, no_list_coherence: bool = False, keep_header_levels: int = 3, parser: str = 'github', list_marker: str = '-', skip_lines: int = 0, constant_ordered_list: bool = False, newline_string: str = '\n') str#

Build the table of contents of a single file.

Parameters:
  • filename (str) – the file that needs to be read.

  • ordered (bool) – decides whether to build an ordered list or not. Defaults to False.

  • no_links (bool) – disables the use of links. Defaults to False.

  • no_indentation (bool) – disables indentation in the list. Defaults to False.

  • no_list_coherence (bool) – if set to False checks header levels for consecutiveness. If they are not consecutive an exception is raised. For example: # ONE\n### TWO\n are not consecutive header levels while # ONE\n## TWO\n are. Defaults to False.

  • keep_header_levels (int) – the maximum level of headers to be considered as such when building the table of contents. Defaults to 3.

  • parser (str) – decides rules on how to generate anchor links. Defaults to github.

  • list_marker (str) – a string that contains some of the first characters of the list element. Defaults to -.

  • skip_lines (int) – the number of lines to be skipped from the start of file before parsing for table of contents. Defaults to 0`.

  • constant_ordered_list (bool) – use a single integer as list marker. This sets ordered to True.

  • newline_string (str) – the newline separator. Defaults to os.linesep.

Returns:

toc, the corresponding table of contents of the file.

Return type:

str

Raises:

a built-in exception.

Warning

In case of ordered TOCs you must explicitly pass one of the supported ordered list markers.

Example:

>>> import md_toc 
>>> with open('foo.md', 'w') as f: 
...     f.write('# This\n# Is an\n## Example\n') 
26
>>> md_toc.api.build_toc('foo.md') 
- [This](#this)
- [Is an](#is-an)
  - [Example](#example)
md_toc.api.build_toc_line(toc_line_no_indent: str, no_of_indentation_spaces: int = 0) str#

Build the TOC line.

Parameters:
  • toc_line_no_indent (str) – the TOC line without indentation.

  • no_of_indentation_spaces (int) – the number of indentation spaces. Defaults to 0.

Returns:

toc_line, a single line of the table of contents.

Return type:

str

Raises:

a built-in exception.

Example:

>>> import md_toc
>>> md_toc.api.build_toc_line('', 0)
''
Example:

>>> import md_toc
>>> md_toc.api.build_toc_line('my string', 10)
'          my string'
md_toc.api.build_toc_line_without_indentation(header: Header, ordered: bool = False, no_links: bool = False, index: int = 1, parser: str = 'github', list_marker: str = '-') str#

Return a list element of the table of contents.

Parameters:
  • header (types.Header) – a data structure that contains the original text, the trimmed text and the type of header.

  • ordered (bool) – if set to True, numbers will be used as list ids, otherwise a dash character. Defaults to False.

  • no_links (bool) – disables the use of links. Defaults to False.

  • index (int) – a number that will be used as list id in case of an ordered table of contents. Defaults to 1.

  • parser (str) – decides rules on how compute indentations. Defaults to github.

  • list_marker (str) – a string that contains some of the first characters of the list element. Defaults to -.

Returns:

toc_line_no_indent, a single line of the table of contents without indentation.

Return type:

str

Raises:

a built-in exception.

Warning

In case of ordered TOCs you must explicitly pass one of the supported ordered list markers.

md_toc.api.compute_toc_line_indentation_spaces(header_type_curr: int = 1, header_type_prev: int = 0, parser: str = 'github', ordered: bool = False, list_marker: str = '-', indentation_log: dict[md_toc.types.IndentationLogElement] = {1: {'indentation_spaces': 0, 'index': 0, 'list_marker': '-'}, 2: {'indentation_spaces': 0, 'index': 0, 'list_marker': '-'}, 3: {'indentation_spaces': 0, 'index': 0, 'list_marker': '-'}, 4: {'indentation_spaces': 0, 'index': 0, 'list_marker': '-'}, 5: {'indentation_spaces': 0, 'index': 0, 'list_marker': '-'}, 6: {'indentation_spaces': 0, 'index': 0, 'list_marker': '-'}}, index: int = 1)#

Compute the number of indentation spaces for the TOC list element.

Parameters:
  • header_type_curr (int) – the current type of header (h[1,…,Inf]). Defaults to 1.

  • header_type_prev (int) – the previous type of header (h[1,…,Inf]). Defaults to 0.

  • parser (str) – decides rules on how compute indentations. Defaults to github.

  • ordered (bool) – if set to True, numbers will be used as list ids instead of dash characters. Defaults to False.

  • list_marker (str) – a string that contains some of the first characters of the list element. Defaults to -.

  • indentation_log (dict[types.IndentationLogElement]) – a data structure that holds list marker information for ordered lists. Defaults to init_indentation_log('github', '.').

  • index (int) – a number that will be used as list id in case of an ordered table of contents. Defaults to 1.

Returns:

None

Return type:

None

Raises:

a built-in exception.

Warning

In case of ordered TOCs you must explicitly pass one of the supported ordered list markers.

md_toc.api.filter_indices_from_line(line: str, ranges: list[range]) str#

Given a line and a Python ranges, remove the characters in the ranges.

Parameters:
  • line (str) – a string.

  • ranges (list) – a list of Python ranges.

Returns:

the line without the specified indices.

Return type:

str

Raises:

a built-in exception.

md_toc.api.get_atx_heading(line: str, keep_header_levels: int = 3, parser: str = 'github', no_links: bool = False) list[md_toc.types.AtxHeadingStructElement]#

Given a line extract the link label and its type.

Parameters:
  • line (str) – the line to be examined. This string may include newline characters in between.

  • keep_header_levels (int) – the maximum level of headers to be considered as such when building the table of contents. Defaults to 3.

  • parser (str) – decides rules on how to generate the anchor text. Defaults to github.

  • no_links (bool) – disables the use of links.

Returns:

struct, a list of dictionaries

Return type:

list[types.AtxHeadingStructElement]

Raises:

GithubEmptyLinkLabel or GithubOverflowCharsLinkLabel or a built-in exception.

Note

license B applies for the github part. See docs/copyright_license.rst

md_toc.api.get_md_header(header_text_line: str, header_duplicate_counter: HeaderDuplicateCounter, keep_header_levels: int = 3, parser: str = 'github', no_links: bool = False) list[md_toc.types.Header]#

Build a data structure with the elements needed to create a TOC line.

Parameters:
  • header_text_line (str) – a single markdown line that needs to be transformed into a TOC line. This line may include nmultiple newline characters in between.

  • header_duplicate_counter (types.HeaderDuplicateCounter) – a data structure that contains the number of occurrencies of each header anchor link. This is used to avoid duplicate anchor links and it is meaningful only for certain values of parser.

  • keep_header_levels (int) – the maximum level of headers to be considered as such when building the table of contents. Defaults to 3.

  • parser (str) – decides rules on how to generate anchor links. Defaults to github.

Returns:

a list with elements None if the input line does not correspond to one of the designated cases or a list of data structures containing the necessary components to create a table of contents.

Return type:

list

Raises:

a built-in exception.

Note

This works like a wrapper to other functions.

md_toc.api.increase_index_ordered_list(header_type_count: HeaderTypeCounter, header_type_prev: int, header_type_curr: int, parser: str = 'github')#

Compute the current index for ordered list table of contents.

Parameters:
  • header_type_count (types.HeaderTypeCounter) – the count of each header type.

  • header_type_prev (int) – the previous type of header (h[1,…,Inf]).

  • header_type_curr (int) – the current type of header (h[1,…,Inf]).

  • parser (str) – decides rules on how to generate ordered list markers. Defaults to github.

Returns:

None

Return type:

None

Raises:

GithubOverflowOrderedListMarker or a built-in exception.

md_toc.api.init_indentation_log(parser: str = 'github', list_marker: str = '-') dict[md_toc.types.IndentationLogElement]#

Create a data structure that holds list marker information.

Parameters:
  • parser (str) – decides rules on how compute indentations. Defaults to github.

  • list_marker (str) – a string that contains some of the first characters of the list element. Defaults to -.

Returns:

indentation_log, the data structure.

Return type:

dict[types.IndentationLogElement]

Raises:

a built-in exception.

md_toc.api.is_closing_code_fence(line: str, fence: str, is_document_end: bool = False, parser: str = 'github') bool#

Determine if the given line is the end of a fenced code block.

Parameters:
  • line (str) – a single markdown line to evaluate.

  • is_document_end (bool) – This variable tells the function that the end of the file is reached. Defaults to False.

  • parser (str) – decides rules on how to generate the anchor text. Defaults to github.

Paramter fence:

a sequence of backticks or tildes marking the start of the current code block. This is usually the return value of the is_opening_code_fence function.

Returns:

True if the line ends the current code block. False otherwise.

Return type:

bool

Raises:

a built-in exception.

Example:

>>> import md_toc
>>> md_toc.api.is_closing_code_fence('```', '```')
True
Example:

>>> import md_toc
>>> md_toc.api.is_closing_code_fence('```', '~~~')
False
md_toc.api.is_opening_code_fence(line: str, parser: str = 'github') str | None#

Determine if the given line is possibly the opening of a fenced code block.

Parameters:
  • line (str) – a single markdown line to evaluate.

  • parser (str) – decides rules on how to generate the anchor text. Defaults to github.

Returns:

None if the input line is not an opening code fence. Otherwise, returns the string which will identify the closing code fence according to the input parsers’ rules.

Return type:

Optional[str]

Raises:

a built-in exception.

md_toc.api.is_valid_code_fence_indent(line: str, parser: str = 'github') bool#

Determine if the given line has valid indentation for a code block fence.

Parameters:
  • line (str) – a single markdown line to evaluate.

  • parser (str) – decides rules on how to generate the anchor text. Defaults to github.

Returns:

True if the given line has valid indentation or False otherwise.

Return type:

bool

Raises:

a built-in exception.

md_toc.api.remove_emphasis(line: str, parser: str = 'github') str#

Remove markdown emphasis.

Parameters:
  • line (str) – a string.

  • parser (str) – decides rules on how to find delimiters. Defaults to github.

Returns:

the input line without emphasis.

Return type:

str

Raises:

a built-in exception.

Note

Backslashes are preserved.

Example:

>>> import md_toc
>>> md_toc.api.remove_emphasis('__my string__ *is this* one')
'my string is this one'
md_toc.api.remove_html_tags(line: str, parser: str = 'github') str#

Remove HTML tags.

Parameters:
  • line (str) – a string.

  • parser (str) – decides rules on how to remove HTML tags. Defaults to github.

Returns:

the input string without HTML tags.

Return type:

str

Raises:

a built-in exception.

md_toc.api.replace_and_split_newlines(line: str) list[str]#

Replace all the newline characters with line feeds and separate the components.

Parameters:

line (str) – a string.

Returns:

a list of newline separated strings.

Return type:

list[str]

Raises:

a built-in exception.

md_toc.api.tocs_equal(current_toc: str, filename: str, marker: str) bool#

Check if the TOC already present in a file is the samw of the one passed to this function.

Parameters:
  • current_toc (str) – the new or current TOC. Do not include the <!--TOC-->\n\n and \n\n<!--TOC-->.

  • filename (str) – the filename with the TOC for the comparison already present in the file.

  • marker (str) – the TOC marker.

Returns:

True if the two TOCs are the same, False otherwise

Return type:

bool

Raises:

a built-in exception.

md_toc.api.write_string_on_file_between_markers(filename: str, string: str, marker: str, newline_string: str = '\n') bool#

Write the table of contents on a single file.

Parameters:
  • filename (str) – the file that needs to be read or modified.

  • string (str) – the string that will be written on the file.

  • marker (str) – a marker that will identify the start and the end of the string.

  • newline_string (str) – the new line separator. Defaults to os.linesep.

Returns:

True if new TOC is the same as the exising one, False otherwise.

Return type:

bool

Raises:

StdinIsNotAFileToBeWritten or an fpyutils exception or a built-in exception.

md_toc.api.write_strings_on_files_between_markers(filenames: list[str], strings: list[str], marker: str, newline_string: str = '\n') bool#

Write the table of contents on multiple files.

Parameters:
  • filenames (list) – the files that needs to be read or modified.

  • strings (list) – the strings that will be written on the file. Each string is associated with one file.

  • marker (str) – a marker that will identify the start and the end of the string.

  • newline_string (str) – the new line separator. Defaults to os.linesep.

Returns:

True if all TOCs are the same as the existing ones, False otherwise.

Return type:

bool

Raises:

an fpyutils exception or a built-in exception.

Exceptions#

Exceptions file.

exception md_toc.exceptions.GithubEmptyLinkLabel#

The link lables contains only whitespace characters or is empty.

exception md_toc.exceptions.GithubOverflowCharsLinkLabel#

Cannot parse link label.

exception md_toc.exceptions.GithubOverflowOrderedListMarker#

The ordered list marker number is too big.

exception md_toc.exceptions.StdinIsNotAFileToBeWritten#

stdin cannot be written onto.

exception md_toc.exceptions.StringCannotContainNewlines#

The specified string cannot contain newlines.

exception md_toc.exceptions.TocDoesNotRenderAsCoherentList#

TOC list indentations are either wrong or not what the user intended.

Types#

Complex dict type definitions.

class md_toc.types.AtxHeadingStructElement#

A single element of the list returned by the get_atx_heading function.

Parameters:
  • header_type (int) – h1 to h6 (1 -> 6).

  • trimmed (header_text) – the link label.

  • visible (bool) – if the line has a smaller header that keep_header_levels, then visible is set to False.

Note

header_type and header_text_trimmed are set to None if the line does not contain header elements according to the rules of the selected markdown parser. visible is set to True if the line needs to be saved, False if it just needed for duplicate counting.

class md_toc.types.Header#

A header object.

Parameters:
  • header_type (int) – h1 to h6 (1 -> 6).

  • text_original (str) – Raw text.

  • text_anchor_link (str) – Transformed text so it works as an anchor link.

  • visible (bool) – if True the header needs to be visible, if False it will not.

class md_toc.types.HeaderDuplicateCounter#

A header_duplicate_counter object.

Parameters:

key – a generic string corresponding to header links. Its value is the number of times key appears during the execution of md-toc.

Note

This dict can be empty.

class md_toc.types.HeaderTypeCounter#

The number of headers for each type, from h1 to h6.

class md_toc.types.IndentationLogElement#

An indentation_log_element object.

Parameters:
  • index (int) – values: 1 -> md_parser[‘github’][‘header’][‘max levels’] + 1.

  • marker (list) – ordered or undordered list marker.

  • spaces (indentation) – number of indentation spaces.