User guide¶

A Simple filter¶

Suppose we want to create a filter that sets all headers to level 1. For this, write this python script:

"""
Set all headers to level 1
"""

from panflute import *

def action(elem, doc):
    if isinstance(elem, Header):
        elem.level = 1

def main(doc=None):
    return run_filter(action, doc=doc) 

if __name__ == '__main__':
    main()

Note

a more complete template is located here

More complex filters¶

We might want filters that replace an element instead of just modifying it. For instance, suppose we want to replace all emphasized text with striked out text:

"""
Replace Emph elements with Strikeout elements
"""

from panflute import *


def action(elem, doc):
    if isinstance(elem, Emph):
        return Strikeout(*elem.content)


def main(doc=None):
    return run_filter(action, doc=doc)


if __name__ == '__main__':
    main()

Or if we want to remove all tables:

"""
Remove all tables
"""

from panflute import *


def action(elem, doc):
    if isinstance(elem, Table):
        return []


def main(doc=None):
    return run_filter(action, doc=doc)


if __name__ == '__main__':
    main()

Globals and backmatter¶

Suppose we want to add a table of contents based on all headers, or move all tables to a specific location in the document. This requires tracking global variables (which can be stored as attributes of doc).

To add a table of contents at the beginning:

"""
Add table of contents at the beginning;
uses optional metadata value 'toc-depth'
"""

from panflute import *


def prepare(doc):
    doc.toc = BulletList()
    doc.depth = int(doc.get_metadata('toc-depth', default=1))


def action(elem, doc):
    if isinstance(elem, Header) and elem.level <= doc.depth:
        item = ListItem(Plain(*elem.content))
        doc.toc.content.append(item)


def finalize(doc):
    doc.content.insert(0, doc.toc)
    del doc.toc, doc.depth


def main(doc=None):
    return run_filter(action, prepare=prepare, finalize=finalize, doc=doc) 


if __name__ == '__main__':
    main()

To move all tables to the place where the string $tables is:

"""
Move tables to where the string $tables is.
"""

from panflute import *


def prepare(doc):
    doc.backmatter = []


def action(elem, doc):
    if isinstance(elem, Table):
        doc.backmatter.append(elem)
        return []


def finalize(doc):
    div = Div(*doc.backmatter)
    doc = doc.replace_keyword('$tables', div)


def main(doc=None):
    return run_filter(action, prepare, finalize, doc=doc)


if __name__ == '__main__':
    main()

Using the included batteries¶

There are several functions and methods that make your life easier, such as the replace_keyword method shown above.

Other useful functions include convert_text (to load and parse markdown or other formatted text) and stringify (to extract the underlying text from an element and its children). For metadata, you can use the doc.get_metadata attribute to extract user–specified options (booleans, strings, etc.)

For instance, you can combine these functions to allow for include directives (so you can include and parse markdown files from other files).

"""
Panflute filter to allow file includes

Each include statement has its own line and has the syntax:

    $include ../somefolder/somefile

Each include statement must be in its own paragraph. That is, in its own line
and separated by blank lines.

If no extension was given, ".md" is assumed.
"""

import os
import panflute as pf


def is_include_line(elem):
    if len(elem.content) < 3:
        return False
    elif not all (isinstance(x, (pf.Str, pf.Space)) for x in elem.content):
        return False
    elif elem.content[0].text != '$include':
        return False
    elif type(elem.content[1]) != pf.Space:
        return False
    else:
        return True


def get_filename(elem):
    fn = pf.stringify(elem, newlines=False).split(maxsplit=1)[1]
    if not os.path.splitext(fn)[1]:
        fn += '.md'
    return fn


def action(elem, doc):
    if isinstance(elem, pf.Para) and is_include_line(elem):
        
        fn = get_filename(elem)
        if not os.path.isfile(fn):
            return
        
        with open(fn) as f:
            raw = f.read()

        new_elems = pf.convert_text(raw)
        
        # Alternative A:
        return new_elems
        # Alternative B:
        # div = pf.Div(*new_elems, attributes={'source': fn})
        # return div


def main(doc=None):
    return pf.run_filter(action, doc=doc) 


if __name__ == '__main__':
    main()

YAML code blocks¶

A YAML filter is a filter that parses fenced code blocks that contain YAML metadata. For instance:

Some text

~~~ csv
title: Some Title
has-header: True
---
Col1, Col2, Col3
1, 2, 3
10, 20, 30
~~~

More text

Note that fenced code blocks use three or more tildes or backticks as separators. Within a code block, use three hyphens or three dots to separate the YAML options from the rest of the block.

As an example, we will design a filter that will be applied to all code blocks with the csv class, like the one shown above. To avoid boilerplate code (such as parsing the YAML part), we use the useful yaml_filter function:

"""
Panflute filter to parse CSV in fenced YAML code blocks
"""

import io
import csv
import panflute as pf


def fenced_action(options, data, element, doc):
    # We'll only run this for CodeBlock elements of class 'csv'
    title = options.get('title', 'Untitled Table')
    title = [pf.Str(title)]
    has_header = options.get('has-header', False)

    with io.StringIO(data) as f:
        reader = csv.reader(f)
        body = []
        for row in reader:
            cells = [pf.TableCell(pf.Plain(pf.Str(x))) for x in row]
            body.append(pf.TableRow(*cells))

    header = body.pop(0) if has_header else None
    table = pf.Table(*body, header=header, caption=title)
    return table


def main(doc=None):
    return pf.run_filter(pf.yaml_filter, tag='csv', function=fenced_action,
                      doc=doc)


if __name__ == '__main__':
    main()

Note

a more complete template is here , a fully developed filter for CSVs is also available.

Note

yaml_filter now allows a strict_yaml=True option, which allows multiple YAML blocks, but with the caveat that all YAML blocks must start with — and end with — or ….

Calling external programs¶

We might also want to embed results from other programs.

One option is to do so through Python’s internals. For instance, we can use fetch data from wikipedia and show it on the document. Thus, the following script will replace links like these: [Pandoc](wiki://) With this “Pandoc is a free and open-source software document converter…”.

"""
Panflute filter that embeds wikipedia text

Replaces markdown such as [Stack Overflow](wiki://) with the resulting text.
"""

import requests
import panflute as pf


def action(elem, doc):
    if isinstance(elem, pf.Link) and elem.url.startswith('wiki://'):
        title = pf.stringify(elem).strip()
        baseurl = 'https://en.wikipedia.org/w/api.php'
        query = {'format': 'json', 'action': 'query', 'prop': 'extracts',
            'explaintext': '', 'titles': title}
        r = requests.get(baseurl, params=query)
        data = r.json()
        extract = list(data['query']['pages'].values())[0]['extract']
        extract = extract.split('.', maxsplit=1)[0]
        return pf.RawInline(extract)


def main(doc=None):
    return pf.run_filter(action, doc=doc) 


if __name__ == '__main__':
    main()

Alternatively, we might want to run other programs through the shell. For this, explore the shell function.

Navigating through the document tree¶

You might wish to apply a filter that depends on the parent or sibling objects of an element. For instance, Modify the first row (TableRow) of a table, or all the Str items nested within a header.

For this, every element has a .parent attribute (and the related .next, .prev, .ancestor(#), `.index, .offset(#) attributes).

For example, the code below will emphasize all text in the last row of every table:

"""
Make text in the last row of every table bold
"""

import panflute as pf


def action(elem, doc):
    if isinstance(elem, pf.TableRow):
        # Exclude table headers (which are not in a list)
        if elem.index is None:
            return

        if elem.next is None:
            pf.debug(elem)
            elem.walk(make_emph)


def make_emph(elem, doc):
    if isinstance(elem, pf.Str):
        return pf.Emph(elem)


def main(doc=None):
    return pf.run_filter(action, doc=doc) 


if __name__ == '__main__':
    main()

Running filters automatically¶

If you run panflute as a filter (pandoc ... -F panflute), then panflute will run all filters specified in the metadata field panflute-filters. This is faster and more convenient than typing the precise list and order of filters used every time the document is run.

You can also specify the location of the filters with the panflute-path field, which will take precedence over Pandoc’s search locations (., $datadir/filters, and $path).

Example:

---
title: Some title
panflute-filters: [remove-tables, include]
panflute-path: 'panflute/docs/source'
...

Lorem ipsum

For this to work, the filters need to have a very specific structure, with a main() function of the following form:

"""
Pandoc filter using panflute
"""

import panflute as pf


def prepare(doc):
    pass


def action(elem, doc):
    if isinstance(elem, pf.Element) and doc.format == 'latex':
        pass
        # return None -> element unchanged
        # return [] -> delete element


def finalize(doc):
    pass


def main(doc=None):
    return pf.run_filter(action,
                         prepare=prepare,
                         finalize=finalize,
                         doc=doc) 


if __name__ == '__main__':
    main()

Note

To be able to run filters automatically, the main function needs to be exactly as shown, with an optional argument doc, that gets passed to run_filter, and which is return ed back.

Note

You can add panflute-verbose: true to the metadata to display debugging information, including the folders searched and the filters executed.

User guide¶

A Simple filter¶

More complex filters¶

Globals and backmatter¶

Using the included batteries¶

YAML code blocks¶

Calling external programs¶

Navigating through the document tree¶

Running filters automatically¶

Stay Informed

Table of Contents

Related Topics