User guide¶
A Simple filter¶
Suppose we want to create a filter that sets all headers to level 1. For this, write this python script:
"""
Set all headers to level 1
"""
from panflute import *
def action(elem, doc):
if isinstance(elem, Header):
elem.level = 1
def main(doc=None):
return run_filter(action, doc=doc)
if __name__ == '__main__':
main()
Note
a more complete template is located here
More complex filters¶
We might want filters that replace an element instead of just modifying it. For instance, suppose we want to replace all emphasized text with striked out text:
"""
Replace Emph elements with Strikeout elements
"""
from panflute import *
def action(elem, doc):
if isinstance(elem, Emph):
return Strikeout(*elem.content)
def main(doc=None):
return run_filter(action, doc=doc)
if __name__ == '__main__':
main()
Or if we want to remove all tables:
"""
Remove all tables
"""
from panflute import *
def action(elem, doc):
if isinstance(elem, Table):
return []
def main(doc=None):
return run_filter(action, doc=doc)
if __name__ == '__main__':
main()
Globals and backmatter¶
Suppose we want to add a table of contents based on all headers, or move all tables to a specific location in the document. This requires tracking global
variables (which can be stored as attributes of doc
).
To add a table of contents at the beginning:
"""
Add table of contents at the beginning;
uses optional metadata value 'toc-depth'
"""
from panflute import *
def prepare(doc):
doc.toc = BulletList()
doc.depth = int(doc.get_metadata('toc-depth', default=1))
def action(elem, doc):
if isinstance(elem, Header) and elem.level <= doc.depth:
item = ListItem(Plain(*elem.content))
doc.toc.content.append(item)
def finalize(doc):
doc.content.insert(0, doc.toc)
del doc.toc, doc.depth
def main(doc=None):
return run_filter(action, prepare=prepare, finalize=finalize, doc=doc)
if __name__ == '__main__':
main()
To move all tables to the place where the string $tables is:
"""
Move tables to where the string $tables is.
"""
from panflute import *
def prepare(doc):
doc.backmatter = []
def action(elem, doc):
if isinstance(elem, Table):
doc.backmatter.append(elem)
return []
def finalize(doc):
div = Div(*doc.backmatter)
doc = doc.replace_keyword('$tables', div)
def main(doc=None):
return run_filter(action, prepare, finalize, doc=doc)
if __name__ == '__main__':
main()
Using the included batteries¶
There are several functions and methods that make your life easier, such as the replace_keyword method shown above.
Other useful functions include convert_text (to load and parse markdown or other formatted text) and stringify (to extract the underlying text from an element and its children). For metadata, you can use the doc.get_metadata attribute to extract user–specified options (booleans, strings, etc.)
For instance, you can combine these functions to allow for include directives (so you can include and parse markdown files from other files).
"""
Panflute filter to allow file includes
Each include statement has its own line and has the syntax:
$include ../somefolder/somefile
Each include statement must be in its own paragraph. That is, in its own line
and separated by blank lines.
If no extension was given, ".md" is assumed.
"""
import os
import panflute as pf
def is_include_line(elem):
if len(elem.content) < 3:
return False
elif not all (isinstance(x, (pf.Str, pf.Space)) for x in elem.content):
return False
elif elem.content[0].text != '$include':
return False
elif type(elem.content[1]) != pf.Space:
return False
else:
return True
def get_filename(elem):
fn = pf.stringify(elem, newlines=False).split(maxsplit=1)[1]
if not os.path.splitext(fn)[1]:
fn += '.md'
return fn
def action(elem, doc):
if isinstance(elem, pf.Para) and is_include_line(elem):
fn = get_filename(elem)
if not os.path.isfile(fn):
return
with open(fn) as f:
raw = f.read()
new_elems = pf.convert_text(raw)
# Alternative A:
return new_elems
# Alternative B:
# div = pf.Div(*new_elems, attributes={'source': fn})
# return div
def main(doc=None):
return pf.run_filter(action, doc=doc)
if __name__ == '__main__':
main()
YAML code blocks¶
A YAML filter is a filter that parses fenced code blocks that contain YAML metadata. For instance:
Some text
~~~ csv
title: Some Title
has-header: True
---
Col1, Col2, Col3
1, 2, 3
10, 20, 30
~~~
More text
Note that fenced code blocks use three or more tildes or backticks as separators. Within a code block, use three hyphens or three dots to separate the YAML options from the rest of the block.
As an example, we will design a filter that will be applied to all code blocks with the csv class, like the one shown above. To avoid boilerplate code (such as parsing the YAML part), we use the useful yaml_filter function:
"""
Panflute filter to parse CSV in fenced YAML code blocks
"""
import io
import csv
import panflute as pf
def fenced_action(options, data, element, doc):
# We'll only run this for CodeBlock elements of class 'csv'
title = options.get('title', 'Untitled Table')
title = [pf.Str(title)]
has_header = options.get('has-header', False)
with io.StringIO(data) as f:
reader = csv.reader(f)
body = []
for row in reader:
cells = [pf.TableCell(pf.Plain(pf.Str(x))) for x in row]
body.append(pf.TableRow(*cells))
header = body.pop(0) if has_header else None
table = pf.Table(*body, header=header, caption=title)
return table
def main(doc=None):
return pf.run_filter(pf.yaml_filter, tag='csv', function=fenced_action,
doc=doc)
if __name__ == '__main__':
main()
Note
a more complete template is here , a fully developed filter for CSVs is also available.
Note
yaml_filter now allows a strict_yaml=True option, which allows multiple YAML blocks, but with the caveat that all YAML blocks must start with — and end with — or ….
Calling external programs¶
We might also want to embed results from other programs.
One option is to do so through Python’s internals.
For instance, we can use fetch data from wikipedia and show it on the document.
Thus, the following script will replace links like these: [Pandoc](wiki://)
With this “Pandoc is a free and open-source software document converter…”.
"""
Panflute filter that embeds wikipedia text
Replaces markdown such as [Stack Overflow](wiki://) with the resulting text.
"""
import requests
import panflute as pf
def action(elem, doc):
if isinstance(elem, pf.Link) and elem.url.startswith('wiki://'):
title = pf.stringify(elem).strip()
baseurl = 'https://en.wikipedia.org/w/api.php'
query = {'format': 'json', 'action': 'query', 'prop': 'extracts',
'explaintext': '', 'titles': title}
r = requests.get(baseurl, params=query)
data = r.json()
extract = list(data['query']['pages'].values())[0]['extract']
extract = extract.split('.', maxsplit=1)[0]
return pf.RawInline(extract)
def main(doc=None):
return pf.run_filter(action, doc=doc)
if __name__ == '__main__':
main()
Alternatively, we might want to run other programs through the shell. For this, explore the shell function.
Running filters automatically¶
If you run panflute as a filter (pandoc ... -F panflute
), then panflute will run all filters specified in the metadata field panflute-filters
. This is faster and more convenient than typing the precise list and order of filters used every time the document is run.
You can also specify the location of the filters with the panflute-path
field, which will take precedence over Pandoc’s search locations (.
, $datadir/filters
, and $path
).
Example:
---
title: Some title
panflute-filters: [remove-tables, include]
panflute-path: 'panflute/docs/source'
...
Lorem ipsum
For this to work, the filters need to have a very specific structure, with a main() function of the following form:
"""
Pandoc filter using panflute
"""
import panflute as pf
def prepare(doc):
pass
def action(elem, doc):
if isinstance(elem, pf.Element) and doc.format == 'latex':
pass
# return None -> element unchanged
# return [] -> delete element
def finalize(doc):
pass
def main(doc=None):
return pf.run_filter(action,
prepare=prepare,
finalize=finalize,
doc=doc)
if __name__ == '__main__':
main()
Note
To be able to run filters automatically, the main function needs to be exactly as shown, with an optional argument doc
, that gets passed to run_filter
, and which is return
ed back.
Note
You can add panflute-verbose: true
to the metadata to display debugging information, including the folders searched and the filters executed.