The hardware and bandwidth for this mirror is donated by METANET, the Webhosting and Full Service-Cloud Provider.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]metanet.ch.

md4r

Lifecycle: experimental R-CMD-check

Provides an R wrapper for the MD4C (Markdown for C) library.Functions exist for markdown parsing (CommonMark compliant) along with support for other common markdown extensions (e.g. GitHub flavored markdown, LaTeX equation support, etc.). The package also provides a number of high level functions for exploring and manipulating markdown ASTs as well as translating and displaying the documents.

Installation

Install md4r from CRAN:

install.packages("md4r")

or install the latest development version package from GitHub:

remotes::install_github("rundel/md4r")

Example

We will start with a simple example of parsing a markdown file using the basic CommonMark dialect.

md_file = system.file("examples/commonmark.md", package = "md4r")
readLines(md_file) |> cat(sep='\n')
#> ## Try CommonMark
#> 
#> You can try CommonMark here.  This dingus is powered by
#> [commonmark.js](https://github.com/commonmark/commonmark.js), the
#> JavaScript reference implementation.
#> 
#> 1. item one
#> 2. item two
#>    - sublist
#>    - sublist

this file (or markdown text) can be processed using the parse_md function which creates an abstract syntax tree representation of the document (as a list of lists of lists … with custom S3 classes)

library(md4r)
(md = parse_md(md_file))
#> md_block_doc [flags: "MD_DIALECT_COMMONMARK"]
#> ├── md_block_h [level: 2]
#> │   └── md_text_normal - "Try CommonMark"
#> ├── md_block_p
#> │   ├── md_text_normal - "You can try CommonMark here.  This dingus is powered by"
#> │   ├── md_text_softbreak
#> │   ├── md_span_a [title: "", href: "https://github.com/commonmark/commonmark.js"]
#> │   │   └── md_text_normal - "commonmark.js"
#> │   ├── md_text_normal - ", the"
#> │   ├── md_text_softbreak
#> │   └── md_text_normal - "JavaScript reference implementation."
#> └── md_block_ol [start: 1, tight: 1, mark_delimiter: "."]
#>     ├── md_block_li
#>     │   └── md_text_normal - "item one"
#>     └── md_block_li
#>         ├── md_text_normal - "item two"
#>         └── md_block_ul [tight: 1, mark: "-"]
#>             ├── md_block_li
#>             │   └── md_text_normal - "sublist"
#>             └── md_block_li
#>                 └── md_text_normal - "sublist"
str(md)
#> List of 3
#>  $ :List of 1
#>   ..$ : 'md_text_normal' chr "Try CommonMark"
#>   ..- attr(*, "level")= num 2
#>   ..- attr(*, "class")= chr [1:3] "md_block_h" "md_block" "md_node"
#>  $ :List of 6
#>   ..$ : 'md_text_normal' chr "You can try CommonMark here.  This dingus is powered by"
#>   ..$ : list()
#>   .. ..- attr(*, "class")= chr [1:3] "md_text_softbreak" "md_text" "md_node"
#>   ..$ :List of 1
#>   .. ..$ : 'md_text_normal' chr "commonmark.js"
#>   .. ..- attr(*, "title")= chr ""
#>   .. ..- attr(*, "href")= chr "https://github.com/commonmark/commonmark.js"
#>   .. ..- attr(*, "class")= chr [1:3] "md_span_a" "md_span" "md_node"
#>   ..$ : 'md_text_normal' chr ", the"
#>   ..$ : list()
#>   .. ..- attr(*, "class")= chr [1:3] "md_text_softbreak" "md_text" "md_node"
#>   ..$ : 'md_text_normal' chr "JavaScript reference implementation."
#>   ..- attr(*, "class")= chr [1:3] "md_block_p" "md_block" "md_node"
#>  $ :List of 2
...

As the AST is just a collection of R lists - we can use subsetting to extract specific elements of the document

parse_md(md_file)[[1]]
#> md_block_h [level: 2]
#> └── md_text_normal - "Try CommonMark"
parse_md(md_file)[[2]]
#> md_block_p
#> ├── md_text_normal - "You can try CommonMark here.  This dingus is powered by"
#> ├── md_text_softbreak
#> ├── md_span_a [title: "", href: "https://github.com/commonmark/commonmark.js"]
#> │   └── md_text_normal - "commonmark.js"
#> ├── md_text_normal - ", the"
#> ├── md_text_softbreak
#> └── md_text_normal - "JavaScript reference implementation."
parse_md(md_file)[[3]]
#> md_block_ol [start: 1, tight: 1, mark_delimiter: "."]
#> ├── md_block_li
#> │   └── md_text_normal - "item one"
#> └── md_block_li
#>     ├── md_text_normal - "item two"
#>     └── md_block_ul [tight: 1, mark: "-"]
#>         ├── md_block_li
#>         │   └── md_text_normal - "sublist"
#>         └── md_block_li
#>             └── md_text_normal - "sublist"

or more advanced tools like rapply() to extract text content

rapply(md, as.character, "md_text")
#> [1] "Try CommonMark"                                         
#> [2] "You can try CommonMark here.  This dingus is powered by"
#> [3] "commonmark.js"                                          
#> [4] ", the"                                                  
#> [5] "JavaScript reference implementation."                   
#> [6] "item one"                                               
#> [7] "item two"                                               
#> [8] "sublist"                                                
#> [9] "sublist"

Additionally, the AST and any component can be converted back into markdown

to_md(md) |> cat(sep='\n')
#> ## Try CommonMark
#> You can try CommonMark here.  This dingus is powered by
#> [commonmark.js](<https://github.com/commonmark/commonmark.js>), the
#> JavaScript reference implementation.
#> 
#>  1. item one
#>  2. item two
#>      - sublist
#>      - sublist

or into html

to_html(md) |> cat(sep='\n')

Try CommonMark

You can try CommonMark here. This dingus is powered by commonmark.js , the JavaScript reference implementation.

  1. item one
  2. item two
    • sublist
    • sublist

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.