Markdown To Html Python



In the previous article we looked at what static sites are, and how they work.

Now we will look at how to convert a single markdown file into an HTML file.

Markdown to HTML Converter The purpose of this project is to create a simple to use python module that can convert markdown files into rich HTML. Convert the markdown to an HTML fragment (the page content). Combine the meta-data and page content with the HTML template to create a complete HTML file. Fortunately, if we use the right Python libraries, each of these steps is very easy.

Html To Md

The conversion process

  • Tablehtml=markdown.markdown(tablemd, extensions='markdown.extensions.tables') For those who want to adding their own additions or changes to the syntax of Markdown, you can use MyExtension as follow: from markdown.extensions import Extension class MyExtension(Extension): #.
  • Converting MArkdown to html 1 - Installing packages. The two packages that we will use are. Python-Markdown Pygments Just as the name suggests, Python-Markdown is the Python package that we will use for the Markdown to HTML conversion. The second library, Pygments, will be used to add the syntax highlighting to the code blocks.

This diagram from the previous article shows the basic process for converting a set of markdown files into the required HTML files for a complete website:

This time we will look in more detail at what is involved in converting a single page of markdown into the corresponding HTML file:

Convert markdown to html in python

Here is an example markdown file, test.md:

This actually isn't a pure markdown file. The top part of the file is meta-data for the page, in a format called yaml. Many static site generators use a similar system. The yaml is contained between the two '---' markers. The rest of the file (after the second '---') is the markdown content of the file. But for brevity we will call the entire file a markdown file.

Converting this page to HTML actually involves 4 separate tasks:

  • Split the file into yaml and markdown parts
  • Extract the meta-data from the YAML.
  • Convert the markdown to an HTML fragment (the page content).
  • Combine the meta-data and page content with the HTML template to create a complete HTML file.

Fortunately, if we use the right Python libraries, each of these steps is very easy.

Splitting the file

This part is fairly standard Python. We read the markdown file in, line by line, and create two strings, ym that contains the yaml text, and md that contains the markdown text.

Python allows us to treat a text file as a sequence of lines of text, that we can loop through using a for loop.

Markdown To Html Python

The first loop discards strings until we find the first '---'. The second loop reads all the strings until the next '---'. Those are the yaml_lines. Finally, all the remaining lines after the second '---' are the markdown data.

We join all the yaml_lines to form a string ym. We join all the lines of markdown data to form the string md.

Parsing the yaml data

We will use the Python yaml library to parse the yaml data, like this:

This parses a block of yaml text and creates a dictionary with the result. Here is what it prints:

This is the same data as we had on the test.md file, but now in the form of a Python dictionary.

Notice that the tags element has a list of values. That is because the yaml header uses a syntax for tags that allows for multiple values.

Converting the markdown data

Markdown

Here we convert the second part of the file, the markdown data, into an html fragment, like this:

We are using the markdown library to do the conversion. This takes a markdown format string and returns an html string. Based on the markdown code above, the html content string will be:

As you can see it correctly marked up the bold and italic text, hyperlink, and image. The markdown method has several extensions that can be added, for example to provide syntax highlighting, but we aren't using those here.

The output is an html fragment. It places each paragraph inside its own paragraph tags, but it doesn't provide higher level tags such as a body tag. It is assumed that the html fragment will be place within a full html document (which we will do next).

Creating the full html

We create our final html using a template like this:

This template is just a basic html page. For a real website, you would probably want to use something more sophisticated, maybe a responsive template and some CSS styling.

But the basic method is the same. You use a full html page template, but with placeholders for variable content such as the title of the page, the author's name, and the main content itself.

The placeholders are enclosed in double curly brackets, for example {{title}}. We use the pystache module to substitute real values for the placeholders to create the final html. Here is the code:

The render function accepts the html template, plus a dictionary that maps the template names on to their values.

Notice that the info dictionary we are using comes straight from the yaml parser. It already contains entries for the title, author and date. The trick here is to make sure that each tag in the html template exactly matches the equivalent field in the yaml header. That way, pystache will be looking for the same tags that the yaml parser stored.

Well that isn't quite true. The info dictionary doesn't yet have an entry for content, because the content comes from the markdown. So we add and extra element to the dictionary, called 'content', containing the processed markdown content.

The other thing to notice is that we use triple brackets for content - {{{content}}}. The reason for this is that the content is raw html data:

Python Markdown To Html With Css

  • For {{value}}, pystache renders the value assuming it is text that you want to display. If it contains html characters such as < it will use escape characters so the the symbol is displayed as a < in the browser. That is what you would want in the page title, for instance.
  • For {{{value}}}, pystache renders the text unaltered, so it the text contains <p>, it will cause a paragraph break. This is what you want for the page content, which does include paragraph breaks.

Putting it all together

This has taken a bit of explaining, but if you actually look at the code to convert the yaml plus markdown into a final html page, it is remarkably simple:

In the next article we will look at how to build a complete site.

In this little tutorial, I want to show you in 5 simple steps how easyit is to add code syntax highlighting to your blog articles.

There are more sophisticated approaches using static site generators,e.g., nikola, but the focus hereis to give you the brief introduction of how it generally works.

All the files I will be using as examples in this tutorial can bedownload from the GitHub repository/rasbt/python_reference/tutorials/markdown_syntax_highlighting

Update:

The people atWebucator (aprovider of Python training classes) created a nice video tutorial fromthis blog post that nicely explains all the essential steps in less than4 minutes! Check it out on YouTube:

1 - Installing packages

The two packages that we will use are

Just as the name suggests, Python-Markdown is the Python package that wewill use for the Markdown to HTML conversion. The second library,Pygments, will be used to add the syntax highlighting to the codeblocks.
Conveniently, both libraries can be installed via pip:

and

(For alternative ways to install the Python-Markdown package, please seethe documentation)

2 - Writing a Markdown document

Now, let us compose a simple Markdown document including some Pythoncode blocks in any/our favorite Markdown editor.

Note that the syntax highlighting does not only work for Python, butother programming languages.

So in the case of C++, for example:

Since the CodeHilite extension in Python-Markdown uses Pygments, everyprogramming language that is listedhere currently has support for syntaxhighlighting.

3 - Converting the Markdown document to HTML

After we created our Markdown document, we are going to usePython-Markdown directly from the command line to convert it into anHTML document.

Note that we can also import Python-Markdown as a module in our Pythonscripts, and it comes with a rich repertory of different functions,which are listed in the libraryreference.

The basic command line usage to convert a Markdown document into HTMLwould be:

However, since we want to have syntax highlighting for our Python code,we will use Python-Markdown’s CodeHiliteextensionby providing an additional -x codehilite argument on the command line:

This will create the HTML body with our Markdown code converted to HTMLwith the Python code blocks annotated for the syntax highlighting.

4 - Generating the CSS

If we open thebody.htmlfile now, which we have created in the previous section, we will noticethat it doesn’t have the Python code colored yet.

What is missing is the CSS code for adding the colors to our annotatedPython code block. But we can simply create such a CSS file viaPygments from the command line.

Note that we usually only need to create thecodehilite.cssfile once and insert a link in all our HTML files that we created viaPython-Markdown to get the syntax coloring

5 - Insert into your HTML body

In order to include a link to thecodehilite.cssfile for syntax coloring in our converted HTML file, we have to add thefollowing line to the header section.

<link type='text/css' href='./codehilite.css'>

Now, we can insert the HTML body(body.html),which was created from our Markdown document, directly into our finalHTML file (e.g., our blog article template).

Markdown To Html Python Online

If we open ourfinal.htmlfile in our web browser now, we can the pretty Python syntaxhighlighting.

Useful links:

  • languages supported by Pygments