5

A Static Site Generator in PHP

 1 year ago
source link: https://devm.io/php/php-static-site
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Faster than Hugo or Zola

A Static Site Generator in PHP


Can PHP compete against a CMS created in Go or Rust and still be more performant? And if so, what are the essential elements for performance? Are these elements also transferable to other PHP projects that have nothing to do with CMS? How can you identify the performance-critical points in a PHP program? And what can you do with the results?

In this article, we’ll introduce techniques such as PHP-FFI, PHP-PECL, and profiling with XHProf. With these techniques, a static site generator written in PHP is more performant than two well-known generators, Hugo and Zola—written in Go and Rust respectively.

What is a Static Site Generator?

Well-known database-supported content management systems (CMS) like WordPress or Joomla use PHP to generate corresponding HTML code from data in a database during a web request. Then, it is delivered to the user. That means that with every request, database operations occur and PHP code is executed. We don’t take into account any additional JavaScript code yet. Flat-file CMSs replace the database with pure file operations.

Static site generators take a fundamentally different approach. Instead of dynamically assembling the HTML page with every user request, all HTML pages are generated in advance. When a user request comes in, the HTML page is already completely ready, and just needs to be delivered by the web server. On the web server side, this means that just a classic web server like NGINX or Apache is needed, but no dynamic components like PHP, JSP, ASP, or similar. This makes static site generators very attractive for performance and security reasons.

No light without shadow

The disadvantage of the static site generator is that it needs a separate deployment step of the generated pages on the server. Generating entire static HTML pages also requires initial generation time.

A static site generator needs the actual content and one or more templates which the static HTML pages are generated from (Fig. 1).

Fig. 1: Templates and Markdown content become static HTML

Fig. 1: Templates and Markdown content become static HTML

The template usually defines the website’s header, footer, and navigation bars. Content is often written in Markdown or reStructuredText. But you can also use your own special language.

A whole range of static site generators has emerged in the last few years. Popular programming languages for creating static site generators usually include Go, Rust, Ruby, JavaScript, Python, and PHP. Table 1 contains a selection of static site generators and their respective programming languages. You can find over three hundred static generators on Jamstack.org and this list is not exhaustive.

Generator Programming Language
Hugo Go
Zola Rust
Eleventy JavaScript
GitBook JavaScript
Pelican Python
Jekyll Ruby
Grav PHP
Saaze PHP
Nift C++

Table 1: Selection of static site generators and their programming languages

You might assume that the generation time for creating all those static pages is less with a static site generator if it uses a compiled language like Go or Rust. We’ll see below that the clever combination of PHP-FFI and PHP-PECL can stand up to well-known compiled languages and even beat them.

Replace WordPress with a static site generator?

For almost ten years, I’ve run a small blog about computers in WordPress. There are thousands of computer blogs like this, similar to travel blogs, cooking blogs, SEO blogs, and much more. If you’ve collected about 300 articles over ten years, you want your pages to remain and not break all at once because of changes in WordPress, forcing you to re-enter content for one or more articles.

But with the introduction of the Gutenberg editor in WordPress, this is exactly what happened. When existing content was touched, it was completely ruined. Inserting mathematical formulas in WordPress is also fairly difficult. Furthermore, WordPress can only run a single blog, not several in parallel under a single URL. Therefore a substitute was necessary at this point. Before, I often had to deal with Hugo and even wrote a converter in Go that can convert WordPress data into Hugo files. However, while creating the converter, I noticed that Hugo’s developers reacted somewhat reserved or even dismissively to change requests.

I was looking for a static site generator that was easy to use, because I wanted to quickly switch from WordPress. Ultimately, I wanted to blog and not deal with static site generators per se. I looked at Stati, Pico, and finally, Saaze.

Saaze

Saaze was created by Scottish designer and developer Gilbert Pellegrom. Previously, he has implemented various other CMSs such as PicoCMS, Baun, Handle, and Circulate. Unfortunately, none of the CMSs that he developed are being further developed by him personally. I'll come back to the subject of abandonware later. Saaze impresses with the following features:

  • Easy to use and install
  • Easy to host, but that's generally not a big problem with static pages
  • Saaze can be used both as a static site generator and a dynamic content generator—it can work like a classic flat-file CMS.
  • Easy extendibility

Saaze’s easy extendability is important because I wanted to be able to easily embed mathematical formulas, YouTube videos, or Twitter tweets. Saaze didn’t offer this out of the box, but adding the functionality was straightforward. So the journey with Saaze began.

Saaze content is provided in Markdown. Each Markdown file contains so called frontmatter in YAML format in the header. This is information about the title, date, draft or, if applicable, the libraries used. This kind of Markdown file with frontmatter looks like Listing 1.

Listing 1

---
date: "2021-05-18 13:00:00"
title: "Moved Blog To eklausmeier.goip.de"
draft: false
categories: ["WordPress"]
tags: ["Saaze", "PHP", "Go"]
author: "Elmar Klausmeier"
---
 
The blog [eklausmeier.wordpress.com](https://eklausmeier.wordpress.com) is no longer maintained. I moved to [eklausmeier.goip.de](https://eklausmeier.goip.de), i.e., this one. During migration I corrected a couple of minor typos and dead links.
...

Saaze uses Blade as a template engine. As mentioned above, static site generators have two disadvantages: deployment and generation time. In my case, deployment is a short shell script that ultimately only renames directories. For a small private blog, this is not a big deal. However, generation time is an issue when you need to quickly regenerate everything because you’ve realized you want to change something after all. It doesn’t matter if the change is made to the template or content—it must be regenerated.

Making Saaze faster

While installing Saaze with Composer, I noticed that Saaze has a number of annoying dependencies on other PHP libraries. This was bothersome because installing Saaze with PHP 8 didn’t work without some acrobatics. Instead, Composer needed to be called over PHP 7, so all dependencies could be satisfied. At first, I attributed this to the fact that PHP 8 is still new. Now, if all dependencies are installed, you can also call Saaze with PHP 8. But if you look at the dependency in Saaze’s source code, you’ll see that these dependencies actually only perform trivial tasks. I made a note to myself that I should really tidy it up if there was time. Later, it turned out that these dependencies are one of the reasons for the comparatively long generation times.

But now, the path is mapped out and the roadmap looks like Figure 2:

  • Using FFI to speed up conversion from Markdown to HTML
  • Profiling the application to identify shortcomings
  • Reducing dependencies

Fig. 2: Performance improvement roadmap

Fig. 2: Performance improvement roadmap

You can see that FFI was used before the actual profiling. With a Markdown-based static site generator, it’s obvious that the conversion from Markdown to HTML has a significant impact on runtime. If you don’t know any obvious performance-critical points in your application in advance, then of course, you need to start with profiling.

Simplified Saaze

I call the accelerated version of Saaze Simplified Saaze. The name comes from the fact that Simplified Saaze’s code is smaller than Saaze’s code. Despite the smaller code size, Simplified has the following features, which Saaze doesn’t:

  • mathematical formulas with MathJax.
  • Embedding Twitter, YouTube, Vimeo, and WordPress videos
  • Embedding Mermaid
  • Embedding CodePen
  • Draft mode
  • Option for “ugly URLs”
  • Single generation of Markdown files
  • Native PHP as a template engine

Since PHP 7.4, PHP gives us the possibility to call C routines from PHP very easily. The emphasis here is on simplicity. FFI stands for Foreign Function Interface. It was programmed by Dmity Stogov, a developer at Zend and is based on the LuaJIT FFI by Michael Pall. PHP is written in C, and extending PHP with C functions was really not that difficult. But it was a little tedious since you needed to download and translate the entire PHP source code first, and then implement the correct extension for your version. FFI gets around this. It’s enough to use FFF::cdef to specify the signature of the C routine—meaning, the name of the routine plus return type and argument types. Additionally, you specify the shared library location where the routine is located. Once you’ve done this, you can call the C routine directly using FFI::string() if, for example, the C routine returns a char *. If the return type of the C routine is only int or double, then the call can even be made directly:

$ffj0 = FFI::cdef("double j0(double);", "libm.so.6");
printf("j0(2) = %f<br>\n", $ffj0->j0(2));

Calling C routines couldn’t be any easier. In order to use FFI in PHP, the following must be set in php.ini:

extension=ffi
ffi.enable=true

This switch position is best checked with phpinfo(). This should look like Figure 3.

Fig. 3: phpinfo() output with activated FFI

Fig. 3: phpinfo() output with activated FFI

With the easy callability of C, you also have C++, Julia, and Go at your disposal. The PHP documentation writes about FFI: “Currently, accessing FFI data structures is significantly (about 2 times) slower than accessing native PHP arrays and objects. Therefore, it makes no sense to use the FFI extension for speed; however, it may make sense to use it to reduce memory consumption.” I think this is misleading, since one of the reasons for calling C routines is, of course, speed. In our case, we’ll see that calling MD4C leads to cutting Saaze’s runtime in half. Note, however, that speed isn’t quite the same as with a pure PHP extension, since encapsulating PHP data for further processing in C takes some time.

You can compile your C routine as follows:

cc -fPIC -Wall -O2 -shared ...

The reason for the command line options -fPIC and -shared is that the C routine should be in a shared library. It shouldn’t be kept secret that FFI currently has a disadvantage compared to routines implemented in PHP: Integration in Composer isn’t directly available yet. If your own software embeds C routines with FFI, then in a sense, this C routine is a foreign object when installed with Composer, requiring separate installation steps. As you can see above, it isn’t difficult or complicated, but it isn’t directly integrated into Composer.

FFI compared to PHP Extensions

One of the main advantages of FFI is the very simple callability of C routines from PHP. The following is a brief description of how to create a PHP extension.

  • Download the PHP source code
  • Download dependencies that you need; in my case with Arch Linux this was pacman -S tidy freetds
  • Reconstruction of the Configure parameters by php -i | grep “Configure Command”; the output is a multi-line command for configure
  • Call make

Then, a simple extension looks like Listing 2.

Listing 2

/* {{{ void test1() */
PHP_FUNCTION(test1) {
  ZEND_PARSE_PARAMETERS_NONE();
 
  php_printf("test1(): The extension %s is loaded + working!\r\n", "callcob");
  cob_init(0,NULL);
}
/* }}} */

You can see that the whole thing is more cumbersome than the FFI call. Another drawback is that with a new PHP version, like changing from PHP 8.0 to 8.1, this procedure must be repeated. A good introduction to programming PHP extensions can be found at Sara Golemon and Zend.

Calling MD4C via FFI

Back to Saaze and reducing generation time. One of the essential tasks of a static site generator is converting Markdown to HTML. Some static site generators accept and require input formats other than Markdown, such as reStructuredText. Saaze requires Markdown. The conversion from Markdown to HTML is done in Saaze with Parsedown Extra, which was programmed primarily by Emanuil Rusev. Replacing Parsedown Extra with MD4C shows that the conversion time from Markdown to HTML can be decreased by a factor of eight. In total, the runtime is reduced by almost half.

MD4C is a routine written in C for converting Markdown to HTML. It was programmed by Martin Mitas. To get a rough idea of how much faster MD4C is compared to other implementations, take a look at Tables 2 and 3. Below table shows results from “Why is MD4C so fast?”.

...

Test Name Input MD4C in seconds cmark in seconds
cmark-benchinput.md (benchmark from cmark) 0.37 0.71
long-block-multiline.md "foo\n" * 1000000 0.04 0.23

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK