Engineering

27 April, 2018

Puppeteer vs WkHtmlToPdf and why I created a new module

Elixir PDF Generation - In one of our recent projects at Coletiv the user had to generate a PDF invoice with the list of orders.

David Magalhães

Software Engineer

Elixir PDF Generation: Puppeteer vs WkHtmlToPdf

In one of our recent projects users had to generate a PDF invoice with the list of orders they did on the platform we have built using Elixir. By that time I didn’t know what could be the best solution or if there was some native implementation to use.

With this article I want to share with you the steps I followed, the decisions I have made until I reached the final solution and why I ended up creating our own module. Hopefully the article can also help you solve a similar problem and make you contribute to the package we created.

I started by researching for existing solutions and quickly discovered that without a doubt the best solution to generate a PDF file was to have it designed in HTML and then convert it to PDF. Although I’m a backend developer at heart, I have some HTML and CSS skills that I could use on this task. It appeared that pdf_generator was the top choice for Elixir developers when there was a need to generate PDF files from code. pdf_generator is a wrapper to wkhtmltopdf, a C++ tool to create PDF files using HTML files.

I also found another alternative, a native PDF generator named gutenex, but quickly found that doing some fancy design with this module would be a more difficult process than writing HTML code (at least if you know HTML and CSS). Also this module hasn’t seen any activity over 2 year, another reason why we decided to go for pdf_generator instead.

WkHtmlToPdf

This software is widely used over the internet as one of the main tools to convert HTML to PDF. At first I was very satisfied with the output generated, being the only problem the HTML table entity containing the orders being badly divided between multiple pages. I ended up calculating how many table rows I could fit on each page, but as I kept evolving the design I had to redo the calculations over and over again. This was very error prone, specially for the other developers working in the project as we couldn’t forget to recheck the calculations every time we did a change.

HTML table entity render behaviour on Puppeteer HTML table entity render behaviour on Puppeteer

HTML table entity render behaviour on WkHtmlToPdf HTML table entity render behaviour on WkHtmlToPdf

The second issue I encountered was with unicode characters, like Chinese characters. The solution for this issue was to convert each character in HTML entities first in order for them to appear in the PDF.

The third and most annoying issue was that wkhtmltopdf uses the machine display to generate the PDF file. For instance, being on an Macbook Retina (2560x1600px 13" display) I needed to exaggerate the document size in the CSS style (i.e. font-size, padding) in order to have a proper sized PDF and not a tiny one. For example, I needed to have font-size: 42px instead of font-size: 12px that was what I would normally use.

Although this wasn’t a very difficult problem to fix, when I deployed to a development server, which isn’t connected to a screen, the default screen size used was very small (1366x768) and the PDF file generated had the layout designed for my screen (2560x1600px 13" display) which in comparison was too big. Since I needed to do some calculation previously to show the content properly, this was unbearable to maintain.

I tried to tweek the display size with the --viewport-size option, but without success. You can find somes issues related to that in their repository, that contain over 1000 issues still open.

Some other possible solutions was to use the --dpi option, having also the --zoom option to reduce the overall size on the PDF generated on the server side, but in the end I couldn’t replicate the same design that I’ve tested on my computer.

At this point I decided to take a step back and rethink the solution, that’s when puppeteer came into play.

Puppeteer

Taking the suggestion of Daniel Ruf related to the display size issues, I end up exploring puppeteer, a Node API that allows you to take screenshots from webpages as well as generate PDF files using a version of Google Chrome browser in headless mode.

With this software, I finally could have the same PDF design replicated in both my laptop and on the server. Globally the PDF rendered was better than wkHtmlToPdf, being the only issue so far the file size which is 10 times higher than the size of the file generated by wkHtmlToPdf.

This also allowed me to implement the header and footer in HTML, and after some tweaking with margin parameters. I could finally generate a PDF file without having to do the previous calculations on the template items. This feature is available in the wkHtmlToPdf, but I just noticed that after exploring the puppeteer options.

The next step was obviously to create a wrapper in Elixir (similar to the pdf_generator wrapper) that allowed other people to use puppeteer the same way.

The new module is available in hex.pm, and also in our github repository.

Puppeteer vs WkHtmlToPdf

In order to help you with the decision of picking one of the two let’s highlight some possible reasons to choose one over the other. In the end it all comes to your specific needs.

Advantages of using puppeteer over wkHtmlToPdf

  • Better PDF rendering

  • Easier to use

  • Uses a well maintained software (puppeteer)

  • Uses less Elixir dependencies

Disadvantages of using puppeteer over wkHtmlToPdf

  • Needs NodeJS

  • Larger footprint (needs NodeJS plus Google Chrome image ~90MB)

  • Generated file size is way bigger than the one generated by wkHtmlToPdf

Final thoughts

If you are currently using pdf_generator wrapper and you are happy with the results that you have, you shouldn’t move away from it. If you’re searching for a PDF generator module for your Elixir project, take some time to give puppeteer_pdf a try.

Elixir

Software Development

PDF

Puppeteer

Join our newsletter

Be part of our community and stay up to date with the latest blog posts.

Subscribe

Join our newsletter

Be part of our community and stay up to date with the latest blog posts.

Subscribe

You might also like...

Go back to blogNext
How to support a list of uploads as input with Absinthe GraphQL

Engineering

26 July, 2022

How to support a list of uploads as input with Absinthe GraphQL

As you might guess, in our day-to-day, we write GraphQL queries and mutations for Phoenix applications using Absinthe to be able to create, read, update and delete records.

Nuno Marinho

Software Engineer

Flutter Navigator 2.0 Made Easy with Auto Router - Coletiv Blog

Engineering

04 January, 2022

Flutter Navigator 2.0 Made Easy with Auto Router

If you are a Flutter developer you might have heard about or even tried the “new” way of navigating with Navigator 2.0, which might be one of the most controversial APIs I have seen.

António Valente

Software Engineer

Enabling PostgreSQL cron jobs on AWS RDS - Coletiv Blog

Engineering

04 November, 2021

Enabling PostgreSQL cron jobs on AWS RDS

A database cron job is a process for scheduling a procedure or command on your database to automate repetitive tasks. By default, cron jobs are disabled on PostgreSQL instances. Here is how you can enable them on Amazon Web Services (AWS) RDS console.

Nuno Marinho

Software Engineer

Go back to blogNext