Rendering and processing complex documents online is a bit of a personal pet peeve of mine. When dealing with certain layouts, it can quickly become a very tough nut to crack. In the following 2 articles I will describe a technique of creating complex documents online using the open source DTP package Scribus, Python and a simple demo service written in Flask.

DocSon

Through the years I have tried different things to make it easier to render and process documents online. One of my solutions for rendering documents was something I called DocSon. DocSon is a web service that uses a json dialect to generate new documents based on templates you can create in libreoffice/word.

It leverages the fact that you can run libreoffice/openoffice in headless mode and connect with it through something called a UNO bridge. This works well, as a matter of fact one of my previous employers uses it to generate their invoices. But because it depends on search and replace actions, it also has its limitations.

The other day I was confronted with a project where we needed to generate some kind of magazine layout. This is a perfect example of a case where a search and replace won’t suffice.

In the end we managed to do the complex layout in code, but the difficulty level made me think of coming up with another and easier solution.

An example of a complex layout

It may be difficult to grasp what I mean with a complex layout. So let us start with an example of a layout where we have a page which has 3 columns and the text of column 1 needs to continue in column 2 and 3.

complex-layout.jpg

This is certainly doable in code, but it involves keeping tabs on every element that has been inserted, the height of elements, coordinates, margins, paddings, calculating the remaining height… . Really a lot if you think about it.

When using a good DTP program you can just put 3 text columns on a page and link them. The moment your text is bigger than the first column it will overflow automatically in the linked columns and you are basically done. That possible in less time that it will cost just to set up your code project.

A DTP engine on a server

I wanted to continue on my main idea behind DocSon. Create your template in an application that even a non developer can use and modify it on a server through a web service. It’s also a more flexible system that one written in pure code. Most coders aren’t designers and vica versa.

That said the idea of having a DTP layout engine on a server where we can communicate through an (online) app and modify documents is nothing new. Adobe has a product called Indesign server which just does that. The only drawback can be its price or the fact that you can’t run it on a linux server. Scribus runs on Windows, Linux and OSX.

So the natural choice was to find an alternative and preferably something that is open source. I found in Scribus a good candidate as it also has a Python API available which makes it possible to interact with Scribus documents.

Is scribus as capable as Indesign ? As I’m not a DTP’er this is difficult for me to say. It does have some good real world use cases and seem to support a lot of things including CMYK. Although UI wise it’s not that pleasing, I found it also very easy to recreate a magazine layout which also deals with the columns overflow problem.

Your milage will vary and that based on the document problem you need to tackle. In my case it should be adequate.

Communicating with Scribus through Python

As already briefly said, Scribus has the possibility to write python scripts that can manipulate Scribus documents. And starting from v1.5.1 there is also a possibility to start a scribus instance at the terminal, run a python script which can interact with Scribus and close (-g flag) the instance.

It is just a matter of writing your python script and run the whole process by executing following command

scribus -g -py myscript.py

Modifying a scribus document from Python

So how does such a python script look like ? Let us start with a simple example of a python script which just inserts some text and save it as a PDF document.

The first step is that we naturally need to have a document we can work with. Start Scribus and select single page to begin with.

demo1_step1.jpg

Insert some text frames by selecting the text frame icon drag / drop the frames on your document

demo1_step2.jpg

Link them up by first selecting the first text frame, click the link text frame icon, select the left bottom text frame and then the right one.

demo1_step3.jpg

Right click on the first text frame and select properties, give it a name helloworld.

demo1_step4.jpg

Save the document as helloworld.sla

Next up is a simple python script in which we will set the content of our text frames and save it as a new PDF file.

One thing we will need is some long text, you can use Lorem Ipsum or - what I did - use the content of a public domain book. Save your text in a file called demo.txt after which you need to create a new python script with the following content.

#!/usr/bin/python
import scribus

# open demo.txt that contains the text we will insert
with open('demo.txt', 'r') as demoFile:
    # read content into a text variable
    text = demoFile.read().replace('\n', '')

    # open our template
    scribus.openDoc('helloworld.sla')

    # set text of our textframe "helloworld"
    scribus.setText(text, 'helloworld')

    # save template as pdf file
    pdf = scribus.PDFfile()
    pdf.file = 'newfile.pdf'
    pdf.save()

The script isn’t that complex. We begin reading the txt file which contains the text we want to insert. Then we open the template file we created in Scribus. We set the content of the text frame to the content of the file we have read and we save the document as a new PDF.

When you execute the script by running following command…

scribus -g -py demo.py

you will afterwards find a file called newfile.pdf, which will look like this

demo1_step5.jpg

As we can see the text start in the upper frame and when there is no room left, it automatically fills in the left frame until it also has no room and continues in the right frame.

Now imagine for a moment, how much longer this would take if you would need to do this in pure code… This tooked 10 minutes at max.

More than only text

Off course this example deals only with text. There are multiple things you may want to do: change font properties, create styles, set an image, change layout depending on certain logic, … . These things and more, are perfectly possible using the API and some Python scripting.

The complete documentation for the API seems to be only available through the program’s help function, yet there are some example scripts available online which shows a myriad of possibilities.

In the next article that is coming up, I will be talking about using this knowledge for building a simple demo flask service that we can use to generate documents online.