In the second part we will take what we have learned in part 1, but this time build a web service around it. In essence we will be creating something that could be - depending on your use case - a possible replacement for an Indesign server and that by only using open source software.

Xvfb

I assume that you are running some kind of Linux/BSD flavor on your server. Unfortunately at this point we are already having a small problem. Starting Scribus from a terminal on a system that doesn’t have a display server available, will result in an error.

Sadly it doesn’t come with a headless mode, although there has been some talks about it in the past.You could install a desktop environment (gnome, KDE, mate, …) but that is not always desirable on a server.

A personal favorite of mine is to use Xvfb in those cases. Xvfb is an X11 display server that performs graphical operations in memory instead of displaying it in on a screen. This helps us to easily work around the GUI problem.

On a Debian based (for example Ubuntu) it is just a matter of installing it by issuing following command

apt-get install xvfb
````

We can now start scribus by executing following command

```bash
xvfb-run --server-args='-screen 0, 1024x768x16' scribus-trunk -g -py myscript.py

The resolution we pass here is in this case not that important, as long as scribus has a display server where it can output some of its GUI stuff it will be perfectly happy.

Our stack

While I make a living by writing mostly in PHP and Javascript, I do love Python for numerous reasons. It is my goto language when I need develop machine learning solutions, need to process data or to quickly develop prototypes of my ideas.

And as we write our Scribus scripts in Python, it seems only logical to use a stack here that mainly exists out of Python modules.

So how does our complete stack look like ?

  • Python 2.7
  • Flask: a microframework which is extremely easy to use
  • Flask-restfull: an extension for Flask that makes it possible to quickly build REST api’s
  • Python-rq: a simple library for queueing jobs (using redis) and processing them in the background.
  • Redis: the rq in Python-rq stands for Redis Que, so we also need to have a running Redis instance. Redis is a fast data structure store which is mostly used as database cache and (like in our case) a message broker.

Let’s work together

Our web service will not directly communicate with Scribus, but will function as a gateway to our job queue and the PDF documents that has been generated.

When a job - which also includes data we need to fill in our document - has been posted to our jobs endpoint, the service will create a new job request on our Redis queue.

A (python) worker script constantly monitors our queue and when it notices that a new job has been posted it will act accordingly. It will execute a scribus instance passing a specific python script that contains some logic to generate a new document based on the values that has been passed to our web service.

When the document has been generated and a PDF has been produced, the web service receives a notice and can serve the document to the end user.

There are a lot of stations and this may give the impression that this whole process is slow, but it takes less than 2 seconds to generate a document and that on a five year old Macbook Pro (stock i5 2415M - 8GB Ram) where the server operations are running in a virtual machine.

architecture.jpg

Some bits and parts

Explaining every nook and cranny would result in a very long and frankly tedious article. As a result we will be covering only the important bits.

We start of with our job service endpoint which we use to create new jobs

class ScribusJob(Resource):
    """ REST: ScribusJob resource """
    def writeJobsInfoJson(self, jobsInfoID, template, title, text, image):
        """ Writes json jobs info file """
        jobsInfo = {}
        jobsInfo['template']  = template
        jobsInfo['title']  = title
        jobsInfo['text']   = text
        jobsInfo['image']  = image

        with open(os.path.join('jobs', str(jobsInfoID) + '.json),'w') as outfile:
            outfile.write(json.dumps(jobsInfo))

    def post(self):
        """ handle post method: submitting new jobs """
        # generate job info id (not python-rq id!)
        jobsInfoID = uuid.uuid1()

        # save files
        file = request.files['image']
        fileName = ""
        if file:
            fileName = file.filename
            file.save(os.path.join('jobs', fileName))

        # store job information in a json file
        self.writeJobsInfoJson(
            jobsInfoID,
            request.form['template'],
            request.form['title'],
            request.form['text'],
            fileName
        )

        # add job to our queue
        job = q.enqueue_call(
            func=buildPDF,
            args=(jobsInfoID,),
            result_ttl=86400
        )

        # return our job id
        return jsonify(
            jobQueueID = job.get_id(),
            jobsInfoID = jobsInfoID
        )

The jobs endpoint doesn’t do a lot except writing a JSON file which contains the posted data we will be using and scheduling a new job on our Redis queue.

The reason why we are writing a JSON file instead of passing it directly as parameters to our CLI script ? To try to mitigate possible security problems.

Also notice that we are also passing our buildPDF function, but this function is never executed by our REST service. It is the worker - our next snippet - which will execute that function.

from redis import Redis
from rq import Worker, Queue, Connection

listen = ['default']

redis_conn = Redis()

if __name__ == '__main__':
    with Connection(redis_conn):
        worker = Worker(list(map(Queue, listen)))
        worker.work()

Our worker script ironically doesn’t do a lot of work. It just listen to our queue and execute the work (buildFunc) function when requested.

# Import subprocess, json, os, pipes
import subprocess, json, os, shlex

def buildPDF(jobsInfoID):
	""" builds our PDF """
	# define command that we will execute
	command = "xvfb-run --server-args='-screen 0, 1024x768x16' scribus-trunk -g -py scribus_templates/template.py --python-arg {}".format(jobsInfoID)

	# run scribus command
	process = subprocess.Popen(command,
	 shell=True,
	 stdout=subprocess.PIPE
	)
	process.wait()

	pass

Our buildPDF method just builds our CLI command passing just our jobsInfoID and execute it.

# Import sys, os, json
import sys, os, json

# get job info id which is passed by the --python-arg flag
jobInfoID = sys.argv[2]

# open job json that contains information we use to fill in our template
jsonObj = None

with open(os.path.join('jobs', jobInfoID + '.json')) as json_file:
    jsonObj = json.load(json_file)

# open doc
scribus.openDoc(os.path.join('scribus_templates', jsonObj['template'] + '.sla'))

# change title
scribus.setText(jsonObj['title'], "title")
scribus.setTextAlignment(1, "title")
scribus.setFontSize(24, "title")

# change image
scribus.loadImage(os.path.join('jobs', jsonObj['image']), 'image_1')
scribus.setScaleImageToFrame(scaletoframe=1, proportional=0, name='image_1')

# change text
scribus.setText(jsonObj['text'], "text")

# save as pdf
pdf = scribus.PDFfile()
pdf.file = os.path.join('pdf', jobInfoID + '.pdf')
pdf.save()

# save as img
img = scribus.ImageExport()
img.type = 'jpg'
img.scale = 50
img.quality = 80
img.name = os.path.join('pdf', jobInfoID + '.jpg')
img.save()

# delete jobs info json and images
os.remove(os.path.join('jobs', jobInfoID + '.json'))
os.remove(os.path.join('jobs', jsonObj['image']))

Our template related python script is in this case a bit more complex than the one we saw in part 1.

It opens and extract information from the JSON file we have written, uses it to fill in our text blocks and to set an image. It then saves the document as an image and a PDF file. This PDF file can then be retrieved through our web service.

Use the source Luke

The complete source code (including that of a simple web demo) can be found on my github account and is released under the AGPL license.

This is of course by no means a production ready web service, but it can certainly function as a start!