In our first blogpost on testing TYPO3's core we focused on the infrastructure of automatic testing which is used for reviewing patches before adding them to TYPO3’s core. Today we’ll take a closer look into the hardware and software stack this work requires.
Splitting a test plan into so many single tasks, requires that at least the same amount of single Bamboo remote agents are online. That way, all of the tasks can run simultaneously and single build plans don’t end up in a queue.
Quite a bit of hardware is needed for this to run smoothly. The TYPO3 GmbH provides two dedicated Hetzner EX41S-SSD machines (i7-6700, 64 GB RAM, 2*250 GB SSD disks) with 20 agents on them to deal with that. But that’s nowhere near enough.
To be able to meet our requirements, the TYPO3 GmbH established a deal with the Leibniz Supercomputing Centre, who now provide us with a significant amount of CPU hours on their OpenNebula based Compute Cloud. The cloud allows us to shut down and ramp-up agents pretty quickly, depending on the current testing load. Currently, this adds 6 virtual machines with a total of 48 CPUs with 384GB RAM and 60 agents to our Bamboo agent pool.
This collaboration is a great win-win-win situation for the TYPO3 GmbH, the TYPO3 core and the Leibniz Supercomputing Centre. Many thanks, you’re actively helping the project to evolve!
At the time of this writing, there are 80 Bamboo agents online, consuming altogether 56 CPUs with 512GB RAM and all of them are up and ready to do our TYPO3 core testing.
The big question is, how does one ensure that this multitude of resources is not only there, but actively used with a maximum of efficiency and throughout all workflows?
As PHP is single threaded, one job typically utilizes only one CPU of a multi-core system. Besides that we need to have agents with different PHP versions online. Bamboo supports this with a so-called “capability” system and knows which specific software a registered agent is capable of handling, e.g. single jobs can be restricted to run only on agents that provide for instance PHP 7.1.
We were able to solve this by encapsulating the whole software stack (including the Bamboo agent) in docker containers and by running multiple containers on one machine at the same time.
Test agents are “throw away software” and don’t need any local persisted data whatsoever. Basically, the underlying host just needs a decent docker engine to be ready for handling the task. Setting up the runtime environment on a stock ubuntu 16.04 is done in a wink. The only local information which is necessary is one single Bamboo agent identifier file per agent. Currently we have 8 machines online and our container deployment looks like this:
In total, this means that we have 10 agents / docker containers per machine of which the vast majority goes to the most used PHP versions. With our 8 systems, this means we’re at 80 online agents with 32 PHP 7.0 and 32 PHP 7.1 agents / containers. This is a good balance between hardware usage, the queue system of Bamboo, the execution time if the cluster is under load and RAM usage.
The docker architecture allows running identical containers multiple times without wasting too much space and the Bamboo plan configuration has been optimized to clean up after each job, leaving things nice and tidy and ready for the next task. This allows us to run the entire stack in a huge ramdisk on each machine. Basically, there is no slow hard disk access at all for any job whatsoever. And this speeds up the database driven functional tests - which create thousands of tables - ever so much and is hugely beneficial for our work. By contrast, this is a pretty slow procedure on all disk-driven DBMS. And as said before, being blocked by testing is a total bore!
There’s also a big improvement on the performance of spinning disks and network driven disks if there is no fancy (expensive) hardware on this level. This is pretty much the same for the browser based acceptance tests which need to load the big browser binary all the time. As a drawback, we do need quite a bit of RAM and also need to keep an eye on memory consumption.
With the RAM driven environment, if such a machine reboots, it’s basically naked. At the moment, a manually triggered script configures the base system, pulls the docker containers, copies agent configurations and starts the containers. This takes a few minutes per machine. The fresh agents then notify themselves as “ready” to the master server and all is well. The rollout of updated container images is also easy: half of the computing cluster is rebooted and initialized a-new, it’s then rinsed and the process is repeated with the second half.
First of all, obviously PHP is required for developing TYPO3. With regard to the actively maintained TYPO3 core branches 6.2, v7, v8 and master, we currently need to support altogether six different PHP versions: PHP 5.3, 5.4, 5.5, 5.6, 7.0 and 7.1. The first two versions are currently phasing out with the end-of-life of TYPO3 CMS 6.2 LTS and are only used for the ELTS program of the TYPO3 GmbH. PHP 5.5 and 5.6 are only relevant for TYPO3 v7. As this version only receives critical bug fixes it does not trigger too much load on the testing infrastructure.
An image in docker can be based on another image to add further stuff. In programming, this is similar to class inheritance. Our image stack is based on a stock ubuntu 16.04, on top of that we added a “baseimage” to prepare ubuntu with some more docker friendly stuff based on a fork of “passenger-docker”. On top of that we have the six different PHP versions as single images and a last layer adds the bamboo java agent.
This adds up to 13 different images: The baseimage plus 6 images for the PHP versions, plus 6 images with the bamboo agents on top. The build chain is available on our bitbucket. A makefile then helps with compiling and uploading the data to the docker-hub. Latest versions of the compiled images can always be pulled from hub.docker.com.
There is a design decision we made with this setup: Each final container is standalone and delivers the entire testing stack.
By starting a PHP 7.0 container a MySQL daemon, a PostgreSQL, Redis, memcached and MS SQL are started. Additionally a chrome browser is packaged within the images. This is all in a single container. This procedure of ours is slightly different to what some docker apologists recommend. In our testing case this seems to be a viable choice, though.
The cool thing about this infrastructure: It allows any TYPO3 core contributor to run the environment locally without much hassle.
For instance, any contributor can just pull the PHP 7.0 docker image (skipping the on-top bamboo-agent altogether) and ends up with the exact same environment the final test suite is executed in. Together with the mapping capabilities of docker, this gives developers a perfectly configured testing environment with a minimum of work. The prepared image can be downloaded and that's it! Nothing more needed. Everything is there and the technology is in place. There is merely one thing still lacking here and that’s a proper documentation. Taking care of this issue is on our current agenda and documentation will be improved within the next few weeks.
With all of this fancy setup, hardware, docker foo and parallelization of jobs, the question arises as to how this setup communicates with the other parts of the system. Bamboo incorporates very well with the other products by Atlassian and can be configured to auto-trigger plans if branches are pushed or merged within bitbucket and so on. It can also trigger deploy plans based on testing results and other pretty cool stuff.
However, the core patch review system itself is powered by Gerrit - and maintained very well by the TYPO3 server team. There is one hitch though, namely that Bamboo does not integrate Gerrit automatically. But luckily Bamboo speaks REST.
Thanks to Susi, we were able to introduce a small middleware to do the integration between review.typo3.org and Bamboo. A hook in Gerrit now triggers a script of the TYPO3 GmbH infrastructure called intercept. Intercept is triggered whenever a core patch is pushed. It translates the patch number and patch set to a Bamboo REST call. With completion of a build, Bamboo in turn triggers a notification to intercept. This translates the test result into a comment on the according Gerrit patch together with a +1 or -1 testing vote.
Intercept does a couple of other interesting things too, but hey, this post is so packed with information already, so looking into intercept details will be another post.
This post pinpoints the challenges of structuring and maintaining a modern testing infrastructure for active projects and some options on how they can be solved. Up until now, the main core master testing branch has executed more than 11,000 builds. This multiplies to over half a million executed jobs.
The presented solution scales pretty well for us at the moment, but comes with a price tag attached: A lot of time and effort is needed to get things right and the infrastructure needs quite a bit of maintenance and work too. However, the core team members have the strong feeling that it's worth this time and effort and also, that this is one of the keys to a successful and strong software development platform.
Thanks for reading! We look forward to welcoming you back to our next post (in this series of eight!).
In the meantime, you can also find answers to other frequently asked technical questions, instructional clips and all sorts of helpful videos on our TYPO3 YouTube channel. Come along and join us there. If you find this useful, please share the link on your network.