Coder Insights

Things I run across in my work and spare time that could be beneficial to the coder community

Git the basics #3 (how-to)



What to know when looking for work.


Running work on GPU (Cuda) (how-to)


Git the basics #1 (how-to)

PHP 7 improved performance

I was participating in a webinar with Zeev explaining the new implementation of PHP 7 and the improvements in memory management giving better performance. He pointed us to some blog posts, by Nikita Popov (@nikita_ppv), explaining improvements to the underlying structure of values, arrays, objects a.s.o.

In this blog post I want to summaries some of the changes and explain the difference in implementation.

In PHP 5 the values had three main data structures (_zval_struct, _zvalue_value, _zval_gc_info) and the way the data was set up all types took mostly the same amount of memory. All types were referenced and counted so garbage collection could be done. The values had extra garbage collection information outside of the main value structure. All in all this meant all data values took up 24 bytes of value + 32 bytes of garbage collection memory.

The new implementation uses another structure for the more involved types with reference counting and garbage collection included in the structs. And the easy types integer, double, string for example uses only the amount they allocate plus some type information. So an integer for example is saved with only 16 bytes of memory used.

Another strange data structure that has improved significantly is objects. To save an object in PHP 5 you needed to use _zval_struct, _zvalue_value, _zval_gc_info, _zend_object_value, _zend_object_store_bucket and _zend_object. All these abstractions made the structure quite hard to handle. For an example if you want to fetch one long property on a zvalue of type object you needed to look at the value _zval_struct->value->obj->handle->bucket->obj->object->properties_table->value->lval.
This structure has to many abstractions that in turn leads to harder code to read and probably more bugs. If we forget the strange structure and only focus on the memory footprint a simple object uses 136 bytes of data and has multiple reference counters for garbage collections for some odd reason.

The new implementation in PHP 7 takes care of these issues in a somewhat elegant way. Reference counting for these more complex types has been moved to the value objects. So the object has a reference counter struct, some handlers and the actual properties on the object. So the new lookup path would be something like _zval_struct->value->obj->properties_table[0]->value->lval.
More over the basic object takes only 40 bytes of memory.

Hash table
The old implementation of hash table used a simple hash to find a bucket place to store data. Then if you got a collision you had to link last entry in the bucket to it. The entry also had to have a previous link so removal could be performed efficiently. Lastly these links was duplicated for garbage collection reasons. So this structure ended up being quite heavy and memory intensive.

A quick calculation. Bucket 16 bytes, value 24 bytes with four links 32 bytes means that we use at least 72 bytes without any collisions, in practice 144 bytes is used and this with only one saved value in the table.

PHP 7 has a completely reworked structure. We have some meta data and order array of values. If this table is used as an array that is all we need. And if we need hashing functionality with keys we use an separate array.

To find a value in this hash array we first get a hash value. Then we modulus this with the size of the array to find the index. If this index is used we could implement collision by using another index and link it by adding this to the next property in the zvalue_value structure. It so happens that of the 16 bytes in this structure we only use 12 but still have to allocate 16 so the last 4 could be used for different tasks, for instance a next pointer. The hash array has a reference to the ordered bucket so we can retrieve our value. This new structure uses of 36 bytes.

The examples above could in some cases be worst case scenarios but the fact remains that the new PHP 7 implemenation is memory lean compared to PHP 5. And performance test in both benchmarks and WordPress tests have seen decreased times by 60 %.

If this entry seemed interesting then please read the references. Nikita goes in to a lot more detail about these changes.


Kdelibs build failure on 64 bit system

I just switched to use Linux at work. I’ve had Gentoo at home for years but Windows seems to be a part of the consulting branch of work I’ve done so far.

The new machine I got was of course a 64 bit one and I installed the right distribution. After a week of use I sadly set my use flags a bit wrong so a emerge –depclean removed a bunch of packages in use and kdelibs was one of them.

Because I had to do actual work, and the things requiring kdelibs where mainly chat and convenience features, it took me almost a week to find a solution.

The failure was hidden in the make scripts. The CMakefile failed on row 146. It compiles support widgets using a make script called Makefile2. After I ran the script manually I found that it failed on the /usr/lib64/ not being a valid ELF binary.

I can understand that it didn’t work when the file /usr/lib64/ was a text file talking about linking in /lib64/ by scripting.

/* GNU ld script
   Since Gentoo has critical dynamic libraries in /lib, and the static versions
   in /usr/lib, we need to have a "fake" dynamic lib in /usr/lib, otherwise we
   run into linking problems.  This "fake" dynamic lib is a linker script that
   redirects the linker to the real lib.  And yes, this works in the cross-
   compiling scenario as the sysroot-ed linker will prepend the real path.

   See bug for more info.
OUTPUT_FORMAT ( elf64-x86-64 )
GROUP ( /lib64/ )

The reasoning for making this file a script is explained in is due to dynamic linking making files larger. In my case this wasn’t an issue because the kdelibs could be as large as they want to as long as they compile.

Building a local OpenSwift object store

I needed a solution to save data and I wanted to create an API so I could handle different storage solutions. One of the solutions I wanted to support was OpenSwift, a part of OpenStack. Trying create a small local installation for testing wasn’t easy and required research. Configuration options seems endless so I’m trying to summarize what is required to get an minimal install up and running.

Using Gentoo you need to fetch some packages and their dependencies. Your distribution probably have similar packages.

USE="account container object proxy"

emerge sys-cluster/swift
emerge sys-auth/keystone
emerge dev-python/python-keystoneclient

Keystone is used to handle the authentication and user permissions. There is multiple solutions for this and I tried a bunch and found that keystone was probably one of the simpler. To ease the use of keystone I set an admin token that later is used to give me access to the keystone API from the client. And in the example of a keystone configuration below I use mysql as my database. The standard install uses sqlite but I found that a MySQL database is a bit easier to handle if you want to look something up. Personal preference.


admin_token = ADMIN
token = keystone.auth.plugins.token.Token
connection = mysql://{username}:{password}@localhost/keystone

After this setup you need to start the mysql server and create the database keystone. Give your user permissions, then you can start the keystone service.

Then we set up some environment variables so you don’t have to supply this information on every call to the client.

export OS_SERVICE_ENDPOINT=http://localhost:35357/v2.0

Next we need to run these commands to setup a tenant, user, role, service and endpoint.

keystone tenant-create --name {} [--description {}]
keystone user-create --name {} [--tenant {}] [--pass [{}]] [--email {}]
keystone role-create --name {}
keystone user-role-add --user {} --role {} [--tenant {}]
keystone service-create --type {} [--name {}] [--description {}]
keystone endpoint-create [--region {}] --service {} --publicurl {} [--adminurl {}] [--internalurl {}]

Below I provided some examples. In my case I created a test tenant, then a test user to that tenant. I created the admin role and connected the user to that role. Creating the object-store service is required for the API later as a service point, I call this service test as well. Lastly I create an endpoint with all urls set to the same proxy service url. (http://localhost:8080/v1.0/ ending with the tenant id created earlier.)

keystone tenant-create --name test
keystone user-create --name testuser --tenant ${tenant_id} --pass --email
keystone role-create --name admin
keystone user-role-add --user ${user_id} --role ${role_id} --tenant ${tenant_id}
keystone service-create --type object-store --name test
keystone endpoint-create --service test 
  --publicurl http://localhost:8080/v1.0/${tenant_id}
  --internalurl http://localhost:8080/v1.0/${tenant_id}
  --adminurl http://localhost:8080/v1.0/${tenant_id}

To secure our solution we need to add hash suffixes and prefixes in the /etc/swift/swift.config. This might not be a required action but is a good practice.

swift_hash_path_suffix = {SOME CRAZY SUFFIX}
swift_hash_path_prefix = {SOME CRAZY PREFIX}

Then we need to configure each server for the different functions. In my case I want them all to use the same device prefix of /srv/node. This is the path where we will mount the devices the servers will use to store data. I also explicitly define the default ports, even though this is not required, it’s easier to keep track of the service ports.


bind_port = 6000
devices = /srv/node


bind_port = 6001
devices = /srv/node


bind_port = 6002
devices = /srv/node

Now we have all the storage servers ready and setup, but we don’t have any data storage devices they could read. Each server could handle multiple devices but in my local test installations I only needed one and I hadn’t any spare disks laying around so I created a virtual one.
First I dd a file of 1GB size, partition it with one primary partition, connect a loop device to it and lastly mount that device to /src/node/r0

dd if=/dev/zero of=disk1.raw bs=512 count=2097152
parted disk1.raw mklabel msdos
fdisk disk1.raw
losetup -P /dev/loop0 disk1.raw
mount /dev/loop0p1 /src/node/r0

Now we create the rings of devices that will handle data for objects, containers and accounts. The first command below creates a builder file with 2^18 partitions, 3 replicas and a setting that tells it it to restrict movement of partitions to 1 hour. This might be so much overkill but it works for my example and doesn’t create overly large files. We add our server to it, in my case with port for each server. Then you partition your servers into regions and zones, I choose to put my server in zone 1 region 1. I supply my device r0 that I mounted above and lastly give the server a weight of 10. The weight is not important when you have one server but might be interesting if you want to favor a partition in your cluster.

swift-ring-builder object.builder create 18 3 1
swift-ring-builder object.builder add --ip --port 6000 --r 1 -z 1 -d r0 -w 10
swift-ring-builder object.builder rebalance
swift-ring-builder container.builder create 18 3 1
swift-ring-builder container.builder add --ip --port 6001 --r 1 -z 1 -d r0 -w 10
swift-ring-builder container.builder rebalance
swift-ring-builder account.builder create 18 3 1
swift-ring-builder account.builder add --ip --port 6002 --r 1 -z 1 -d r0 -w 10
swift-ring-builder account.builder rebalance

When we ran the rebalance function above we created the *.ring.gz files with the ring information required for the proxy server to handle requests, so now we need to configure it. We define the port from our endpoint above 8080 and then add keystoneauth into our main pipeline. To make it easy for us, I turn on account_autocreate so we get new accounts without any extra job on our part.


bind_port = 8080
pipeline = healthcheck cache authtoken keystoneauth proxy-server
use = egg:swift#proxy
account_autocreate = true

Then we need to define the authtoken filter in the pipeline. I have to set the identity_uri to the keystone server and the admin token I defined earlier so the proxy could connect to the keystone server to retrieve authentication information.

paste.filter_factory = keystonemiddleware.auth_token:filter_factory
identity_uri = http://localhost:35357/
admin_token = ADMIN

Later in the file we need to add a keystoneauth filter as well to handle incoming request and generate the auth tokens from keystone. In my example I allow admins to operate the cluster. Reseller prefix could be used to prefix your tenant id in the proxy endpoint. For example http://localhost:8080/v1.0/AUTH_${tenantId}

paste.filter_factory = keystone.middleware.swift_auth:filter_factory
use = egg:swift#keystoneauth
reseller_prefix =
operator_roles = admin

memcache configuration is required so you don’t get any unnecessary errors in the logs. Just use the sample one should be fine.

Now we should be able to start all the services. swift-account, swift-container, swift-object and swift-proxy.

To try my solution I use the opencloud PHP library provided from rackspace. (
Below I added some code that could be used to connect and test your setup.

$client = new OpenCloud\OpenStack('http://localhost:5000/v2.0/', array(
  'username' => '{keystone_username}',
  'password' => '{keystone_password}',
  'tenantId' => '{keystone_tenant_id}'

$service = $client->objectStoreService(
               '{keystone_service_region (default "regionOne")}', 

For more help using this API you could check the extensive documentation at


Debugging with breakpoint in any place

Was debugging some code the other day and had some annoyance. I had to set a breakpoint in my code and then try to figure out where a problem happened by going into methods, and jump multiple steps before actually reaching the code I wanted to test. In this case the code to debug was a third party library which made the problem even harder.

I’ve used JD-Gui for decompiling code for years now. This tool usually decodes in a good manor and keeps line numbers intact so I could follow the code in JD-Gui when I debugged in Eclipse. But reaching the code I wanted to look into, often required me to follow hundreds of branches. Remembering these correctly and reach the code I wanted to test wasn’t feasible for prolonged testing.

The other day I right clicked one of the breakpoints by mistake and saw the option export/import breakpoints. This gave me the idea that I could try to export one of my breakpoints, modify it and import.

The big surprise was that this was really easy to accomplice. Most of the data exported by the tooling wasn’t actually needed to pinpoint the breakpoint. Most of the data seemed to be there as meta data for showing the breakpoint in GUI. So after testing multiple times inside Eclipse Luna I found that I could put a breakpoint into the code fetching appenders in log4j with only these lines.

<?xml version="1.0" encoding="UTF-8"?>
    <breakpoint enabled="false" persistant="true" registered="true">
        <resource path="/" type="8"/>
        <marker lineNumber="200" type="org.eclipse.jdt.debug.javaLineBreakpointMarker">
            <attrib name="org.eclipse.jdt.debug.core.typeName" value="org.apache.log4j.Category"/>
            <attrib name="message" value=""/>

Above I’ve added the code for a breakpoint that should be disabled at start. The parser requires a resource with path and type and in this case I choose 8 because this type didn’t require any specific path.

Then we want to create a marker to tell the debugger where to stop execution. In my case I create a javaLineBreakpointMarker at line number 200 inside of the class org.apache.log4j.Category. Lastly we have a message used to display the text of the breakpoint in a more visually pleasing way.
The default display shows Classname[Linenumber] (/).

Platform as a service review

I thought it could be interesting to do a small review of the PaaS solutions I’ve come into contact with and my experience. To preface this post I want to say that I mainly use PaaS solutions for prototyping and small scale development. None of the solutions I’ve built so far is running in production mode but I want a solution that can scale. So I’ll focus on the free tears in this review.


First I tried Appfog and back then I hadn’t used PaaS solutions before. Appfog had what I wanted when it came to languages and was generous with resources and services. After I used the service for about 6 months they changed their free tear to be really restrictive and stop my application randomly. This prompted me to find a new solution.

An important feature of a PaaS solution is monitoring and logging. This was one of the things I never got to work in a reasonable way. They have no good way to read files on disk so I tried logging via services.
First I tried to use the logentries service but reading the log though that service was never satisfying for development. Logs wasn’t immediate, error messages was hidden and debugging was really taxing.
So I tried to install an log4j appender to route my messages to mongodb but I wasn’t able to read that externally in a good way. (Logging in, setting up a tunnel and reading the log through a plugin in eclipse that was buggy wasn’t a good solution)

Appfog handle their resources by giving you a set amount of CPU and memory and you have sliders to determine how much resources the current installation should use. How much disk space is allocated isn’t clear.

Deployment is done via building your WAR and then uploading using the Appfog console application.

Resources: 512MB memory
Languages: Java, PHP, Ruby, Python, Javascript (nodejs)
Databases: Mongo, Mysql, PostgreSQL


This is the service I currently use. Their resource management is different. Openshift use gears that have a specified amount of CPU, memory and disk depending on the size of the gear. And when you install an application you decide how many gears you want to use. In the free tear you get 3 small gears and that was reasonable amount of resources for my project.

When it comes to logging the Openshift you write your logs to disk and then you have tools in the rhc to read files with common GNU commands directly. This is an immediate read and pretty straight forward. And if that wasn’t enough you could login to the application space via ssh and have full access to your home directory with your application, setting environment variables both in a none persistent and persistent way.

One downside to Openshift is that the extra services is installed on the same gear as the application and share resources if you use the standard setup. This didn’t work for me because I required some data to be saved to disk and some to mongo db. And this mongo db installation took half of the gear resources so I ran out of disk almost instantly.

My solution was to setup a do-it-yourself gear and added the database there. Using environment variables and ssh tunneling I then setup a connection from my application to the database gear. But the database gear was stopped when not “used”. So I had to add a keep alive thread to my application calling the web front of the database gear to keep it up during operation.

Of course one could fix this by buying gear power but when you develop a prototype you don’t want a pay per month deal.

Lastly we have to talk about deployment. You setup a project to a git repository at Openshift and the deployment of your application is done by pushing data to the repository. The Openshift tooling also helps you to deploy new applications and services directly from Eclipse.

Resources: 3 * (1GB disk, 512MB memory)
Languages: Java, PHP, Ruby, Python, Javascript (nodejs), Perl, Go
Databases: Mongo, Mysql, PostgreSQL


I’ve also looked into Bluemix, this is a new service from IBM that I feel is a bit in beta stage at the moment. The interface is sluggish and the business case is not directed to prototype development.

So to the functionallity. Adding a new application, in my case a liberty profile, was quite easy. And after the installation the interface informed me that I could use cloud foundry console tools like Appfog uses, Bluemix tools inside Eclipse or Git repository for my deployment.

I got the git repository to work for me without much hassle. Installing extra services like Mysql wasn’t hard either but I realized that Mongo db wasn’t labeled as a data management module. It was labeled as a web application service so I missed it at first. And I couldn’t find any service for actual management of the data inside my databases.

The connection information uses the same standard as Appfog with the VCAP_SERVICES environment variable. So if you use Appfog for your application today the migration should be quite seamless when it comes to database connections.

Another way to deploy your application is to use the Bluemix tools for Eclipse. If you have tried the Liberty profile server inside Eclipse the Bluemix setup is similar. You install the tools via link or marketplace. Then you just add a new server to your servers tab.
Then you can use the server as any other server in the list. Build your application and then choose run on server.
Using this approach is a bit slower than the git option and works only for liberty profile. A plus is that you can utilize a local installation to test your application and then deploy it to Bluemix in the same way.

The default logging the scenario is much like at Appfog. You could use the cloud foundary tools to read some log files and browse for your application log files.
If you use Liberty Profile on the other hand you have another option. The Bluemix tools for Eclipse will show your system console, apache activity and server output in your server console when you choose to run your application on Bluemix server. Some server exceptions will also be fetched and displayed in a separate popup window.

Many things in Bluemix they are still in beta like (2015-04-08).

Resources: Bluemix has no free tear, you pay per GB used every hour
(Cost of App = (App’s GB-hours – 375 Free GB-hours) x $0.07)
Languages: Java, PHP, Ruby, Python, Javascript (nodejs)
Databases: Cloudant, DB2, Mysql, PostgreSQL, Mongo

Add/remove to segment

Customer segments aren’t the most straight forward concept. It should be and works fine but the back-end handling seems to be a set of different solutions put together to make the whole.

First of we have the customer segments created as member groups. These groups have names and some small settings but only a few is saved in the actual mbrgrp tables. We have the old functionality from the Accelerator saved in mbrgrp tables like mbrgrpcond where you save a configuration for the member group. A set of conditions that needs to be met in order to put a member into a member group. And you also have the possibility to put someone implicitly into an member group or exclude them from a member group. The inclusions or exclusions is done in the mbrgrpmbr table.

The priority of these functions is exclusions then inclusions and lastly the condition configuration is validated.

These three functionalities works fine and has presentations inside of the Management Center so you can see who is in a member group or what conditions needs to be met in order for someone to be put into this member group.

Then we come to the disconnect. When you choose the checkbox to add or remove customers to a segment using marketing activities then the condition configuration signals that the data for the customers inside a segment will be saved in a separate table.

select cast(conditions as varchar(20000)) from mbrgrpcond where mbrgrp_id = ?;
<?xml version="1.0" encoding="UTF-8"?>
    <variable name="marketingPopulates"/>
    <operator name="="/>
    <value data="T"/>

So when a customer is added this way then the system will add the PERSONALIZATIONID of the current session into the DMMBRGRPPZN table.

This means that in that setup the priority will change to exclusions then inclusions and lastly the users added via a marketing activity.

Another thing to keep in mind is that the first tables all use member ID as there reference for inclusion and exclusion but the last solution in the DMMBRGRPPZN uses PERSONALIZATIONID. This is because marketing activities could target users who isn’t logged in to the commerce site, and a PERSONALIZATIONID is saved in the session and could therefor be targeted against guest users as well as registered customers. The PERSONALIZATIONID is saved in the users table for the registered users so you could find the connection to an member groups registered members by using

select * from USERS as u, DMMBRGRPPZN as mgp 
    u.personalizationid = mgp.personalizationid 
    and mgp.mbrgrp_id = ?;