You are here

Some thoughts on performance & time synchronization.

In preparation for the 2015 ACT-R workshop, I've been working with my boss on an interesting model. What makes it interesting is that it runs full throttle for about 30 minutes of interactions, followed by a day of rest, for almost a full year. It's a model that actually does quite a bit of work, has massive numbers of references, and is very long lived. My boss built the initial model. I then took it and did an almost 1:1 conversion to jACT-R. It's provided some interesting comparisons, insights, and new optimizations.

Performance

For all the obvious reasons, I'd like jACT-R to be performant and scale flexibly. The distributed model with jACT-R and CommonReality has been very profitable in that respect. However, we've recently reached a bottleneck when it comes to running multiple models within the runtime.

Overhead

The first approach was to cut back on the overhead. jACT-R relies on many layers of abstraction, in theory it's just a matter of choosing the right layer to split. Specifically, I created the EmbedConnector[javadoc pending], similar to the LocalConnector, which permits access to a ThinAgent, a minimal overhead interface to the model's perception. 

When running in this configuration (supported through the IDE, or API), the modeler/developer must manually create, add, update, and remove the percepts pushed to the model. This is much closer in design to what the canonical ACT-R uses (i.e., device interface). 

This change, while being less reusable, does dramatically improve performance. It's hardly surprising, given that the use of CommonReality proper requires participant synchronization and messaging. This literally doubled the runtime throughput. I could run two models in the EmbedConnector runtime in the same amount of time as a single with the CommonRealityConnector. But that was it. The system could definitely do better, I still had six cores free, but couldn't utilize them fully.

It's About Time

The problem with maximizing the core use was due to the standard in distributed simulations: time synchronization. You must have some measure of time synchronization in a distributed simulation for fairness and consistency. It's even more important for scientific simulations where interactions are inherently temporal. If you've got two models running and they are supposed to be interacting, you want their clocks to be synchronized. If the models could run in lock-step (with a constant time increment), this wouldn't be a performance issue, but they don't. Differences in behavior and subsymbolics will produce model runs that are available to do work at irregular intervals. At irregular intervals, you infrequently get both models running at the sametime. 

 

The solution is to permit independent clocks for models in the same runtime. Assuming your models do not interact, and your experiment can be run safely across multiple models (some new tools coming to make this easier), you can run in this manner. It requires the use of a hidden system property (or API calls), and is intentionally obscure to prevent people from jumping in too soon. If your experiment is not thread-safe, and multi-model compatible, you're going to get deadlocks left and right (see the new DeadlockDetector instrument). How well does it perform? You're only limited by the number of cores you have. On my MBP2009, i5 quad-core (dual thread), a single model, running full throttle, takes 110% cpu (~1.1 cores) to completion in 4 minutes (2.5months/minute). Running seven models, 790% cpu (out of 800%) to completion in 6.5 minutes (~10months/minute).

Ramifications

It is now possible to do ridiculous data collection/generation by combining this and the iterative runs. 
The EmbedConnector and indepdent clocks also open the door on easier embedding within other applications. If you embed using the RuntimeBuilder and EmbedTools, just supply the EmbedConnector and you're good to go.

Final Thoughts

If you look at the source code for jACT-R, it can be intimidating. There are hundreds more classes than you'd think, and it can be hazy how the pieces come together. That's the fault of my documentation - but because of these design decisions, targeted and layered optimizations are not only possible, but fairly easy. The modelers can define how they want their model to be run, without changing the model or the experiments. Let me be clear, the model did not change at all through this development. Nor did the experiment (as it was already designed to handle multiple models safely). The runtime is configured independent of the model. How the runtime is configured has no impact on the model's run or predictions, just its speed. And that's kind of awesome, if I do say so myself.

p.s. Most of the major 2.0 changes have happened. It's mostly some updating of a code from older idioms to the newer ones. There will be a release in advance of the ACT-R workshop.