Developer Insights

Scaling Jama

The approach that has helped us balance the need to scale with the need to innovate and remain a leader in the marketplace

During its early years, Jama found sales success with a land-and-expand strategy: we would sell to a small team in a large company, and their implementation success would lead to expansion opportunities in other divisions of that company. As a result, our earliest Jama deployments tended to be small, both in number of users and amount of data, so application scalability wasn’t our biggest concern. As a bootstrapped start-up with a small engineering staff, we could instead focus our resources on adding new features to the product.

At that time, the Jama application was built as a single artifact – a WAR file that a customer could deploy into any Java application server. Jama has always been available as both an on-premises install and as a Jama-hosted application. The same WAR file was also deployed into our hosted environment, which meant our hosted and on-premises deployments shared the exact same codebase and build process. This simplicity also allowed us to focus on feature development and improvements to the application, without dedicating a team to building and deployment of the application. While this helped us to innovate and evolve the application faster, we knew that this architecture would eventually reach its limits.

With this architecture, the only way to scale the Jama application was up or down, commonly known as vertical scaling. Customers could dedicate more CPUs, or faster CPUs, faster disk drives, or more memory to their application server. Each of these changes would usually provide incremental improvements in scaling. The only constraints were those of the physical hardware Jama was deployed on. While this provided some relief for larger customers, we knew that this was also a temporary measure. As customers added more and more users, and more data into Jama, the application would eventually require more resources than a single machine could provide. Splitting a large Jama installation across multiple servers and databases was an option, albeit one with some adverse effects. In addition to the overhead of maintaining multiple installations, a customer’s data would have to be separated into several databases. A key benefit of Jama is that users can collaborate on all their data across multiple projects in one place, and physically separating the data would negate that benefit.

Some time ago we dedicated a team of developers to scale Jama horizontally. This team’s goal was to evolve the Jama architecture to one where we could scale a Jama installation by simply adding more hardware. By then we already had several hundred on-premises installations, and several hundred more clients in our hosted multi-tenant environment. Even though we were a SaaS company, we continued to support our on-premises customers, so we needed a horizontal scaling solution for both of these deployment scenarios. We also ruled out building a new application from the ground up, which would force customers to migrate to a new platform, so we needed to incrementally evolve the existing architecture. And we had to continue improving the product and adding new features, so the architecture evolution had to happen in parallel with uninterrupted feature development.

In order to evolve our monolithic application to be horizontally scalable, we knew we had to remove the shared state from inside the monolith. There were many forms of shared state in our application – search indexes, caches, message queues, file system etc. Our first step was to move that shared state out of the monolith into separate services. What was left of the monolith (the ‘Jama core’) would rely on these new services, each of which could be scaled independently. Multiple Jama cores could then use the same shared services clusters, and we could add/remove Jama cores as needed. This approach enabled us to incrementally evolve our architecture. As soon as we extract a service from the Jama core, it can be deployed to our hosted environment, without requiring a ‘big bang’ deployment of all of them at the end. To learn how we deploy the same code and features to our on-premises customers, see this blog post on The Long Road To Docker .

Once we finish moving state out into these foundational services (e.g. search, cache, filesystem) we can build new Jama features as separate services instead of putting them inside the Jama core. Building and testing a new feature in its own service is simpler, it can be independently scalable, teams are decoupled from one another and it can be evolved independent of the core Jama application and other services. We can also extract the existing functionality inside Jama core into separate services, pulling them out of the monolith. As with any codebase that is several years old, the oldest parts become more brittle and difficult to change over time, so moving them into standalone services will make it easier to improve and scale those features independently.

The first foundational service we tackled was search. We were using the Lucene search library, which wrote search indexes to the filesystem, and maintained them with worker threads in the Jama application. This tightly coupled our search capability to the Jama core it was running in. We took the following steps to extract search out of the monolith into a service:

  1. Create a Searcher interface in the Jama application, and a LocalSearcher implementation of it, for everywhere we used Lucene.
  2. Implement a search service as a thin wrapper around ElasticSearch, a distibuted search server
  3. Create a RemoteSearcher implementation of the Searcher interface, which used the search service using a REST API.

Based on configuration settings, the Searcher interface would proxy to either the local or remote implementation. This allowed us to continue using (and deploying) the Lucene-based local search until we completed the remote implementation. During testing of the remote implementation, the local implementation was also searched automatically, and any discrepancies in the two sets of results were reported as errors. This approach helped us know when we achieved parity in the two implementations. Once we completed the remote implementation and switched over to it, we were able to remove the local implementation and all Lucene dependencies from our codebase. The next service we’re getting ready to deploy is a cache service, and work on distributed messaging has also begun.

New feature development at this time is still happening inside the monolith, mainly because there are still foundational services to be extracted. Yet we are deliberately designing new features to be easily extractable into their own services in the future. We implement ingress and egress interfaces to control the interactions between a new feature and the Jama core. These interfaces describe the coupling between the new feature and the rest of the application, which are potential concerns when we convert it to a service, so a design goal is to make those interfaces as small as possible.

This work happens in parallel with our regular Jama development work – while one or two teams work on services, other teams have been adding new features and making other improvements to Jama – and we have continued to deploy new releases at our monthly cadence. By incrementally adding services we avoided forking our codebase and diverging into two different platforms. We remain responsive to the marketplace by adding new features while moving towards horizontal scalability, and we continue to deliver value to our customers with monthly releases. This approach to scaling Jama has helped us balance the need to scale our application with the need to innovate and remain a leader in the marketplace.

  • Sean Adkinson

    Great write-up Ranjeewa. I love how simple the migration seems. Just create an interface, and implement it a couple times, and pull the switch when it is ready. I’m sure the implementation had much more gotchas, however :-).

    I’m curious if you were able to create and use `Searcher` without changing any client code, or did you need to change the code using the `Searcher` to match what you knew you were going to need for elastcisearch down the road? With code migrations like this, even just extracting the interface, I find the latter is more common, as code sometimes is so bound, that it requires refactoring just to find the proper methods for the abstraction.

    Hope you all are doing great!

    • Hi Sean, nice to hear from you :-). Yes, getting the right level of abstraction in the common interface did take several iterations, during which client code could also change. ~Ranjeewa