Understanding and improving legacy code can be a challenge. Read by Refactoring can help you make sense of code you are trying to improve, but sometimes you need to look at things in a slightly different way to see a way forward.
The technique described here is a simple, effective way to help visualize what’s going on in a class or method you are trying to understand.
Jama’s continuous integration (CI) system flags flaky tests for us. These are tests that fail and then pass again seemingly randomly, usually infrequently, and with no apparent changes to the code they are testing. A test I wrote a few years ago started to act flaky, the system alerted me, and I looked into it.
The source of the random test failures was a third-party mocking framework for HTTP calls. Something about the interaction between it, my code, and our CI system was causing unexpected HTTP timeouts. The test was also interacting with the file system — a big problem because unit tests shouldn’t be doing this by definition, and the file access made the test unacceptably slow.
Like I said, I wrote this code a few years ago, and since have learned some things. One of those things is I’ve become more and more hesitant to reach for a mocking framework to help overcome unit testing challenges. I’ve found that the need for mocks often (not always!) is an indicator of code with a design problem. When looking at my old code with my now slightly more experienced eyes, I saw that the mock library I pulled in was simply covering for bad code. The ultimate solution for the flaky test was to refactor/rewrite the code so it was better structured and could be tested without use of a mock.
In order to eliminate the need for the mock library, I had to eliminate the dependency on the call to the HTTP client. To do this, I had to figure out a way to nudge the dependency up, out of the class so it could be injected at run time with a real network connection, or at test time with a test double. Looking at the method in question, it wasn’t obvious how that would be possible. The more I studied the code, the more I became convinced it was doing too much. A well designed class should have one reason to change. I suspected this class had several reasons to change – a clear violation of the Single Responsibility Principle.
I wanted a clearer way to see what the code was doing, and I recalled a blog post by Martin Fowler walking through the refactoring of a video service for YouTube. To illustrate what we has doing, he used different text colors in the code to show the different concerns the code was handling. I thought this colorization of the code would help me with my own refactoring challenge.
The first challenge is overcoming the font coloring imposed by your IDE. IDEs are syntax-aware, helpfully coloring your code so you can clearly see the different parts of your programming language. It’s a great way to develop, but this takes away a lot of control for coloring the code in a syntax-neutral manner like I wanted. Your IDE may be more flexible, but I had to drop my source file into EMACS so I had full control over the font. Other editors may work just as well for this purpose.
I identified four responsibilities this single method was performing. I kept the categories broad, knowing that I could further separate concerns after breaking up the large method. The HTTP client, buried in the middle, seems to be not extractable.
The color key for the responsibilities is:
- Setting up file/http params for the send
- Setting up and using HTTP client and handling the response/errors
- Recording status and history
- Encoding a customer identifier
Identifying the separate concerns of the code is usually easier than disentangling them. Examine the code line by line and ask why it’s there – what is the larger task it is performing – and then group it with code concerned with the same task. You’ll usually find colors in blocks (as in the image above), as lines of code accomplishing the same task are necessarily next to each other.
Other clues to identifying concerns can be found by tracing the method parameters through the method. Methods with several parameters can be a sign of too many responsibilities. Each parameter is passed in for a different reason, and each of these reasons may be a separate concern.
Now that I could clearly see the responsibilities, I went back to the IDE and used my refactoring tools to first extract each color to it’s own method, and then, extract those methods to their own, cohesive classes. Luckily I had a unit test (albeit flakey) to guide me during the refactorings, and I created new tests as new objects became apparent.
This isn’t a post on refactoring techniques (see Martin Fowler for that), so I won’t go into the details of my changes, but the final version reduced the method to a single responsibility and 3 lines of code:
This code is entirely concerned with sending the specified file to an http client and asking a service to update the history of the file based on the returned status. The other 37 lines of code didn’t necessarily get deleted, but what is left is grouped together into smaller objects that do one thing and do that one thing well. The target class now collaborates with the other objects to manage the sending of a file over an HTTP client. Each responsibility of this task is now separate and easily tested in isolation. The HTTP client causing the testing grief has now been isolated and is injected into the class that needs it, making it easy to inject a simple test double for unit testing. No more flakiness.
Looking at a big block of code can be intimidating and confusing; it can be hard to figure out where to start. Stepping out of your IDE and leveraging other ways of looking at code can provide insights on what it’s doing and how it can be improved. Sometimes all we need is a slight change in perspective, and a little bit of color.