CVS Analysis
As noted in an earlier post, I am currently reading "Notes on the Synthesis of Form" by Christopher Alexander.In chapters 2 and 3, Alexander explains how the aligment between form and context in a design ensemble can be expressed as a system of binary variables. Each variable of the system represents a potential misfit of the ensemble and the value of the variable indicates if the misfit is occurring. Alexander illustrates a such system by referencing an example from the book "Design for a Brain", by W. Ross Ashby - one of the founding fathers of cybernetics. The example is about a system of lightbulbs. Each lightbulb can be either on or off; if a lightbulb is on, it can go off in a second. The lightbulbs are connected, such that the state of any lightbulb depends on the states of the connected lighbulbs: If a lightbulb is off and it is connected to at least one lightbulb that is on, there is a chance that the lightbulb will also go on in a second.
Alexander then analyses how different configurations of the connections between lighbulbs causes the system to react to change in different ways. From the example, Alexander conclude that the internal decomposition of a system is reflected by the way, in which the system reacts to change.
Alexander's description made me think of the object-oriented design principle concerning decomposition: "Low Coupling & High Cohesion". I knew that Alexander had something to do with this, that much seems to be established. But I started to think about ways to determine, if coupling and cohesion are appropriately designed in a software system? I mean, clearly it is something you strive for as a designer, but do you succeed? Of course, this is something you can get a feel for - as a developer. An unstable code base, difficult maintenance, ridding bugs in one part of the system makes bugs appear in other parts etc., are all indicators of bad decomposition. But what if it would be possible to actually measure coupling and cohesion? My idea was to analyze the information in a Concurrent Version System - CVS - repository for a given software system. If one were to correlate update information with information about packages and references across package boundaries, it should be possible to get some measure of the decomposition of the system. E.g. for at system with *bad* decomposition, you would expect that an update to any one component (fixing a bug, introducing a new feature etc.) would "ripple" to components in other parts of the system, and thus result in a swarm of updates across the system.
I decided to do a little checking, to see if something like this has already been developed. I first came across The System and Communication Group, GSyC, at the Universidad Rey Juan Carlos in Spain. GCyC do conduct research based on analysis of CVS repositories. Their approach, however, is based on the application of social network analysis [1] and therefore seems to be primarily concerned with the relationship between developers and code structure - not the structure of the code by itself.
I then found my way to professor, Dr. Harald Gall and his research group at the Department of Informatics at the University of Zurich in Switzerland. Dr. Gall and his group appears to have developed some analysis tools that are capable of perfoming exactly the kind of measurements, that I was thinking of: through the combination of three different analyses, their approach lays bare the logical couplings of the system [2]. If you compare figures 1 and 2 from [2] with the illustrations of force interdependencies in "Notes on…" (pp. 43), the resemblance is striking. By searching for work that cite [2], I was also able to locate the LOOSE research group at Polytechnical University of Timisoara, Romania. The LOOSE group appears to conduct research of a similiar kind [3].
[1] L. Lopez-Fernandez, G. Robles and J. M. Gonzalez-Barahona. Applying Social Network Analysis to the Information in CVS Repositories. In Proceedings of the Mining Software Repositories Workshop. 26th International Conference on Software Engineering, Edinburgh, Scotland, 2004.
[2] H. Gall, M. Jazayeri and J. Krajewski. CVS Release History Data for Detecting Logical Couplings. In Proceedings of the International Workshop on Principles of Software Evolution (IWPSE), Helsinki, Finland, IEEE CS Press, pp. 13-23, September 2003.
[3] T. Girba, S. Ducasse, R. Marinescu and D. Ratiu, Identifying Entities That Change Together. In Proceedings of 9th IEEE Workshop on Empirical Studies of Software Maintenance (WESS 2004), Chicago, USA, 2004.
If you enjoyed this post, make sure you subscribe to my RSS feed!