On mutable shared state

Problems with mutable shared state are often discussed in a concurrent programming context. However, those who are lucky enough to avoid concurrent programming can still fall in trouble because of mutable shared state. First pitfall is a global variable. Everyone knows global variables are bad. Although in software design there is no absolute truth or principle, this statement is almost always true. But why are they bad? The full list of reasons can be looked up here, but in a few words – it is because it hinders code understandability and can cause nasty bugs. Understanding of function which uses global variable is never easy because you can not reason about it only by its signature. You can never predict its behavior because you can never be sure what state will global variable have during function’s execution. For the same reason, it is easy to make a mistake during coding.

So, if you don’t use global variables and don’t do concurrent programming, are you safe then? Not really. First, sometimes you can’t avoid global variables. For example, if you program in JavaScript. The problem with JavaScript is that it requires global variables. JavaScript does not have a linker. All compilation units are loaded into a common global object. Second, in case you are doing ordinary object-oriented programming, you can fall in trouble with your objects’ state. Take a close look at this design:

public class Student
{
    public string Name { get; set; }
    public int Age { get; set; }
}

public class AcademicGroup
{
    private readonly List<Student> _students;

    public AcademicGroup(List<Student> students)
    {
        if (HaveUniqueNames(students))
            _students = students;
        else
            throw new ArgumentException(
                "Two or more students with the same names are not allowed!");
    }

    public bool IsValid()
    {
        return HaveUniqueNames(_students);
    }

    private bool HaveUniqueNames(List<Student> students)
    {
        return students
            .Select(s => s.Name)
            .Distinct()
            .Count() == students.Count;
    }
}

AcademicGroup is perfectly encapsulated little class which holds a list of students and for this class to be valid all those students must have unique names. To ensure this, we have validation check in the constructor. Internal data is private (and even read-only!), hence it seems like no one can destroy successfully constructed object of this class. However, look at this usage of the class:

class Program
{
    static void Main()
    {
        var students = new List<Student>
            {
                new Student { Name = "Arnold Schwarzenegger", Age = 18},
                new Student { Name = "Britney Spears", Age = 20},
                new Student { Name = "Barak Obama", Age = 19}
            };

        var academicGroup = new AcademicGroup(students);
        Console.WriteLine(academicGroup.IsValid()); //True
        var anotherBritney = new Student { Name = "Britney Spears", Age = 30 };
        students.Add(anotherBritney);
        Console.WriteLine(academicGroup.IsValid()); //False

        students.Remove(anotherBritney);
        academicGroup = new AcademicGroup(students);
        Console.WriteLine(academicGroup.IsValid()); //True
        students.FirstOrDefault(s => s.Name == "Britney Spears").Name = "Barak Obama";
        Console.WriteLine(academicGroup.IsValid()); //False
    }
}

Nice, is not it? Especially nice to ask students about encapsulation using an example like this one. The internal state of the academic group has been broken after its construction. This line, students.Add(anotherBritney), could have happened anywhere in the system, so that you would search for the reason of having your object in the incorrect state for many hours. And maybe even on the customer side. The same issue arises when we use singleton pattern to simulate global variable.

The problem can be formulated as a combination of 3 conditions: 1) there is shared state between two modules (between Program and AcademicGroup there is the list of students); 2) there is a change of the shared state by one module without notifying another module which have references to this shared state (after creating object of AcademicGroup, Program changes list of students and does not notify about this action newly created object); 3) one of the modules which shares the state has assumptions about this state i.e. correctness of this module depends on the shared state (AcademicGroup assumes list of students to have unique references).

To avoid this problem, one must do one of the following 1) Make shared state immutable; 2) Notify all others about the change of the shared state which they depend on; 3) All modules should avoid making any assumptions about the shared state.

In my example, I can’t avoid making assumptions about incoming to AcademicGroup data. Therefore, I can try to do notification by adding another method to the AcademicGroup:

public class AcademicGroup
{
...
    public void NotifyAboutStudentsChange()
    {
        if (!IsValid())
            throw new
                InvalidConstraintException(
                   "Two or more students with the same names are not allowed!");
    }
....
}

And then notifying all objects of AcademicGroup like this:

class Program
{
    static void Main()
    {
        var students = new List<Student>
            {
                new Student { Name = "Arnold Schwarzenegger", Age = 18},
                new Student { Name = "Britney Spears", Age = 20},
                new Student { Name = "Barak Obama", Age = 19}
            };

        var academicGroup = new AcademicGroup(students);
        Console.WriteLine(academicGroup.IsValid()); //True
        var anotherBritney = new Student { Name = "Britney Spears", Age = 30 };
        students.Add(anotherBritney);
        academicGroup.NotifyAboutStudentsChange();  //Trows
        Console.WriteLine(academicGroup.IsValid()); //This line never executes

        students.Remove(anotherBritney);
        academicGroup = new AcademicGroup(students);
        Console.WriteLine(academicGroup.IsValid()); //True
        students.FirstOrDefault(s => s.Name == "Britney Spears").Name = "Barak Obama";
        academicGroup.NotifyAboutStudentsChange(); //Trows
        Console.WriteLine(academicGroup.IsValid()); //This line never executes
    }
}

Nasty mess, is not it? But without it, we have a design flaw which makes our code hard to understand and buggy. But don’t get me wrong, I am not suggesting creating stuff like this in your designs. No, although it is a solution if you can’t avoid assumptions about shared state and you can’t make shared state immutable. To my mind the best solution here is going functional way, even though in some cases it may be less efficient. Here is what well-designed code would look like (thanks to heaven we now have immutable collections in .NET 4.5):

class Program
{
    static void Main()
    {
        var students = new List<Student>
            {
                new Student(name: "Arnold Schwarzenegger", age: 18),
                new Student(name: "Britney Spears", age: 20),
                new Student(name: "Barak Obama", age: 19)
            };

        var academicGroup = new AcademicGroup(students);
        Console.WriteLine(academicGroup.IsValid()); //True
        var anotherBritney = new Student(name: "Britney Spears", age: 30);
        students.Add(anotherBritney);
        Console.WriteLine(academicGroup.IsValid()); //True

        students.Remove(anotherBritney);
        academicGroup = new AcademicGroup(students);
        Console.WriteLine(academicGroup.IsValid()); //True
        
        //This does not even compile anymore: 
        //students.FirstOrDefault(s => s.Name == "Britney Spears").Name = "Barak Obama";
        
        var student = students.FirstOrDefault(s => s.Name == "Britney Spears");
        students[students.IndexOf(student)] =
            new Student(name: "Barak Obama", age: students[students.IndexOf(student)].Age);
        Console.WriteLine(academicGroup.IsValid()); //True
    }
}
public class Student
{
    //This can be done only in C# 6
    //In C# 5 you have to create readonly backing fields
    public string Name { get; }
    public int Age { get; }

    public Student(string name, int age)
    {
        Name = name;
        Age = age;
    }
}
public class AcademicGroup
{
    private readonly ImmutableList<Student> _students;

    public AcademicGroup(IEnumerable<Student> students)
    {
        _students = students.ToImmutableList();
        if (!HaveUniqueNames(_students))
            throw new ArgumentException(
                "Two or more students with the same names are not allowed!");
    }

    public bool IsValid()
    {
        return HaveUniqueNames(_students);
    }

    private bool HaveUniqueNames(IReadOnlyCollection<Student> students)
    {
        return students
            .Select(s => s.Name)
            .Distinct()
            .Count() == students.Count;
    }
}

To conclude, I would say I start more often thinking that I am not a fan of classical object-oriented programming anymore. Some functional elements, like immutable state, seem to be a really powerful tool to simplify designs and make us less likely to write a bug.

Facebooktwittergoogle_pluslinkedinmailFacebooktwittergoogle_pluslinkedinmail

Procedural abstraction vs code container

I am still fascinated when I think about the invention of a subroutine. It was the transition from a huge list of commands (ancient program) to the set of subprograms glued together for achieving some computation purpose. The mechanism of subroutine was invented to support the developer in his/her procedural abstractions design. The developer back then just like today was concerned with 1) reducing huge software complexity and 2) reducing software cost by maximizing reuse. In classical engineering disciplines, like electrical, radio, mechanical, abstraction was widely and successfully used for these purposes. Abstracting essentially means creating some component, whose internal structure and functioning can be ignored during construction of a larger system. Only some external description, i.e. interface should be taken into account. Hence, the subroutine is fundamentally a tool, present in all modern programming languages, to define procedural abstractions. It is called procedural because it hides some procedure/algorithm inside. Subroutine’s instance can also be viewed as an ideal component in a system. Ideal because it is pure. Because it is abstraction itself over machine’s commands set. It can’t be destroyed with time, its characteristics do not depend on environmental conditions. Interface to this component is subroutine’s signature.

Unfortunately, it is only a tool, and therefore it does not guarantee that it will be used to create an abstraction. As any other tool, it can be misused. And you have to be careful to avoid this. The problem is that without any hard thought it is very easy to start using subroutine not as a procedural abstraction, but as code container. Here is what happens. You have to write code, right? You need to place the code somewhere. Oh, dear! Here is a subroutine called “DoEverything”. It accepts 30 parameters, you need 15-th, 23-rd. And one more. You add a 31-st parameter to this subroutine (note, no older code in this subroutine needs this parameter!), add few lines of code into it and your are done! Great! But in reality you just reverted to ancient programming. You made your code more complex and prone to bugs. You will not be able to reuse these few lines of code when needed because they are integrated into the mess named “DoEverything”. This concept of procedural abstraction vs code container may seem obvious, but pretty much all developers (including myself, of course) do the mistake of abusing subroutine. But how do you recognize you have made this design error. Well, I would suggest the following tests:

1) Try to imagine yourself as your code’s maintainer and look at the signature of your subroutine. Can you tell what your subroutine do only by its signature? If you can’t – you have created code container. For example, look at this signature

Book GetBookById(Repository r, int id);

Or this one

Money TotalCost(List<Book> books);

You will hardly dive into the code of these subroutines to learn what they do. They are nicely designed procedural abstractions. Now look at this signature

void ExecutePostChange(bool totalRecalc);

What does this one do? Well, everything when something changes. But what exactly? And what the heck does totalRecalc mean? This subroutine very likely is a code container.

2) Try explaining to someone (maybe to your rubber duck) what your subroutine does in one or two short sentences. Takes much more time/space? You have created code container, otherwise nice abstraction is designed. Make sure you write code which is as close to those sentences as possible.

3) Try to imagine your subroutine reused in some other context. It should be easy with nice abstraction with a single responsibility. You will hardly reuse ExecutePostChange subroutine, which I alluded to previously, it does too many unrelated things. TotalCost subroutine seems to be reusable enough. But if TotalCost violates single responsibility principle and also saves the result of the calculation into some file or whatever, it will not be easy to reuse it. In this case in spite of its nice signature and well expressible internals, its design is more like code container.

It is also worth noting, that there is another abstraction widely used in software, which is data abstraction. It is when one can ignore internal data organization/structure and only rely on a set of operations which are available to perform with given data structure. Objects in OOP are good examples of data abstractions and it is also very easy to misuse this tool and create just subroutines container. In such a case, it is just simple pre-OOP procedural programming like it was in C. This is a bit different topic, however, will discuss it next time.

Facebooktwittergoogle_pluslinkedinmailFacebooktwittergoogle_pluslinkedinmail

Why clean code and automated testing are so important for software engineering?

In this post I would like to dig a little bit into philosophical topic, trying to use some analogies between classical engineering disciplines and software development to explain why clean code and automated testing became so important and widely used in software industry.

It is very common today to consider software development to be a craft, not engineering. And while I would agree with this statement to some degree, the argument like “there is very little math in programming, hence it can not be systematic and reliable enough, hence it can not be considered as an engineering discipline” seems to be not really valid. The important insight was given in this amazing article, which simply states that code we write is actually a software design. We used to think that software design is done by very smart guys who draw impressive UML or whatever kind of diagrams. And then monkeys in cubicles sit down and perform actual construction of the code without applying much intelligence to the process. The final product, the machine code after compilation, then shipped to the customer. The agile methodologies boom just destroyed this idea and proved that code in some high-level programming language is actually formal design specification. Formal enough to be taken and without any additional creative thought transformed to the real product, which bring value to the customer.

Now, in classical engineering disciplines, they have similar to an agile process. Design phase goes in iterations and some formal description of future product gets created. This description is further used in the factories and plants, sometimes 100% automated to build final products. To me, it looks like the very close match to software development. The only point, which I think people often miss, is the identification of the final product in software development. I believe it is not a compiled code. Because it is conceptually the same thing as source code, but only translated to the language of worker, so to say. And for interpreted languages there is no such a thing as compilation step. So what is the final product then? Well, it is what brings value to the customer – the process which develops under our design (program). And computing machine is just 100% automated factory.

Ok, but why should you care? You should care because your code is DESIGN SPECIFICATION. And in classical engineering disciplines these specifications are done in the very well defined way with a strong focus on understandability. These specs are media in which design is done. It is used to capture design decisions, to simulate and test future products. These specs are communication media for engineering teams. And by the way, craftsmen never did formal specifications of their designs.

Software engineering is not an exception. The only difference is that we tend to make a mess in our designs. This makes it harder to reflect on systems properties and test them. This is why clean code is essential for establishing software development as an engineering discipline. This is why modern professional developers care a lot about clarity of their designs.

Ok, but what about math? Should we also start intensively use math to become engineers? Well, I don’t think so. I was trained as a classical radio engineer in the university. And I remember very well sitting at the computer with my professor and implementing in BASIC differential equations which described the oscillating circuit. And professor asked if I knew why we were doing it. I just stared at him with surprise in my eyes. The answer was simple, it was much cheaper to analyze characteristics of the circuit we were designing using the mathematical model of the circuit and not the real circuit. I was testing my electric circuit without spending money and time to buy all the components and solder them together (not that it was not fun to build real circuit, I still remember this incomparable feeling of the act of creation). It is simply much cheaper to get feedback on what we have designed before we have built the actual product. But still, mathematical models can be wrong, and testing should ultimately be done to ensure that product does what it is expected to do. Have you ever watched Discovery shows where people test new airplanes? This is just the colossal amount of money. But they have to prove with testing that their airplanes can actually take off, fly and land. No mathematical model can eliminate this design step.

Now, how much does it cost to run the software, i.e. build our products and see if they actually work? It is almost free! Why then should industry spend money on adopting complex formal methods? I can’t see any reason for this and industry seems to be supporting my statement. Analytical proof of correctness of the software seems to be redundant, therefore only empirical methods are left, i.e. testing. However, nice code analyzers are always welcome if they don’t require any additional work to be done on developer’s part. But they will hardly be able to prove that user, for example, can actually save the document in Microsoft Word. If one day this kind of requirement will be possible to describe formally, no doubt, there will be tools wich will transform this requirement to executable programs.
So, the conclusion here is that testing is another ingredient, required for our profession to become engineering.

However, there is a little problem with testing in software. And this problem is in “soft” concept. Software changes very often, it evolves like some organism to meet changing environmental requirements. And without ability to test relatively cheap that software system is still ok after every change, we can’t talk about a systematic approach to software development. Hence, simple testing is not enough. Tests should also be formal and repeatable, i.e. automatic.

Conclusion. Developing software ultimately means designing processes, intended to happen in computing machines. Processes are not material and therefore products, created by software developers, are ideal. Math is not so important because of mentioned idealism and cheapness of building a final product from design spec. This makes a huge difference because all other engineering disciplines build material products. It does not mean, however, that software developers can’t engineer their products,  i.e. develop them systematically, rigorously, with repeatable procedures. For this to happen we have to embrace clean coding as well as automated testing – the only practical way to prove software correctness these days. The more systematically we will do this, the closer we will be to software engineering.

Facebooktwittergoogle_pluslinkedinmailFacebooktwittergoogle_pluslinkedinmail