The downside of type inference

 

In one of his vibrant pleas to promote Literate Programming, Donald Knuth observed that code is only written once, but read many times (not entirely true, since code under maintenance is under constant refactoring, continuously written and rewritten, but still generally valid: code is read way more than written). Focusing on making code easier to write is therefore futile: what matters is to make code easier to read

Knuth was mainly concerned with the narrative: having a program’s artifacts listed in a sequence constrained by the language syntax and semantics can be counterproductive. Tightly related functions, variables, statements even, that should ideally be presented together may have to be scattered across the source file, making it harder to understand and to maintain. One of Literate Programming’s main contributions is to allow a program’s narrative, the sequence of its artifacts, to be tweaked in favor of the reader, to maximize understandability, rather than presented in an unnatural order for the sake of complying with the language at hand.

 

 

The infamous var

This post is not about sequence, but it starts with the same ill-advised pre-eminence of writing over reading code. Type inference is the mechanism by which the compiler determines a variable, function or expression’s type rather than have it specified explicitly by the coder.

In C#, it translates typically into variable declarations such as:

var p = coll[idx].multipleMatch(a, b);

where p’s type is determined automatically by looking at the expression on the right side of the assignment. Type inference may turn to a cascading concern: the type of p in the example above may depend on the types of a and b, which may have to be inferred as well, depend on other variables, etc.

 

The upside of redundancy

 

At first sight, what is there not to like about type inference? It makes the developer’s life easier, it makes the code mode concise and less redundant, and in today’s world, that should be the end of it. Redundancy is bad, right?

Well, not always.

Quality in software is achieved by coupling redundancy and consistency, expressing the same thing multiple times and ensuring that these multiple occurrences agree with each other. Testing is merely a matter of duplicating the intelligence of the code with sample input data and ensuring that the corresponding output matches one’s expectations.

 

At first sight, what is there not to like about type inference? It makes the developer’s life easier, it makes the code mode concise and less redundant, and in today’s world, that should be the end of it. Redundancy is bad, right?

Well, not always.

Quality in software is achieved by coupling redundancy and consistency, expressing the same thing multiple times and ensuring that these multiple occurrences agree with each other. Testing is merely a matter of duplicating the intelligence of the code with sample input data and ensuring that the corresponding output matches one’s expectations.

Explicit variable declarations are also a form of generally accepted redundancy. Many variable declarations could be inferred would one decide (as some did, and some still do) that this would be a desirable property for a programming language.

While many types can be inferred by the compiler, explicit user-defined typing allows the compiler to – redundantly, but that’s the whole point – make sure that the types computed by the compiler matches the ones specified by the coder.

A program written in this minimal style without explicit types is considered valid when all the inferable types have exactly one solution according to the language inference rules (which can be quite arcane). There is nothing to ensure that this solution is what the developer had in mind. It is a guarantee of consistency, but without redundancy, it is not a guarantee of correctness.

This does not mean that all type inferences are equally bad. There are cases where redundancy does not contribute anything of substance, and where type inference allows for a more elegant and even more readable notation, as in:

var l = new Dictionary<Multiset<String>, Bag<Polygon>>();

which would otherwise be written as:

Dictionary<Multiset<String>, Bag<Polygon>>
         l = new Dictionary<Multiset<String>, Bag<Polygon>>();

where repeating a complex type twice in a single declaration obviously does not help much (to the contrary; one could argue that it takes some serious deciphering to ensure that these two types are indeed isomorphic).

What about the reader?

 

 

Even if the inferred type matches what the developer had in mind, how about the reader? How is he (or she) supposed to access the developer’s unwritten wisdom, casually deferred to the compiler’s inference capability for the sake of brevity?

A human reader does not have the processing power of a compiler, nor access to the same amount of information. Expecting the former to reproduce the process of the later is insane.

Modern development environments – such as Microsoft’s Visual Studio, for instance – do an excellent job at synthesizing this type information on the fly so that it can be displayed for the reader. Still, having it accessible somewhere is not the same as having it written explicitly.

 

 

A good, old idea, falling into oblivion

It is not as if making types more explicit was a new idea by any mean. The Hungarian notation (https://en.wikipedia.org/wiki/Hungarian_notation), made popular by Charles Simonyi, went even further, and used naming conventions to include type information into variable names.

This technique was first introduced for BPCL which did not support explicit types in variable declarations, and where the Hungarian notation was used to document such types.

Ironically enough, now that languages give the ability to type variables, but inference mechanisms can guess types without explicit user specification, it is as if documenting them is no longer valuable.

Go figure.

A damning correlation

When inspecting code, I almost systematically replace inferred types by their explicit counterparts, assisted in this tedious but useful effort by Visual Studio. It makes the code less prone to interpretation. It makes me more efficient as a reader. It avoids misunderstandings that would arise from me making faulty assumptions regarding types of variables, and consequently, expressions and functions.

Having been through this process for a while, I have observed a damning correlation: type inference often resolves to trivial types, such as strings or list of strings, rather than more elaborate types that would make extensive use of sub-classing capabilities.

It is as if contempt for explicit type specifications correlates to contempt for the structuring effect of user data types, contempt that would cause one to forego encapsulating basic types in separate classes to ensure that they are not mixed too liberally, that a file name is not passed as parameter to a function that takes a password, etc.

It is as if it were an indication of a lack of interest in typing in general.

Not a very reassuring thought if you ask me.

19-04-2018 - By in

Leave a Reply

Your email address will not be published. Required fields are marked *