Thursday, January 21, 2010

Why COBOL Is Bad For Your Health

Before you think that I'm going to continue one of the eternal developers discussions like Windows x Linux, or C# x Java or even OGL x DX, I'm not. COBOL is a useful language and will remain that way for a very long time. It has and keeps serving its purpose, which is to be a language targeted at non-programmers, mostly business analysts, with very few or none programming knowledge whatsoever. What I'm about to state here are the deficiencies of COBOL: being business oriented has its cost and COBOL pays dearly for it. Also appreciate that I have good knowledge over COBOL, Mainframe, and Batch architecture. However, I was groomed in C/C++ and specialize in distributed systems, so I have a reasonable understanding of both worlds. 


Recently a fellow in my team asked me why I hated COBOL so much. To keep things short, my answer was that I did not hate COBOL at all, but I thought that there were better languages which could do COBOL work better; I stated that COBOL syntax might be easy and simple, however, COBOL programs are semantically obscure and can often lead very bad algorithms. I'm now going to explain why I think that.
Remember that I'm not a doom-sayer. COBOL isn't dead, nor is it going to die. It has its purpose, and it does it well enough. People will keep learning COBOL for a long time now, and many enterprises will continuously grow their mainframe platform. 
So, without further ado, let's give you my reasons why I believe that COBOL is bad for your health:


SECTION A - CODE SAFETY:

I. All Variables Are Global
It probably goes without saying (at least to any weathered, non-COBOL programmer), that you shouldn't use global variables in your programs. Globals are bad for your health because it's hard to predict their value, the reason for that beings that every single instruction inside the program might modify it. 


If your program has less than two hundred lines and variables follow good naming rules and if you're not using redefines, maybe you can find out where the variable is accessed and predict its value. But in a world where the average COBOL program has way more than ten times that number of lines, you are in for a very hard "mind compiling" experience.


Moreover, there's a have a side-effect of variable cramming. Whenever a programmer needs to extend or fix the source of a COBOL program, instead of using the variables that already are there (since he can't know where the given variable is accessed), he declares a new variable. The effect of this is that the source end up having more variables than effectively necessary and gets even harder to read. Add to that the fact that COBOL doesn't allow variables to be declared within procedure code (like old C), and now you have this programmer hell: many variables whose declaration are very far on the code from their use spot, which means a lot of scrolling up and down the source.

II. Variables Aren't Type Safe
Type safety is a very complex subject. Many languages that are perceived as type safe actually aren't -- C/C++ can cast anything to void, and void can be cast to anything -- but COBOL goes way beyond that when it gives programmers REDEFINES. REDEFINES allows anything to be seem as a different type at compile time, and is, in many ways, a cast. One can argue that C/C++ presents us with a similar structures with unions. However, for some reason, C/C++ programs hardly ever use unions, preferring to have a bytestream that is then copied to a new instance of a certain type.


Besides, COBOL also have "untyped" variables, called group items. Group items in COBOL are similar to  C structs, being a definition of a group of variables that are aligned together in the memory. However, in COBOL, those group items doesn't have a defined type, and the compiler allows that any date be moved to such group or from the group. There is no runtime boundary checking as well, so you can easily overflow the area. What COBOL does for you instead is area truncating. It's completely left to the programmer the responsibility of knowing the types fit.*

II. Variables Aren't Really Typed
This one will probably be the most polemic point here. COBOL use a typing system that includes mainly two types of variables, numeric and text. Numeric variables can be of COMPUTATIONAL type, which means that they allow numeric data but such data is stored on a different way -- compacted. The first criticism here is that, for a language called 3GL, COBOL, exposes a lot of the underlying implementation to its programmer, which has to know the differences between compacted COMPUTATIONAL data and "common". Of course, there are historical factors that led to this implementation, namely, the fact that storage was way more expensive when COBOL was conceived. But using this as an excuse only proves that COBOL is obsolete and should be dumped.


SECTION B - Code Structure


I. Where You Write Your Code Matters
COBOL still inherits a lot from punched-card days. In COBOL, code can only be contained between column 8 and 72, and column 7 is reserved for "indicators", that can help you inform the compiler that the following line is a commentary or a continuation from the previous line. Add to that the fact that some commands need to start on what COBOL calls AREA B. Area B starts at the column 11. This means that you have only 61 characters to input commands, which are very long in nature already (you need a least 11 characters to write an attribution, for example). And remember that variables tends to have lots of prefixes and suffixes, because the COBOL scope member operator OF is never used.


II. Periods Are Both Scope And Statement Terminators
Another one of COBOL strange behaviors that will make you shiver. In COBOL, you can finish statements with a period ("."). You can, because most of the time you don't need to. Most of the time, because sometimes they are necessary. Already confused? Well, it gets worse. In COBOL, you also close scope with a period. So, if you begin an IF construct and stick a period just after the first statement, the scope is terminated and whatever comes afterwards is considered outside from the IF. 


So, if you decided to stick periods after all sentences, you can't. So you decide to abandon periods, and be on the safe, never ending a loop or scope accidentally... but, just like we said, you can't.  


III. Idiosyncrasies
COBOL is a champion when it comes to idiosyncrasies. For example, assignments in COBOL are written as MOVE variable TO variable. Moving is usually conceived as taking something from one place and putting it somewhere else, but assignment works by copying the value of a certain variable to the value of another one, and that's exactly what the MOVE operator does in COBOL.


In sum...
There are a lot of reasons why COBOL should be avoided at your enterprise. Sure, you can have a person trained in COBOL in less than a week, but how long will you take to remove bugs from his code? How many bugs will appear in the future? COBOL is a counter-productive language, that encourages bad developers to write bad code.


* COBOL has evolved during the years, and so have the compilers. I wouldn't be surprised if there was a compiler directive that allowed such checking to be made, but I must say that I never saw anyone using it.