The Fascinating Science of Software Compilers

What is a Compiler?

A software compiler is a computer program that creates executable programs from source code. The most widely used compiler in the world is the GCC (GNU Compiler Collection), which compiles C, C++, Objective-C, Java™ programs and many other languages. GCC is free software and part of the larger GNU project.

A compiler is a computer program that translates high-level programming language into machine language (binary code). For example, we write one line of C++ code such as “a = b+5”. The compiler generates one line of binary code for it. That single line of computer code can be run by itself and does not need any human intervention to run it.

Types of Compiler

There are many kinds of compiler that target different programming language: Java compiler, C++ compiler, Objective-C compiler, Ada compiler and so on. Most compilers nowadays have a command line interface (CLI) and can be invoked in the following way: gcc myProgram.c -o executableFileName. The –o option indicates your desired output filename.

What are the Popular Compilers?

Many people use and love the GCC compiler which is part of many GNU/Linux distributions. There are also many other compilers that are also widely used for specific platforms: Visual C++, Jikes, the Intel C++ Compiler, Borland C++ Builder, Metrowerks CodeWarrior, Microsoft Visual Studio (which actually includes several different compilers based on the language), IBM XL C/C++ compiler and many more.

The GCC compiler is free software, and so are many other compilers. Most of them are available for purchase, but they are only licensed to use the compiler for specific platforms.

There is a GNU project that provides a large collection of free compilers. The Free Software Foundation (FSF) runs the larger GNU Project that includes GCC, GDB and many other programs that provide all sorts of tools to make various kinds of open source software development possible.

What are the Advantages of Software Compiler?

Software compilers are very important to creating executable programs. Without them, we would have to write each line of code in its machine language form. The advantages of using a compiler include:

The compiler is much faster than humans at translating high-level programming languages into assembly language. The compiler produces an executable program that runs on the target system, which means that they can be used by themselves and do not need a human being to run them. The compiler ensures that all the code is syntactically correct, and therefore reduces debugging time. If the compiler produces an error message, it tells exactly what is wrong; the error message may even include a line number.

Disadvantage of Software Compiler?

The main disadvantage of software compilers is that they create programs that are not guaranteed to be bug-free. This can be particularly annoying on large projects, where it takes a long time to discover and fix a problem in any complex program.

It is very hard to maintain complex programs as they get bigger and more complicated. A typical production code has possibly one hundred thousand lines of code. When we try to understand those lines, it is very hard, if not impossible, to read or understand the code accurately, and it takes a lot of time for a human being to comprehend the code.

The problem can be worse when we use an automated tool such as software compilers.

What is a Binary Code?

Binary code is a form of computer code for computers that uses only two states: the on and off state. The simplest computer that can be imagined is an on-off machine.

What is Assembly Language?

Assembly language, as the name indicates, is the basic form of computer code or machine language. It uses only one operation per computer instruction. It can be very hard to debug when we use an assembly language.

Section 2 of this article discusses the size of the generated binary code and the speed of the compilation.

What is a Binary Compilation?

Binary compilation, sometimes called binary translation, refers to a process in which source code is translated from one programming language into another in a way that makes the resulting computer code as small as possible. Binary compilers are used to translate high-level programming languages, such as C++, into machine language. In most cases, compilers produce object files instead of just an executable program.

The size of the binary code is an important issue for software developers who write applications. The compiled program can be small in size when it runs on a large computer system but not on small portable devices. The binary code generated by a compiler is usually the smallest possible for that computer architecture, and this means that there are some optimizations done by the compiler to make the output as small as possible.

Which Tool can Produce Small Binary Code?

The flat assembler (FASM) is a freeware software and released under GPL. It can create exe files and it’s really fast. The speed of this tool has improved more with the recent patches. There are different sets of features to the flat assembler, which can turn assembly code into machine code.

Which Compiler can Compile Source to Binary Fast?

The Tiny C Compiler is a computer program for UNIX-based operating systems that translates C source code to assembly language, which can then be compiled by other assemblers. This Tiny C Compiler is small enough to fit onto floppy disks and it can reduce compile time by up to 95%.

What Affects the Speed of the Compilation?

The size of the binary code depends on the compiler that we use. The size of the binary code is a direct result of the source code, because it runs through transformations to produce executable object programs. The implementation of most high-level programming languages is not optimized; therefore, there are many redundant instructions in the generated binary code. This means that there are many unnecessary instructions. The compiler tries to remove these redundant instructions to produce the smallest possible binary code that can execute on the target platform.

What is the Size of the Binary Code?

The binary code is generally produced by a compiler to make it as small as possible. The source code of a program compiles to an intermediate language, and it is further translated to machine code by a loader or linker. The size of the binary depends on the target machine architecture, operating system and hardware platform. Each programming language has its own rules and syntax, but they all compile into an intermediate language that can be converted to machine code.

Which Tool Produces Smaller Binary Code?

The answer to this question depends on what kind of file we compile, how much optimization do we need, speed and some other factors. The size of the binary code can vary from compiler to compiler and from source code.

How to Compile Source Code to a Smaller Binary Code?

The answer to this question depends on source code, compiler and features of that compiler. The size of the binary code can vary from compiler to compiler and from source code. The table below shows the source code, compiled binary code for some different compilers.

What is Compiler Optimization?

Compiler optimization is a process of converting a higher-level language program into assembly, machine code and binary. Programmers often use the term compiler optimization to describe techniques to improve the performance of their code by making it more efficient. Assemblers do not have any optimization capabilities that can increase the efficiency of code. The assembly code generated by assemblers is usually fast but not optimized in terms of size or execution speed; assemblers are only concerned with generating correct instructions.

Compiler Optimization: Static and Dynamic

When we optimize the program, we can improve execution speed. The compiler detects certain patterns in the code and optimizes the best location of each instruction. The compiler tries to optimize for performance by identifying these common patterns or idioms, and eliminates unnecessary instructions for each structure in a program. As a result, we have fewer instructions that have to be executed.

We can perform some static optimizations when we compile source code to produce an executable binary for a particular platform or CPU architecture. There are about 3 steps of compilation, the first step is preprocessing takes place. This process involves macro substitution, and it also replaces all undeclared identifiers with external references. The second step is lexical analysis that scans the source code and breaks it down into smaller units called tokens which are basically a meaningful user defined language combined. The third step is semantic analysis of the source program is to generate intermediate code from lexical analyzer. Usually the lexical analyzer and syntax analysis generate an intermediate language, in which the rules of the programming language are defined.

Example

The C++ program is two files, main.cpp and ints.cpp. Here we provide a simple example of how to compile and link our program using g++ command.

The Main.cpp

The following text from main.cpp file:

int main() {
int num, num1;
char ch;
std::cout<<"please enter a number:"; std::cin>>num;
while (num!=0) {
if (num>50) {
//Check for an input value that is too big or too small to be stored in num1
ch=std::cin.get();
fprintf(stderr, "input: %s", ch); }
else {
//Is the input not a string ? If so then we will try to convert it
if (ch==' ') { ch=' '; } //If the input is a number then record it to num1
num1=ch;
ch=std::cin.get();
fprintf(stderr, "input: %s/%s", ch, num1.tie() ); }
} }

The Ints.cpp

The source code of ints.cpp is simple and small.

The following text from ints.cpp file:

#include <iostream> using namespace std;
int main(int argc, char * argv)
{ while (true) {
if (num>50) {
cout<<"input: "<<num;
getch();
fprintf(stderr,"input: "<<num);
} else break;
} cout<<"Thank You.";
return 0; }

The following text shows the result produced by the g++ command:

$ g++ main.cpp ints.cpp -o num1 $ ./num1 Please enter a number: 10 Thank You. $

The Input and Outputs

The C++ program uses the iostream library functions to read input directly from the console using getch(); and write output to console with fprintf(). The getch() function will read one character from std::cin (std::cout in this case). The next statement will write the character that was read from std::cin directly to stderr using the fprintf() function. The fprintf() function will write the string that was read from std::cin to stdout. The getch(); and fprintf(); functions are provided by the iostream library.

The output of our program is not as we expected it should be.

The following text shows the error message produced by g++ command:

$ g++ ints.cpp $./ints Thank You. input: 10 input: 50 input: 49 input: 23 input: 0

We can’t compile the source code with g++ command because of that error. We need to use another compiler, such as GCC, to compile our source code.

What is the Difference between gcc and g++?

GCC is the GNU C Compiler and its full name is GNU C Compiler. The main purpose of the GNU project is to provide a free C, C++ and Java compiler for UNIX and Linux systems. GCC is one of the most popular compilers available today. A free version of GCC known as GNU Compiler Collection or GCC is available on many platforms including Windows.

The g++ command line tools are a set of command line tools that use the GCC backend compilers. G++ is an optimizing ANSI C++ compiler with some support for Objective-C and other languages. G++ is generally used to compile C++ programs.

The name of the program we can run on an operating system is not directly related to whether it is a compiled application or written in C, Objective-C, Java or some other language. A program called by a C++ compiler called g++ has a file extension of .exe. A program that calls a Java compiler called javac has a file extension of .class.

We can use more than one compiler to compile different parts of our program or object at different times.

Conclusion

Compiler optimization is a process of converting a higher-level language program into assembly, machine code and binary to produce the smallest possible binary. The size of the binary code can vary depending on the compiler and which programming language is being compiled. The compiler tries to remove redundant instructions in order to produce the smallest possible binary code that can execute on the target platform.

We use a particular compiler to convert our source code into an executable binary file. There are many compilers available on the Internet and we can choose one of the available free or commercial compilers. The choice of which compiler to use depends on how fast we need to get our program ready for production and a couple of other factors, such as whether or not we have an operating system that supports a specific compiler.

Do you want more? Read: PC Ocular magazine regularly!

Benkő Attila is a Hungarian senior software developer, independent researcher and author of many computer science related papers.

Leave a Reply