12 October 2011

compilers and linkers

Just one thing DO not skip this section.
Presenting to all of you a part of discussion between a rookie and nerd

Do you know what is machine language?
ya very simple machine language is the language understand and executed by the processor.It is in the form of 0 and 1.example of strings of machine language 0100010010,1010010 etc…
Do you know what is the processor?
Ya very simple.It is the main processing unit of a computer where all the airthmetic and logical operations are done.
Do you know what is assembly language?
No I am not sure.Is machine language and assembly language are the two different names for same thing?
NO,they are different from each other.
Assembly language is just one level above the machine language.
In the beginning there was only machine language in the form of 0 and 1.Every program has to be written in 0 and 1 only.
So assembly language was introduced which is more readable and simpler then machine language to learn.But what is simpler to humans was useless to processor.They need a way to convert assembly language into machine language.So assemblers were introduced.
Each assembler has its own assembly language.There can be large number of assemblers for a single processor.But assemblers for processor A may not be compatible for processor B.
Its solely depends on you that for a particular processor P which assembler you gonna use according to your budget or requirements. lets forget assemblers for now..
Can you name some processor architecture ?
cool …
ARM’s ARM Architecture
Atmel’s AVR architecture
Microchip’s PIC architecture
Texas Instruments’s MSP430 architecture
Intel’s 8051 architecture
Zilog’s Z80 architecture
for now just understand that architecture is a design of a processor.A group of processors can have the same architecture but it may vary from manufacturer to manufacturer or even from processor to processor
so all processors are not same.
so as you told me before that machine language is the language understand by the processor.so each processor or processor family has its own machine language.Try to understand that machine language is very specific to hardware on which it is going to run.
so we cannot guarantee that a program written in machine language of processor A can be run on processor B.
This is true that all the processors can understand only 0 and 1.But the order in which these strings is specific to the processor.For eg if 0111 represents add command on Processor A then it is possible that it can represent subtract command on processor B or may represent nothing at all.So every processor manufacture provides a manual for these codes and their meaning and the programmer has to write programs according to these manuals.
So every program written in non machine language has to be translated (compiled) first.This equally applies for programs written in C.Every program written in C has to be converted first into the machine language of the processor on which you are going to run.
Compiler : A compiler is a program that converts instructions from one language into equivalent instructions in another language.It also does other tasks like syntax checking.However It can’t detect logical errors.
Different processors have different compilers either provided by the processor manufacture or third party compiler manufactures .So if you want to run a c program on x86 then you have to first compile the program using x86 specific compiler.Similarly if you want to run the same c program on amd processor then you have to compile that program using compiler specific to the amd.
But there are cross compiler which can generate machine code from C language for different architectures.
And what we use today are cross compilers only
1980 Introl C Compiler for the Motorola 6809 processor.
GCC cross compiler system.
See the image of wikipedia page showing target processors for which GCC can compile machine code.
GCC code compatible processors
GCC code compatible processors
so whats the difference between compiler and assembler?
According to me conceptually both do the same things.Conversion of one language into another language.
Assembler can convert only assembly language into machine language.It is implemented as a software.
The source and target language are fixed for assembler.Assemblers are also specific to architecture and can only translate assembly language supported by that architecture.
But for compiler both source and target language can be different.
There are compilers which came with assemblers+linker+editor and the whole package is known as compiler.These compilers generate the assembly language code which is then converted into the machine language by the inbuilt assembler.
Some compilers such as the Microsoft C compiler will compile C and C++ source code directly into machine code. GCC on the other hand will compile C and C++ into assembly language, and an inbuilt assembler is used to convert that into the appropriate machine code.
Why they do this is not a point here.We will refer the term compiler for the thing which convert any non machine language into machine language.The output of the compiler is the object file (.obj on windows or .o on unix) containing the equivalent machine language.
There are compilers which can produce machine code from different programming languages and for multiple processor architectures.GCC is one of them and also the most important of them.
So how a compiler depends on Operating system.And most important when i download a compiler, then they ask me only if i want for windows or linux.They never mention processor.Why is this so?
There are basically two reasons.
1.Most of the compilers use in today are ported to almost every architecture which is in use today.They assume that you are using one of the ported architecture.
2.Second there are many other functionalities which a compiler as a software needs and which are generally available in the Os.So the compiler vendors ask for the Os to provide the correct compiler.
Ok now with the use of compiler i have converted my source code into machine language and i have the myprogram.obj file.Now can i run the program please.
Not so fast…
There is still something missing
Suppose your program has many source files.
you have converted each into its equivalent object code with the help of compiler..
now how will you execute your program?
selecting all the 3 and double clicking them
and what happens if your program has thousands of .obj (real life)
So this is not the way things goes here
these files have to be combined into one big file called executable.
This is done by the linker.Linkers also came integrated with compiler or as a separate utility.
There are other things which a linker do but i am not going to take you deeper.
The compiler which we are going to use is Bloodshed dev c/c++.It uses Mingw which in turn uses GCC.
GCC  has inbuilt assembler and linker.
In this article i have not given the actual working of compiler and linker but have explained what are they and what they do.
This is sufficient for most of us to move on to next big thing.