This blog is subject the DISCLAIMER below.

Saturday, January 13, 2007

Advanced C++ part 3: Linking

This post isA will make the previous post clearer.
When the C++ compiler processes a source file, it generates an object file (.o or .obj file) for each source file it have processed.
The object file typically contains a series of symbols along with their implementation. For now, a symbol is a variable name, or a function name (including member functions, we will discuss calling convention and name decoration later). The symbol can be defined in that file, or marked as external symbol.
The linker operates on a group of files. Knowing what each file have defined and what each files needs from other files. The linker's mission is to put all the object files into one executable file (.bin or .exe for standalone file, and .so and .dll for dynamic link libraries, we won't talk about dynamic linking here); linking external symbols from one file to their implementation in other files, hence the name.
An example is better than 1000 words so:
say file a.cpp has the following symbols (for scientific honesty, these would not be the real symbol names generated):


main
count [this is an integer, not a function]
external add
external subtract


And file b.cpp has the following symbols:

subtract
external count


And file c.cpp has the following sybols:

add
external count


The linker would put the address of the add function of the b.cpp into the empty slot in a.cpp, same for subtract. And puts the address of count in the empty slots of b.cpp and c.cpp. And then put all the 3 files together. Now a.cpp can call the functions he wanted from the other files. Note that the real thing that happens will be more complicated but this is a simplifies version of what happens.

The moral of this story :D is to show you how the compiler can only worry about one file at a time. Note that producing an executable from source files, is 2-phase process, compilation and linking.

Next post will be about calling convention and name decoration (aka name mangling) (related to function signature), that will show how overloading occurs and what is the use of declaring the argument types of an external function not just it's name.


As Ramy have suggested:
Futher reading for the last post:
* I am sorry I tried to search google, but I didn't find something directly useful. My sources however was from reading solutions for the problems I've faced before. As a matter of fact, one statement every while and then is where I collected this info into my mind; i.e. it is not from one direct source. The most useful source however was when I was dreaming to make a C++ compiler and I read a lot about compilers, there were some hints here and there about the operation of a C compiler, not even C++. (You can read more about Makefiles and GNU Make, you need those to manage compiling large applications in a custom way, using makefiles will help you to deeply understand the compilation process- it's tedious at the beginning; a trial and error method, takes a lot of time to know these details)
Further reading for this post:
* Same as the above, I got it from diagnosing linking problems over MSDN. I.e. searching for one linking error after another. The other useful source was when I was working on the OS and I met a lot of problems because linking an OS is something totally different than a normal program (MS linker can't do it btw). I faced a problem once that I need to put the address of the main function in the first 4 KB of the executable so GRUB can read it, there were no way to enforce that in MS linker. I spent 2 weeks facing several linkers and there were linker scripts involved. There was another one where I needed to get the size of the kernel at the run-time, I didn't implement any File System yet, so I had to rely on some feature of placing some variable in the end of the kernel and getting it's address! That one I had to ask on the alt.os.development usenet group, something you won't find directly on some web page. (You can read about GNU LD for further reading, but that's would be useful if you are searching for a certain feature, not for common reading)
Further reading for the next post: (finally something that can be directly found)
Wikipedia: Name mangling
Google search: name decoration

2 comments:

Ahmed M. Farrag said...

That's cool... waiting for decorations that I guess will serve explaining this.... that fact there's no solid reference for linking and linkers irritates me. r u sure there's nothing out there??
Thank u very much for the article, keep it up please :)

Mohammad Alaggan said...

You can try to read what a compilers book has to say about linking. Perhaps read the manual of GNU LD might help.
You're welcome, just ed3eely.