Beyond being the simple output of a compiler, an executable is a blueprint for creating a process. Its internal structure, how it's loaded, and the memory environment it runs in are fundamental concepts in systems programming.
1. The Executable and Linkable Format (ELF) in Depth
On Linux and other Unix-like systems, executables use the ELF format. A key advanced concept is the distinction between its two "views" of the file:
路聽聽聽聽聽聽聽聽
Section
Headers: This is the linking view.
It's a detailed list of sections (.text
, .data
, .symtab
, etc.) that is useful for the linker and for
debugging tools.
路聽聽聽聽聽聽聽聽 Program Headers: This is the execution view. It's a much simpler list of "segments" that tells the operating system's loader which parts of the file to map directly into memory, and what permissions to give them (e.g., Read/Execute for the code, Read/Write for the data).
When you run a program, the OS loader only needs to read the program headers to create a runnable process.
2. The Program Loading Process
When you execute a
program (e.g., by typing ./my_app
),
the OS loader performs several steps before main()
is ever called:
1.聽聽 Validation: It reads the ELF header to ensure it's a valid executable for the current architecture.
2.聽聽 Memory Mapping: It reads the program headers and creates a new virtual address space for the process. It maps segments from the executable file into this address space (e.g., the code segment is mapped as read-only and executable).
3.聽聽 Dynamic Linker:
If the program is dynamically linked, the loader maps the required shared
libraries (like libc.so
) into the process's memory.
4.聽聽 Runtime Relocation: The dynamic linker then performs any final address relocations needed for the shared libraries.
5.聽聽 Control Transfer: The loader passes control to the program's official entry point鈥攚hich
is not main
, but a startup routine in the C runtime library.
3. Process Memory Layout
The virtual address space created by the loader has a standard layout:
路聽聽聽聽聽聽聽聽 Text Segment: At the lowest address, this read-only segment contains the executable machine code.
路聽聽聽聽聽聽聽聽
Data & BSS
Segments: Contains initialized (.data
) and uninitialized (.bss
) global and static variables.
路聽聽聽聽聽聽聽聽
Heap: A region of memory that grows upwards. This is where
dynamic memory allocation (via malloc()
, calloc()
) takes place.
路聽聽聽聽聽聽聽聽 Stack: At the highest address, this region grows downwards. It's used for storing local variables, function parameters, and return addresses for function calls.
4. Position-Independent Code (PIC)
Shared libraries (.so
files) are loaded at a different virtual address for
each process that uses them. Therefore, their code cannot use absolute memory
addresses. This problem is solved using Position-Independent Code (PIC).
When compiling
with the -fPIC
flag, the compiler generates machine code that uses relative
addressing. Instead of referring to a variable by its absolute address, the
code refers to it by its offset from the current instruction pointer. This
allows the same block of code to run correctly regardless of where it's loaded
in memory, which is essential for shared libraries.
5. The True Entry
Point: _start
vs. main
A common
misconception is that main()
is
the first function to run. The actual entry point of the program, as specified
in the ELF header, is a C runtime startup function, typically named _start
.
The _start
routine is responsible for:
1.聽聽 Performing any necessary setup for the C standard library.
2.聽聽 Retrieving command-line arguments and environment variables from the stack.
3.聽聽 Calling main()
with the familiar argc
and argv
parameters.
4.聽聽 When main()
returns, _start
takes its return value and calls the exit()
system call to terminate the process.
Your main()
function is therefore called by the C runtime
environment, it is not the program's absolute beginning.
聽
聽
Storage classes in C determine a variable's or a function's scope (visibility), lifetime (how long it exists in memory), and storage location (e.g., stack, CPU register).