The Basics #2: Understanding the Compiler

_Rocket_ · August 27, 2019

Introduction

The compiler is an essential part of why programming languages exist. Without the compiler, most of what we do as programmers wouldn't be possible.

The language the computer speaks is something that no human can efficiently read. With time you could learn how to read very small portions of code that is made entirely of 1's and 0's, but there would be no possible way to read it as fast as you would a normal programming language. This is where a compiler comes in. Because the lack of understanding between human and computer is a two way street. A computer cannot understand the programming languages we use. An if statement means absolutely nothing to a computer, much like how an incredibly long line of bits means nothing to us. So in order for programs to exist, there had to be a way to translate human-written code into something the computer could understand.

What are Compilers?

A compiler is responsible for turning your hand written code into something the computer can understand. It acts like a translator. Whatever you write in your code, the compiler is responsible for translating that written code into bits that a computer can use to run your program.

This is why compilers can be very strict about what you do and how you do it. During compile time, the compiler is going through every single word of your code and translating it into a language your PC can use. This is a very specific and unforgiving process. Every single line has to make sense to the compiler, at least to the point of it being translatable. If even one thing doesn't make sense, the compiler will instantly refuse to continue. This prevents anything extremely weird happening. If the compiler just continued to compile your code despite it confusing something, your program would undoubtedly crash. There are a lot of moving parts in your PC, and everything has to flow without error. This is why even making a single mistake when writing code will always upset your compiler. It confuses it, so it stops before trying to finish a broken application. Luckily, over the many years of programming being around, compilers have become smart enough to tell you what the error is and where it is located. Be glad you live in a time when that kind of technology exists, because back in the day... One error would completely mess up a program, and you would not be given a single hint as to why it is failing to function.

Why are there so many compilers?

Compilers are very complicated. You would think compilers could be exactly the same... But no. Compilers are very different from one another for many reasons. Your operating system, hardware, and especially your programming language are factors you have to consider when designing a compiler. Due to how precise compilers have to be, this can lead to compilers failing on certain hardware or operating systems. Remember, there always has to be a smooth flowing operation going on in your PC. And the way a PC is designed can heavily change how that "flow" operates. A mac is very different from a windows PC. On the surface they appear very similar. But their hardware has very specific differences that change everything. Compilers have to factor in these small differences. This is why a single programming language can have multiple compilers to choose from.

This is also why software or games can have different release dates for different operating systems. Code HAS to be written differently for different operating systems (depending on the language). Compilers always have a set of rules for the programmer to follow to ensure everything works properly during compile time. These rules can change depending on what operating system the program is being made for. This can force programmers to finish writing code, compile for one operating system, then make a copy of that code and edit it until it follows the rules for a different compiling procedure.

How Pointers Break Compilers

If you do not know what a pointer is, this section may not concern you. However, if you do have experience dealing with pointers, you probably know what I am already going to say.

For the most part, everything memory related is handled by the compiler itself, along with the instructions the CPU is given. This is because the compiler is designed to make these procedures automated to ensure the program works properly. So having bad memory allocations that result in a crash is next to impossible. However... some languages have a nifty little thing called a pointer.

Pointers are used to point to addresses in memory as you should already know. The problem is, compilers have no way of regulating your uses of pointers. Usually languages try to prevent programmers from doing really stupid stuff by forcing you to at least provide a datatype or an object to a pointer. Like ObjectEntity* e; And for many cases, this works. But a compiler cannot monitor anything besides that. Pointers are the ONLY thing in any language that allows you to do one of the compilers jobs. This can be incredibly powerful, but also incredibly dangerous. If you have used C++, you are bound to have had a memory allocation error. You know the one. The one that has a very large negative number? Yeah, that lovely one.

Compilers are automated and handle memory without error. But when you give a human the ability to manage memory, even if it's a very limited control, it's still giving a lot of power. And since we are all human... mistakes happen often. And those mistakes can lead to very inconsistent crashes. Allow me to provide an example.

When you create an array, you are required to provide a base element size so the program can properly find a memory block to hold all the elements. Like this:

int array[100];

//If you tried to call:

array[100];

//You would get an array out of bounds exception. Same with doing array[343]; or any other number larger than 99.

The compiler can recognize this error because it is responsible for managing the memory addresses of this block. It has full control over what happens to the memory address. However, making an array of pointers is a little different.

//The compiler finds the memory address for this array for you, but the memory management is officially in your hands from this point on

int* array[100];

//The compiler cannot touch the memory addresses of these pointers until you instruct it to do so. This memory is up to you. All the compiler understands is there is an array of pointers. It cannot recognize where the array begins or ends. All you have done is told the compiler "Hey, conserve this block of memory in RAM for me".

//Due to this, you can be really stupid and go out of the arrays boundaries. Remember, all the compiler CAN know is there's an array of pointers. That's it. Everything else memory address related is up to you.

array[100];

//This will not result in an error, same with array[13341]; This is incredibly dangerous.

If you name off an element that is outside of a pointer arrays boundaries, a couple things can happen.

1. Everything will be fine. In this situation, you officially have pointed to a memory address that is not in use. The really cool thing about this is the fact that you can now do anything you want with this memory address. You can even see the last value that was stored in it. But...

2. You can instantly crash. If a memory address is currently in use by a different program, a bunch of crap goes wrong. Very wrong. Very very wrong. And it guarantees a crash.

So messing with pointers in this way is dangerous because you can potentially crash. The funny thing is, you could very well do this without having to worry. Because in the end, you only sometimes crash. But when a crash is inconsistent, you aren't a very good programmer if you don't try to fix it.

Assembly Code and its Relation to Compilers

Assembly code is the closest thing we have to actual machine language. Learning assembly code is a really good way to better understand how compilers work. You are essentially telling the computer how to do every single step with assembly. To be frank it's awesome. Explaining the details of assembly is something that would take multiple tutorials. Because you have to factor in things like how the stack and the heap works, different operating systems, different hardware, etc. It's crazy.

If you would like to learn a little on assembly, here's a nifty video for you to watch.

Assembly is incredibly interesting because you are writing code that the computer can almost understand.

Making a compiler is possible. If you are truly interested in how a compiler works, look for some tutorials on designing a compiler for a language. "Low level languages" like C (and assembly, which is one of the lowest levels) are much easier to write compilers for due to how the language is designed. Making a compiler for higher level languages like Java and Python is very complicated. So maybe start with C.

**Joshy** · August 29, 2019

I'm not sure if introducing pointers with compilers is helpful for a reader is who is trying to understanding the basics, but I do think they're fun and they're a common interview question for any programming relevant position (at least while they're recruiting at the university).

#include<stdio.h>

int main()
{

	char name[] = "Joshy!";
	char *pointer = &name[0];
	
	for(int i=0; i<sizeof(name); i++)
	{
	
		printf("%c", *(pointer+i));
		
	}
	
	return 0;
	
}

Edited August 29, 2019 by Joshy

Sign In

The Basics #2: Understanding the Compiler

Recommended Posts

_Rocket_ 656 / 8,277

Share this post

Link to post

Share on other sites

Joshy 4,364 / 44,945

Share this post

Link to post

Share on other sites

Recently Browsing 0 members

Latest Topics

Latest Posts on Topics

Recent Milestones

Recent Ranks

Browse

Activity

Donate

Servers

Applications

Server Applications

CS:GO/CS2

GMOD

TF2

CS:S

Unturned

Team Applications

Mute/Ban Appeals

CS:GO/CS2 Appeals

CS:S Appeals

GMod Appeals

TF2 Appeals

Report Someone

CS:GO/CS2 Reports

GMOD Reports

TF2 Reports

Report Staff