Jump to content
 Share

_Rocket_

The Basics #1: DataTypes and Memory

Recommended Posts

Posted  Edited by _Rocket_ - Edit Reason: Errors and such

Introduction

Hello, and welcome. This is the first of likely a series of posts about basics in programming. Now, just because this is named "The basics", this post is going to cover datatypes in a much more in-depth level. If you are just starting to learn programming, the contents in this post will likely overwhelm you. I suggest you study a little more on languages like Java or any C language because these languages use datatypes often. This post is more geared towards those who have a little experience with programming and know what a datatype is, but doesn't fully understand why a datatype works and why different datatypes are essential to a computer.

 

This is going to go step by step in explaining what datatypes are and how they operate. Every bit of information is crucial to understanding what these things are, so read carefully. You may even have to read over this a couple times, because I can guarantee you, you might miss a couple things your first time reading.

 

The Hardware

 

The CPU

Understanding how software communicates with your computer requires you to understand the hardware the computer operates with. The Central Processing Unit is the most essential part of your computer in regards to preforming calculations and instructions. 

 

The RAM

Random Access Memory is constantly communicating with your CPU. This piece of hardware is responsible for holding every single thing that makes a CPU work. A RAM's job consists of storing variables, instructions, and other things that can be used to tell the CPU what to do, how to do it, and where to find the tools responsible for preforming what it is told.

 

The Relationship of RAM and CPU

One thing that some people may not know about RAM is how instructions are stored. When you run a .exe file on your system, all the code that is compiled into that .exe file is transferred into RAM as instructions. Uh huh, that's right. The instructions in your code is stored inside RAM as well as variables and other things. This is why really large programs with millions of lines of code may take a second to actually launch when you run them (Like GTA V, which tends to have a few seconds of nothing before it actually does anything). In that moment when you click the .exe application, all those instructions are being loaded into RAM. Once they are all loaded, the CPU begins to access the memory addresses that hold these instructions and performs them.

 

If you have any experience, or have at least looked into assembly code, you will know how the RAM stores these instructions. If this at all confuses you, feel free to consult the following video by Tom Scott.

 

 

What is a Datatype?

A datatype is a way to both identify the size of data that RAM needs to store, and a way for the compiler to know how the variable is meant to be used. Datatypes are essential to the functionality of your computer for quite a few reasons that I will get to in a moment. Chances are, if you are only experienced with Python or any language that does not allow the programmer to name datatypes, you may be wondering why you never have to use them.

 

Python and other languages that don't show datatypes are typically heavily object-oriented languages. A lot of the memory, scopes, etc. are handled by the language itself and the compiler that comes with it. This allows programmers who use these languages to completely forget about datatypes and just deal with the code. However, with languages like C++ where efficiency is the most important part of the code, being able to choose the datatype for each variable is important and essential.

 

Why Your Compiler Needs Datatypes

When a variable is stored into a memory address in RAM, it becomes nothing but a binary number. There are no characters involved, just binary. A base 2 number system. This can be a huge problem for a number of reasons. When everything in RAM is represented in binary, the compiler has no way of telling the difference between a 1 byte bool variable and a 1 byte char variable. When the compiler has a way to identify what datatype a variable is, it can't properly compile the instructions in a way that allows the CPU to handle the information accordingly.

 

Why Your RAM Needs Datatypes

When a program is launched, all the instructions and variables MUST be stored in RAM. And as the program is running, variables are constantly being altered and created, so there is non-stop communication between your CPU and your RAM. When a variable is declared, it is now crucial to find a memory block that can hold the size of the variable. This is where datatypes come in. Without datatypes, there is not a consistent way of telling the size needed to store a variable. If you took a moment to think "How could variables be stored without datatypes", you could probably come up with a few ideas... But you would know how inefficient they would be. A datatype has a distinct memory block size, and when a variable is declared, it will find a memory block that is free in RAM that matches the space required. Without it, am integer could wind up taking up much less or much more space than it is supposed to.

 

The Datatypes

These are the datatypes you will find in code. Some datatypes are used more than others, especially depending on what language you use.

 

bool & boolean - 1 Byte

char - 1 Byte

short - 2 Bytes

int - 4 Bytes

long - 4 Bytes

float - 4 Bytes

double - 8 Bytes

long long - 8 Bytes

 

Boolean

Boolean is a true or false variable. In RAM, a boolean will only use a single bit to store its value. It must take up a full byte for one reason, memory addresses. Memory addresses in RAM are always a single Byte. This means no matter what, any variable MUST be at least a byte long, no matter what they store. You will see booleans often. Even in a simple if statement like this:

 

if(6 == 12) runMethod();

 

The if statement is actually taking in a boolean argument. == is a boolean operator, and it compares the constants you see in the brackets. The result of that boolean operator is... a boolean variable haha. And of course, you can even make your own boolean variables. The name of boolean differs between languages.

 

Char

A char variable does need a byte to store its information, unlike boolean. Due to how char works, it acts a lot like any typical number variable. You can increment chars, do math with them, etc. So in all reality, the CPU doesn't really need to tell the difference between a char variable and an integer variable. The compiler does. Some languages do not allow the programmer to use char. It only allows them to operate with strings. So doing things like this:

 

char b = 'b';

b++;

 

or

 

if(varChar < 'A') runMethod();

 

is not possible in some languages.

 

Short

You will see shorts in languages like C++, but not so often in languages like Java. This is because shorts are essentially an integer that takes up less space in RAM. Using shorts in java is pretty uncommon because Java isn't built for high-tier efficiency. It's a heavily object oriented language. Efficiency wasn't in mind when it was designed. But in languages like C and C++, you may see shorts quite often.

 

Int

Integers are one of the first datatypes you will use when you begin learning a language. It is an okay-sized variable that doesn't take up too much space, and it has enough space to hold large numbers. Quite useful.

 

float and double

Floats and doubles are similar to the relationship between shorts and integers. However, a float is NOT just a double that takes up less space. Floating point numbers are very different in how they are handled, much like how longs are handled differently from integers despite the fact that longs take up the same memory size in RAM. You will see floats incredibly often in C++, and will rarely see doubles. But you will see doubles quite often in Java, while not seeing very many floats.

 

All the other datatypes are a little more self-explanatory. You can do more research on these if you need to.

 

The Difference Between a Signed and Unsigned Variable

Understanding the difference between signed and unsigned variables require you to understand how RAM and CPU identifies negative numbers. When a variable is signed, it means it can hold either a positive or a negative number.

 

When a variable is signed, the first bit inside the first byte of the variable in RAM is used to represent if it is positive or negative.

 

So, say for instance: 0000 0101 represents the number 5. The first bit is 0, so it is actually -5. To get positive 5, you would need RAM to look like this: 1000 0101. This is positive 5.

 

The downside of signed variables is the highest number you can store. Since a single bit is used to identify if the number is positive or negative, the highest number you can have is greatly reduced. Let's go back to that variable we talked about.

 

1111 1111 is the signed representation of positive 127. You cannot go higher than this. If you tried to increment this number by one, the number would overflow to 0000 0000. However, an unsigned variable is different.

 

In an unsigned variable, no bits are used to represent if a number is positive or negative. This means 1111 1111 actually represents the number 255. The good thing about that is the fact that your number can be larger now. The bad news is, you can no longer have negative numbers. If you tried to subtract 1 from 0000 0000, the number would underflow into 1111 1111 which is 255.

 

So, if your number will never be negative, use an unsigned variable to be safe. That way, the variable can count higher without overflowing. (Btw, if I accidentally reversed 0 and 1 representing positive and negative, sorry about that lol)

 

How RAM Locates Memory...

... is a little too complicated for this tutorial. When you ask that question, you have to learn about stacks and heaps, and how memory is allocated into RAM, how memory blocks are found, etc. What I CAN say is, memory is not found the way you may think it is found. It doesn't start from memory address 0 and work its way up to memory address FFFFFFF or whatever. I might cover that in a later time when I learn a little more about it myself.

 

Edited by _Rocket_
Errors and such

I write programs and stuff.

 

If you need to contact me, here is my discord tag: Dustin#6688

 

I am a busy person. So responses may be delayed.

1840045955_Thicco(1).thumb.png.87c04f05633286f3b45b381b4acc4602.png

 

Share this post


Link to post
Share on other sites


Posted  Edited by Joshy

Why are datatypes degenerate in terms of its size?  Why not just be efficient and have four different datatypes?  We'll convert the bool and char into one datatype that is 1 byte; the int, long, and float into another that is 4 bytes, and lastly double and long long into one more 8 bytes.

 

Another question is why is bool so wasteful?  It would be better if it were just 1 bit instead of a whole byte.

 

I enjoy your guides.  Keep up the great work.

Edited by Joshy

PoorWDm.png?width=360&height=152

Share this post


Link to post
Share on other sites


1 hour ago, Joshy said:

Why are datatypes degenerate in terms of its size?  Why not just be efficient and have four different datatypes?  We'll convert the bool and char into one datatype that is 1 byte; the int, long, and float into another that is 4 bytes, and lastly double and long long into one more 8 bytes.

 

Another question is why is bool so wasteful?  It would be better if it were just 1 bit instead of a whole byte.

 

I enjoy your guides.  Keep up the great work.

If you read closely, these questions are actually answered in the text. But like I said, it can be easy to miss things upon first reading.

 

When it comes to RAM, you would be correct. The difference between a char and a boolean is absolutely nothing to RAM. They might as well be identical. However, this is where a compiler comes in.

 

When you write programs, you are essentially writing the instructions that your computer will follow when running your program. The compiler is responsible for turning your human-readable code into something that the computer can understand. Due to how compilers work, you have to be very specific when you write your code. Otherwise, the compiler would become easily confused and make really weird programs that easily crash.

 

This is why there are different datatypes. For instance, if you tried to do:

 

int var = 'c' + false;

 

How would the compiler properly compile instructions for this? To your RAM, it wouldn't care what is stored inside it's memory. It's job is to just store whatever it is given. But for a program to properly function, these clashes of datatypes could cause very problematic inconsistencies in instructions. This is why different datatypes are essential for programs to run properly.

 

I also explain why bools are 1 byte long. As of right now, we cannot make variables that are smaller than 1 byte.

 

This is due to how the CPU handles instructions. When the CPU accesses RAM, it does it through memory addresses. A single memory address is 1 byte long. If you watch the YouTube video I linked, you'll see what a CPU does. It does its instructions through accessing memory addresses.

 

Since a memory address is pretty much a "label" for 1 byte in RAM, any variable HAS to be at least 1 byte. That's the only way the CPU would be able to access that variable. Otherwise, variables would begin to share memory addresses, which would lead to horrible inconsistent bugs and memory leaks.

 


I write programs and stuff.

 

If you need to contact me, here is my discord tag: Dustin#6688

 

I am a busy person. So responses may be delayed.

1840045955_Thicco(1).thumb.png.87c04f05633286f3b45b381b4acc4602.png

 

Share this post


Link to post
Share on other sites


Posted  Edited by Joshy

I am mystified.

 

#include<stdio.h>

int main()
{

	char someCharacter = 'A';

	
	for(int i=0; i<26; i+=true)
	{
	
		printf("%c\n", someCharacter+i);
		
	}
	
	
	return 0;
	
}

 

Z6YzVaA.png

 

Even if the smallest register size is 1 byte, then why don't they just concatenate or shift the registers?  It's not uncommon in the real world to have small information slices sit between two registers.  In fact: If you look at the instruction format, then you'll see that some fields are 5 and 6 bits.  Do they waste the other 2-3 bits because the smallest register size is 1 byte?  I don't think so.

Edited by Joshy

PoorWDm.png?width=360&height=152

Share this post


Link to post
Share on other sites


Posted  Edited by _Rocket_
44 minutes ago, Joshy said:

I am mystified.

 


#include<stdio.h>

int main()
{

	char someCharacter = 'A';

	
	for(int i=0; i<26; i+=true)
	{
	
		printf("%c\n", someCharacter+i);
		
	}
	
	
	return 0;
	
}

 

Z6YzVaA.png

 

Even if the smallest register size is 1 byte, then why don't they just concatenate or shift the registers?  It's not uncommon in the real world to have small information slices sit between two registers.  In fact: If you look at the instruction format, then you'll see that some fields are 5 and 6 bits.  Do they waste the other 2-3 bits because the smallest register size is 1 byte?  I don't think so.

That is honestly crazy. Never knew you could do it like that. Then again it makes sense, true and false is kind of like a macro, 1 and 0. It's why you can do while(1) and it still work just fine.

 

I believe there are different syntaxes just for the sake of code readability. It makes code much easier to understand.

 

And to be frank, I can't really give a solid answer to the second question. That is something that requires an extensive knowledge on assembly code and/or compilers. I may know a little on assembly and understand what compilers are, but I'm far from an expert. I could imagine this question has been asked somewhere in stack overflow.

 

The one thing I do know is: The fact that it is a byte doesn't really impact performance at all. If booleans were 1 bit long, the performance would actually for the most part stay the same. So yeah, it is wasteful, but at the same time it doesn't impact performance in the slightest. Because in the end, the CPU is still looking for a single memory address, regardless of it's actual size.

Edited by _Rocket_

I write programs and stuff.

 

If you need to contact me, here is my discord tag: Dustin#6688

 

I am a busy person. So responses may be delayed.

1840045955_Thicco(1).thumb.png.87c04f05633286f3b45b381b4acc4602.png

 

Share this post


Link to post
Share on other sites


Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


×
×
  • Create New...