Jump to content
 Share

Nick

The Semester of Death & Nick

Recommended Posts

It's definitely way overdue for me to make a post like this, haha. I'd like to shine some light on why I'm very absent at the moment. I'm currently on my fifth semester of bachelor degree in Computer Science which has been nicknamed the Semester of Death here. A combination of a compiler course, a machine learning course, and a distributed systems and security course has earned it this name. I played it smart, or so I thought, and swapped machine learning with algebra which turned out to be probably just as hard as machine learning. Since this would otherwise be a very short post and since I love talking about my courses, I'll shine some light on what I've been up to. 😄

 

Our compiler course is definitely the most interesting in my opinion. We are working on a compiler for a language called Tiger from this book. Our compiler is split into the following phases:

  1. Lexing
  2. Parsing
  3. Semantic Analysis
  4. LLVM--
  5. x86

A simple example of a Tiger program is:

let
	var a := 5
in 
	a + 1 + 2
end

The let...in...end expression allows you to bind variable (between let and in) before evaluating an expression (between in and end). The latter expression should be easy enough to understand, and the expected output should of course be 8. The first phase, lexing, is responsible for turning a raw Tiger file into tokens that'll be used for parsing the structure of the language. Lexing the above program yields the following tokens:

1:1:LET
2:2:VAR
2:6:ID a
2:8:ASSIGN
2:11:INT 5
3:1:IN
4:2:ID a
4:4:PLUS
4:6:INT 1
4:8:PLUS
4:10:INT 2
5:1:END

This is then used by the parser in order to build an AST (abstract syntax tree) which provides an unambiguous presentation of a program given the grammar of the language is unambiguous. A simple example of ambiguity is a binary operation such as plus. The calculation above a+1+2 can be computed in two different ways: (a+1)+2 or a+(1+2). Often you see the former which is called left associativity. It is very important to rule out any ambiguity, since the program otherwise would be unpredictable.  It can be seen from the output of the parsing phase how this takes form in the AST:

LetExp([
  VarDec(a,true,NONE,
  | IntExp(5))],
  SeqExp[
  | OpExp(PlusOp,
  |   OpExp(PlusOp,
  |   | VarExp(
  |   |   SimpleVar(a)),
  |   | IntExp(1)),
  |   IntExp(2))])

SeqExp is just a sequence of expressions that are being run in the given order. The last expression is then returned. Our plus expression is expressed as the two nested OpExp (binary operator expression). The plus expression can be visualized as so:

image.png

It can be seen that a+1+2 is being parse with left associativity, because the left side of the root expression is a+1 while the right side is 2 resulting in (a+1)+2. The next phase, semantic analysis, annotates the tree with types, and ensure that all expression are valid (i.e. you are not adding string and ints, and so on). The result is as follows:

(INT,
  LetExp([
  | VarDec(a,true,INT,
  |   (INT,
  |   | IntExp(5)))],
  | (INT,
  |   OpExp(PlusOp,
  |   | (INT,
  |   |   OpExp(PlusOp,
  |   |   | (INT,
  |   |   |   VarExp(
  |   |   |   | (INT,
  |   |   |   |   SimpleVar(a)))),
  |   |   | (INT,
  |   |   |   IntExp(1)))),
  |   | (INT,
  |   |   IntExp(2))))))

This example is, of course, not very exciting since all types are integers. :P Since this phase ensures that the program is valid and provides types for all expressions, its resulting tree is heavily used in later stages. The next phase, LLVM--, translates the program into a subset of LLVM. LLVM is an intermediate language that is used as a common point for many languages. Consider that you have n languages that you want supported on m platforms. If you make a compiler for each n to each m, then you'll make n*m compilers. Whereas if you make a compiler from each n languages to an intermediate language such as LLVM, and a compiler from LLVM to each m platforms, you'll only end up with n+m compilers. Many C-compilers compile to LLVM before compiling to machine code (such as x86). The compiler clang can compile LLVM files which we utilized in order to test this phase. The LLVM code for the above program is the following:

%$locals_tigermain = type { i8*, i64 }

define i64 @tigermain (i8* %$sl) {
 %locals_$_0 = alloca %$locals_tigermain
 %arg_$_1 = getelementptr %$locals_tigermain, %$locals_tigermain* %locals_$_0, i32 0, i32 0
 store i8* %$sl, i8** %arg_$_1
 %ptr_$_2 = getelementptr %$locals_tigermain, %$locals_tigermain* %locals_$_0, i32 0, i32 1
 store i64 5, i64* %ptr_$_2
 %var_ptr_$_3 = getelementptr %$locals_tigermain, %$locals_tigermain* %locals_$_0, i32 0, i32 1
 %varvalue_$_4 = load i64, i64* %var_ptr_$_3
 %temp_$_5 = add i64 %varvalue_$_4, 1
 %temp_$_6 = add i64 %temp_$_5, 2
 ret i64 %temp_$_6
}

Don't think too much about it, haha. It is very confusing if you are not familiar with pointers, but you should be able to see where the magic happens. This file can then be compiled (it has to be bundled with a small runtime written in C, but I'll not get into that) with clang and run:

$ clang src/compiler/llcodegen/runtime.c test.ll 
$ ./a.out 
$ echo $?
8

Here we can see the expected result being returned as the exit code of the program. This is of course a very simple example. The full Tiger language supports mutually recursive records, arrays, and type aliases, mutually recursive functions, for and while-loops, and has a small standard library for printing, getting user input, and so on. We just finished implementing the LLVM-- code generation (although we fail one niche test case which is killing us, haha). We haven't gotten to x86 yet, but I am very excited for that! I am so excited about this course that I'll be writing my bachelor project with the PL (Programming Languages) group. I'll hopefully be working on a language called Troupe (developed at my university) which is focused on using language structures in order to ensure confidentiality.

 

Our distributed systems and security course has also been very exciting. We started off with making a peer-to-peer network and implementing our own version of the RSA public-key cryptosystem. We used the RSA implementation to make a signature system. A signature system basically allows you to sign content with your secret key. Other people can then verify the signature using your public key. You might have seen something like this on GitHub where developers can sign their code so others can't tamper with it without the signature becoming invalid. We then combined the signing system and peer-to-peer network and made a ledger system where peers in the network can transfer currency among them. I'll not get into the theory on all this, because shit hits the fan very fast, haha. This all led to us creating our own blockchain with our own crypto currency on top of it (similar to BitCoin). I just wrapped up the last details on the implementation a few hours ago and is enjoying a quiet moment since both our compiler and blockchain is in a good state.

 

If you stayed for the algebra, then I'm sorry. I'm not that excited about it, haha. :P We are learning some cool things, but it is quite uneventful compared to the other courses. I hope to be able to use it during my master's degree, where I'd like to focus on cryptography, algorithms, and programming languages.

 

I got a little carried away here. I am quite happy that I am enjoying these courses so much otherwise I'd definitely have burned out by now, which is sadly the reality for many of my friends. Thanks for coming to my Ted Talk and wish me luck! I'll hopefully have time for GFL in December/January and definitely in the next semester. See you soon!


Wanna know what I am up to? Take a look at my personal Trello board or my cards on the Development Trello board!

Share this post


Link to post
Share on other sites


1 hour ago, TheSadBandit said:

I didn't read it, but it looks like something I would jump off a cliff before taking

 

You know who didn't jump off a cliff?

 

Jeffrey Epstein

Share this post


Link to post
Share on other sites


Is it algebra, or linear algebra?  I followed along this video while studying machine learning.  The two are almost the same- machine learning is like applied linear algebra.  I think he does it so well, that almost anyone could follow the lectures.  It comes off as trivial at the beginning (literally feels silly to watch an hour long video on y=Ax), but I promise you it advances and converges, and every last second of each video matters.

 

 

Happy to have met this person in real life too.


PoorWDm.png?width=360&height=152

Share this post


Link to post
Share on other sites




×
×
  • Create New...