Sunday, January 10, 2010

Does Learning Assembly Language Matter Anymore?

When I started out to teach myself how to program I was always a bit bewildered by assembly language. I often looked at example code from online forums and starred at strange syntax such as this:

mov EAX, [EBP+8]
lea EDS, EAX

I thought '”How can you possibly write software using this stuff?”. Obviously I was being naive since compiling a Visual Basic, or C++ Builder application (my two favorite development tools at the time – others included PowerBASIC) ultimately was generating exactly this “stuff” which the CPU could make use of. As I became more proficient as a programmer, I noticed that I was looking more and more at assembly language code during my debugging sessions. Very often in the C++ Builder IDE I would be watching the values of the EAX and ECX registers, so that I could see the return value of a function or the current loop index value. Little did I know that I was starting to appreciate more the power of this language and what could be achieved.

Back in the early 2000’s I downloaded MASM32 which is a Microsoft assembler and RadASM, a great IDE for writing assembler code. With those tools I was able to write very simple assembler stuff, mainly DLL files that I could use with my Visual Basic 6 applications. It was so cool to see my VB6 code call an assembler routine, passing it a string that was displayed in a message box – a message box created in pure assembly code!! (I guess you had to be there :) What I was learning by doing this was how different calling conventions worked, how to build a DLL file using the assembly source and static library references. By passing a parameter from my VB code, I had to learn how to navigate a stack frame by using the ESP and EBP registers to get to the parameter’s address and grab its content in order for MessageBox to use it. I learned that:

mov EAX, [EBP+8]

actually meant “move the contents of memory address EBP+8 into the register EAX – in other words dereferencing a memory address. Having done a fair amount of C/C++ programming and using pointers, I was getting under the hood experience of how pointers worked at the assembler level. This actually helped me understand pointers better.

OK, so what am I getting at here?

In this day and age of managed runtimes and virtual machines, .NET languages, PHP, Perl, PowerShell, etc, some developers might be asking themselves “what’s the point of learning assembler? It’s not like I’m ever going to use it!”. That might very well be the case, but bear in mind this thought – without good assembly language developers, there would be no operating systems. Let me explain. Taking into account Windows and Linux for example, the vast majority of the codebase for these OS’s are written in C/C++. I would place a very large bet that a small percentage of the code is written in assembler. Assembler code must exist somewhere in the OS since talking directly to  hardware would not be possible. Without those hardcore assembler developers, who’s going to look after this code? If there’s no new talent on the horizon, that would be a deficit to the OS developers who rely on assembler programmers who can churn out this low-level communications code which is vital for the OS’s existence.

Take computer games, there’s bound to be a lot of assembler code in the source for computer games – for example, code to optimize the graphics algorithms so that the frame rate can be really high. A lot of games are written in C/C++ but nothing beats assembler for pure speed. Also you sometimes can’t as easily take advantage of CPU specific instructions from a higher level language, only coding in assembler can give you access to those instructions.

One of the most important areas where assembly language experience counts a lot is crash dump analysis. In my early years I would often see a BSOD and wonder “how can you tell what’s broken just by looking at this crap?!… What’s a bugcheck and how what’s this stack frame rubbish?” Now things are different. I’ve been reading blogs such as Crash Dump Analysis and the NT Debugging blog to help me understand how to analyze and interpret the information in dump files. I’ve also taught myself how to use WinDbg which is a free debugger for Windows. I’ve spent a lot of time recently working with SharePoint 2007, and believe or not, having WinDbg and assembly language experience has come in very handy! How? Well, SharePoint 2007 isn’t the most, how shall we say, “consistent” platform to use. You might get an exception one day that has the following message:

“Something's happened, it’s broken the site, so… there you go”.

(OK, not what you actually get, but in some cases the exception message is totally useless).

When I see an exception message which is useless, I attach WinDbg to the w3wp.exe process running the “broken” SharePoint and I can see the real exception that caused a problem, and sometimes I might have to step into the running code at assembler level to see what’s going on. I don’t have to this often, but the fact that I have this in my arsenal has saved me a couple of times from punching the wall the frustration as to why I’m getting an error.

I strongly believe that learning assembler makes you a better developer, but that’s just me. Check out the following Stack Overflow questions on the matter, see what you think:

Is learning Assembly Language worth the effort?
Why do we teach assembly language programming?
Why do you program in assembly?
Should I learn Assembly programming?
Is there a need to use assembly these days?
What are the practical advantages of learning Assembly?