The Linux hacker's intro to assembly language (Pt. 1)

posted onMarch 18, 2002

by hitbsecnews

Relevance

Assembly language has it's many opponents who argue in this day and age of ultra efficient, high level compilers understanding and coding in assembly is a bit antiquated. While it is true that one can produce fast, efficent code without assembly language, knowledge of assembly is absolutely essential in understanding deeper computer architechture. Knowledge of assembly is also vital in reverse engineering(cracking for all you kiddies). And of course debugging high level languages is difficult without assembly. But before you can get to all that stuff, you've got to learn the basics.

The Main Course

Before we dig in to some source, a little introduction is in order. When you program in assembly language, you have to tell your microprocessor what to do, usually one machine instruction at a time. You are also without alot of the "helping hand" capabilities of alot of high level languages. Also if you're writing executables in a non-secure operating environment(say like dos)you can wreak havok on your hardware if you make mistakes. Luckily, Linux does a decent job of sheilding itself from the mistakes of beginners.

The Cpu has extra fast memory spaces inside of it called registers. Cpu registers are really the main workspaces of the processor. The Cpu addresses(locates) the registers by referring to them by name. 386 and later cpu registers are 32 bits in size. There are alot of registers inside of a cpu, but most of them are used for special purposes. The only ones we will be using in these exercises will be EAX, EBX, ECX, EDX. These registers are for general purpose and thus are called general purpose registers.

On the Linux platform, you dont get unrestricted access to your processor and hardware. For security reasons, your requests are relayed to the kernel which then performs the requested instructions. The linux kernel uses the c library of functions and system calls to process most requests. To put it more simply, when assembly programming in linux, you set up functions similar to the manner you do in c, and then call the kernel to perform that operation. In order to make this clearer lets write up a simple c program which calls one function and then translate it into assembly code to further illustrate the point.

/* wrote.c--sample c prog */

main() {

char *buf = "This is your stringn";

write(1,buf,20);

}

All this prog does is display the string "This is your string" on the screen. It uses the write() c function. In order to write equivalent assembly code, a little more research is in order. To execute a system call in assembly, you must set up your registers correctly and tell the kernel what system call you would like to execute. System calls are given specific numbers so you can load a register with the corresponding system call number. Then call the kernel and the kernel will know which call to execute. A list of system call numbers are listed in the file "/usr/include/asm/unistd.h". If you look in that file you will see that the write() call has a number 4 next to it. With this information we can begin to construct an equivalent assembly program. The cpu's general purpose registers must be loaded with the parameters of the functions before calling the kernel, but there is one catch. In assembly they must be loaded in reverse order. Investigating the write() syscall by doing a "man 2 write" on the command line yields the following info regarding the parameters.

ssize_t write(int fd, const void *buf, size_t count);

This syscall will write buffer *buf of size count to the file descriptor fd. Here is the equivalent assembly code to the c program above.

/* wrote.s */

.data

msg:

.string "This is the stringn"

.text

global _start

_start:

movl $20, %edx ;move the byte count into edx

movl $msg, %ecx ;move the pointer to your string into ecx

movl $1, %ebx ;move the file descriptor into ebx in this case its 1(stdout)

movl $4, %eax ;move 4 into eax. this is the system call number for the kernel

int $0x80 ;call the kernel with this instruction

movl $0, %ebx ;load ebx with zero, as per the exit() syscall

movl $1, %eax ;exit system call

int $0x80

Exact language syntax differs with whatever assembler you use. In these exercise we use the GNU gas assembler. The reasoning is simple. Everyone that has gcc has this assemlber. Save the above source as "wrote.s". To compile this prog type "as -o wrote.o wrote.s" without the quotes. Then it must be linked. I'm not going to cover what that is right now, just think of it as an extra step. To do that type "ld -o wrote wrote.o". That will leave you with a running program.

To sum up the program flow, you take the function parameters, load them into the registers in backwards order, call the kernel, load the registers for the "exit" system call, and call the kernel one last time. When using gas, variables begin with a dollar sign. Registers begin with a percent sign. The words that begin with a "." as in ".data" denote sections of the program. Those will be discussed further in the next article. For now, just put 'em in. The "movl" instruction moves data from one place to another. The "l" at the end of it stands for "long" which means the item being moved is a 32 bit quantity. The first operand is the data to be moved and second operand is the destination. The "int" instruction stands for "interrupt" and generally is used to interrupt program flow for some reason which is specified by the number next to it. In this instance, the "$0x80" is the number for a kernel call. Like stated earlier the function parameters are loaded in reverse order(from right to left). And there you have a simple prog which writes your specified string on the screen. This can obviously be done in c, but youve learned a little more about your system internals, and thats the important part. In my next article i will walk you through an assembly program which does something more useful and elaborate on more aspects of assembly language.

1.) The Linux Hackers Intro to assembly language (Pt. 1) - argc
2.) Intro to PGP on Windows - madirish
3.) Hacking Windows Shares from Linux with Samba - madirish
4.) DVD Ripping the Right Way - A
5.) SAM Files and NT Password Hashes - Grifter
6.) SQL Interjection Attack - Fiend
7.) Raw Socket Access in Windows XP - Tierra
8.) The Tuxtendo's Tuxkit Rootkit Analysis - Spoonfork

Source

Source

Tags

You May Also Like

Other Articles In This Section

Recent News

Friday, November 29th

Tuesday, November 19th

Friday, November 8th

Friday, November 1st

Tuesday, July 9th

Wednesday, July 3rd

Friday, June 28th

Thursday, June 27th

Thursday, June 13th

Wednesday, June 12th

Tuesday, June 11th

HITB Discord

Latest Conference Videos Play all

Powered By