April 7, 2016

Android GOT Hook

Preface

In this post, I assumed the readers have basic knowledge of ELF file format. I'm gonna talk about the dynamic linking and relocation in this post. Although it's the beginner level, basic knowledge of ELF file format makes the content easier to understand.

Global Offset Table

It's very common for an object file to access global symbols, how to resolve these symbols is a big problem for the linker. There're two ways to solve the problem.
The first one is to generate a relocation table for the object file, the table contains the entries that need to be relocated. Before loading the object file, the .text segment will be patched according to the relocation table, thus the placeholder of global symbols will be resolved to absolute address. As this method patches the .text segment, different process of the same object file cannot share the .text memory image. Supposed every object file is linked in this way, there'll be many copies of the same file in memory.
Global offset table(GOT for short) is introduced to improve the shareability of the .text segment. If the object file referenced global symbols, there'll be an entry in GOT of this object file(The description is not precise, I'll talk about it later). If global symbols need to be resolved, GOT will be patched instead of .text segment. GOT resides in .data segment, thus .text segment keeps consistency across processes. We call it position-independent code. If we compile the files with -fPIC option, the compiler will produce position-independent code.

Procedure Linkage Table

Position-independent code improve the shareability of .text segment, it can be loaded at any address, but the efficiency is not as good as position-dependent code. Position-independent code requires more instructions to access global symbol and the work of patching GOT is a heavy work. Based on experience, usually object file only calls a small portion of the global symbols listed in GOT. Therefore the Procedure Linkage Table(PLT for short) is introduced for implementing the lazy-binding. The lazy-binding is achieved by a set of interesting interaction of PLT and GOT. The implementation is processor-specific, ARM is almost the same as x86. Due to space limitation, detailed process is omitted here. You can refer "Procedure Linkage Table" for further information. In short, access to global symbols will jump to the corresponding entry in PLT first, if it's the first time, PLT will patch the GOT, this may take some time, while the consecutive calls don't need to patch the GOT, the cost can be ignored.
While there's a vital difference between Linux and Android when handling the symbol bindings. Most Linux linker ld.so use the lazy-binding feature, but Android linker /system/bin/linker use the immediate-binding. I don't know why /system/bin/linker is implemented in this way, but this feature really makes our code simpler.
We should notice that every executable and shared library has separate PLT and GOT.

GOT Hook Overview

Based on the above facts, we can easily conclude that ELF hook can be achieved via modifying GOT entries. I have done the following things:

Find the GOT section header;
Find the GOT segment base address of target module;
Iterate over the GOT segment and find the address of function you wanna hook;
Substitute the address with your custom function address.

Then we're done with the hook. The key point of the above steps is locating the GOT segment base address, I'll talk about it in detail in the following section.

GOT Hook in Detail

This section will not talk much about elf file format, I'll just list some key information related to the GOT hook.
Let's see the struct of ELF header and Section header:

#define EI_NIDENT 16
typedef struct {
	unsigned char e_ident[EI_NIDENT];
	Elf32_Half e_type;
	Elf32_Half e_machine;
	Elf32_Word e_version;
	Elf32_Addr e_entry;
	Elf32_Off e_phoff;
	Elf32_Off e_shoff;
	Elf32_Word e_flags;
	Elf32_Half e_ehsize;
	Elf32_Half e_phentsize;
	Elf32_Half e_phnum;
	Elf32_Half e_shentsize;
	Elf32_Half e_shnum;
	Elf32_Half e_shstrndx;
} Elf32_Ehdr;

typedef struct {
	Elf32_Word sh_name;
	Elf32_Word sh_type;
	Elf32_Word sh_flags;
	Elf32_Addr sh_addr;
	Elf32_Off sh_offset;
	Elf32_Word sh_size;
	Elf32_Word sh_link;
	Elf32_Word sh_info;
	Elf32_Word sh_addralign;
	Elf32_Word sh_entsize;
} Elf32_Shdr;

First, we need to find the GOT section header, we can do this by analyzing the ELF file statically. Supposed that we've gotten the ELF header, we can get the base offset of section header table by elf_header->e_shoff and the number of section is elf_header->e_shnum. With the two value, we're able to iterate over the section header table. The sh_name in section header is a index of .shstrtab section. We can use this index to find the human-readable section name, if the name equals ".got", then we've gotten the GOT section header.
But how can we find the .shstrtab section itself? It really got me stumped for some time. After I checked the documentations, I knew that the base offset of .shstrtab section header can be calculated by the following formula:

off_t shstrtab_header_offset = elf_header->e_shoff + elf_header->e_shstrndx * sizeof(Elf32_Shdr);

With the .shstrtab section header, we have known the base offset of section and the section size, so we can easily get the full content of .shstrtab section. Like I said before, we can get the GOT section header easily.
We know that GOT is patched at runtime, so we have to parsing the GOT content after the object file has been loaded into memory. The member sh_addr of Elf32_Shdr gives us the offset at which the section's first byte should reside after loaded into memory. Therefore, the GOT segment base address is module base address plus got_section_header->sh_addr. On 32-bit system, every entry of GOT is an Elf32_Addr data. Now we can iterate the GOT entries without problem.
Before substituting the GOT entry, we have to inject shared library that contains our custom function into the target process to make sure the virtual address of custom function can be resolved. It can be done via the "TinyInjector" I wrote before.
I also made a library to do the GOT hook mentioned in this post and published it on Github with additional test case. It can be found at "AndroidGotHook".

Pros and Cons

Compared with inline hook, GOT hook has its advantages and disadvantages. GOT hook is easier to implemented and you don't need to worry about the portability. And you can hook the function of specified module. But GOT hook can only hook the calls of global symbols, as calling functions defined in the same module won't generate entries in GOT.
Inline hook modifies the instructions directly, so it's more flexible than GOT hook. Theoretically, it can hook any function in the target process whether the symbol has an entry in GOT or not. While inline hook requires good knowledge of ARM assembly and it's hard to port to other architectures.
You should choose the hook method according to the use case.