LLVM DataFlow Analysis

About

I tried to learn program analysis via directly writing analysis for binary for months. But I waste a lot of time :(. The resource of binary analysis is much more limited and user-unfriendly. Thus, I choose to start from LLVM for analysis instead, which is indeed much better.

I used MacOS and LLVM 8.0. For some reasons, I cannot configure the brew version LLVM. I would also recommend you to build from source code to avoid weird bugs. Since it’s a large project , we only care about clang and opt - the C front end and LLVM optimizer.

Intro to LLVM Pass

In LLVM, a Pass is usually designed for analysis, or transformation. You can derives own analysis from base class. We have following base classes:

  • ImmutablePass: This pass will neither run nor change state. But it stores information about current compiled target.
  • ModulePass: In this pass, we use entire program as a unit.
  • FunctionPass: FunctionPass executes on each functions independently.
  • LoopPass: Inspecting loops.
  • BasicBlockPass: Similar to FunctionPass, but basic block refers to a block without jumps.

Above passes are relatively common. For detailed explanations, you can view the official document.

Most classes have to implement runXXX(). XXX usually is the name of current pass. We start our analysis here.

Two Naive Hello-World

Pass Basis

Let’s see how to write a basic LLVM pass. First, let’s do some preparations (to avoid modify environment just edit the skeleton.c from llvm pass tutorial):

#include "llvm/Pass.h"
#include "llvm/IR/Function.h"
#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/IPO/PassManagerBuilder.h"
#include "llvm/IR/InstIterator.h"

using namespace llvm;

We need to put our customized class Hello in an anonymous namespace. It derives from FunctionPass, which is a utility class for analyzing functions. Then, a RegisterPass is used to register our analysis pass:

namespace {
  struct Hello : public FunctionPass {
      static char ID;
      Hello() : FunctionPass(ID) {}
    ...
  }
}

char Hello::ID = 0;
static RegisterPass<Hello> X("hello", "Hello World Pass",
  false /* Only looks at CFG */,
  false /* Analysis Pass */); 

Because we just want to write a naive analysis, it’s unnecessary to have a complicated constructor. Adding code to runOnFunction, which will be executed by opt, is enough for out naive algorithm.

Let’s set up a simple task: find the first and the last LLVM IR for each function. Output the first instruction in the set. Then, iterate until we find the last one:

bool runOnFunction(Function &F) override {
    errs() << "-> ";
    errs().write_escaped(F.getName()) << '\n';

    for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I)
    {
        if (I == inst_begin(F))
            errs() << "  Begin: " << *I << "\n";

        auto tmp = I;
        if ((++tmp) == inst_end(F))
            errs() << "  End: " << *I << "\n";
    }
    errs() << "\n";
    return false;
}

The Function type record information for a function. We iterate a function via inst_iterator and use inst_begin() and inst_end() to locate where to start and end.

In LLVM, we use errs() or outs() rather than std::cout to output. It’s an output utility provided as substitution to STD. The following is a testing file, and compile with /PATH_TO/clang -emit-llvm -c test.c -o test.ll:

#include<stdio.h>
int test() {
    int a = 1;
    for (int i = 1; i < 10; i++) {
        a = a * i;
    }
    return a;
}

int main() {
    printf("test");
    test();
    return 100;
}

You might need to add an addition flag for clang in macOS:

-I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/

In build directory, use cmake ../ && make to build, and run

/PATH_TO/opt -load skeleton/libSkeletonPass.so -hello < test.ll

And the output is:

// removed some extra information
-> test
  Begin:   %a = alloca i32, align 4
  End:   ret i32 %4

-> main
  Begin:   %call = call i32 (i8*, ...) @printf(...)
  End:   ret i32 100

You may notice that some instructions may produce a value. That is called virtual register. The virtual register does not exist actually (and we can not manipulate it in pass). Let’s take an example:

%add1 = add %tmp, 1
%mul1 = mul %add1, 10

Rather than producing value %add1 and %mul1, we can treat them like alias for the IR. Let’s write in Lisp way:

(mul (%add %tmp, 1), 10)

And %add1 actually alias (%add %tmp, 1) (a bit confusing, I know).

The full script is here. Above instructions are LLVM IRs, a type of representation between source code and binary code. Here, we use clang to transform source code to IR. Subsequently, opt load our pass to analysis the test.ll file. This is the life of an analysis.

Simple Transform

Another aim of LLVM is to optimize the program, we can create and insert instructions to optimize our code. Let’s try to optimize following two cases:

add foo, 0 -> none (delete it since usless)
mul bar, 2 -> shl bar, 1 (multiply to shift)

LLVM IR is a handy tool which is similar to assembler with more elegantly design for cross-platform and analyzing. It is composed by operands, which allow as to examine types and values.

We can use a BasicBlockPass here because a greater scope is unnecessary:

bool runOnBasicBlock(BasicBlock &B)
{
    // a stack to store delete instruction
    std::stack<Instruction *> del;
    // dump to show changes
    B.dump();
    bool deleteNext = false;
    Instruction *tmp = NULL;
    for (auto i = B.begin(), e = B.end(); i != e; ++i)
    {
        // it's not a strict definition, we want to delete the store instruction as well
        // store is always followed by add, so just add the next instruction to delete list
        if (deleteNext)
        {
            del.push(dyn_cast<Instruction> (i));
            deleteNext = false;
        } else if (BinaryOperator *ins = dyn_cast<BinaryOperator>(i))
        {
            // check op_code
            if (ins->getOpcode() == Instruction::Add)
            {
                auto v1 = ins->getOperand(1), v2 = ins->getOperand(0);
                // check if add constant is 0
                if (ConstantInt *intv = dyn_cast<ConstantInt>(v1))
                {
                    if (intv->equalsInt(0))
                    {
                        del.push(ins);
                        deleteNext = true;
                    }
                }
                if (ConstantInt *intv = dyn_cast<ConstantInt>(v2))
                {
                    if (intv->equalsInt(0))
                    {
                        del.push(ins);
                        deleteNext = true;
                    }
                }
            }
            // optimize multiply
            bool replace = false;
            Instruction * new_inst = NULL;
            if (ins->getOpcode() == Instruction::Mul) {
                auto v1 = ins->getOperand(1), v2 = ins->getOperand(0);
                if (ConstantInt *intv = dyn_cast<ConstantInt>(v1))
                {
                    if (intv->equalsInt(2))
                    {
                        // create shl instruction
                        // shl foo, 1
                        new_inst = BinaryOperator::Create(Instruction::Shl, 
                            ins->getOperand(0), ConstantInt::get(ins->getType(), APInt(32, StringRef("1"), 10)));
                        replace = true;
                    }
                } else if (ConstantInt *intv = dyn_cast<ConstantInt>(v2))
                {
                    if (intv->equalsInt(2))
                    {
                        new_inst = BinaryOperator::Create(Instruction::Shl, 
                            ins->getOperand(1), ConstantInt::get(ins->getType(), APInt(32, StringRef("1"), 10)));
                        replace = true;
                    }
                }
                // append instruction
                if (replace) {
                    new_inst->insertAfter(ins);
                    ins->replaceAllUsesWith(new_inst);
                    del.push(ins);
                }
            }
        }
    }
    // deleted useless instruction
    outs() << "Deleted:\n";
    while (!del.empty()) {
        tmp = del.top();
        del.pop();
        tmp->eraseFromParent();
    }
    B.dump();
    return true;
}

We use a set here to store delete instruction. Erase instructions inside the loop will break our iterations.

We have a few new concepts here. First, we use dyn_cast to transform sub-type safely. If it does not match corresponding type, the function will return NULL.

A few IR operations are also introduced. BinaryOperator and Constant are both derived from IR. The first is usually related with math operations (add, sub, and etc). The latter one is basically constant number. Other types are documented in official website.

After creating instructions use Create, we need to append it use insertAfter (or similar insert operations). And more importantly, our code should call repalceAllUsesWith to update references. Imagine following transformation:

%1 = mul nsw i32 %0, 2 -> delete later
store i32 %1, i32* %z, align 4

if we did not replaceAllUsesWith, the store will keep using old virtual register reference:

%2 = shl i32 %0, 1
store i32 %1, i32* %z, align 4

and the correct one should be following, and you can see that the virtual register reference has been updated:

%1 = shl i32 %0, 1
store i32 %1, i32* %z, align 4

Finally, we use eraseFromParent to drop the instructions. Since we modify the basic block, a return true should be added to indicate the change. Otherwise return false.

mem2reg

This opt plugin will lift original IR to SSA formed IR:

L/clang  -emit-llvm -c test.c -Xclang -disable-O0-optnone -S
$L/opt -mem2reg -S test.ll -o ssa.ll

It will produce Phi-node, a pseudo-instruction which assigns possible values to a variable. Be careful about it, because it’s not a real instruction.

Dataflow Framework

Concept

The following is only a short description of dataflow analysis. I would recommend you to watch slides from UW CSE501 or CMU 5-745 for a thoughtful understanding.

Dataflow basically tacks the flow of data (of course). We can use it to optimize the program. For example:

  • liveness analysis: a variable is used at some stages, then it’s lived.
  • reaching definition: check if a variable will be redefined (not reachable) along a path.
  • available expressions analysis: if the result of an expression has been evaluated at previous stage, then the expression is available

Sometimes, we want to analysis from the beginning of the program, it’s called forward analysis. Otherwise it’s backward analysis.

Since the entry point of a program might have other predecessors (for example, entry points forms a loop with other basic blocks). We might need to iterate until the result does not change. This is named fix point algorithm.

Design

Now, we will implement a classical dataflow framework (actually many college will use it as the first assignment of PA or compiler backend course). In a classical framework, we need:

  • meet operation: use union or intersect to calculate inner flow
  • transfer: using the function to calculate the output for a basic block
  • interior: value to initialize interior values
  • boundary: value to initialize boundary values
  • result: result for in/out for each basic blocks
  • fix point algorithm: run the analysis until the result does not change

Despite them, we need to implement set operations.

My work is largely based on this repo (and I realize I can write much better latter). Let’s use reaching definition as an example to explain implementation. The full source code is here.

First, we need to have a dataflow class:

#include "llvm/Pass.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/Function.h"
#include "llvm/IR/InstIterator.h"
#include "llvm/Analysis/CFG.h"

#include <vector>
#include <set>

namespace llvm {
	// cutomize errors for quiting
	void error(std::string);

	// ADT for storing BasicBlock and Instructions
	typedef int Index;
	typedef std::vector<BasicBlock*> BBList;
	typedef std::set<Index> VSet;
	typedef std::vector<VSet> VList;
	typedef std::vector<void*> Domain;

	// union two set
	template<typename T> std::set<T> unionSet (std::set<T> s1, std::set<T> s2) {
		std::set<T> result;
		result.insert(s1.begin(), s1.end());
		result.insert(s2.begin(), s2.end());

		return result;
	}

	// intersection of two sets
	template<typename T> std::set<T> substractSet (std::set<T> s1, std::set<T> s2) {
		std::set<T> result;
		result.insert(s1.begin(), s1.end());
		for (auto i = s2.begin(); i != s2.end(); ++i) {
			auto iter = result.find(*i);
			if (iter != result.end()) {
				result.erase(iter);
			}
		}

		return result;
	}

	// convert an expression to index ID
	Index domainIndex (Domain &D, void* ptr);

	// dataflow direction
	enum Direction {
		FORWARD,
		BACKWARD
	};

	// meet operations
	enum MeetOp {
		INTERSECT,
		UNION
	};

	// result for transfer function
	struct TransferOutput {
		VSet transfer;
		std::map<BasicBlock*, VSet> neighbor;
	};

	// in and out result for blocks
	struct BlockResult {
		VSet in, out;
		TransferOutput transferOutput;
	};

	// the final result
	struct DataFlowResult {
		std::map<BasicBlock*, BlockResult> result;
	};

	// dataflow framework
	class Dataflow {
		public:
		Dataflow (Direction direction, MeetOp meetop, Domain domain)
		: direction(direction), meetop(meetop), domain(domain)
		{

		};

		VSet applyMeet (VList input);
		DataFlowResult run (Function &F, VSet boudary, VSet interior);
		Index domainIndex (void* ptr);
		virtual TransferOutput transferFn (VSet input, BasicBlock *currentBlock) = 0;
		const Index INDEX_NOT_FOUND = -1;
		Domain domain;

		private:
		Direction direction;
		MeetOp meetop;
	};

	// convert LLVM value to corresponding std::string
	std::string getValueName (Value* v);
};

We add our Dataflow class to llvm namespace. Most operations are clear. We define several set operations, and two enum types to define the flow direction and meet operations for the analysis. transferFn is virtual functions, its exact behaviors need to be determined by specific algorithms.

In addition, we define domain. This vector allows us store the instruction we want to analysis (for example, we don’t hope save branch instructions when analyzing varaible). And more important, we can map instruction with location index.

run runs the whole analysis, it will initialize values and called fixed point algorithm to use applyMeet and transferFn to calculate in/out. The comprehensive code will be discussed later.

Dataflow Implementation

Let’s start from easiest function applyMeet:

// gmerate meet opeartion for in/out
VSet Dataflow::applyMeet (VList input) {
    VSet result;

    if (input.empty()) {
        return result;
    }

    for (auto o : input) {
        // UNION: simply insert all number to set
        if (meetop == MeetOp::UNION) {
            result.insert(o.begin(), o.end());

        // INTERSECTION: remove value occurs more than once, otherwise insert it
        } else if (meetop == MeetOp::INTERSECT) {
            if (result.empty()) {
                result.insert(o.begin(), o.end());
                continue;
            }

            VSet tmp;
            for (auto value : o) {
                auto redundant = result.find(value);
                if (redundant != result.end()) {
                    tmp.insert(value);
                }
            }
            result = tmp;
        }
    }
}

It accepts a list of input, then check the meet operations. If it is union, we simply insert all values into the result set. In intersect case, we find the value occurs more than once, then insert them in the result set.

Now, let’s take a look at the run function. At the very beginning, it will initialize blocks based on flow direction. If it’s forward, then we add entry point. On the other hand, we add all blocks with ret statement:

std::map<BasicBlock*, BlockResult> result;
Domain &domain = this->domain;
VSet base;

// initialize the first Block we need to iterate accoring to direction
BBList initList, traverseList;
switch (direction) {
    case Direction::FORWARD:
        initList.push_back(&F.front());
        break;
    case Direction::BACKWARD:
        for (auto &BB : F) {
            if(isa<ReturnInst> (BB.getTerminator())) {
                initList.push_back(&BB);
            }
        }
        break;
    default:
        error("Unknown Direction");
        break;
}

Then, for each basic block, we need to get their neighbors blocks. And it should be predecessor in forward case and successor in backward case:

// push other blocks into the list
std::map<BasicBlock*, BBList> neighbors;
switch (direction) {
    case Direction::FORWARD:
        for (auto &BB : F) {
            for (auto pred_BB = pred_begin(&BB); pred_BB != pred_end(&BB); ++pred_BB) {
                neighbors[&BB].push_back(*pred_BB);
            }
        }
        break;
    case Direction::BACKWARD:
        for (auto &BB : F) {
            for (auto succ_BB = succ_begin(&BB); succ_BB != succ_end(&BB); ++succ_BB) {
                neighbors[&BB].push_back(*succ_BB);
            }
        }
        break;
    default:
        error("Unknown Direction");
        break;
}

Then, we will set boundary value and interior value for each blocks:

// initialize boudary set value
BlockResult boundaryRes = BlockResult();
if (direction == Direction::FORWARD) {
    boundaryRes.in = boundary;
    base = boundary;
} else {
    boundaryRes.out = boundary;
}

for (auto BB : initList) {
    result.insert(std::pair<BasicBlock*, BlockResult>(BB, boundaryRes));
}

// initalize interior set value
BlockResult interiorRes = BlockResult();
if (direction == Direction::FORWARD) {
    interiorRes.out = interior;
} else {
    interiorRes.in = interior;
    base = interior;
}

for (auto &BB : F) {
    if (result.find(&BB) == result.end()) {
        result.insert(std::pair<BasicBlock*, BlockResult>(&BB, interiorRes));
    }
}

After all blocks are inserted, we need to determine the traverse order. I use BFS to derive the result. But it seems that LLVM does provide a utility for us to get blocks in order. The code is long and boring, so I won’t paste here.

The most important part, fixed point algorithm:

while (!converged) {
    converged = true;

    for (auto currBB : traverseList) {
        // we use calculate meet value first
        VList meetInput;

        // if we have to initialize with some values
        if (direction == BACKWARD && isa<ReturnInst>(currBB->getTerminator())) {
            meetInput.push_back(base);
        }

        if (direction == FORWARD && currBB == &F.front()) {
            meetInput.push_back(base);
        }
        
        for (auto n : neighbors[currBB]) {
            VSet value;
            if (direction == Direction::FORWARD) {
                value = result[n].out;
            } else {
                value = result[n].in;
            }
            meetInput.push_back(value);
        }
        
        // then is transfer value
        VSet meetResult = applyMeet(meetInput);
        if (direction == Direction::FORWARD) {
            result[currBB].in = meetResult;
        } else {
            result[currBB].out = meetResult;
        }

        VSet *blockInput = (direction == Direction::FORWARD) 
            ? &result[currBB].in : &result[currBB].out;
        TransferOutput transferRes = transferFn (*blockInput, currBB);
        VSet *blockOutput = (direction == Direction::FORWARD) 
            ? &result[currBB].out : &result[currBB].in; 

        // check if previous result and the transfer result are the same
        if (converged) {
            if (transferRes.transfer != *blockOutput ||
                result[currBB].transferOutput.neighbor.size() != transferRes.neighbor.size()) {
                converged = false;
            }
        }

        // update value
        *blockOutput = transferRes.transfer;
        result[currBB].transferOutput.neighbor = transferRes.neighbor;
    }
}

At the very beginning, we give a list of output from neighbor blocks to applyMeet to calculate input for this block. Subsequently, we called transferFn to get output. If the output matched the previous one, the work list converges. And we can quit iterating.

Reaching Definition

Now, we have completed most parts of the dataflow framework except finish transferFn method. To describe reaching definition mathematically:

  • in = U(pred(B.out))
  • out = (input - kill) U gen

First, we need to put all varaible into domain. Please notice that argument can also be varaible. So we need an additional cycle to add arguments. We also use arguments as boundary value, because they always exist when entering entry block:

outs() << "Function: " << F.getName() << "\n";
DataFlowResult result;
Domain domain;

VSet boudary, interior;
// use functions arguments to initialize
for (auto arg = F.arg_begin(); arg != F.arg_end(); ++arg) {
    domain.push_back(&*arg);
    boudary.insert(domain.size() - 1);
}

// add remaining instruction
for (auto I = inst_begin(F); I != inst_end(F); ++I) {
    if (Value* val = dyn_cast<Value> (&*I)) {
        if (getValueName(val) != "") {
            if (std::find(domain.begin(), domain.end(), val) == domain.end()) {
                domain.push_back(val);
            }
        }
    }
}

After complete, let’s just run and print result:

Analysis analysis  = Analysis(Direction::FORWARD, MeetOp::UNION, domain);
result = analysis.run(F, boudary, interior);
// output in/out
for (auto &BB : F) {
    outs () << "\n<" << BB.getName() << ">\n";
    outs () << "in: ";
    for (auto i : result.result[&BB].in) {
        outs() << getValueName((Value *)analysis.domain[i]) << " ";
    }
    outs () <<"\nout: ";
    for (auto i : result.result[&BB].out) {
        outs() << getValueName((Value *)analysis.domain[i]) << " ";
    }            
    outs () << "\n";    
}

How about transferFn? The idea is simple, we iterate each instructions. If its value name match instructions that have been used in input, we add it to kill set (the varaible has been reassigned). In addition, we will also check whether it matched gen set. If it does, the instruction from gen set will be erased because of re-assignment:

TransferOutput output;
VSet gen, kill;
for (auto inst = curr->begin(); inst != curr->end(); ++inst) {
    std::string val = "";
    if (isa<Instruction> (&*inst)) {
        val = getValueName(&*inst);
    }

    if (auto load_ins = dyn_cast<StoreInst>(&*inst)) {
        val = getValueName(load_ins->getPointerOperand());
    }

    if (val == "") {
        continue;
    }

    // if input has be re-assigned, kill it
    for (auto LHS : input) {
        if (isa<Instruction> (&*inst)) {
            if (getValueName((Value *) domain[LHS]) == val) {
                kill.insert(LHS);
            }
        }      
    }

    // some values are killed in the same Basic Block, so sad
    auto tmp = gen.end();
    for (auto i = gen.begin(); i != gen.end(); ++i) {
        if (tmp != gen.end()) {
            gen.erase(tmp);
            tmp = gen.end();
        }
        if (getValueName((Value *) domain[*i]) == val) {
            tmp = i;
        }
    }

    Index i = domainIndex(&*inst);
    if (i != INDEX_NOT_FOUND) {
        gen.insert(i);
    }
}

And finally, as the math formula suggests, just:

// (input - kill) U gen  
VSet tmp = substractSet(input, kill);
output.transfer = unionSet(tmp, gen);
return output;

Now, we have completed our first analysis.

Conclusion

We learn some fundamental of dataflow analysis and basic utility of LLVM passes. The dataflow framework can actually be much more elegant. We can split it into initBoundary, initInterior, actionToResult, and initDomain (but anyway, too lazy to change).

I have completed another two dataflow algorithm, you can check them in my github.