CS153/hw6/doc/hw6-opt.html

<!DOCTYPE html>

<html lang="en" data-content_root="./">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.18.1: http://docutils.sourceforge.net/" />

    <title>1. HW6: Dataflow Analysis and Optimizations &#8212; CS 153 2023</title>
    <link rel="stylesheet" type="text/css" href="_static/pygments.css?v=4f649999" />
    <link rel="stylesheet" type="text/css" href="_static/alabaster.css?v=a2fbdfc9" />
    <link rel="stylesheet" type="text/css" href="_static/custom.css?v=3dba9716" />
    <link rel="stylesheet" type="text/css" href="_static/cs153-handout.css?v=bc747a33" />
    <script src="_static/documentation_options.js?v=7f41d439"></script>
    <script src="_static/doctools.js?v=888ff710"></script>
    <script src="_static/sphinx_highlight.js?v=dc90522c"></script>
    <link rel="index" title="Index" href="genindex.html" />
    <link rel="search" title="Search" href="search.html" />
    <link rel="prev" title="&lt;no title&gt;" href="index.html" />

  <link rel="stylesheet" href="_static/custom.css" type="text/css" />


  <meta name="viewport" content="width=device-width, initial-scale=0.9, maximum-scale=0.9" />

  </head><body>


    <div class="document">
      <div class="documentwrapper">
        <div class="bodywrapper">


          <div class="body" role="main">

  <section id="hw6-dataflow-analysis-and-optimizations">
<span id="hw6-opt"></span><h1><span class="section-number">1. </span>HW6: Dataflow Analysis and Optimizations<a class="headerlink" href="#hw6-dataflow-analysis-and-optimizations" title="Link to this heading">¶</a></h1>
<section id="getting-started">
<h2><span class="section-number">1.1. </span>Getting Started<a class="headerlink" href="#getting-started" title="Link to this heading">¶</a></h2>
<p>Many of the files in this project are taken from the earlier projects.  The
new files (only) and their uses are listed below.  Those marked with <code class="docutils literal notranslate"><span class="pre">*</span></code> are
the only ones you should need to modify while completing this assignment.</p>
<table class="docutils align-default">
<tbody>
<tr class="row-odd"><td><p>bin/datastructures.ml</p></td>
<td><p>set and map modules (enhanced with printing)</p></td>
</tr>
<tr class="row-even"><td><p>bin/cfg.ml</p></td>
<td><p>“view” of LL control-flow graphs as dataflow graphs</p></td>
</tr>
<tr class="row-odd"><td><p>bin/analysis.ml</p></td>
<td><p>helper functions for propagating dataflow facts</p></td>
</tr>
<tr class="row-even"><td><p>bin/solver.ml</p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">*</span></code> the general-purpose iterative dataflow analysis solver</p></td>
</tr>
<tr class="row-odd"><td><p>bin/alias.ml</p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">*</span></code> alias analysis</p></td>
</tr>
<tr class="row-even"><td><p>bin/dce.ml</p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">*</span></code> dead code elimination optimization</p></td>
</tr>
<tr class="row-odd"><td><p>bin/constprop.ml</p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">*</span></code> constant propagation analysis &amp; optimization</p></td>
</tr>
<tr class="row-even"><td><p>bin/liveness.ml</p></td>
<td><p>provided liveness analysis code</p></td>
</tr>
<tr class="row-odd"><td><p>bin/analysistests.ml</p></td>
<td><p>test cases (for liveness, constprop, alias)</p></td>
</tr>
<tr class="row-even"><td><p>bin/opt.ml</p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">*</span></code> optimizer that runs dce and constprop (and more if you want)</p></td>
</tr>
<tr class="row-odd"><td><p>bin/backend.ml</p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">*</span></code> you will implement register allocation heuristics here</p></td>
</tr>
<tr class="row-even"><td><p>bin/registers.ml</p></td>
<td><p>collects statistics about register usage</p></td>
</tr>
<tr class="row-odd"><td><p>bin/printanalysis.ml</p></td>
<td><p>a standalone program to print the results of an analysis</p></td>
</tr>
</tbody>
</table>
<div class="admonition-note admonition">
<p class="admonition-title">Note</p>
<p>You’ll need to have <a class="reference external" href="http://gallium.inria.fr/~fpottier/menhir/">menhir</a> and <a class="reference external" href="https://clang.llvm.org/">clang</a> installed on your system for this
assignment.  If you have any difficulty installing these files, please
post on <a class="reference external" href="https://edstem.org/us/courses/40936/discussion/">Ed</a> and/or contact the course staff.</p>
</div>
<div class="admonition-note admonition">
<p class="admonition-title">Note</p>
<p>As usual, running <code class="docutils literal notranslate"><span class="pre">oatc</span> <span class="pre">--test</span></code> will run the test suite.  <code class="docutils literal notranslate"><span class="pre">oatc</span></code>
also now supports several new flags having to do with optimizations.</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>-O1 :  runs two iterations of (constprop followed by dce)
--liveness {trivial|dataflow} : select which liveness analysis to use for register allocation
--regalloc {none|greedy|better} : select which register allocator to use
--print-regs : print a histogram of the registers used
</pre></div>
</div>
</div>
</section>
<section id="overview">
<h2><span class="section-number">1.2. </span>Overview<a class="headerlink" href="#overview" title="Link to this heading">¶</a></h2>
<p>The Oat compiler we have developed so far produces very inefficient code,
since it performs no optimizations at any stage of the compilation
pipeline. In this project, you will implement several simple dataflow analyses
and some optimizations at the level of our LLVMlite intermediate
representation in order to improve code size and speed.</p>
<section id="provided-code">
<h3>Provided Code<a class="headerlink" href="#provided-code" title="Link to this heading">¶</a></h3>
<p>The provided code makes extensive use of modules, module signatures, and
functors. These aid in code reuse and abstraction. If you need a refresher on
OCaml functors, we recommend reading through the <a class="reference external" href="https://dev.realworldocaml.org/functors.html">Functors Chapter</a> of Real World OCaml.</p>
<p>In <code class="docutils literal notranslate"><span class="pre">datastructures.ml</span></code>, we provide you with a number of useful modules,
module signatures, and functors for the assignment, including:</p>
<blockquote>
<div><ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">OrdPrintT</span></code>: A module signature for a type that is both comparable and
can be converted to a string for printing. This is used in conjunction with
some of our other custom modules described below. Wrapper modules <code class="docutils literal notranslate"><span class="pre">Lbl</span></code>
and <code class="docutils literal notranslate"><span class="pre">Uid</span></code> satisfying this signature are defined later in the file for the
<code class="docutils literal notranslate"><span class="pre">Ll.lbl</span></code> and <code class="docutils literal notranslate"><span class="pre">Ll.uid</span></code> types.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">SetS</span></code>: A module signature that extends OCaml’s
built-in set to include string conversion and printing capabilities.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">MakeSet</span></code>: A functor that creates an extended set (<code class="docutils literal notranslate"><span class="pre">SetS</span></code>) from a type
that satisfies the <code class="docutils literal notranslate"><span class="pre">OrdPrintT</span></code> module signature. This is applied to the
<code class="docutils literal notranslate"><span class="pre">Lbl</span></code> and <code class="docutils literal notranslate"><span class="pre">Uid</span></code> wrapper modules to create a label set module <code class="docutils literal notranslate"><span class="pre">LblS</span></code>
and a UID set module <code class="docutils literal notranslate"><span class="pre">UidS</span></code>.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">MapS</span></code>: A module signature that extends OCaml’s built-in maps to include
string conversion and printing capabilities. Three additional helper
functions are also included: <code class="docutils literal notranslate"><span class="pre">update</span></code> for updating the value associated
with a particular key, <code class="docutils literal notranslate"><span class="pre">find_or</span></code> for performing a map look-up with a
default value to be supplied when the key is not present, and <code class="docutils literal notranslate"><span class="pre">update_or</span></code>
for updating the value associated with a key if it is present, or adding an
entry with a default value if not.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">MakeMap</span></code>: A functor that creates an extended map (<code class="docutils literal notranslate"><span class="pre">MapS</span></code>) from a type
that satisfies the <code class="docutils literal notranslate"><span class="pre">OrdPrintT</span></code> module signature. This is applied to the
<code class="docutils literal notranslate"><span class="pre">Lbl</span></code> and <code class="docutils literal notranslate"><span class="pre">Uid</span></code> wrapper modules to create a label map module <code class="docutils literal notranslate"><span class="pre">LblM</span></code>
and a UID map module <code class="docutils literal notranslate"><span class="pre">UidM</span></code>. These map modules have fixed key types, but
are polymorphic in the types of their values.</p></li>
</ul>
</div></blockquote>
</section>
</section>
<section id="task-i-dataflow-analysis">
<h2><span class="section-number">1.3. </span>Task I: Dataflow Analysis<a class="headerlink" href="#task-i-dataflow-analysis" title="Link to this heading">¶</a></h2>
<p>Your first task is to implement a version of the worklist algorithm for
solving dataflow flow equations presented in lecture.  Since we plan to
implement several analyses, we’d like to reuse as much code as possible
between each one. In lecture, we saw that each analysis differs only in the
choice of the lattice, the flow function, the direction of the analysis,
and how to compute the meet of facts flowing into a node. We can take
advantage of this by writing a generic solver as an OCaml functor and
instantiating it with these parameters.</p>
<section id="the-algorithm">
<h3>The Algorithm<a class="headerlink" href="#the-algorithm" title="Link to this heading">¶</a></h3>
<p>Assuming only that we have a directed graph where each node is labeled with a
<em>dataflow fact</em> and a <em>flow function</em>, we can compute a fixpoint of the flow
on the graph as follows:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>let w = new set with all nodes
repeat until w is empty
  let n = w.pop()
  old_out = out[n]
  let in = combine(preds[n])
  out[n] := flow[n](in)
  if (!equal old_out out[n]),
    for all m in succs[n], w.add(m)
end
</pre></div>
</div>
<p>Here <code class="docutils literal notranslate"><span class="pre">equal</span></code>, <code class="docutils literal notranslate"><span class="pre">combine</span></code> and <code class="docutils literal notranslate"><span class="pre">flow</span></code> are abstract operations that will be
instantiated with lattice equality, the meet operation and the flow function
(e.g., defined by the gen and kill sets of the analysis),
respectively. Similarly, <code class="docutils literal notranslate"><span class="pre">preds</span></code> and <code class="docutils literal notranslate"><span class="pre">succs</span></code> are the graph predecessors
and successors in the <em>flow graph</em>, and do not correspond to the control flow
of the program. They can be instantiated appropriately to create a forwards or
backwards analysis.</p>
<div class="admonition-note admonition">
<p class="admonition-title">Note</p>
<p>Don’t try to use OCaml’s polymorphic equality operator (<code class="docutils literal notranslate"><span class="pre">=</span></code>) to compare
<code class="docutils literal notranslate"><span class="pre">old_out</span></code> and <code class="docutils literal notranslate"><span class="pre">out[n]</span></code> – that’s <em>reference equality</em>, not <em>structural
equality</em>. Use the supplied <code class="docutils literal notranslate"><span class="pre">Fact.compare</span></code> instead.</p>
</div>
</section>
<section id="getting-started-and-testing">
<h3>Getting Started and Testing<a class="headerlink" href="#getting-started-and-testing" title="Link to this heading">¶</a></h3>
<p>Be sure to review the comments in the <code class="docutils literal notranslate"><span class="pre">DFA_GRAPH</span></code> (<em>data flow analysis graph</em>)
and <code class="docutils literal notranslate"><span class="pre">FACT</span></code> module signatures in <code class="docutils literal notranslate"><span class="pre">solver.ml</span></code>, which define the parameters of
the solver. Make sure you understand what each declaration in the signature does
– your solver will need to use each one (other than the printing functions)!
It will also be helpful for you to understand the way that <code class="docutils literal notranslate"><span class="pre">cfg.ml</span></code> connects
to the solver.  Read the commentary there for more information.</p>
</section>
<section id="now-implement-the-solver">
<h3>Now implement the solver<a class="headerlink" href="#now-implement-the-solver" title="Link to this heading">¶</a></h3>
<p>Your first task is to fill in the <code class="docutils literal notranslate"><span class="pre">solve</span></code> function in the <code class="docutils literal notranslate"><span class="pre">Solver.Make</span></code>
functor in <code class="docutils literal notranslate"><span class="pre">solver.ml</span></code>. The input to the function is a flow graph labeled
with the initial facts. It should compute the fixpoint and return a graph with
the corresponding labeling. You will find the set datatype from
<code class="docutils literal notranslate"><span class="pre">datastructures.ml</span></code> useful for manipulating sets of nodes.</p>
<p>To test your solver, we have provided a full implementation of a liveness
analysis in <code class="docutils literal notranslate"><span class="pre">liveness.ml</span></code>. Once you’ve completed the solver, the liveness
tests in the test suite should all be passing. These tests compare the output
of your solver on a number of programs with pre-computed solutions in
<code class="docutils literal notranslate"><span class="pre">analysistest.ml</span></code>. Each entry in this file describes the set of uids that
are <strong>live-in</strong> at a label in a program from <code class="docutils literal notranslate"><span class="pre">./llprograms</span></code>. To debug,
you can compare these with the output of the <code class="docutils literal notranslate"><span class="pre">Graph.to_string</span></code> function on
the flow graphs you will be manipulating.</p>
<div class="admonition-note admonition">
<p class="admonition-title">Note</p>
<p>The stand-alone program <code class="docutils literal notranslate"><span class="pre">printanalysis</span></code> can print out the results of a
dataflow analysis for a given .ll program.  You can build it by doing
<code class="docutils literal notranslate"><span class="pre">make</span> <span class="pre">printanalysis</span></code>.  It takes flags for each analysis (run with <code class="docutils literal notranslate"><span class="pre">--h</span></code>
for a list).</p>
</div>
</section>
</section>
<section id="task-ii-alias-analysis-and-dead-code-elimination">
<h2><span class="section-number">1.4. </span>Task II: Alias Analysis and Dead Code Elimination<a class="headerlink" href="#task-ii-alias-analysis-and-dead-code-elimination" title="Link to this heading">¶</a></h2>
<p>The goal of this task is to implement a simple dead code elimination
optimization that can also remove <code class="docutils literal notranslate"><span class="pre">store</span></code> instructions when we can prove
that they have no effect on the result of the program. Though we already have
a liveness analysis, it doesn’t give us enough information to eliminate
<code class="docutils literal notranslate"><span class="pre">store</span></code> instructions: even if we know the UID of the destination pointer is
dead after a store and is not used in a load in the rest of the program, we
can not remove a store instruction because of <em>aliasing</em>.  The problem is that
there may be different UIDs that name the same stack slot. There are a number
of ways this can happen after a pointer is returned by <code class="docutils literal notranslate"><span class="pre">alloca</span></code>:</p>
<blockquote>
<div><ul class="simple">
<li><p>The pointer is used as an argument to a <code class="docutils literal notranslate"><span class="pre">getelementptr</span></code> or <code class="docutils literal notranslate"><span class="pre">bitcast</span></code> instruction</p></li>
<li><p>The pointer is stored into memory and then later loaded</p></li>
<li><p>The pointer is passed as an argument to a function, which can manipulate it
in arbitrary ways</p></li>
</ul>
</div></blockquote>
<p>Some pointers are never aliased. For example, the code generated by the Oat
frontend for local variables never creates aliases because the Oat language
itself doesn’t have an “address of” operator. We can find such uses of
<code class="docutils literal notranslate"><span class="pre">alloca</span></code> by applying a simple alias analysis.</p>
<section id="alias-analysis">
<h3>Alias Analysis<a class="headerlink" href="#alias-analysis" title="Link to this heading">¶</a></h3>
<p>We have provided some code to get you started in <code class="docutils literal notranslate"><span class="pre">alias.ml</span></code>. You will have
to fill in the flow function and lattice operations. The type of lattice
elements, <code class="docutils literal notranslate"><span class="pre">fact</span></code>, is a map from UIDs to <em>symbolic pointers</em> of type
<code class="docutils literal notranslate"><span class="pre">SymPtr.t</span></code>. Your analysis should compute, at every program point, the set of
UIDs of pointer type that are in scope and, additionally, whether that pointer
is the unique name for a stack slot according to the rules above. See the
comments in <code class="docutils literal notranslate"><span class="pre">alias.ml</span></code> for details.</p>
<blockquote>
<div><ol class="arabic simple">
<li><p><code class="docutils literal notranslate"><span class="pre">Alias.insn_flow</span></code>: the flow function over instructions</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">Alias.fact.combine</span></code>: the combine function for alias facts</p></li>
</ol>
</div></blockquote>
</section>
<section id="dead-code-elimination">
<h3>Dead Code Elimination<a class="headerlink" href="#dead-code-elimination" title="Link to this heading">¶</a></h3>
<p>Now we can use our liveness and alias analyses to implement a dead code
elimination pass. We will simply compute the results of the analysis at each
program point, then iterate over the blocks of the CFG removing any
instructions that do not contribute to the output of the program.</p>
<blockquote>
<div><ul class="simple">
<li><p>For all instructions except <code class="docutils literal notranslate"><span class="pre">store</span></code> and <code class="docutils literal notranslate"><span class="pre">call</span></code>, the instruction can
be removed if the UID it defines is not live-out at the point of definition</p></li>
<li><p>A <code class="docutils literal notranslate"><span class="pre">store</span></code> instruction can be removed if we know the UID of the destination
pointer is not aliased and not live-out at the program point of the store</p></li>
<li><p>A <code class="docutils literal notranslate"><span class="pre">call</span></code> instruction can never be removed</p></li>
</ul>
</div></blockquote>
<p>Complete the dead-code elimination optimization in <code class="docutils literal notranslate"><span class="pre">dce.ml</span></code>, where you will
only need to fill out the <code class="docutils literal notranslate"><span class="pre">dce_block</span></code> function that implements these rules.</p>
</section>
</section>
<section id="task-iii-constant-propagation">
<h2><span class="section-number">1.5. </span>Task III: Constant Propagation<a class="headerlink" href="#task-iii-constant-propagation" title="Link to this heading">¶</a></h2>
<p>Programmers don’t often write dead code directly. However, dead code is often
produced as a result of other optimizations that execute parts of the original
program at compile time, for instance <em>constant propagation</em>. In this section
you’ll implement a simple constant propagation analysis and constant folding
optimization.</p>
<p>Start by reading through the <code class="docutils literal notranslate"><span class="pre">constprop.ml</span></code>. Constant propagation is similar
to the alias analysis from the previous section. Dataflow facts will be maps
from UIDs to the type <code class="docutils literal notranslate"><span class="pre">SymConst.t</span></code>, which corresponds to the lattice from
the lecture slides. Your analysis will compute the set of UIDs in scope at
each program point, and the integer value of any UID that is computed as a
result of a series of <code class="docutils literal notranslate"><span class="pre">binop</span></code> and <code class="docutils literal notranslate"><span class="pre">icmp</span></code> instructions on constant
operands. More specifically:</p>
<blockquote>
<div><ul class="simple">
<li><p>The flow out of any <code class="docutils literal notranslate"><span class="pre">binop</span></code> or <code class="docutils literal notranslate"><span class="pre">icmp</span></code> whose operands have been
determined to be constants is the incoming flow with the defined UID to
<code class="docutils literal notranslate"><span class="pre">Const</span></code> with the expected constant value</p></li>
<li><p>The flow out of any <code class="docutils literal notranslate"><span class="pre">binop</span></code> or <code class="docutils literal notranslate"><span class="pre">icmp</span></code> with a <code class="docutils literal notranslate"><span class="pre">NonConst</span></code> operand sets
the defined UID to <code class="docutils literal notranslate"><span class="pre">NonConst</span></code></p></li>
<li><p>Similarly, the flow out of any <code class="docutils literal notranslate"><span class="pre">binop</span></code> or <code class="docutils literal notranslate"><span class="pre">icmp</span></code> with a <code class="docutils literal notranslate"><span class="pre">UndefConst</span></code>
operand sets the defined UID to <code class="docutils literal notranslate"><span class="pre">UndefConst</span></code></p></li>
<li><p>A <code class="docutils literal notranslate"><span class="pre">store</span></code> or <code class="docutils literal notranslate"><span class="pre">call</span></code> of type <code class="docutils literal notranslate"><span class="pre">Void</span></code> sets the defined UID to
<code class="docutils literal notranslate"><span class="pre">UndefConst</span></code></p></li>
<li><p>All other instructions set the defined UID to <code class="docutils literal notranslate"><span class="pre">NonConst</span></code></p></li>
</ul>
</div></blockquote>
<p>(At this point we could also include some arithmetic identities, for instance
optimizing multiplication by 0, but we’ll keep the specification simple.)
Next, you will have to implement the constant folding optimization itself,
which just traverses the blocks of the CFG and replaces operands whose values
we have computed with the appropriate constants. The structure of the code is
very similar to that in the previous section. You will have to fill in:</p>
<blockquote>
<div><ol class="arabic simple">
<li><p><code class="docutils literal notranslate"><span class="pre">Constprop.insn_flow</span></code> with the rules defined above</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">Constprop.Fact.combine</span></code> with the combine operation for the analysis</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">Constprop.cp_block</span></code> (inside the <code class="docutils literal notranslate"><span class="pre">run</span></code> function) with the code needed
to perform the constant propagation transformation</p></li>
</ol>
</div></blockquote>
<div class="admonition-note admonition">
<p class="admonition-title">Note</p>
<p>Once you have implemented constant folding and dead-code elimination, the
compiler’s <code class="docutils literal notranslate"><span class="pre">-O1</span></code> option will optimize your ll code by doing 2 iterations
of (constant prop followed by dce).  See <code class="docutils literal notranslate"><span class="pre">opt.ml</span></code>.  The <code class="docutils literal notranslate"><span class="pre">-O1</span></code>
optimizations are <em>not</em> used for testing <em>except</em> that they are <em>always</em>
performed in the register-allocation quality tests – these optimizations
improve register allocation (see below).</p>
<p>This coupling means that if you have a faulty optimization pass, it might
cause the quality of your register allocator to degrade. And it might make
getting a high score harder.</p>
</div>
</section>
<section id="task-iv-register-allocationn-optional">
<h2><span class="section-number">1.6. </span>Task IV: Register Allocationn (Optional)<a class="headerlink" href="#task-iv-register-allocationn-optional" title="Link to this heading">¶</a></h2>
<p>The backend implementation that we have given you provides two basic register
allocation stragies:</p>
<blockquote>
<div><ul class="simple">
<li><p><strong>none</strong>: spills all uids to the stack;</p></li>
<li><p><strong>greedy</strong>: uses register and a greedy linear-scan algorithm.</p></li>
</ul>
</div></blockquote>
<p>For this task, you will implement a <strong>better</strong> register allocation strategy
that makes use of the liveness information that you compute in Task I.  Most
of the instructions for this part of the assignment are found in
<code class="docutils literal notranslate"><span class="pre">backend.ml</span></code>, where we have modified the code generation strategy to be able
to make use of liveness information.  The task is to implement a single
function <code class="docutils literal notranslate"><span class="pre">better_layout</span></code> that beats our example “greedy” register allocation
strategy.  We recommend familiarizing yourself with the way that the simple
strategies work before attempting to write your own allocator.</p>
<p>The compiler now also supports several additional command-line switches that
can be used to select among different analysis and code generation options for
testing purposes:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>--print-regs prints the register usage statistics for x86 code
--liveness {trivial|dataflow} use the specified liveness analysis
--regalloc {none|greedy|better} use the specified register allocator
</pre></div>
</div>
<div class="admonition-note admonition">
<p class="admonition-title">Note</p>
<p>The flags above <em>do not</em> imply the <code class="docutils literal notranslate"><span class="pre">-O1</span></code> flag (despite the fact that we
always turn on optimization for testing purposes when running with
<code class="docutils literal notranslate"><span class="pre">--test</span></code>).  You should enable it explicitly.</p>
</div>
<p>For testing purposes, you can run the compiler with the <code class="docutils literal notranslate"><span class="pre">-v</span></code> verbose flag
and/or use the <code class="docutils literal notranslate"><span class="pre">--print-regs</span></code> flag to get more information about how your
algorithm is performing.  It is also useful to sprinkle your own verbose
output into the backend.</p>
<p>The goal for this part of the homework is to create a strategy such that code
generated with the <code class="docutils literal notranslate"><span class="pre">--regalloc</span> <span class="pre">better</span></code> <code class="docutils literal notranslate"><span class="pre">--liveness</span> <span class="pre">dataflow</span></code> flags is
“better” than code generated using the simple settings, which are <code class="docutils literal notranslate"><span class="pre">--regalloc</span>
<span class="pre">greedy</span></code> <code class="docutils literal notranslate"><span class="pre">--liveness</span> <span class="pre">dataflow</span></code>.  See the discussion about how we compare
register allocation strategies in <code class="docutils literal notranslate"><span class="pre">backend.ml</span></code>.  The “quality” test cases
report the results of these comparisons.</p>
<p>Of course your register allocation strategy should produce correct code, so we
still perform all of the correctness tests that we have used in previous
version of the compiler.  Your allocation strategy should not break any of
these tests – and you cannot earn points for the “quality” tests unless all
of the correctness tests also pass.</p>
<div class="admonition-note admonition">
<p class="admonition-title">Note</p>
<p>Since this task is optional, the quality test cases in <code class="docutils literal notranslate"><span class="pre">gradedtests.ml</span></code>
are commented out. If you are doing this task, uncomment the additional
tests in that file. (Look for the text “Uncomment the following code if
you are doing the optional Task IV Register Allocation”.)</p>
</div>
</section>
<section id="task-v-experimentation-validation-only-if-task-iv-completed">
<h2><span class="section-number">1.7. </span>Task V: Experimentation / Validation (Only if Task Iv completed)<a class="headerlink" href="#task-v-experimentation-validation-only-if-task-iv-completed" title="Link to this heading">¶</a></h2>
<p>Of course we want to understand how much of an impact your register allocation
strategy has on actual execution time.  For the final task, you will create a
new Oat program that highlights the difference.  There are two parts to this
task.</p>
<section id="create-a-test-case">
<h3>Create a test case<a class="headerlink" href="#create-a-test-case" title="Link to this heading">¶</a></h3>
<p>Post an Oat program to <a class="reference external" href="https://edstem.org/us/courses/40936/discussion/">Ed</a>.  This program should exhibit significantly
different performance when compiled using the “greedy” register allocation
strategy vs. using your “better” register allocation strategy with dataflow
information.  See the file <code class="docutils literal notranslate"><span class="pre">hw4programs/regalloctest.oat</span></code> and
<code class="docutils literal notranslate"><span class="pre">hw4programs/regalloctest2.oat</span></code> for uninspired examples of such a
program.  Yours should be more interesting.</p>
</section>
<section id="post-your-running-time">
<h3>Post your running time<a class="headerlink" href="#post-your-running-time" title="Link to this heading">¶</a></h3>
<p>Use the unix <code class="docutils literal notranslate"><span class="pre">time</span></code> command to test the performance of your
register allocation algorithm.  This should take the form of a simple table of
timing information for several test cases, including the one you create and
those mentioned below.  You should test the performance in several
configurations:</p>
<blockquote>
<div><ol class="arabic simple">
<li><p>using the <code class="docutils literal notranslate"><span class="pre">--liveness</span> <span class="pre">trivial</span></code>  <code class="docutils literal notranslate"><span class="pre">--regalloc</span> <span class="pre">none</span></code> flags  (baseline)</p></li>
<li><p>using the <code class="docutils literal notranslate"><span class="pre">--liveness</span> <span class="pre">dataflow</span></code>  <code class="docutils literal notranslate"><span class="pre">--regalloc</span> <span class="pre">greedy</span></code> flags  (greedy)</p></li>
<li><p>using the <code class="docutils literal notranslate"><span class="pre">--liveness</span> <span class="pre">dataflow</span></code>  <code class="docutils literal notranslate"><span class="pre">--regalloc</span> <span class="pre">better</span></code> flags  (better)</p></li>
<li><p>using the <code class="docutils literal notranslate"><span class="pre">--clang</span></code> flags  (clang)</p></li>
</ol>
</div></blockquote>
<p>And… all of the above plus the <code class="docutils literal notranslate"><span class="pre">-O1</span></code> flag.</p>
<p>Test your compiler on at least these three programs:</p>
<blockquote>
<div><ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">hw4programs/regalloctest.oat</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">llprograms/matmul.ll</span></code></p></li>
<li><p>your own test case</p></li>
</ul>
</div></blockquote>
<p>Report the processor and OS version that you use to test.  For best results,
use a “lightly loaded” machine (close all other applications) and average the
timing over several trial runs.</p>
<p>The example below shows one interaction used to test the <code class="docutils literal notranslate"><span class="pre">matmul.ll</span></code> file in
several configurations from the command line:</p>
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>&gt; ./oatc --liveness trivial --regalloc none llprograms/matmul.ll
&gt; time ./a.out

real 0m1.647s
user 0m1.639s
sys  0m0.002s


&gt; ./oatc --liveness dataflow --regalloc greedy llprograms/matmul.ll
&gt; time ./a.out

real 0m1.127s
user 0m1.123s
sys  0m0.002s

&gt; ./oatc --liveness dataflow --regalloc better llprograms/matmul.ll
&gt; time ./a.out

real 0m0.500s
user 0m0.496s
sys  0m0.002s

&gt; ./oatc --clang llprograms/matmul.ll
&gt; time ./a.out

real 0m0.061s
user 0m0.053s
sys  0m0.004s
</pre></div>
</div>
<p>Don’t get too discouraged when clang beats your compiler’s performance by many
orders of magnitude.  It uses register promotion and many other optimizations
to get high-quality code!</p>
</section>
</section>
<section id="optional-task-leaderboard">
<h2><span class="section-number">1.8. </span>Optional Task: Leaderboard!<a class="headerlink" href="#optional-task-leaderboard" title="Link to this heading">¶</a></h2>
<p>As an optional and hopefully fun activity, we will run a leaderboard for efficient
compilation. When you submit your homework, we will use it to compile a test suite.
(You can choose what name will appear for you on the leaderboard; feel free to use
your real name or a pseudonym.) We will compare the time that your compiled version
takes to execute compared to a compilation using the Clang backend.</p>
<p>You are welcome to implement additional optimizations by editing the file <code class="docutils literal notranslate"><span class="pre">opt.ml</span></code>.
Note that your additional optimizations should run only if the <code class="docutils literal notranslate"><span class="pre">-O2</span></code> flag is passed
(which will set <code class="docutils literal notranslate"><span class="pre">Opt.opt_level</span></code> to 2).</p>
<p>All of your additional optimizations should be implemented in the <code class="docutils literal notranslate"><span class="pre">opt.ml</span></code> file; we
know this isn’t good software engineering practice, but it helps us simplify our
code submission framework sorry.</p>
<p>We will post on Ed a link to the leaderboard test suite, so you can access the latest
version of the test suite.</p>
<p>Info about leaderboard results: The leaderboard shows the execution time of your
compiled version compared to the Clang-compiled version. Specifically, we compile
a testcase with the command
<code class="docutils literal notranslate"><span class="pre">./oatc</span> <span class="pre">-O2</span> <span class="pre">--liveness</span> <span class="pre">dataflow</span> <span class="pre">--regalloc</span> <span class="pre">better</span> <span class="pre">testfile</span> <span class="pre">runtime.c</span></code> and
measure the execution time of the resulting executable. Let this time be
<em>t_student</em>. We also compile the test case with the additional flag
<code class="docutils literal notranslate"><span class="pre">--clang</span></code> and measure the execution time of the resulting executable. Let
this time be <em>t_clang</em>. The leaderboard displays <em>t_student</em>
divided by <em>t_clang</em> for each test case, and also the geometric mean
of all the test cases. (The “version” column is the md5 sum of all the testcases.)</p>
<p>Propose a test case to add to the leaderboard: If you implement an additional
optimization and have developed a test case that your optimization does well on,
you can post a description of your optimization and the test case on Ed, and we
will consider the test case for inclusion in the test suite. Your test case must
satisfy the following properties:</p>
<blockquote>
<div><ul class="simple">
<li><p>Does not require any command line arguments to run.</p></li>
<li><p>Takes on the order of 1-3 seconds to execute</p></li>
</ul>
</div></blockquote>
</section>
<section id="grading">
<h2><span class="section-number">1.9. </span>Grading<a class="headerlink" href="#grading" title="Link to this heading">¶</a></h2>
<p><strong>Projects that do not compile will receive no credit!</strong></p>
<dl class="simple">
<dt>Your grade for this project will be based on:</dt><dd><ul class="simple">
<li><p>100 Points: the various automated tests that we provide.</p></li>
</ul>
</dd>
</dl>
<ul class="simple">
<li><p>Bonus points and unlimited bragging rights: completing
one or more of the optional tasks. Note that the register-allocator
quality tests don’t run unless your allocator passes all the correctness tests.</p></li>
</ul>
</section>
</section>


          </div>

        </div>
      </div>
      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
        <div class="sphinxsidebarwrapper"><h3>Navigation</h3>
<ul class="current">
<li class="toctree-l1 current"><a class="current reference internal" href="#">1. HW6: Dataflow Analysis and Optimizations</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#getting-started">1.1. Getting Started</a></li>
<li class="toctree-l2"><a class="reference internal" href="#overview">1.2. Overview</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#provided-code">Provided Code</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#task-i-dataflow-analysis">1.3. Task I: Dataflow Analysis</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#the-algorithm">The Algorithm</a></li>
<li class="toctree-l3"><a class="reference internal" href="#getting-started-and-testing">Getting Started and Testing</a></li>
<li class="toctree-l3"><a class="reference internal" href="#now-implement-the-solver">Now implement the solver</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#task-ii-alias-analysis-and-dead-code-elimination">1.4. Task II: Alias Analysis and Dead Code Elimination</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#alias-analysis">Alias Analysis</a></li>
<li class="toctree-l3"><a class="reference internal" href="#dead-code-elimination">Dead Code Elimination</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#task-iii-constant-propagation">1.5. Task III: Constant Propagation</a></li>
<li class="toctree-l2"><a class="reference internal" href="#task-iv-register-allocationn-optional">1.6. Task IV: Register Allocationn (Optional)</a></li>
<li class="toctree-l2"><a class="reference internal" href="#task-v-experimentation-validation-only-if-task-iv-completed">1.7. Task V: Experimentation / Validation (Only if Task Iv completed)</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#create-a-test-case">Create a test case</a></li>
<li class="toctree-l3"><a class="reference internal" href="#post-your-running-time">Post your running time</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#optional-task-leaderboard">1.8. Optional Task: Leaderboard!</a></li>
<li class="toctree-l2"><a class="reference internal" href="#grading">1.9. Grading</a></li>
</ul>
</li>
</ul>


        </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="footer">


    </div>


  </body>
</html>