Wednesday, July 23, 2014

Applicative vs Monadic build systems

Summary: Shake is a monadic build system, and monadic build systems are more powerful than applicative ones.

Several people have wondered if the dependencies in the Shake build system are monadic, and if Make dependencies are applicative. In this post I'll try and figure out what that means, and show that the claim is somewhat true.

Gergo recently wrote a good primer on the concepts of Applicative, Monads and Arrows (it is worth reading the first half if you are unfamiliar with monad or applicative). Using a similar idea, we can model a simple build system as a set of rules:

rules :: [(FilePath, Action String)]
rules = [("a+b", do a <- need "a"; b <- need "b"; return (a ++ b))
        ,("a"  , return "Hello ")
        ,("b"  , return "World")
        ]

Each rule is on a separate line, containing a pair of the file the rule produces (e.g. a for the second rule) and the action that produces the files contents (e.g. return "Hello"). I've used need to allow a rule to use the contents of another file, so the rule for a+b depends on the files a and b, then concatenates their contents. We can run these rules to produce all the files. We've written these rules assuming Action is a Monad, using the do notation for monads. However, for the above build system, we can restrict ourselves to Applicative functions:

rules = [("a+b", (++) <$> need "a" <*> need "b")
        ,("a"  , pure "Hello ")
        ,("b"  , pure "World")
        ]

If Action is applicative but not monadic then we can statically (without running any code operating on file contents) produce a dependency graph. If Action is monadic we can't generate a graph upfront, but there are some build systems that cannot be expressed applicatively. In particular, using a monad we can write a "dereferencing" build system:

rules = [("!a", do a <- need "a"; need a)
        ,("a" , pure "b")
        ,("b" , pure "Goodbye")
        ]

To build the file !a we first require the file a (which produces the contents b), then we require the file b (which produces the contents Goodbye). Note that the first rule has changed b the content into b the file name. In general, to move information from the file content to a file name, requires a monad. Alternatively stated, a monad lets you chose future dependencies based on the results of previous dependencies.

One realistic example (from the original Shake paper), is building a .tar file from the list of files contained in a file. Using Shake we can write the Action:

contents <- readFileLines "list.txt"
need contents
cmd "tar -cf" [out] contents

The only build systems that I'm aware of that are monadic are redo, SCons and Shake-inspired build systems (including Shake itself, Jenga in OCaml, and several Haskell alternatives).

While it is the case that Shake is monadic, and that monadic build systems are more powerful than applicative ones, it is not the case that Make is applicative. In fact, almost no build systems are purely applicative. Looking at the build shootout, every build system tested can implement the !a example (provided the file a is not a build product), despite several systems being based on applicative dependencies.

Looking at Make specifically, it's clear that the output: input1 input2 formulation of dependencies is applicative in nature. However, there are at least two aspects I'm aware of that increase the power of Make:

  • Using $(shell cat list.txt) I can splice the contents of list.txt into the Makefile, reading the contents of list.txt before the dependencies are parsed.
  • Using -include file.d I can include additional rules that are themselves produced by the build system.

It seems every "applicative" build system contains some mechanism for extending its power. I believe some are strictly less powerful than monadic systems, while others may turn out to be an encoding of monadic rules. However, I think that an explicitly monadic definition provides a clearer foundation.

Sunday, June 29, 2014

Optimisation with Continuations

Summary: Continuations are confusing. Here we solve a simple problem (that is at the heart of the Shake build system) using continuations.

Imagine we are given two IO a computations, and want to run them both to completion, returning the first a value as soon as it is produced (let's ignore exceptions). Writing that in Haskell isn't too hard:

parallel :: IO a -> IO a -> IO a
parallel t1 t2 = do
    once <- newOnce
    var <- newEmptyMVar
    forkIO $ t1 >>= once . putMVar var
    forkIO $ t2 >>= once . putMVar var
    readMVar var

We create an empty variable var with newEmptyMVar, fire off two threads with forkIO to run the computations which write their results to var, and finish by reading as soon as a value is available with readMVar. We use a utility newOnce to ensure that only one of the threads calls putMVar, defined as:

newOnce :: IO (IO () -> IO ())
newOnce = do
    run <- newMVar True
    return $ \act -> do
        b <- modifyMVar run $ \b -> return (False, b)
        when b act

Calling newOnce produces a function that given an action will either run it (the first time) or ignore it (every time after). Using newOnce we only call putMVar for the first thread to complete.

This solution works, and Shake does something roughly equivalent (but much more complex) in it's main scheduler. However, this solution has a drawback - it uses two additional threads. Can we use only one additional thread?

For the problem above, running the computations to completion without retrying, you can't avoid two additional threads. To use only one additional thread and run in parallel you must run one of the operations on the calling thread - but if whatever you run on the additional thread finishes first, there's no way to move the other computation off the the calling thread and return immediately. However, we can define:

type C a = (a -> IO ()) -> IO ()

Comparing IO a to C a, instead of returning an a, we get given a function to pass the a to (known as a continuation). We still "give back" the a, but not as a return value, instead we pass it onwards to a function. We assume that the continuation is called exactly once. We can define parallel on C:

parallel :: C a -> C a -> C a
parallel t1 t2 k = do
    once <- newOnce
    forkIO $ t1 (once . k)
    t2 (once . k)

This definition takes the two computations to run (t1 and t2), plus the continuation k. We fork a separate thread to run t1, but run t2 on the calling thread, using only one additional thread. While the parallel function won't return until after t2 completes, subsequent processing using the a value will continue as soon as either finishes.

Looking at the transformers package, we see Control.Monad.Trans.Cont contains ContT, which is defined as:

newtype ContT r m a = ContT {runContT :: (a -> m r) -> m r}

If we use r for () and IO for m then we get the same type as C. We can redefine C as:

type C a = ContT () IO a

The changes to parallel just involve wrapping with ContT and unwrapping with runContT:

parallel :: C a -> C a -> C a
parallel t1 t2 = ContT $ \k -> do
    once <- newOnce
    forkIO $ runContT t1 (once . k)
    runContT t2 (once . k)

Now we've defined our parallel function in terms of C, it is useful to convert between C and IO:

toC :: IO a -> C a
toC = liftIO

fromC :: C a -> IO a
fromC c = do
    var <- newEmptyMVar
    forkIO $ runContT c $ putMVar var
    readMVar var

The toC function is already defined by ContT as liftIO. The fromC function needs to change from calling a callback on any thread, to returning a value on this thread, which we can do with a forkIO and MVar. Given parallel on IO takes two additional threads, and parallel on C takes only one, it's not too surprising that converting IO to C requires an additional thread.

Aren't threads cheap?

Threads in Haskell are very cheap, and many people won't care about one additional thread. However, each thread comes with a stack, which takes memory. The stack starts off small (1Kb) and grows/shrinks in 32Kb chunks, but if it ever exceeds 1Kb, it never goes below 32Kb. For certain tasks (e.g. Shake build rules) often some operation will take a little over 1Kb in stack. Since each active rule (started but not finished) needs to maintain a stack, and for huge build systems there can be 30K active rules, you can get over 1Gb of stack memory. While stacks and threads are cheap, they aren't free.

The plan for Shake

Shake currently has one thread per active rule, and blocks that thread until all dependencies have rebuilt. The plan is to switch to continuations and only have one thread per rule executing in parallel. This change will not require any code changes to Shake-based build systems, hopefully just reduce memory usage. Until then, huge build systems may wish to pass +RTS -kc8K, which can save several 100Mb of memory.

Sunday, June 22, 2014

Announcing ghc-make

Summary: I've released ghc-make, which is an alternative to ghc --make.

I've just released v0.2 of ghc-make (on Hackage, on Github). This package provides an alternative to ghc --make which supports parallel compilation of modules and runs faster when nothing needs compiling. To unpack that:

  • Parallel compilation: Call ghc-make -j4 and your program will build by running up to four ghc -c programs simultaneously. You usually need at parallel factor of 2x-3x to match ghc --make on a single core, since ghc --make does a lot of caching that is unavailable to ghc-make. If you use -j1, or omit a -j flag, the compilation will be based on ghc --make and should take the same time to compile.
  • Faster when nothing needs rebuilding: If ghc --make is slow when there is nothing to rebuild, and most of your executions do no rebuilding, ghc-make will make things go faster. On Windows I have one project where ghc --make takes 23 seconds and ghc-make takes 0.2 seconds (more than 100x faster). Particularly useful for scripts that do ghc --make Main && ./Main.

See the README for full details.

How do I use it?

Install ghc-make (cabal update && cabal install ghc-make). Then replace your calls to ghc my -arguments with ghc-make my -arguments. Almost all arguments and flags supported by ghc are supported by ghc-make - it is intended as a drop-in replacement. Let me know about any bugs on the bug tracker.

To use ghc-make with Cabal, try cabal build --with-ghc=ghc-make --ghc-options=-j4. (This technique is due to the ghc-parmake project, which also does parallel ghc --make compiles.)

How is it implemented?

This program uses the Shake library for dependency tracking and ghc --make for building. The actual ghc-make project itself only contains 4 modules, and the largest of those is the test suite.

To pass options to the underlying Shake build system prefix them with --shake, for example --shake--report=- will write a profile report to stdout and --shake--help will list the available Shake options.

Tuesday, June 03, 2014

Shake file hashes/digests

Summary: Shake can now be configured to check file hashes/digests instead of modification times, which is great if you frequently switch git branches.

Build systems run actions on files, skipping the actions if the files have not changed. An important part of that process involves determining if a file has changed. The Make build system uses modification times to impose an ordering on files, but more modern build systems tend to use the modification time as a proxy for the file contents, where any change indicates the contents have changed (e.g. Shake, Ninja). The alternative approach is to compute a hash/digest of the file contents (e.g. SCons, Redo). As of version 0.13, Shake supports both methods, along with three combinations of them - in this post I'll go through the alternatives, and their advantages/disadvantages.

Modification times rely on the file-system updating a timestamp whenever the file contents are written. Modification time is cheap to query. Saving a file afresh will cause the modification time to change, even if the contents do not - as a result touch causes rebuilds. Unfortunately, working with git branches sometimes modifies a file but leaves it with the same contents, which can result in unnecessary rebuilds (see the bottom of this post for one problematic git workflow).

File digests are computed from the file contents, and accurately reflect if the file contents have changed. There is a remote risk that the file will change without its digest changing, but unless your build system users are actively hostile attackers, that is unlikely. The disadvantage of digests is that they are expensive to compute, requiring a full scan of the file. In particular, after every rule finishes it must scan the file it just built, and on startup the build system must scan all the files. Scanning all the files can cause empty rebuilds to take minutes. When using digests, Shake also records file sizes, since if a file size changes, we know the digest will not match - making most changed digests cheap to detect.

Modification time and file digests combines the two methods so that a file only rebuilds if both the modification time and digest have changed. The advantage is that for files that have not changed the modification time will cheaply detect that, without ever computing the file hash. If the file has changed modification time, then a digest check may save an expensive rebuild, but even if it doesn't, the cost is likely to be small compared to rerunning the rule.

Modification time and file digests on inputs takes the previous method, but only computes digests for input files. Generated files (e.g. compiled binaries) tend to be large (expensive to compute digests) and not edited (rarely end up the same), so a poor candidate for digests. The file size check means this restriction is unlikely to make a difference when checking all files, but may have some limited impact when building.

Modification time or file digests combines the two methods so that a file rebuilds if either modification time or file digest have changed. I can't think of a sensible reason for using this setting, but maybe someone else can?

Suggestions for Shake users

All these options can be set with the shakeChange field of shakeOptions, or using command line flags such as --digest or --digest-and-input. Switching between some change modes will cause all files to rebuild, so I recommend finding a suitable mode and sticking to it.

  • If you can't trust the modification times to change, use ChangeDigest.
  • If you are using git and multiple branches, use ChangeModtimeAndDigestInput.
  • If you have generated files that rewrite themselves but do not change, I recommend using writeFileChanged when generating the file, but otherwise use ChangeModtimeAndDigest.
  • Otherwise, I currently recommend using ChangeModtime, but some users may prefer ChangeModtimeAndDigest.

Appendix: The git anti-build-system workflow

Certain common git workflows change files from the current version, to an old version, then back again - causing modification-time checks to run redundant rebuilds. As an example, imagine we have two branches foo and bar, based on remote branches origin/foo and origin/bar, both of which themselves are regularly synced to a common origin/master branch. The difference between origin/foo and origin/bar is likely to be small. To switch from an up-to-date bar to an up-to-date foo we can run git checkout foo && git pull. These commands switch to an out-of-date foo, then update it. As a result, any file that has changed since we last updated foo will change to an old version, then change to a new version, likely the same as it was before we started. This workflow requires build systems to support file digests.

Tuesday, May 27, 2014

Shake 0.13 released

Summary: Shake 0.13 is out, which contains a few API changes and several new features.

I've just released Shake 0.13. There are several new features, which I'll blog about in more detail over the next few weeks. If you're upgrading:

  • ShakeOptions has additional fields, as it almost always does. Don't pattern match on this type directly, use record updates/selectors. The new member is shakeChange which lets you pick between basing file checking on modification time (the default), file digests, or combinations thereof.
  • shakeReport is now [FilePath] instead of Maybe FilePath. You can now write multiple profiling reports, specify - to output a simplified report on stdout, or files ending with .json to generate JSON output.
  • Shake is replacing **> with |*> , ?>> with &?> and *>> with &*> - although the old operators will be around for a few versions yet. The new operators are hopefully more memorable - they are either OR rules (||) which match build any one of several files, or AND rules (&&) which build multiple files simultaneously, on top of the standard *> and ?> rules.
  • defaultRule is deprecated, and should be replaced with priority 0 . rule. The new priority mechanism allows defining rules at different priorities, which *> takes advantage of, so that now fully explicit matches take precedence over file-pattern matches.
  • Development.Shake.Sys is gone and all system calls are now marked deprecated. Please use cmd or command instead.
  • File times are recorded to higher precision, so files written in a fast loop are now likely to be detected as changing.
  • The Ninja emulation now supports -t compdb, which is useful for CMake.

I don't expect these changes to hit many users, and all should be fairly localised tweaks.

Thursday, May 15, 2014

Shake as a dependency library

Summary: You can use Shake as a library to implement other build tools.

The Shake build tool is often used to define a specific build system, as an alternative to Make. But Shake is really a library, and can be used to implement other build tools. In this post I'm going to show a rough implementation of the Sake build tool using Shake.

What is Sake?

Extracted from the Sake documentation:

Sake is a way to easily design, share, build, and visualize workflows with intricate interdependencies. Sake is a simple and self-documenting build system, targeted at scientists, data analysts and business teams.

The Sake build rules are defined in YAML, and a simple example is:

create the input:
    help: create the input file
    formula: echo test > input.txt
    output:
        - input.txt
convert to uppercase:
    help: change the input file to uppercase
    dependencies:
        - input.txt
    formula: cat input.txt | tr '[a-z]' '[A-Z]' > output.txt
    output:
        - output.txt

Sake build rules are simple, contain lots of help text, and are quite explicit. I can see why some users would prefer it to Shake or Make (especially as the Sake tool also produces nice visualisations and help information).

Sake on top of Shake

This section contains an implementation of Sake that can execute the file above, along with tests from the Sake repo. I'm going to intersperse the implementation along with some notes. First we give language extensions and imports:

{-# LANGUAGE OverloadedStrings #-}
import Control.Applicative
import Control.Exception
import Development.Shake
import Data.Yaml
import qualified Data.HashMap.Strict as Map
import qualified Data.Vector as Vector
import qualified Data.Text as Text

The interesting imports are Shake (the build system) and Yaml (the parser for YAML files). Our main function loads the Sake YAML file, then defers to Shake:

main = do
    build <- either throw id <$> decodeFileEither "Sakefile.yaml"
    shakeArgs shakeOptions $ elaborate build

We are using shakeArgs to get Shake to provide command line handling for our tool. The interesting part is elaborate, which translates the Sake rules into Shake rules. We define elaborate as:

elaborate (Object x) | Map.member "formula" x = do
    let formula = fromString $ x Map.! "formula"
    let dependencies = map fromString . fromArray <$> Map.lookup "dependencies" x
    let output = map fromString . fromArray <$> Map.lookup "output" x
    let act = do
            maybe alwaysRerun need dependencies
            command_ [] "sh" ["-c",formula]
    case output of
        Nothing -> action act
        Just output -> do want output; output *>> \_ -> act
elaborate (Object x) = mapM_ elaborate $ Map.elems x
elaborate _ = return ()

The first case is the interesting one. We look for formula fields which indicate build rules. We extract out the fields formula, dependencies and output. We then define act which is the action Shake will run:

maybe alwaysRerun need dependencies
command_ [] "sh" ["-c",formula]

If there were no dependencies, we always rerun the rule, otherwise we require the dependencies using need. Next we run the formula command using sh. Then we define the rules:

case output of
    Nothing -> action act
    Just output -> do want output; output *>> \_ -> act

If a Sake rule has no output field, then it is always run, which Shake specifies with action. Otherwise we want the output (since all Sake outputs are always built) and define a rule producing multiple outputs (the *>> function) which runs act. Finally, we have a few helpers to extract the fields from the YAML:

fromString (String x) = Text.unpack x
fromArray (Array x) = Vector.toList x
fromArray Null = []

Note that the full Sake implementation contains additional features and error checking. However, I think it is quite nice that a reimplementation of the basics can be done in only 16 lines of Haskell. The reimplementation also supports several features that the original Sake does not, including profiling, progress reporting and staunch mode.

Conclusions

Shake is capable of implementing other build tools, and can be used as a build system in its own right, or a library supplying dependency tracking. I believe there is plenty scope for higher-level build specifications (Cabal is one example), and hope that these tools can delegate their dependency logic to Shake.

Monday, May 05, 2014

Build system performance: Shake vs Ninja

Summary: Ninja is a build system focused on being fast. Some limited benchmarking suggests the Shake build system might be faster.

The Shake build system aims to be both powerful and fast. The Ninja build system describes itself as small with a focus on speed. Now that Shake can run Ninja build files, I benchmarked Shake against Ninja.

The Test

I have been benchmarking Shake vs Ninja as part of the Shake continuous integration tests. My benchmark builds the Ninja source code using their Ninja build file, once from scratch with 3 threads (a full build), then builds again to ensure it is up to date (a zero build). The test is run with both Ninja and Shake, always immediately after a Ninja build (to ensure all files are cached). The average times over 71 runs are:

  • Full build: Ninja takes 6.552s, Shake takes 5.985s. Shake is 0.567s faster.
  • Zero build: Ninja takes 0.007s, Shake takes 0.012s. Ninja is 0.005s faster.

These tests are run on Linux, on a Travis machine. Both the hardware and loading of the machine is likely to vary over time. I deliberately picked a lower level of parallelism to try and ensure the build was not limited by running too many processes (it does not seem to be). It is now a test failure if Shake is slower for the full build, or if Shake is more than 0.1s slower for the zero build.

A more interesting test would be building something more substantial than Ninja - but choosing a benchmark is hard, and I am limited by the amount of Travis time I can use. It is not clear if Shake will be consistently N seconds faster than Ninja, or N% faster than Ninja, or if this result is an aberration due to the particular choice of benchmark. Shake does not implement the Ninja feature of rebuilding when the command line changes - adding that feature would be unlikely to have any impact on the full build but may slightly slow down the Shake zero build.

Improvements to Shake

When I first started benchmarking Shake vs Ninja, I had reports that Shake was significantly slower - taking around 40% longer to build large projects. As a result I made a number of improvements to Shake:

Improvement 1: --skip-commands

I added the --skip-commands flag and shakeRunCommands option to Shake, which skips running any command operations which have no return results. Provided your build system does not delete temporary files, it allows you to build normally, then build with --always-make --skip-commands to "run the build", skipping running commands, measuring the rest of the build system.

Improvement 2: Makefile parsing

Using --always-make --skip-commands on LLVM via Ninja files, I found the non-command build time was 58s. Profiling showed that most of the time was spent parsing Makefiles, so I wrote optimised routines, available from Development.Shake.Util. These changes reduced the LLVM non-command time to 15s.

Improvement 3: Filepath normalisation

Further profiling showed that filepath normalisation was now a bottleneck. I responded by both optimising the filepath normalisation (writing a large test suite and correcting several bugs in the process), and removing some redundant normalisations. These changes reduced the LLVM time to 4s, most of which went on file modification time checking.

Improvement 4: Turning off idle garbage collection

By default, programs compiled with GHC run the garbage collector if the Haskell runtime is idle for 0.3s. For Shake, which regularly becomes idle when running commands, the garbage collector ends up competing with the commands it has spawned. I now recommend people turn off the idle garbage collector by compiling with -with-rtsopts=-I0, and I do so for the shake executable.

Improvement 5: --timings

In order to accurately measure where time was going, I added the --timings flag and shakeTimings option. When run with --timings Shake prints out something like:

Start                             0.006s    1%
shakeArgsWith                     0.000s    0%
Function shake                    0.002s    0%
Database read                     0.049s   10%  ===
With database                     0.002s    0%
Running rules                     0.353s   72%  =========================
Pool finished (5 threads, 2 max)  0.000s    0%
Lint checking                     0.033s    6%  ==
Profile report                    0.041s    8%  ==
Total                             0.486s  100%
Build completed in 0:01m

Here we can see which stages are taking most time. For example, reading in the database takes 0.049s at 10% of the time. The = symbols to the right serve as a crude bar plot representing the timings.

Improvement 6: Smaller database

For zero builds I found much of the time was spent reading the database. I changed some of the representation, using smaller Int types and more compact encodings. These changes reduced the database by ~25% and had a small effect on the time to read the database.

Future improvements

For the full build, I beat Ninja, despite originally only aiming for a draw. The build overhead introduced by Shake is 0.029s, of which 0.010s is running the rules. Provided that scales linearly, the cost seems negligible compared to actually performing the build.

For the zero build, I am slower than Ninja. To investigate I measured just running --version with Ninja and Shake. Ninja takes 0.003s and Shake takes 0.004s, so a large portion of the zero build times is the cost of starting the executable, not project specific. Running --timing I see:

Start                             0.000s    3%  ==                       
shakeArgsWith                     0.000s    7%  =====                    
Ninja parse                       0.001s   16%  ===========              
Function shake                    0.000s   10%  ======                   
Database read                     0.002s   36%  =========================
With database                     0.000s    3%  ==                       
Running rules                     0.001s   20%  =============            
Pool finished (2 threads, 2 max)  0.000s    2%  =                        
Total                             0.004s  100%                           

The largest bottleneck is database reading. Duncan Coutts' recent work on moving the binary library to CBOR is likely to result in a smaller and faster database. I await that work eagerly, and will look at further optimisations after it is available.

Is Shake faster than Ninja?

For building Ninja from scratch, Shake is faster than Ninja (perhaps the Ninja developers should switch to Shake for their development work :P). Another Ninja user benchmarked building the VXL project with both Shake and Ninja and discovered Ninja took 44 mins, while Shake took 41 mins (Shake 3 minutes faster) - but this benchmark was only run a handful of times. I would be interested to hear additional results.