Zero Cost Abstractions: An Addendum

On Reddit, Icarium-Lifestealer asked me how the disassembly looked if I used -C panic=abort instead of extern "C". Curious, I tried and came out with the following assembly, which is again exactly the same as the C example

        push    rbx
        mov     rbx, rdi
        cmp     dword ptr [rdi], 43
        jl      .LBB3_2
        mov     rdi, rbx
        call    example::bar
        mov     dword ptr [rbx], 42
.LBB3_2:
        mov     rdi, rbx
        pop     rbx
        jmp     example::baz

Zero cost abstractions: Rust vs C++

I watched a video of a C++ talk from the always excellent Chandler Carruth yesterday, entitled “There are no Zero Cost Abstractions“. In it, he talks about the cost of different abstractions that are labelled zero cost. It’s a really interesting talk and I’d advise people interested in low-level programming to give it a look.

He made an observation that C++’s std::unique_ptr is actually not a zero cost abstraction at runtime, and showed a simple C++ example using raw pointers, and then a semantically equivalent one using std::unique_ptr. Then he compared the assembly outputs taken from compiler explorer.

Now this was really interesting, and particularly because Rust boasts about having very similar zero cost abstractions. So I wondered about the cost of the equivalent code in Rust.

Fortunately compiler explorer supports Rust as well, so we can compare them!

So for reference, here are the C++ examples that Chandler included in his talk:

Raw pointer version:
void bar(int* ptr);
void baz(int *ptr);

void foo(int *ptr) {
    if (*ptr > 42) {
        bar(ptr);
        *ptr = 42;
    }
    baz(ptr);
}
With std::unique_ptr
#include <memory>

using namespace std;

void bar(int* ptr) noexcept;
void baz(unique_ptr<int> &&ptr) noexcept;

void foo(unique_ptr<int> ptr) noexcept {
    if (*ptr > 42) {
        bar(ptr.get());
        *ptr = 42;
    }
    baz(std::move(ptr));
}

And here are the x64 assembly language outputs. I’ve compiled using clang 10.0 and -O2 for optimisation.

As not everyone is familiar with the beauty of x64 assembly language, I’ve annotated the the output.

Raw pointer version:
        push    rbx                      # register preservation for the calling convention
        mov     rbx, rdi                  
        cmp     dword ptr [rdi], 43      # if (ptr > 42)
        jl      .LBB0_2                  # 
        mov     rdi, rbx                 # calling convention set up
        call    bar(int*)
        mov     dword ptr [rbx], 42      # *ptr = 42
.LBB0_2:
        mov     rdi, rbx                 # reverse out the register/stack preservation
        pop     rbx
        jmp     baz(int*)                # TAILCALL
With std::unique_ptr
        push    rbx                      # register preservation for the calling convention
        mov     rbx, rdi                
        mov     rdi, qword ptr [rdi]     # what's this? we are dereferencing the ptr in rdi!
        cmp     dword ptr [rdi], 43      # if (ptr > 42)
        jl      .LBB0_2
        call    bar(int*)                
        mov     rax, qword ptr [rbx]     # another dereference!
        mov     dword ptr [rax], 42      # *ptr = 42
.LBB0_2:
        mov     rdi, rbx                 # reverse out the register/stack preservation
        pop     rbx
        jmp     baz(unique_ptr<int>&&)   # TAILCALL

Now we have two extra fetches from memory in the std::unique_ptr version. I’m not going to go into detail for the reason behind them, as Chandler does this expertly in the talk with multiple revisions of the code, and I think you should all watch it, as the rest of the talk is highly relevant for even non C++ programmers.

Now std::unique_ptr<T> is equivalent to Rust’s Box<T>. Rust’s move semantics and ownership model enforce the same behaviour (actually even more strongly than C++ does).

So my first pass at a rust equivalent of the unique_ptr example was as follows:

#[inline(never)]
fn bar(ptr: &i32) {
    print!("{}", ptr);
}

#[inline(never)]
fn baz(ptr: Box<i32>) {
    print!("{}", ptr);
}

pub fn foo(mut ptr: Box<i32>) {
    if *ptr > 42 {
        bar(&*ptr);
        *ptr = 42;
    }
    baz(ptr)
}

In Rust, we can’t get away with a forward reference to a function in another translation unit, so we define bar and baz trivially, and mark them as non-inlinable, so that the compiler will generate calls to them instead of trying to be clever – we aren’t trying to check how the compiler inlines code and we want it to be a fair comparison.

So what does the generated output look like? Well slightly disappointing:

        push    r14                  # Lots of calling convention set up
        push    rbx
        push    rax
        mov     rbx, rdi
        cmp     dword ptr [rdi], 43  # if *ptr > 42
        jl      .LBB5_3
        mov     rdi, rbx             # calling convention set up
        call    example::bar
        mov     dword ptr [rbx], 42
.LBB5_3:
        mov     rdi, rbx             # Lots of calling convention unwind too
        add     rsp, 8
        pop     rbx
        pop     r14
        jmp     example::baz
        mov     r14, rax             # This is panic unwinding code.
        mov     rdi, rbx             # The jmp above means this is never reached!
        call    alloc::alloc::box_free
        mov     rdi, r14
        call    _Unwind_Resume@PLT
        ud2

Rust’s calling convention means there is a lot more setup. Of particular note was the
panic unwind. LLVM does a tail call optimisation, so instead of a call, it does it’s
cleanup and a jump. That means that when baz returns, it returns directly to foo’s caller
rather than foo. It’s a neat optimisation and saves on stack space and instruction calls.
However, that does mean that all the code after that is unused. This looks like a bug, albeit
not a serious one, either in LLVM or in the rustc backend.

So we can make the comparison more fair, by using the C calling convention in Rust to compare like for like. extern "C" functions need no panic unwinding (as C functions can’t panic) and the setup and tear down should be the same as for the C examples above, so this really would be a like for like comparison.

Here is the code:

#[inline(never)]
extern "C" fn bar(ptr: &i32) {
    print!("{}", ptr);
}

#[inline(never)]
extern "C" fn baz(ptr: Box<i32>) {
    print!("{}", ptr);
}

pub extern "C" fn foo(mut ptr: Box<i32>) {
    if *ptr > 42 {
        bar(&*ptr);
        *ptr = 42;
    }
    baz(ptr)
}

and the output:

        push    rbx
        mov     rbx, rdi
        cmp     dword ptr [rdi], 43
        jl      .LBB4_2
        mov     rdi, rbx
        call    example::bar
        mov     dword ptr [rbx], 42
.LBB4_2:
        mov     rdi, rbx
        pop     rbx
        jmp     example::baz

This is identical to the C raw pointer example. Rust’s compiler enforced semantics enable it to better optimise code without you even thinking about it. This is why I’ve come to love Rust – whilst C++ is too entrenched to ever disappear, I strongly believe that Rust is becoming a much better alternative to C++ in many areas where the only alternative to C++ was pure C. There is a long way to go yet, but it’s already very impressive.

As an aside, and true to Chandler’s talk – this is not zero cost, we pay for this performance at compile time. Rust is contending with C++ to be the slowest language to compile, although there is a lot of work going on to optimise it.

Anyway check out Rust and check out Chandler Carruth’s talk!

Edit: The comment on zero cost generated a bit of controversy on Reddit – and I think looking back it’s because it was poorly worded. I’m not attempting to hijack, change or rubbish the use of the term “zero cost abstraction”, merely observe that we pay for that performance elsewhere. With the power of hindsight I’d have written something like: “True to Chandler’s talk, and like all zero cost abstractions, this is only zero runtime cost – we pay for this performance at compile time.”

Colour Schemes

Just a short post about a simple web app I’ve been using a fair bit recently. When I’m building my own websites (or apps), I like them to look at least a bit pretty. Now I’m no designer, and my ability to choose an elegant colour scheme is, at best, limited. But don’t worry! There’s an app that! In this case, the badly spelled, but very useful coolors.co. This generates a random scheme of 5 colours that work well together, every time you press space. Not only that you can tinker with it and regenerate the bits your want to change, as well as get a range of different tints for your chosen colours. Simple, well done and really useful. They really do need to work on their spelling though.

Umm javascript is okay… I guess…

People who know me are probably aware I have over years been fairly scathing about JavaScript as a serious programming language. I’ve been mostly a classical desktop and back end developer for a large part of my career, and as such have avoided web development. A large reason for this was playing with JavaScript and web development in the early 2000’s. It wasn’t pretty. A mess of functions, no types anywhere, a far cry from the elegance of Borland Delphi, which was my favourite language of the time, or even VB6, which I was using at work, which if nothing else, had a kick ass IDE (there really was nothing else).

For the last two years I’ve been working on the next generation version of our software at work, which, as all enterprise software is these days, entirely web based. Our front end stack is AngularJS, Node, and typescript, which has necessitated me learning Typescript, and by proxy, becoming au fait with modern JavaScript.

My opinion of JavaScript in 2018? It’s actually pretty good. The prototypal inheritance system is both unusual and useful. I really like the way you can chain the Boolean operators together for conditional and lazy execution of expressions. The fact that every JavaScript object is a dictionary that can be arbitrarily extended. JavaScript is also a first class functional programming language.

A big part of my new found admiration for JavaScript is the new ES2015 syntax (well not so new now…). The class syntax tidies up some of the rather odder parts of the object creation syntax, and arrow functions both simplify creating functions, and get around some of the more quirky behaviour of the “this” parameter.

TypeScript further embellishes JavaScript with a very decent stab at a type system on top of JavaScript as well as also allowing you to use the lastest JavaScript features and transpile down to more widely supported JavaScript.

It’s safe to say I’m a convert to the expressiveness of JavaScript, with or without TypeScript. Yes there are horrible bits, the implicit type conversions on ==, implicit semi-colon placement, everything in non-strict mode. But you can train yourself to stay out of these holes with very little effort, and as a programmer, learning the ways people use this quirky language can open your eyes to a whole new way of writing code. That can only be a good thing.

Learning a new language

Something I haven’t really done for a long time – I’m learning a new programming language – D (www.dlang.org). I’m very impressed so far – it’s got a nice standard library that includes a number of things missing from the C++ standard library, and the language design is cleaner than C++ while still providing most if not all of the C++ benefits. I’ll post more when I get past hello world…

Angular JS – Makes Web Dev fun?

Now I’m a proponent of native software over web, and have been for a long time. My reasons have long been straightforward.
1. The user experience is much better.
2. The dev frameworks and experience are much better.
3. I don’t have to soil my hands with JavaScript on the desktop.

Points 1 and 2 are regularly eroded, to the point where it’s becoming interesting to me again. Point 3 is, alas, going in the wrong direction, with JavaScript now the Lingua Franca of the web. It could have been VBScript, so we should be thankful for small mercies.

I decided to rewrite an old silverlight app in HTML5 this weekend. This was partly due to a jab from someone at work (it’s an internally facing wiki I wrote a few years ago), and partly due to it being an excuse to try out Angular JS – an MVVM framework for HTML5 app dev. I’m seriously impressed. It’s a really smart script and moves all the templatibg you normally do server side using frameworks like Razor to the browser.

The best thing about it is how little JavaScript I needed to write to do it. I rewrote my silverlight app, front end and back in less time and much less code, and the result is much more responsive. The downside is that most people at work use primitive outdated browsers like IE8 which my beautiful flexbox layouts will not tolerate. So before release I may need to wire in some feature sniffing and write some facetious comments people using backwards browsers.

When more memory is less

Quick question; when can increasing the size of your data structures lead to less memory being allocated?

While you consider your answer to the conundrum, I’ll lay out the scenario for mine.

I have a Lexer I’ve written in C# for an expression evaluation language. A Lexer (more properly a lexical analyzer, sometimes called a tokeniser) takes a string of text and splits it into tokens which can be usefully consumed by a parser to do something cool.

Now the Lex function on my Lexer class takes as its argument a single string, and returns an IEnumerable.
Tokens look like this:

struct Token
{
  public string Text;
  public int Position;
  public TokenType Type;
}

Text is the part of the original expression which this token represents – I take a substring of the original string.

Position is where the token occurs in the original string – Used for producing useful error messages when parsing fails.

TokenType is an enum which specifies what sort of token it is. The most functionally important part of the lexer’s output, but you can ignore it for the purposes of this discussion.

Now I’m parsing 2 large text files with this lexer (around 35MB of data). Memory requirements for this and the other data structures involved are just under 500MB – so non trivial.

When looking at performance I noticed the huge number of memory allocations that were being done by the lexer. Not I’m parsing over 300,000 expressions with this lexer, so there are well over four million tokens being generated. The tokens are kept alive as long as the files are open in the app, as I use them for more than just parsing. That means I have over four million strings allocated. Thinking about it that way, I decided to change my implementation of Token to the following:

struct Token
{
   public string Expression;
   public int Position;
   public int Length;
   public TokenType Type;
   public Text { get { return Expression.Substring(Position, Length); } }
}

I’ve added a Line and a Length field. Notice that text is now a property with a custom getter and no backing storage and hence doesn’t take up memory. Total net change in size is a 4 byte increase.

I refactor my code to store the entire expression in the Expression field and the length of the token in the Length field. The text property gives the same result as before, but computes its value from the other information in the token.

So my four million tokens now take up an extra 16MB of space right?

Actually the program posted a 90MB decrease in memory usage. That’s nearly a full 20% less. How come?

This optimisation is all about strings, and relies on two factors of strings – immutability and “referenceness”.

Strings are immutable reference objects. This means you are really just passing around pointers to memory that doesn’t change. Once it’s been allocated it doesn’t change until it gets collected by the garbage collector. So I can have lots of copies of that string without ever needing to allocate more memory. If the string is changed somewhere, a new string is allocated and used instead, preserving the original, so all other references are safe.

So I re-use my already allocated expression string instead of allocating a new one. The net result is 4 million less allocations (10% speed boost) and 90MB less memory allocated. Why 90MB for 35MB of text data? Well my text data is stored in UTF-8 where as internally a .NET string is encoded as UTF-16, so they are approximately twice as big in memory. They have a int length indicator, which is another 4 bytes times the length of the string, which adds 16MB. So that’s about 86MB, which is exactly the same as 90MB when you are writing a blog post.

Hello, WordPress

My first wordpress blog.

This is the first thing I’ve used my website for some several years, and my third blog (I had a brief, but ill fated one on blogger a few years ago, and I have started a private work blog as well).

I’m intending to comment on code, interesting developments in the world of proper coding (proper in this case being a synonym for C++) occasionally other inferior languages, and things that take my fancy from the world of geekery.