{"id":24,"date":"2020-05-24T19:54:18","date_gmt":"2020-05-24T19:54:18","guid":{"rendered":"http:\/\/www.rottedfrog.co.uk\/?p=24"},"modified":"2020-05-27T07:58:08","modified_gmt":"2020-05-27T07:58:08","slug":"zero-cost-abstractions-rust-vs-c","status":"publish","type":"post","link":"https:\/\/www.rottedfrog.co.uk\/?p=24","title":{"rendered":"Zero cost abstractions: Rust vs C++"},"content":{"rendered":"\n<p>I watched a video of a C++ talk from the always excellent Chandler Carruth yesterday, entitled &#8220;<a rel=\"noreferrer noopener\" href=\"https:\/\/www.youtube.com\/watch?v=rHIkrotSwcc&amp;list=LLkSsu_JvG53vwdMNEDCLLqQ&amp;index=10\" target=\"_blank\">There are no Zero Cost Abstractions<\/a>&#8220;. In it, he talks about the cost of different abstractions that are labelled zero cost. It&#8217;s a really interesting talk and I&#8217;d advise people interested in low-level programming to give it a look.<\/p>\n\n\n\n<p>He made an observation that C++&#8217;s <code>std::unique_ptr<\/code> is actually not a zero cost abstraction at runtime, and showed a simple C++ example using raw pointers, and then a semantically equivalent one using <code>std::unique_ptr<\/code>. Then he compared the assembly outputs taken from compiler explorer.<\/p>\n\n\n\n<p>Now this was really interesting, and particularly because Rust boasts about having very similar zero cost abstractions. So I wondered about the cost of the equivalent code in Rust.<\/p>\n\n\n\n<p>Fortunately compiler explorer supports Rust as well, so we can compare them!<\/p>\n\n\n\n<p>So for reference, here are the C++ examples that Chandler included in his talk:<\/p>\n\n\n\n<h6>Raw pointer version:<\/h6>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"cpp\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">void bar(int* ptr);\nvoid baz(int *ptr);\n\nvoid foo(int *ptr) {\n    if (*ptr > 42) {\n        bar(ptr);\n        *ptr = 42;\n    }\n    baz(ptr);\n}<\/pre>\n\n\n\n<h6>With <code>std::unique_ptr<\/code><\/h6>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"cpp\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">#include &lt;memory>\n\nusing namespace std;\n\nvoid bar(int* ptr) noexcept;\nvoid baz(unique_ptr&lt;int> &amp;&amp;ptr) noexcept;\n\nvoid foo(unique_ptr&lt;int> ptr) noexcept {\n    if (*ptr > 42) {\n        bar(ptr.get());\n        *ptr = 42;\n    }\n    baz(std::move(ptr));\n}<\/pre>\n\n\n\n<p>And here are the x64 assembly language outputs. I&#8217;ve compiled using clang 10.0 and -O2 for optimisation.<\/p>\n\n\n\n<p>As not everyone is familiar with the beauty of x64 assembly language, I&#8217;ve annotated the the output.<\/p>\n\n\n\n<h6>Raw pointer version:<\/h6>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"asm\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">        push    rbx                      # register preservation for the calling convention\n        mov     rbx, rdi                  \n        cmp     dword ptr [rdi], 43      # if (ptr > 42)\n        jl      .LBB0_2                  # \n        mov     rdi, rbx                 # calling convention set up\n        call    bar(int*)\n        mov     dword ptr [rbx], 42      # *ptr = 42\n.LBB0_2:\n        mov     rdi, rbx                 # reverse out the register\/stack preservation\n        pop     rbx\n        jmp     baz(int*)                # TAILCALL<\/pre>\n\n\n\n<h6>With <code>std::unique_ptr<\/code><\/h6>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"asm\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">        push    rbx                      # register preservation for the calling convention\n        mov     rbx, rdi                \n        mov     rdi, qword ptr [rdi]     # what's this? we are dereferencing the ptr in rdi!\n        cmp     dword ptr [rdi], 43      # if (ptr > 42)\n        jl      .LBB0_2\n        call    bar(int*)                \n        mov     rax, qword ptr [rbx]     # another dereference!\n        mov     dword ptr [rax], 42      # *ptr = 42\n.LBB0_2:\n        mov     rdi, rbx                 # reverse out the register\/stack preservation\n        pop     rbx\n        jmp     baz(unique_ptr&lt;int>&amp;&amp;)   # TAILCALL<\/pre>\n\n\n\n<p>Now we have two extra fetches from memory in the <code>std::unique_ptr<\/code> version. I&#8217;m not going to go into detail for the reason behind them, as Chandler does this expertly in the talk with multiple revisions of the code, and I think you should all watch it, as the rest of the talk is highly relevant for even non C++ programmers.<\/p>\n\n\n\n<p>Now <code>std::unique_ptr&lt;T&gt;<\/code> is equivalent to Rust&#8217;s <code>Box&lt;T&gt;<\/code>. Rust&#8217;s move semantics and ownership model enforce the same behaviour (actually even more strongly than C++ does).<\/p>\n\n\n\n<p>So my first pass at a rust equivalent of the unique_ptr example was as follows:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"rust\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">#[inline(never)]\nfn bar(ptr: &amp;i32) {\n    print!(\"{}\", ptr);\n}\n\n#[inline(never)]\nfn baz(ptr: Box&lt;i32>) {\n    print!(\"{}\", ptr);\n}\n\npub fn foo(mut ptr: Box&lt;i32>) {\n    if *ptr > 42 {\n        bar(&amp;*ptr);\n        *ptr = 42;\n    }\n    baz(ptr)\n}<\/pre>\n\n\n\n<p>In Rust, we can&#8217;t get away with a forward reference to a function in another translation unit, so we define <code>bar<\/code> and <code>baz<\/code> trivially, and mark them as non-inlinable, so that the compiler will generate calls to them instead of trying to be clever &#8211; we aren&#8217;t trying to check how the compiler inlines code and we want it to be a fair comparison.<\/p>\n\n\n\n<p>So what does the generated output look like? Well slightly disappointing:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"asm\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">        push    r14                  # Lots of calling convention set up\n        push    rbx\n        push    rax\n        mov     rbx, rdi\n        cmp     dword ptr [rdi], 43  # if *ptr > 42\n        jl      .LBB5_3\n        mov     rdi, rbx             # calling convention set up\n        call    example::bar\n        mov     dword ptr [rbx], 42\n.LBB5_3:\n        mov     rdi, rbx             # Lots of calling convention unwind too\n        add     rsp, 8\n        pop     rbx\n        pop     r14\n        jmp     example::baz\n        mov     r14, rax             # This is panic unwinding code.\n        mov     rdi, rbx             # The jmp above means this is never reached!\n        call    alloc::alloc::box_free\n        mov     rdi, r14\n        call    _Unwind_Resume@PLT\n        ud2<\/pre>\n\n\n\n<p>Rust&#8217;s calling convention means there is a lot more setup. Of particular note was the<br>\npanic unwind. LLVM does a tail call optimisation, so instead of a <code>call<\/code>, it does it&#8217;s <br>\ncleanup and a jump. That means that when <code>baz<\/code> returns, it returns directly to foo&#8217;s caller<br>\nrather than foo. It&#8217;s a neat optimisation and saves on stack space and instruction calls.<br>\nHowever, that does mean that all the code after that is unused. This looks like a bug, albeit<br>\nnot a serious one, either in LLVM or in the rustc backend.<\/p>\n\n\n\n<p>So we can make the comparison more fair, by using the C calling convention in Rust to compare like for like. <code>extern \"C\"<\/code> functions need no panic unwinding (as C functions can&#8217;t panic) and the setup and tear down should be the same as for the C examples above, so this really would be a like for like comparison.<\/p>\n\n\n\n<p>Here is the code:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"rust\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">#[inline(never)]\nextern \"C\" fn bar(ptr: &amp;i32) {\n    print!(\"{}\", ptr);\n}\n\n#[inline(never)]\nextern \"C\" fn baz(ptr: Box&lt;i32>) {\n    print!(\"{}\", ptr);\n}\n\npub extern \"C\" fn foo(mut ptr: Box&lt;i32>) {\n    if *ptr > 42 {\n        bar(&amp;*ptr);\n        *ptr = 42;\n    }\n    baz(ptr)\n}<\/pre>\n\n\n\n<p>and the output:<\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"asm\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">        push    rbx\n        mov     rbx, rdi\n        cmp     dword ptr [rdi], 43\n        jl      .LBB4_2\n        mov     rdi, rbx\n        call    example::bar\n        mov     dword ptr [rbx], 42\n.LBB4_2:\n        mov     rdi, rbx\n        pop     rbx\n        jmp     example::baz<\/pre>\n\n\n\n<p>This is identical to the C raw pointer example. Rust&#8217;s compiler enforced semantics enable it to better optimise code without you even thinking about it. This is why I&#8217;ve come to love Rust &#8211; whilst C++ is too entrenched to ever disappear, I strongly believe that Rust is becoming a much better alternative to C++ in many areas where the only alternative to C++ was pure C. There is a long way to go yet, but it&#8217;s already very impressive.<\/p>\n\n\n\n<p>As an aside, and true to Chandler&#8217;s talk &#8211; this is not zero cost, we pay for this performance at compile time. Rust is contending with C++ to be the slowest language to compile, although there is a lot of work going on to optimise it.<\/p>\n\n\n\n<p>Anyway check out Rust and check out Chandler Carruth&#8217;s talk!<\/p>\n\n\n\n<p><strong>Edit:<\/strong> The comment on zero cost generated a bit of controversy on Reddit &#8211; and I think looking back it&#8217;s because it was poorly worded. I&#8217;m not attempting to hijack, change or rubbish the use of the term &#8220;zero cost abstraction&#8221;, merely observe that we pay for that performance elsewhere. With the power of hindsight I&#8217;d have written something like: &#8220;True to Chandler&#8217;s talk, and like all zero cost abstractions, this is only zero runtime cost &#8211; we pay for this performance at compile time.&#8221;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I watched a video of a C++ talk from the always excellent Chandler Carruth yesterday, entitled &#8220;There are no Zero Cost Abstractions&#8220;. In it, he talks about the cost of different abstractions that are labelled zero cost. It&#8217;s a really interesting talk and I&#8217;d advise people interested in low-level programming to give it a look. &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.rottedfrog.co.uk\/?p=24\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Zero cost abstractions: Rust vs C++&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.rottedfrog.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/24"}],"collection":[{"href":"https:\/\/www.rottedfrog.co.uk\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rottedfrog.co.uk\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rottedfrog.co.uk\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rottedfrog.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=24"}],"version-history":[{"count":8,"href":"https:\/\/www.rottedfrog.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/24\/revisions"}],"predecessor-version":[{"id":50,"href":"https:\/\/www.rottedfrog.co.uk\/index.php?rest_route=\/wp\/v2\/posts\/24\/revisions\/50"}],"wp:attachment":[{"href":"https:\/\/www.rottedfrog.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=24"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rottedfrog.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=24"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rottedfrog.co.uk\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=24"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}