`[[trivial_abi]]` 101

Finally, a blog post on [[trivial_abi]] !

This is a brand-new feature in Clang trunk, new as of about February 2018. It is a vendor extension to the C++ language — it is not standard C++, it isn’t supported by GCC trunk, and there is no active WG21 proposal to add it to the standard C++ language, as far as I know.

Full disclosure: I am totally not involved in the implementation of this feature. I’m just watching its patches go by on the cfe-commits mailing list and applauding quietly to myself. But this is such a cool feature that I think everyone should know about it.

Okay, first of all, since this is a non-standard attribute, Clang trunk doesn’t actually support it under the standard attribute spelling [[trivial_abi]] . Instead, you must spell it old-style as one of the following:

  • __attribute__((trivial_abi))
  • __attribute__((__trivial_abi__))
  • [[clang::trivial_abi]]

Also, being an attribute, the compiler will be super picky about where you put it — and passive-aggressively quiet if you accidentally put it in the wrong place (because unrecognized attributes are supposed to be quietly ignored). This is one of those “it’s a feature, not a bug!” situations. So the proper syntax, all in one place, is:

#define TRIVIAL_ABI __attribute__((trivial_abi))

class TRIVIAL_ABI Widget {
    // ...
};

What is the problem being solved?

Remember my blog post from 2018-04-17 where I showed two versions of a class (there called Integer ):

struct Foo {
    int value;
    ~Foo() = default; // trivial
};

struct Bar {
    int value;
    ~Bar() {} // deliberately non-trivial
};

In that post’s particular code snippet, the compiler produced worse codegen for Foo than it did for Bar . This was worth blogging about because it was surprising . Programmers intuitively expect that the “trivial” code will do better than the “non-trivial” code. In most situations, this is true. Specifically, this is true when we go to do a function call or return:

template<class T>
T incr(T obj) {
    obj.value += 1;
    return obj;
}

incr<Foo> compiles into the following code:

leal   1(%rdi), %eax
retq

( leal is x86-speak for “add” .) We can see that our 4-byte obj will be passed in to incr<Foo> in the %edi register; and then we’ll add 1 to its value and return it in %eax . Four bytes in, four bytes out, easy peasy.

Now look at incr<Bar> (the case with the non-trivial destructor).

movl   (%rsi), %eax
addl   $1, %eax
movl   %eax, (%rsi)
movl   %eax, (%rdi)
movq   %rdi, %rax
retq

Here, obj is not being passed in a register, even though it’s the same 4 bytes with all the same semantics. Here, obj is being passed and returned by address. So our caller has set up some space for the return value and given us a pointer to that space in %rdi ; and our caller has given us a pointer to the value of obj in the next argument register %rsi . We fetch the value from (%rsi) , add 1 to it, store it back into (%rsi) (so as to update the value of obj itself), and then (trivially) copy the 4 bytes of obj into the return slot pointed to by %rdi . Finally, we copy the caller’s original pointer %rdi into %rax , because the x86-64 ABI document (page 22) says we have to.

The reason Bar behaves so differently from Foo is that Bar has a non-trivial destructor, and the x86-64 ABI document (page 19) says specifically:

If a C++ object has either a non-trivial copy constructor or a non-trivial destructor, it is passed by invisible reference (the object is replaced in the parameter list by a pointer […]).

The later Itanium C++ ABI document defines a term of art:

If the parameter type is non-trivial for the purposes of calls , the caller must allocate space for a temporary and pass that temporary by reference.

[…]

A type is considered non-trivial for the purposes of calls if:

  • it has a non-trivial copy constructor, move constructor, or destructor, or
  • all of its copy and move constructors are deleted.

So that explains it: Bar gets worse codegen because it is passed by invisible reference. It is passed by invisible reference because of the unfortunate conjunction of two independent premises:

  • the ABI document says that things with non-trivial destructors are passed by invisible reference, and
  • Bar has a non-trivial destructor.

By the way, this is a classical syllogism : the first bullet point above is the major premise , and the second is the minor premise . The conclusion is “ Bar is passed by invisible reference.”

Suppose someone presents us with the syllogism

  • All men are mortal.
  • Socrates is a man.
  • Therefore Socrates is mortal.

If we wish to quibble with the conclusion “Socrates is mortal”, we must rebut one of the premises: either rebut the major premise (maybe some men aren’t mortal) or rebut the minor premise (maybe Socrates isn’t a man).

To get Bar to be passed in registers (just like Foo ), we must rebut one or the other of our two premises. The standard-C++ way to do it is simply to give Bar a trivial destructor, negating the minor premise. But there is another way!

How [[trivial_abi]] solves the problem

Clang’s new trivial_abi attribute negates the major premise above. Clang extends the ABI document to say essentially the following:

If the parameter type is non-trivial for the purposes of calls , the caller must allocate space for a temporary and pass that temporary by reference.

[…]

A type is considered non-trivial for the purposes of calls if it has not been marked [[trivial_abi]] AND:

  • it has a non-trivial copy constructor, move constructor, or destructor, or
  • all of its copy and move constructors are deleted.

That is, even a class type with a non-trivial move constructor or destructor will be considered non-trivial for the purposes of calls, if it has been marked by the programmer as [[trivial_abi]] .

So now (using Clang trunk) we can go back and write this:

#define TRIVIAL_ABI __attribute__((trivial_abi))

struct TRIVIAL_ABI Baz {
    int value;
    ~Baz() {} // deliberately non-trivial
};

and compile incr<Baz> , and we get the same code as incr<Foo> !

Caveat #1: [[trivial_abi]] is sometimes a no-op

I would hope that we could make “trivial-for-purposes-of-calls” wrappers around standard library types like this:

template<class T, class D>
struct TRIVIAL_ABI trivial_unique_ptr : std::unique_ptr<T, D> {
    using std::unique_ptr<T, D>::unique_ptr;
};

Unfortunately, this doesn’t work. If your class has any base classes or non-static data members which are themselves “non-trivial for purposes of calls”, then Clang’s extension as currently written will make your class sort of “irreversibly non-trivial” — the attribute will have no effect. (It will not be diagnosed. This means you can use [[trivial_abi]] on a class template such as optional and have it be “conditionally trivial”, which is sometimes a useful feature. The downside, of course, is that you might mark a class trivial and then find out later that the compiler was giving you the silent treatment.)

The attribute will also be silently ignored if your class has virtual bases or virtual member functions. In these cases it probably won’t even fit in a register anyway, and I don’t know what you’re doing passing it around by value, but, just so you know.

So, as far as I know, the only ways to use TRIVIAL_ABI on “standard utility types” such as optional<T> , unique_ptr<T> , and shared_ptr<T> are

  • implement them from scratch yourself and apply the attribute, or
  • break into your local libc++ and apply the attribute by hand there.

(In the open-source world, these are essentially the same thing anyway.)

Caveat #2: Destructor responsibility

In our Foo / Bar example, the class had a no-op destructor. Suppose we gave our class a really non-trivial destructor?

struct Up1 {
    int value;
    Up1(Up1&& u) : value(u.value) { u.value = 0; }
    ~Up1() { puts("destroyed"); }
};

This should look familiar; it’s unique_ptr<int> stripped to its bare essentials, and with printf standing in for delete .

Without TRIVIAL_ABI , incr<Up1> looks just like incr<Bar> :

movl   (%rsi), %eax
addl   $1, %eax
movl   %eax, (%rdi)
movl   $0, (%rsi)
movq   %rdi, %rax
retq

With TRIVIAL_ABI added, incr<Up2> looks much bigger and scarier!

pushq  %rbx
leal   1(%rdi), %ebx
movl   $.L.str, %edi
callq  puts
movl   %ebx, %eax
popq   %rbx
retq

Under the traditional calling convention, types with non-trivial destructors are always passed by invisible reference, which means that the callee ( incr in our case) always receives a pointer to a parameter object that it does not own . The caller owns the parameter object. This is what makescopy elision work!

When a type with [[trivial_abi]] is passed in registers, we are essentially making a copy of the parameter object. There is only one return register on x86-64 (handwave), so the callee has no way to give that object back to us when it’s finished. The callee must take ownership of the parameter object we gave it! Which means that the callee must call the destructor of the parameter object when it’s finished with it.

In our previous Foo / Bar / Baz examples, this destructor call was happening, but it was a no-op, so we didn’t notice. Now in incr<Up2> we see the additional code that is produced by a callee-side destructor.

It is conceivable that this extra code could add up, in certain use-cases.

However, counterpoint: this destructor call is not appearing out of nowhere! It is being called in incr because it is not being called in incr ’s caller. So in general the costs and benefits might be expected to balance out.

Relation to “trivially relocatable” / “move-relocates”

None.

As you can see, there is no requirement that a [[trivial_abi]] class type should have any particular semantics for its move constructor, its destructor, or its default constructor. Any given class type will likely be trivially relocatable, simply because most class types are trivially relocatable by accident. But we can easily design a [[trivial_abi]] offset_ptr which is super duper non-trivially relocatable:

template<class T>
class TRIVIAL_ABI trivial_offset_ptr {
    intptr_t value_;
public:
    trivial_offset_ptr(T *p) : value_((const char*)p - (const char*)this) {}
    trivial_offset_ptr(const trivial_offset_ptr& rhs) : value_((const char*)rhs.get() - (const char*)this) {}
    T *get() const { return (T *)((const char *)this + value_); }
    trivial_offset_ptr& operator=(const trivial_offset_ptr& rhs) {
        value_ = ((const char*)rhs.get() - (const char*)this);
        return *this;
    }
    trivial_offset_ptr& operator+=(int diff) {
        value_ += (diff * sizeof (T));
        return *this;
    }
};

int main() {
    trivial_offset_ptr<int> top = &a[4];
    top = incr(top);
    assert(top.get() == &a[5]);
}

Here’s the full code .

Clang trunk passes this test at -O0 or -O1 , but at -O2 (i.e., as soon as it tries to inline the calls to trivial_offset_ptr::operator+= and the copy constructor) it fails the assertion.

So there’s a caveat there too. If your type is doing something crazy with the this pointer, you probably don’t want to be passing it in registers out on the bleeding edge right now. But IMO this is just a Clang bug, and I think it’ll get fixed pretty soon in the grand scheme of things.

Filed 37319.

Further reading