Ayao "Alqualos" Kuroyuki (ayao) wrote,
Ayao "Alqualos" Kuroyuki
ayao

Calling virtual functions from a C++ constructor, directly or indirectly

Дома однозначно лучше. Погода хорошая, а снега не так уж и много, как пугали.

Ну да ладно, hisashiburi ni an article about programming wo yarimashou. (N.B.: ЕРЖ подтверждают, что в предыдущем предложении не использовано ни одного еврейского слова. Ура!)

First an example:
#include <stdio.h>

class A {
  private:
    int x;
  protected:
    virtual int generateX();
  public:
    A(int x);
    inline int getX() {return x;}
};

int A::generateX() {
  return 1;
}

A::A(int x):
  x(x) {} 

class B: public A {
  protected:
    virtual int generateX();
  public:
    B();
};

int B::generateX() {
  return 2;
}

B::B():
  A(generateX()) {
}

class C: public B {
  protected:
    virtual int generateX();
};

int C::generateX() {
  return 3;
}

int main() {
  C c;
  printf("%d\n", c.getX());
  return 0;
}

Now, I wonder, what would that print?

Depending on your experience with C++ your answer may be either 3 or 2. I can't imagine why one would think it is 1. And indeed, it is 2. But why the hell two? We are calling a virtual function, and we are creating an object of class C, so it should be 3, no? No. The reason is that the object of class C isn't created yet at that point. Indeed, we know that constructors are executed from the parent to descendants, so when we are still inside the class B constructor, the C constructor wasn't executed yet. And what would be the point of calling a virtual function for an object whose fields aren’t initialized? It could try to access some of them and this would lead to an undefined behaviour. In C# or Java, the class C function would be called, though. Not that it makes any sense in those languages either. It just makes it a really bad practice to call any virtual (for Java that means any non-static) functions in the constructor. But it's not a good idea to do that in C++ either, as we'll see very soon.

I mentioned that constructors executed from the parent to the descendant, right? I lied. They are executed from the descendant to the parent, then back to the descendant. What the hell? Let's see:

1. We create an object of class C, so its constructor (the compiler-generated default one) is called.
2. The first thing it does is call the B default constructor.
3. The B default constructor calls the constructor of the class A... no, wait! It needs an argument, so the first thing it does is evaluate that argument!
4. Now, it calls the constructor of class A.
5. When the A constructor is done, it returns back to the B constructor.
6. The B constructor executes its initializer list and body. Both are empty in this case, though.
7. The same goes for the C constructor as it is compiler-generated.

Now guess when the vtable for each class is initialized? All three classes have different vtable pointers, and each constructor initializes it to the appropriate value. Since in the end it comes to the parent-to-child execution order, we end up having a full-fledged class C object with the correct vtable pointer. But in the process, it changes from A to B to C. Now, the question is, when does it happen exactly? It makes no sense to initialize the vtable before the call to the parent constructor because it would overwrite it, and we would end up with the vtable pointer pointing to the class A vtable. So it is done right after the call to the parent constructor. And only when there is no parent, it is the first thing that constructor does.

But wait, what's that? In the above example, it means that the vtable is first initialized to the A vtable between steps 4 and 5. Which means that it's uninitialized at the step 3, when the evaluation of the argument to the A constructor takes place. But in order to evaluate that argument, the virtual generateX() is called! How can a virtual function can be called with no vtable? Well, it turns out that the compiler knows about special rules for virtual function calls from the constructor, so it just calls it as if it was a normal (non-virtual) function. I have tested this with GCC, checking the assembly output, and it works. I don't know what the C++ standard says about it, though. But it is possible that it explicitly specifies this behaviour, and that would be reasonable.

So, if this is the standard behaviour, then we are completely safe? Can we just treat virtual functions as regular ones during initialization? Unfortunately, no. Take this:
// ...

class B: public A {
  private:
    int doGenerateX();
  protected:
    virtual int generateX();
  public:
    B();
};

// ...

int B::doGenerateX() {
  return generateX();
}

B::B():
  A(doGenerateX()) {
}

// ...

Now, if you have carefully read all that complicated stuff that I have written above, you can guess what will happen. As the compiler has no idea who may call doGenerateX(), it can't possibly generate a regular call for the generateX() inside it. This means a full-fledged virtual call will be generated. And the last example will most probably crash, but that's really UB.

But what if the compiler initializes vtable twice in the constructor? First before the parent constructor call, even before its arguments are evaluated, then after that call to restore vtable from whatever the parent constructor initialized it to. Suppose this behaviour is required by the standard and every compiler supports that. Will we be safe than in the above example? The answer is yes, but it would make absolutely no sense. Think about it: when we call a function before the parent constructor call, basically every field in the object is uninitialized. Now, what would be the point of calling a non-static function when fields only contain garbage?

The final question: is it safe to call a virtual function after the call to the parent constructor? From the technical point of view, yes, it is. But what about OOP philosophy? Does it make much sense to perform an action, that is supposed to be polymorphic, in a non-polymorphic way? Most probably, not.

The conclusion: do not call virtual functions from the constructor. Especially, do not call them before the base class constructor is called. And most importantly, do not do that in an indirect way. And if you need to call a function before the call to the parent constructor, then it can't access any fields as none of them are initialized yet. And if it doesn't access any field, why not just make it static and save yourself from a lot of trouble?
Tags: it
Subscribe
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 0 comments