10 February 2013

A little about virtual functions

   Virtual function is the most common way to support one of the three concepts of object-oriented programming, namely, polymorphism. So what is polymorphism? Wikipedia says the following: polymorphism is a programming language feature that allows values of different data types to be handled using a uniform interface. Maybe this is correct, but I'd rather formulate this in another way: polymorphism is a way to treat different objects the same way without worrying about their exact types.
    So how is polymorphism supported with virtual functions? To answer this, we should know what a virtual function is, and how it is different from ordinary functions. When talking about virtual functions we consider only member functions of a class, global or static member functions cannot be virtual, as they have nothing to do with objects. Suppose we have a class with two member functions:

class OrdinaryClass
{
public:
       void func1();
       void func2();
};

To declare a function virtual we need to add keyword virtual to the declaration. Let's do that for function 'func2'.

class PolymorphicClass
{
public:
       void func1();
       virtual void func2();
};

So what is the difference? From first sight nothing special. But, let's find out some differences between 'OrdinaryClass' and 'PolymorphicClass'. If we compile exactly these examples we will get an unresolved symbol error for the 'PolymorphicClass', even if we do not call 'func2', but in case of 'OrdinaryClass' we won't get an error if we don't call 'func2'. If we add an empty definition for 'func2' in 'PolymorphicClass' (to make the sample compile), create objects of both classes and just print the results of operator sizeof for both, the sizes will be different. C++ standard does not define the size of 'PolymorphicClass', but in most implementations it will be 4 bytes (32-bit system), and the size of 'OrdinaryClass' instance will be 1. So far we have discovered two differences, but I'll tell the reasons for these a little later.
    Let's define the 'func2' in both classes to see if there is any difference in behaviour:

class OrdinaryClass
{
public:
       void func1();
       void func2()
       {
              std::cout << "OrdinaryClass :: func2 " << std::endl;
       }
};

class PolymorphicClass
{
public:
       void func1();
       virtual void func2()
       {
              std::cout << "PolymorphicClass :: func2 " << std::endl;
       }
};

void main()
{
       OrdinaryClass obj1;
       PolymorphicClass obj2;

       obj1.func2(); // OrdinaryClass :: func2
       obj2.func2(); // PolymorphicClass :: func2
}


If you run this piece of code you will see that both objects behave the same way. The difference comes to stage when we inherit from polymorphic class. Suppose we have polymorphic class 'Base' with single virtual function and a derived class 'Derived' which redefines the mentioned function:

class Base
{
public:
       virtual void Do()
       {
              std::cout << "Base::Do" << std::endl;
       }
};

class Derived : public Base
{
public:
       virtual void Do()
       {
              std::cout << "Derived::Do" << std::endl;
       }
};

void main()
{
       Base b1;
       Derived d1;

       b1.Do(); // Base::Do
       d1.Do(); // Derived::Do

       Base *pb = new Base();
       pb->Do(); // Base::Do
       Derived *pd = new Derived();
       pd->Do(); // Derived::Do

       Base *pb2 = new Derived();
       pb2->Do(); // Derived::Do
}

The outputs of each call of 'Do' is provided in the comment of corresponding line. As we can see, all the calls but the last one result in an expected behavior. The last one executed the function defined in 'Derived', though we called the function on a pointer to 'Base' type. If 'Do' was not declared as virtual the last line should have printed 'Base::Do'.
    So this is how virtual member functions differ from ordinary member functions: when calling a virtual function using a pointer to an object, the function defined in the most derived type is executed. The same is correct for references, because they store the address of the object as the pointers. Redefining virtual function in a derived class is called overriding. C++11 standard introduced a new keyword override to explicitly mention the intent of redefinition of already existing virtual function. The derived class is not obliged to override virtual functions defined in the immediate or upper base classes. In that case calling that function on a pointer to base class pointing to an object of most derived class will execute the function of the deepest derived class in the hierarchy overriding that function.

Which implementation of function to call is determined at run-time, not at compile-time, because the compiler cannot know what type of object the pointer points to, for example:

int num = GetObjectTypeNumber();

Base *ptr = 0;

if (num == 1)
{
       ptr = new Base();
}
else
{
       ptr = new Derived();
}

ptr->Do();

Thus, the function call should be somehow redirected at run-time. So how can the environment know which function definition to execute? Actually C++ standard does not put any requirements on the implementation details of the virtual functions, anyway, most of the compilers do this way: for each polymorphic (containing a virtual function) class a special data structure is created called virtual function table (vtbl), which stores pointers to the functions that should be executed for all the virtual functions defined in the class. This table is created per class. So if we have hundreds of objects of the same polymorphic class, the virtual function table is one. And each instance of a polymorphic class contains a pointer, known as virtual pointer (vptr), pointing to the vtbl of that class. I will explain how the virtual function table is created and what it contains in another article. Just keep in mind that each call of a virtual function on a pointer or a reference is converted to another call of some function from virtual function table through virtual pointer.
    Seems now we know the reasons for both firstly discovered differences between a polymorphic and non-polymorphic classes:
  • the instances differ in sizes because polymorphic class contains an additional pointer (vptr),
  • polymorphic class definition does not compile without the virtual function definition, because it should be used in the virtual function table initialization.
And finally, we started with polymorphism, but did not give a single example. I prefer the most renowned example, one with the shapes. So consider we have a base 'Shape' class and we derive 'Ellipse', 'Rectangle' and 'Triangle' from 'Shape'. 'Shape' has a virtual function 'Draw()', which is overridden in three sub-classes. So now we can keep a list of Shape objects and call Draw() on each of them, without worrying about exact type of the objects, and the correct Draw() will be called for each one:

class Shape
{
public:
       virtual void Draw()
       {
              // do something generic for all shapes
       }
private:
       // generic shape data
};

class Ellipse : public Shape
{
public:
       void Draw() override
       {
              // Draw ellipse using data
       }
private:
       // Ellipse-specific data
};

class Rectangle : public Shape
{
public:
       void Draw() override
       {
              // Draw rectangle using data
       }
private:
       // Rectangle-specific data
};

class Triangle : public Shape
{
public:
       void Draw() override
       {
              // Draw triangle using data
       }
private:
       // Triangle-specific data
};

void main()
{
       std::list<Shape*> listOfShapes;

       listOfShapes.push_back(new Ellipse());
       listOfShapes.push_back(new Rectangle());
       listOfShapes.push_back(new Triangle());

       std::for_each(listOfShapes.begin(), listOfShapes.end(),
[] (Shape *s) {s->Draw();});
}


If we add implementations instead of comments and run the code, we will see that for each shape the correct version of 'Draw()' is executed. This is the concept of polymorphism, we seem to be doing the same thing, but actually different things happen, and on the other hand we treat different types of objects the same way.

This was just a simple introduction about virtual functions in C++, I will cover more advanced features like pure virtual functions, abstract classes and interfaces, virtual destructors, etc.; answers to some questions like why virtual functions work only for pointers and references, can we call a virtual function in a constructor or destructor, etc.; and some compiler-specific implementation details in my next article, which I am going to post very soon. Thanks for your time and interest.



10 October 2012

Storing a type on the example of a simple messenger


Hi there, I have been thinking a long time about storing a type in C++. So now I want to share with you my insights. As we all know type is something that defines the amount of necessary space and the behavior for the objects of that type. So what if we need to store a type to reuse it later? This is the main problem considered in this article.
Unlike some other languages, C++ does not introduce types as objects, so they cannot directly be stored like the latter. Though, we may think of an indirect way. For example, we can indirectly get the type of an object. To achieve this, we need just one simple type, and a function, which are presented below:

template <typename T>
struct TypeRepresentation
{
       typedef T type;
};

template <typename T>
TypeRepresentation<T> get_type(const T&)
{
       return TypeRepresentation<T>();
}

What is bad here is though we get an appropriate object, we cannot use the member type ‘type’, because we only have an object, but not the type of it:

int a = 5;
get_type(a).type // error

The member types (introduced by using typedef keyword) are only accessible through the containing type name, not an object. So to access ‘type’ we need its qualified name:

TypeRepresentation<int>::type

But to use the name we use the containing type, which we assume we do not know. In this case if we use it, we know that ‘type’ is int so this kind of use is senseless. Imagine that we somehow can use the type having only the object of TypeRepresentation<> type. We then could store this objects in a container, say std::vector<boost::any>. But as soon as we wanted to use the objects, i.e. the member ‘type’, we should convert the object to the original type with any_cast, and … we do not know the original type, otherwise we again would not need the ‘type’ member at all.

The basic idea is: however we try to wrap up the type, we will still come back to it somehow, and as it is not an object, we cannot store it as an object. There are still some cases when we need to know about the types, and we need to store that information somehow. To make it clearer, let us write a simple messenger class. We will provide a registration mechanism to register for messages from concrete types of objects, and also functions to send messages. The most interesting part here is the registration for messages from specified type of sender. So we need to somehow store the type which the registered object wants to listen to.

Before C++11 the only type describing the type was type_info from <typeinfo> header. It can be easily retrieved from any object using operator typeid. Nevertheless, it only was meant for informative purpose, and we cannot construct it ourselves or copy it to store somewhere. So to know whether some object is of wanted type we can compare the results of typeid operator called both for the object and the wanted type, for example,

if (typeid(obj) == typeid(Sender))
{
       // Got it, now do what you wanted to do
}

But still we need to store the type_info object to register the receiver for further consideration, which is impossible. Fortunately, C++11 introduced a new type, type_index, which is actually a wrapper around the type_info, but is both CopyConstructible and CopyAssignable. So now we have a chance to store it. So using std::type_index, we can handle any type-specific registration. The code for the Messenger class follows:

class Messenger
{
private:
       // Prohibit explicit construction and copying
       // My compiler does not support 'delete' and 'default' keywords
       Messenger() {}
       Messenger(const Messenger&);
       Messenger& operator=(const Messenger&);

       std::map<std::type_index, std::list<IReceiver *>> typeToReceivers;
       std::list<IReceiver *> allReceivers;

public:
       static Messenger& Get()
       {
              static Messenger messenger;

              return messenger;
       }

       template <class Sender>
       void RegisterForMessagesFromType(IReceiver *receiver)
       {
              std::map<std::type_index, std::list<IReceiver *>>::
              iterator seeker = this->typeToReceivers.find(
                                std::type_index(typeid(Sender)));
              if (seeker != this->typeToReceivers.end())
              {
                     seeker->second.push_back(receiver);
              }
              else
              {
                     std::list<IReceiver *> newList;
                     newList.push_back(receiver);
                     typeToReceivers.insert(std::make_pair(
                         std::type_index(typeid(Sender)), newList));
              }
       }

       void RegisterForMessages(IReceiver *receiver)
       {
              std::list<IReceiver *>::const_iterator
              seeker = std::find(this->allReceivers.begin(),
                              this->allReceivers.end(), receiver);
              if (seeker == this->allReceivers.end())
              {
                     this->allReceivers.push_back(receiver);
              }
       }

       template <class Sender>
       void UnregisterForMessagesFromType(IReceiver *receiver)
       {
              std::map<std::type_index, std::list<IReceiver *>>::
              iterator seeker = this->typeToReceivers.find(
                                std::type_index(typeid(Sender)));
              if (seeker == this->typeToReceivers.end())
              {
                     return;
              }
              std::list<IReceiver *>::iterator iter =
                  std::find(seeker->second.begin(),
                            seeker->second.end(), receiver);
              if (iter != seeker->second.end())
              {
                     seeker->second.erase(iter);
              }
       }

       template <class Sender>
       void SendMessageFrom(const IMessage& msg, const Sender& from)
       {
              std::map<std::type_index, std::list<IReceiver *>>::
              const_iterator seeker = this->typeToReceivers.find(
                                   std::type_index(typeid(from)));
              if (seeker != this->typeToReceivers.end())
              {
                     std::for_each(seeker->second.begin(),
                                   seeker->second.end(),
                                   [&msg] (IReceiver *rcvr)
                                   { rcvr->Receive(msg); });
              }
       }

       template <class Receiver>
       void SendMessagesTo(const IMessage& msg)
       {
              std::for_each(allReceivers.begin(), allReceivers.end(),
                            [&msg] (IReceiver *rcvr)
                            { if (typeid(*rcvr) == typeid(Receiver))
                                           { rcvr->Receive(msg); } });
       }
};

This class works fine believing my tests. I missed only the IMessage and IReceiver interfaces, here is the code for them:

class IMessage
{
public:
       virtual std::string Get() const = 0;
       virtual void Set(const std::string&) = 0;
};

class IReceiver
{
public:
       virtual void Receive(const IMessage&) = 0;
};

Actually we could register a member function of a predefined signature of the receiver class, not to require the receiver to be derived from IReceiver, but that would complicate the code. You can customize the class, add new members to support filtering by message types, not the senders, or sending to receivers of specified types.

There is one more newly introduced keyword in C++, namely decltype, which retrieves the type of the expression or object, but it has really nothing to do with storing the types, as it would require storing the object to know the type of, and this in its turn implies some limitations on the object’s type.

I hope someone has learnt something new within this article. If you have any idea of how to really store the type, I would be happy to know about it. Thanks for your time.

11 August 2012

RAII: Resource acquisition is initialization

Hello, dear reader. After reviewing my previous post about exception handling many people advised me to write about RAII either. As the latter is not C++-specific only and it actually does not much relate to exception handling I decided to write about it in a separate post. So What is RAII?

RAII stands for Resource Acquisition Is Initialization. It is a programming technique invented by Bjarne  Stroustrup and intended to make the usage of resources more safe in terms of their allocation and deallocation. In C++ after an exception is thrown during program execution the only thing that the standard guarantees to be executed is the destructors of automatic objects allocated on the stack (the destructors of objects with static storage duration are also called and all this is done in std::exit() function which also calls user-defined functions registered with std::atexit(), though these are not the case for RAII). So if we want to allocate a resource to use it somehow, RAII implies its allocation in a class constructor, and deallocation in the destructor. Thus, if anything goes wrong, we can be sure that the resource will be deallocated properly. Let's try this on a simple example.

The most common example used through the literature is the file resource, e.g. a handle of the opened file stream, we also can give this example here, as it does not change the essence. Suppose we need to open an xml file, read some structured data in portions (3 high-level items at a time), process it (e.g. convert to another format, or just deserialize an object from the xml file), and finally close the file. Here is a code snippet for the function which covers all the steps:

MyClass LoadFromXMLFile(const std::string& filePath)
{
       // Create an empty object
       MyClass obj;

       // Open a stream to read from xml file
       std::ifstream in(filePath, std::ios_base::in);

       // Read data in small chunks and fill the object
       FillObjectDataFromStream(obj, in);

       // Close the stream
       in.close();

       // Return the object
       return obj;
}

The dangerous part of this function is FillObjectDataFromStream() function. If it throws an exception the file stream will never be closed (not talking about process termination). So this means that we do not release a resource which is not used already and that is very bad practice.

Now if we want to take advantage of the RAII technique we need a helper class, which will allocate the resource in the constructor and release it in the destructor. Let's name this class FileStreamOpener:

class FileStreamOpener
{
private:
       std::ifstream _in;

public:
       FileStreamOpener(const std::string& filePath)
       {
              _in.open(filePath.c_str(), std::ios_base::in);
       }

       ~FileStreamOpener()
       {
              _in.close();
       }

       std::ifstream& GetStream()
       {
              return _in;
       }
};

Now we need to modify LoadFromXMLFile() function, here we go:

MyClass LoadFromXMLFile(const std::string& filePath)
{
       // Create an object
       MyClass obj;

       // Implicitly open a stream through FileStreamOpener
       FileStreamOpener fso(filePath);

       // Read data in small chunks and fill the object
       FillObjectDataFromStream(obj, fso.GetStream());

       // We don't even need to close the stream
       // as fso will be destructed when the flow
       // goes out of this function's body scope
       // and the stream will be closed.
       return obj;
}

The details are given in comments. Now whenever the destructor of 'fso' object is called, more precisely, during stack unwinding if exception is thrown or after the execution flow leaves the function body scope, the file stream will be closed, i.e. the resource will be properly released. This is the whole idea of the RAII idiom. Thanks for your time.