C++ Core Guidelines: Rules for Unions

Contents[Show]

A union is a particular data type where all members start at the same address. A union can hold only one type at a time; therefore, you can save memory. A tagged union is a union that keeps track of its types.

 

Wolpertinger

Here are the four rules for unions.

Let's start with the most obvious rule.

C.180: Use unions to save memory

Because a union can hold only one type at one point at a time, you can save memory. The union will be as big as the biggest type.

 

union Value {
    int i;
    double d;
};

Value v = { 123 };      // now v holds an int
cout << v.i << '\n';    // write 123
v.d = 987.654;          // now v holds a double
cout << v.d << '\n';    // write 987.654

 

Value is a "naked" union. You should not use it according to the next rule.

 

Rainer D 6 P2 540x540Modernes C++ Mentoring

Be part of my mentoring programs:

 

 

 

 

Do you want to stay informed about my mentoring programs: Subscribe via E-Mail.

C.181: Avoid “naked” unions

"Naked" unions are error-prone because you must keep track of the underlying type.

// nakedUnion.cpp

#include <iostream>

union Value {
    int i;
    double d;
};

int main(){
  
  std::cout << std::endl;

  Value v;
  v.d = 987.654;  // v holds a double
  std::cout << "v.d: " << v.d << std::endl;     
  std::cout << "v.i: " << v.i << std::endl;      // (1)

  std::cout << std::endl;

  v.i = 123;     // v holds an int
  std::cout << "v.i: " << v.i << std::endl;
  std::cout << "v.d: " << v.d << std::endl;      // (2)
  
  std::cout << std::endl;

}

 

 The union holds a double in the first iteration and an int value in the second iteration. If you read a double as an int (1) or an int as a double (2), you get undefined behavior.

nakedUnion

 To overcome this source of errors, you should use a tagged union.

C.182: Use anonymous unions to implement tagged unions

Implementing a tagged union is quite sophisticated. In case you are curious, have a look at rule C.182. I will just make it easy and will write about the new C++ standard.

With C++17, we get a tagged union: std::variant. std::variant is a type-safe union. Here is a first impression.

 

// variant.cpp

#include <variant>
#include <string>
 
int main(){

  std::variant<int, float> v, w;       // (1)
  v = 12;                              // v contains int
  int i = std::get<int>(v);            // (2)        
                                       
  w = std::get<int>(v);                // (3)
  w = std::get<0>(v);                  // same effect as the previous line
  w = v;                               // same effect as the previous line

                                       // (4)
  //  std::get<double>(v);             // error: no double in [int, float]
  //  std::get<3>(v);                  // error: valid index values are 0 and 1
 
  try{
    std::get<float>(w);                // w contains int, not float: will throw
  }
  catch (std::bad_variant_access&) {}
 
                                       // (5)
  std::variant<std::string> v("abc");  // converting constructors work when unambiguous
  v = "def";                           // converting assignment also works when unambiguous

}

 

In (2), I define the two variants v and w. Both can have an int and a float value. Their initial value is 0. This is the default value for the first underlying type. v becomes 12. std::get<int>(v) returns the value using the type. Line (3) and the following two lines show three possibilities to assign the variant v the variant w. But you have to keep a few rules in mind. You can ask for a variant's value by type or index. The type must be unique and the index valid (4). If not, you will get a std::bad_variant_access exception. If the constructor or assignment call is unambiguous, a conversion occurs. This is why it's possible to construct a std::variant<std::string> with a C-string or assign a new C-string to the variant (5).

C.183: Don’t use a union for type punning

At first, what is type punning? Type punning is the possibility of a programming language intentionally subverting the type system to treat a type as a different type. One typical way to do type punning in C++ is to read the member of a union with a different type from the one with which it was written.

What is wrong with the following function bad?

union Pun {
    int x;
    unsigned char c[sizeof(int)];
};

void bad(Pun& u)
{
    u.x = 'x';
    cout << u.c[0] << '\n';       // undefined behavior (1)
}

void if_you_must_pun(int& x)
{
    auto p = reinterpret_cast<unsigned char*>(&x);   // (2)
    cout << p[0] << '\n';                            // OK; better 
// ...
}

 

Expression (1) has two issues. First and foremost, it's undefined behavior. Second, the type punning is quite challenging to find. If you have to use type punning, do it with an explicit cast such as reinterpret_cast in (2). With reinterpret_cast you have at least the possibility to spot your type punning afterwards.

What's next?

Admittedly, this final post on rules for classes and class hierarchies was a bit short. with the next post, I will write about the next significant section: enumerations.

 

 

 

 

Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Animus24, Jozo Leko, John Breland, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Robert Blanch, Truels Wissneth, Kris Kafka, Mario Luoni, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, Peter Ware, Daniel Hufschläger, Alessandro Pezzato, Bob Perry, Satish Vangipuram, Andi Ireland, Richard Ohnemus, Michael Dunsky, Leo Goodstadt, John Wiederhirn, Yacob Cohen-Arazi, Florian Tischler, Robin Furness, Michael Young, Holger Detering, Bernd Mühlhaus, Matthieu Bolt, Stephen Kelley, Kyle Dean, Tusar Palauri, Dmitry Farberov, Juan Dent, George Liao, Daniel Ceperley, Jon T Hess, Stephen Totten, Wolfgang Fütterer, Matthias Grün, Phillip Diekmann, Ben Atakora, Ann Shatoff, and Rob North.

 

Thanks, in particular, to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, Sudhakar Belagurusamy, Richard Sargeant, Rusty Fleming, John Nebel, Mipko, Alicja Kaminska, and Slavko Radman.

 

 

My special thanks to Embarcadero CBUIDER STUDIO FINAL ICONS 1024 Small

 

My special thanks to PVS-Studio PVC Logo

 

My special thanks to Tipi.build tipi.build logo

 

My special thanks to Take Up Code TakeUpCode 450 60

 

Seminars

I'm happy to give online seminars or face-to-face seminars worldwide. Please call me if you have any questions.

Bookable (Online)

German

Standard Seminars (English/German)

Here is a compilation of my standard seminars. These seminars are only meant to give you a first orientation.

  • C++ - The Core Language
  • C++ - The Standard Library
  • C++ - Compact
  • C++11 and C++14
  • Concurrency with Modern C++
  • Design Pattern and Architectural Pattern with C++
  • Embedded Programming with Modern C++
  • Generic Programming (Templates) with C++

New

  • Clean Code with Modern C++
  • C++20

Contact Me

Modernes C++,

RainerGrimmDunkelBlauSmall

Tags: Classes, union

Comments   

0 #1 Balázs Benics 2017-11-20 20:58
Hi there,

I'm not sure with the last example. I don't think that it contains any UBs.
Let's see why:

At the 'void bad(Pun& u)' function
--------------------------------------
You activated the integer member of the 'u' union by assigning the ascii value of 'x' character.
After that we access the union's value through a different (non-active) union member. which is really UB in a strict fashion, but if we assume the language extension which is widely used and offered by the most of the compilers, than we can access the value through a non-active union member.
(Keep in mind, that ANY type can accessed through [signed/unsigned] char type, so it's also true for this line. If we would access through a non std::byte, non char type, than it would be really an UB)

The only problem that I can see there is that we don't know that the System is Big-endian, or Little-endian so two possible output can be there, but none of them is UB.

At the other function, the standard is clear about that, and It seems to be right as you wrote.
But the outcome is still depends on endianness, I think.

But please check it, and tell me if I'm wrong.

Thank you in advance.

ps: btw nice article
Quote
0 #2 Shafik Yaghmour 2017-12-29 15:33
I would hesitant to show a type punning example using reinterpret_cast although it may be valid in your specific example since you are using unsigned char which is allowed to alias, it is not valid in general.

We have a well-defined alternative which is via memcpy and in type punning cases it should optimize away. See the details in my Stackoverflow answer here: https://stackoverflow.com/a/31080901/1708801

We may eventually get bit_cast which would do away with the need for memcpy: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0476r2.html also see implementation here: https://github.com/jfbastien/bit_cast

PS my first comment attempt got cut short :-(
Quote

Stay Informed about my Mentoring

 

Mentoring

English Books

Course: Modern C++ Concurrency in Practice

Course: C++ Standard Library including C++14 & C++17

Course: Embedded Programming with Modern C++

Course: Generic Programming (Templates)

Course: C++ Fundamentals for Professionals

Course: The All-in-One Guide to C++20

Course: Master Software Design Patterns and Architecture in C++

Subscribe to the newsletter (+ pdf bundle)

All tags

Blog archive

Source Code

Visitors

Today 4371

Yesterday 6193

Week 10564

Month 32238

All 12110447

Currently are 176 guests and no members online

Kubik-Rubik Joomla! Extensions

Latest comments