Compiler Explorer, PVS-Studio, and Terrible Simple Bugs

Contents[Show]

Do you know that PVS-Studio is integrated into the Compiler Explorer? If not, you should definitely read this guest post from Andrey Karpov including a promo code. 

 

In this article, we'll talk about Compiler Explorer (godbolt.org), its integration with the PVS-Studio static analyzer, and bugs. Whatever your case, you'll surely find the information below helpful. If you haven't tried Compiler Explorer yet, this post will help you get started with this interesting and useful online service for experimenting with code compilation. And if you have, you'll get a few insights into some possible ways of using it. We'll be talking about simple yet serious bugs and ways to fight them. Make yourself some tea or coffee and let's get started. This article may inspire you to both ponder over some of the ideas I'll share and play around with the tools mentioned.

CompilerExlorer

Compiler Explorer

Compiler Explorer – is an interactive tool that lets you type code in one window and see the results of its compilation in another window. There you can explore warnings/bugs and assembly code generated by the compiler, as well as view the results of running the program. The tool supports Ada, D, Fortran, Python, and other programming languages. But it is support of C and C++ that we're interested in right now.

If you're interested, there's a Compiler Explorer C++-channel in Slack. You can join it by getting an automatic invite.

One important and useful feature of Compiler Explorer is the ability to create a permanent link to a "snapshot" of your current work, with all typed code and positions of opened windows preserved. For instance, this link will take you to a page containing the program text, assembly code, and the output of the running program. These links can be conveniently shared or inserted into blog posts.

Introduction

Compiler Explorer & static analysis

You may have already known all of that. But there's more. Compiler Explorer is evolving and gradually turning into something bigger than just a regular online compiler. Its list of additional tools ("Add tool") is growing, enabling developers not only to compile source code but also see the results of checking it with static analyzers such as clang-tidy and PVS-Studio.

Of course, the tool should be used sensibly – do not expect the online service to be able to check a large project comprised of many files. A more natural approach in that case would be to use an installed version of the analyzer. Another thing you don't want to do is to evaluate analyzers' abilities based on how they perform on synthetic examples in Compiler Explorer because such tests are very likely to be inadequate.

In all other respects, Compiler Explorer is a very cool tool. You are free to play with analyzers as more tools become available. Check if and how a certain bug is reported by one analyzer or another. This may be promising from the perspective of both satisfying your curiosity and conducting research.

Compiler Explorer & education

And there's still more. If you are a teacher or a student, this section is for you. Compiler Explorer can be viewed as an environment where students can do simple lab assignments. Sure, this service has its limitations, but as far as one-file lab tasks are concerned, it's a perfect tool.

Just type the code, compile it, and watch the console output. Create a link to your work and share it with your teacher for assessment. You don't need to install anything on your computer to do that kind of tasks. You can actually complete and share your assignment from any terminal, even your smartphone (though it might be less convenient). I'm sure students will appreciate this service, especially during the period of exams :).

Add to this the ability to immediately check the code with static analyzers. Many bugs/typos will be caught right off, while you will get an opportunity to teach yourself the good practice of using static analyzers.

Here's an example to show you how it's done in practice. Suppose you are studying arrays and loops and need to write a matrix-transpose program – here's what you write.

 ProgramBug

As you can see, the program compiles and even starts. But it does something weird and won't print the modified matrix.

Good news is that you have PVS-Studio's warning about a potential typo in the bottom right corner. Indeed, the variable that gets incremented in the inner loop is i, not j. This typo leads to an infinite loop. By fixing the code we have the program work as expected.

 ProgramBugFixed

If you are a teacher, take note of this scenario of doing and assessing lab assignments – it may come in handy. This practice will also help your students get started with the static code analysis methodology.

Note. This approach won't be as convenient with more complex lab tasks or term papers, of course. In that case, you can install PVS-Studio on your computer and use a free license. See the "Free PVS-Studio for Students and Teachers" section for details.

In the next section, we'll talk about bugs and find out why we need to use static code analyzers.

C++ developers underestimate simple bugs

If you asked a hypothetical C++ programmer what bugs they thought were most common, they would most likely name null pointers, division by zero, undefined behavior, array-index-out-of-bounds, and uninitialized variables. But in the real world, things are quite different. That list is rather a collection of error patterns that we all have learned about during our study of programming or from books. In practice, however, it's quite other types of bugs that take a lot of effort and time to find and fix and that, for some reason, still don't get much attention in discussions. Let's take a closer look at this interesting subject.

As was already said, our hypothetical programmer would most likely mention division-by-zero and uninitialized variables as the most pestering bugs. They do occur, of course, but the frequency of their occurrence is exaggerated to say the least.

The nature of our work implies checking tons of open-source projects. And it's my experience of examining projects that enables me to argue that bugs related to divisions by zero and accessing uninitialized memory are pretty rare.

First, there aren't many division-related bugs because you don't often use the division operation in the first place :). Second, everybody knows about these bugs and is usually careful to check the divisor value. Third, compilers are getting better and better at finding such bugs.

It's about the same thing with uninitialized variables. This pattern is well known, and developers generally write code that's clean in that respect. Fortunately, most agree that the scope of a variable should be as small as possible and that initializing a variable along with its declaration is good style. Besides, today's compilers are smart and will likely warn you about uninitialized variables.

So it turns out those types of bugs aren't to be found really often in real-life projects. Then why do they come up so frequently in discussions? Perhaps those were the bugs that programmers were running into as beginners. Accidental use of uninitialized variables in one's very first programs is a pretty plausible scenario. And it's the very first experience that usually leaves the most lasting mark :). An alternative explanation is that those concerns date back to the old books written in times when even less safe programming languages were in use and compilers were too bad at detecting potential defects.

Okay, but what about undefined behavior, null pointers, and array-index-out-of-bounds? These bugs are very common indeed and generally hard to catch, which makes programmers' concerns justifiable. Just look how many bug patterns based on null pointers alone we have collected across open-source projects: V522, V595, V757, V769, V1004, etc. Developers expect these issues to show up and they do. Let's move on to the most interesting category: the bugs which nobody talks about but which are awfully rife in programs.

I insist that there are entire classes of bugs which are the real bane but which don't get proper attention. I'm talking about typos.

You might say it's not a big deal. You surely know about typos, don't you? What's so secret about them? No secret, but this problem is hugely underestimated. If you want proof, just look at how test sets for evaluating analyzers are composed. I already wrote about that in the article "Why I Dislike Synthetic Tests". For example, Toyota ITC provides for all kinds of null-pointer bugs and even unnatural defects:

void null_pointer_006 ()

{

  int *p;

  p = (int *)(intptr_t)rand();

  *p = 1; /*Tool should detect this line as error*/

          /*ERROR:NULL pointer dereference*/

}

You don't believe, do you, that a bug like that will ever happen in real code. But Toyota developers still make tests for such anomalies. At the same time, you won't find tests for typos like this:

static void gen_prov_ack(....)

{

  ....

  if (link.tx.cb && link.tx.cb)

  ....

}

This is an absolutely real bug – we found it in the Zephyr operating system.

Developers don't talk much about typos. This probably accounts for the problem of typos being less clearly definable than, say, that of null pointers. A null pointer is something you know and understand. If there's a path in your program where a pointer becomes null and gets dereferenced, this can be easily explained in a book, and you know how to find such bugs with static analysis.

Now, what's a typo? Hmm... Good question. This could be a misspelled variable name, messed up parentheses, incorrect array index, misused copy-paste, and even confusion between & and &&. We have thousands of such bug patterns in our database. I'm not actually sure to what articles exactly to refer you to, but this list here contains all those infinite variations of typos.

I guess it's because typos can't be clearly defined and diagnosed that they keep escaping the attention of book authors and analyzer developers. Ordinary programmers seem to ignore them too for their simplicity. They just won't take a typo seriously. "It's just a slip-up – it won't happen again!" With this mindset, they keep coding only to make another mistake and then waste time hunting it down :). And while we are at it, here's a link to the small post "The second myth - expert developers do not make silly mistakes" :).

Carelessness and lack of concentration leading to typos eventually lead to an incredible amount of time wasted hunting them. To back up that statement, I refer you to three articles of mine, each of which is dedicated to some particular class of typos:

  1. Zero, one, two, Freddy's coming for you;
  2. The Evil within the Comparison Functions;
  3. The Last Line Effect.

Do read them and your life will never be the same again. Typos – typos are everywhere, and they are not that harmless to the development process as you might think.

What are you to do with this newly acquired knowledge? Well, I can share at least four useful tips.

First: knowledge is power!

Now that you are aware of the threat, you are going to stay alert to typos from now on, which is already a big step forward. You'll be more likely to notice them when reviewing code of your teammates. Especially in the last lines :).

Second: table-style formatting

Bad formatting prevents many typos from being noticed when coding and reviewing the code. Just rewrite your code in columns or table-style and bugs will stick out. Here's an example to illustrate that. This is a bug detected in FreeBSD Kernel with the PVS-Studio analyzer:

MPASS(reg == A_TP_RXT_MIN || reg == A_TP_RXT_MAX ||

    reg == A_TP_PERS_MIN || reg == A_TP_PERS_MAX ||

    reg == A_TP_KEEP_IDLE || A_TP_KEEP_INTVL ||

    reg == A_TP_INIT_SRTT || reg == A_TP_FINWAIT2_TIMER);

Now let's format this code as follows:

MPASS(reg == A_TP_RXT_MIN ||

      reg == A_TP_RXT_MAX ||

      reg == A_TP_PERS_MIN ||

      reg == A_TP_PERS_MAX ||

      reg == A_TP_KEEP_IDLE ||

      A_TP_KEEP_INTVL ||

      reg == A_TP_INIT_SRTT ||

      reg == A_TP_FINWAIT2_TIMER);

The incorrect condition with the A_TP_KEEP_INTVL constant stands out clearly now.

I'd arrange it even more neatly:

MPASS(reg == A_TP_RXT_MIN   ||

      reg == A_TP_RXT_MAX   ||

      reg == A_TP_PERS_MIN  ||

      reg == A_TP_PERS_MAX  ||

      reg == A_TP_KEEP_IDLE ||

      A_TP_KEEP_INTVL       ||

      reg == A_TP_INIT_SRTT ||

      reg == A_TP_FINWAIT2_TIMER);

Such formatting takes time though. So I borrowed another table style for code formatting that prescribes writing logical operators separating conditions at the beginning of a line:

MPASS(   reg == A_TP_RXT_MIN

      || reg == A_TP_RXT_MAX

      || reg == A_TP_PERS_MIN

      || reg == A_TP_PERS_MAX

      || reg == A_TP_KEEP_IDLE

      || A_TP_KEEP_INTVL

      || reg == A_TP_INIT_SRTT

      || reg == A_TP_FINWAIT2_TIMER);

This style may feel somewhat unusual at first, but you'll get used to it quickly and find it very useful. Our team actually uses it in the code of PVS-Studio. So I do recommend adopting it – you'll see bugs pop up at the coding stage already.

I elaborated on this approach to formatting in Chapter 13 [Table-style formatting] in my mini-book "The Ultimate Question of Programming, Refactoring, and Everything".

Third: table-driven methods are good

If you deal with a lot of similar code blocks, you will probably find it best to program an algorithm using table-driven methods. These are nicely explained by Steven McConnell in his book "Code Complete" (Chapter 18):

A table-driven method is a scheme that allows you to look up information in a table rather than using logic statements (if and case) to figure it out. Virtually anything you can select with logic statements, you can select with tables instead. In simple cases, logic statements are easier and more direct. As the logic chain becomes more complex, tables become increasingly attractive.

Actually, I strongly recommend reading this wonderful book in full. If you are scared off by its size, then please do read this Chapter 18 at least. And then take the trouble to write a function of your own using table-driven methods. Yes, this will take a bit more time but it's going to pay off in the end. Adding new conditions will become easier and faster, while the risk of making a mistake much less.

Fourth: use code analyzers

A static code analyzer is a diligent and never-tiring assistant capable of finding typos at the earlier stages. The key here is to use it regularly. Rather than running it from time to time, integrate the analyzer into your development process. Should you meet a skeptic, refer them to this article, where we counter the typical arguments against static analysis.

Thanks for reading. I hope I've managed to get you to look at software bugs and code review from a new perspective. Welcome to try our code analyzer PVS-Studio. It can become your biggest helper in typo search, allowing you to review code focusing on bugs in algorithms and high-level examination of the code rather than hunting out a misplaced parenthesis.

Get a month free trial of the PVS-Studio analyzer using the #mcpp promo code.

 

 

Thanks a lot to my Patreon Supporters: Meeting C++, Matt Braun, Roman Postanciuc, Venkata Ramesh Gudpati, Tobias Zindl, Marko, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Darshan Mody, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Animus24, Jozo Leko, John Breland, espkk, Wolfgang Gärtner, Jon Hess, Christian Wittenhorst, Louis St-Amour, Stephan Roslen, Venkat Nandam, Jose Francisco, Douglas Tinkham, Lakshman, Kuchlong Kuchlong, Avi Kohn, Serhy Pyton, Robert Blanch, Kuma [], Truels Wissneth, Kris Kafka, Mario Luoni, Neil Wang, and Friedrich Huber.

 

 

Seminars

I'm happy to give online-seminars or face-to-face seminars world-wide. Please call me if you have any questions.

Standard Seminars 

Here is a compilation of my standard seminars. These seminars are only meant to give you a first orientation.

Contact Me

Modernes C++,

RainerGrimmSmall

My Newest E-Books

Course: Modern C++ Concurrency in Practice

Course: C++ Standard Library including C++14 & C++17

Course: Embedded Programming with Modern C++

Course: Generic Programming (Templates)

Course: C++ Fundamentals for Professionals

Subscribe to the newsletter (+ pdf bundle)

Blog archive

Source Code

Visitors

Today 6745

Yesterday 10139

Week 37786

Month 6745

All 4627639

Currently are 180 guests and no members online

Kubik-Rubik Joomla! Extensions

Latest comments