Preprocessor – The Token Pasting (##) Operator

This is probably the least understood and most poorly documented preprocessor operator.

The token pasting (##) operator simply eliminates any white space around it and concatenates (joins together) the non-whitespace characters together. It can only be used in a macro definition. It is used to create new tokens.

It may not be the first or last characters in the replacement text.

If there are multiple token pasting operators (##) and / or stringizing operators (#) the order of evaluation is undefined.

Valid tokens are:

  • identifiers (variable names, function names, etc)
  • keywords (int, while, volatile, etc)
  • literals (strings, numbers, characters, true or false)
  • operators and punctuators (+, -, *, (, etc)

If the result of using ## is not a valid token, then the behaviour is undefined.

Consider the following macro definition which uses the ## operator to remove the extra spaces in the replacement string:

#define macro_start i ## n ##t m         ##ain(void)

After preprocessing the name macro_start will be replaced by:

int main(void)

because the preprocessor will have processed the ## by

  1. removing all instances of it,
  2. eliminating white space surrounding it,
  3. concatenating the non-whitespace characters together.

While the token pasting (##) operator can be used with parameterless macros, it doesn’t make any sense to do so: you can just type what you need without using the ## operator to concatenate it.

It’s true power becomes evident when you use it to concatenate the parameters you pass in the macro. (As far as the preprocessor is concerned, everything it handles is just a bunch of text).

Normally, it is used for automatically creating new identifiers. For example,

#define my_macro(x) x##_macrofied

will concatenate the parameter passed (x) and append the suffix _macrofied to it.

my_macro(identifier)

will be expanded into:

identifier_macrofied

Compiler Differences in handling ##

GCC and Visual C++ handle ## ambiguity differently.

GCC is strict if the resulting concatenation is not a valid preprocessing token – it issues an error during compilation.

Visual C++, on the other hand, reprocesses the concatenation result and will accept constructs that are deemed invalid by GCC.

Both compilers are working correctly, since the standard does not define how an invalid token is to be handled (it just says the behaviour is undefined). Rejecting it is fine as is reprocessing the result and parsing it into valid tokens.

Example:

The following will fail with GCC but succeed with Visual C++

#define macro_start int main ## (void)

After the preprocessor has concatenated main and ( we have the token

main(

which is not a valid token.

GCC rejects it.

On the other hand, Visual C++ reprocesses it producing two tokens: 1) an identifier main and 2) the punctuator / operator (.

Both compilers will correctly process the following

#define macro_increment(x) x+ ## +

because it parses into 2 tokens. The first being the identifier x, the second being the operator ++.

Why Use it?

It is generally used to reduce repetitive (thus error prone) typing.

The folowing code sample defines a macro that creates a new scalar type in C / C++ and creates 6 custom named functions for that type (initialization, addition, subtraction, multiplication, division, and raw value access). Without use of the ## operator, this would have to either been typed in (very repetitive, boring and error prone if you are defining many types) or done via a copy / paste / search and replace operation (still repetitive, boring and error prone if you are defining multiple types).

#define new_scalar_type(name, type) \
typedef struct \
{ \
    type value; \
} name; \
inline name name##_(type v) \
{ \
    name t; \
    t.value = v; \
    return t; \
} \
inline name add_##name(name a, name b) \
{ \
    name t; \
    t.value = a.value + b.value; \
    return t; \
} \
inline name sub_##name(name a, name b) \
{ \
    name t; \
    t.value = a.value - b.value; \
    return t; \
} \
inline name mul_##name(name a, name b) \
{ \
    name t; \
    t.value = a.value * b.value; \
    return t; \
} \
inline name div_##name(name a, name b) \
{ \
    name t; \
    t.value = a.value / b.value; \
    return t; \
} \
inline type value_##name(name a) \
{ \
    return a.value; \
} \

When you use the macro in your code:

new_scalar_type(age, int);

you get

  • a new type called age, represented by an int
  • an initializing function age age_(int t)
  • an addition funtion age add_age(age a, age b)
  • a subtraction function age sub_age(age a, age b)
  • a multiplication function age mul_age(age a, age b)
  • a division function age div_age(age a, age b)
  • a raw value access function int value_age(age a)

If you defined a new type, say weight_lbs, then you will get the new type and the associated, custom named, functions that go along with it.

You can now define a new type and use it as follows:

new_scalar_type(age, int);
int main (void)
{
    age a = age_(42);
    age b = age_(24);
    age c = add_age(a, b);
    return value_age(c);
}

Another Example

The following macro (called convert can be used to automatically generate a function that converts from one value to another (for example, °F to °C or feet to inches, etc):

#define convert(from, to, conversion, from_type, to_type) \
to_type convert_##from##_to_##to(from_type f) \
{ \
   return conversion; \
} \

It takes 5 parameters:

  1. from – a descriptive name of the unit we are converting from
  2. to – a descriptive name of the unit we are converting to
  3. conversion – the conversion equation (yes, macro parameters can be complex)
  4. from_type – the type we are converting from
  5. to_type – the type we are converting to

The macro would be used as follows:

convert(f, c, (f-32)*5.0/9.0, float, float);
convert(ft, in, ft * 12, int, int);

When the macros are expanded, we get two functions called convert_f_to_c and convert_ft_to_in. These can be used as follows:

int main (void)
{
    float a = 70.0;
    float b;
    int c = 3;
    int d;
    b = convert_f_to_c (a);
    d = convert_ft_to_in(c);
    return 0;
}

Historical note: some (very) old compilers accepted /**/ for token pasting (this was before the time of the C89 standard). If you are browsing through some old C code, you might find code like this:

#define my_macro(x, y) x/**/y

This worked because the preprocessor eliminates whitespace. If the particular implementation eliminated whitespace after parameter replacement, then it would concatenate the parameters.

I know of no compilers that accept this.