Type punning with unions

Posted on 2014-05-25

TLDR: don’t use unions for type punning; always use memcpy.

Sometimes you might want to reinterpret a value of one type as value of another type. For example you might have an integer parameter, but you know that it actually contains a float value.

Let’s assume the integer was large enough to store the float somehow, and now you want to get the value back. If you follow wikipedia you might do something like this:

1
2
3
4
5
6
7
8
float get_float(int i) {
    union {
        int i;
        float f;
    } x;
    x.i = i;
    return x.f;
}

Usually this works fine; as long as you access the values directly through the union member probably no compiler will screw this up. (There are various claims whether this is actually supposed to work in different language versions and C vs C++.)

But if you start using pointers (or references in C++) this can fail very fast:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
int return0(int *i, float *f) {
    *i = 0;
    *f = 1;
    return *i;
}

int test_union_type_punning() {
    union {
        int i;
        float f;
    } x;
    return return0(&x.i, &x.f);
}

If compiled with gcc (4.9.1) gcc -Wall -O2 -S type-punning.c I get this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
return0:
    .cfi_startproc
    movl    $0, (%rdi)
    xorl    %eax, %eax
    movl    $0x3f800000, (%rsi)
    ret
    .cfi_endproc

test_union_type_punning:
    .cfi_startproc
    xorl    %eax, %eax
    ret
    .cfi_endproc

Although in return0 i and f refer to the same memory location, and the write to f comes after the write to i, the compiler will ignore the write to f. This is called “strict aliasing”; the compiler assumes that a write to a float reference will not modify any integers, and therefore the written integer value is still present.

Even inlining doesn’t help the compiler to see it, and it also doesn’t print a warning.

That is why I always prefer punning with memcpy like this:

1
2
3
4
5
6
7
#include <string.h>

float get_float(int i) {
    float f;
    memcpy(&f, &i, sizeof(f));
    return f;
}

With gcc -Wall -O2 -mtune=ivybridge and clang (same options) both functions generate the same code:

1
2
3
4
5
get_float:
    .cfi_startproc
    movd    %edi, %xmm0
    ret
    .cfi_endproc

So if you have the option avoid unions for type punning. memcpy is safer and makes the intention more clear, and shouldn’t have any performance drawbacks.

References:


Complete story with comments

Some notes on SHA-3

Posted on 2014-05-15

What is SHA-3?

SHA-3 is a family of cryptographic hash functions (also see SHA-3 on Wikipedia). NIST selected Keccak as implementation for SHA-3 in October 2012, and released a draft specifying the details in April 2014. I think the algorithm details are final.

How does it work?

Keccak consists of two components:

  1. Permutation functions Keccak-f for specific state sizes (1600, 800, 400, 200, 100, 50 and 25 bits); for SHA-3 only Keccak-f[1600] is actually used.
  2. A “sponge” construction: a sponge requires a transformation function (like Keccak-f[1600]) and a capacity parameter.

    The sponge maintains a state (as large as the transformation function handles and initialized to zero); the state is divided into a “rate” and a “capacity” part; the rate size is state size - capacity.

    Each input and output block is of rate size; when a block comes in it is bitwise XORed into the “rate” part of the state, and then the transformation is run (this is called “absorbing” a block). To output a block the “rate” part is simply copied into the output block (“squeezing”), followed by a transformation run on the state.

    As the input data is usually not aligned to the block size, it is always padded by appending a 1 bit, then 0 bits until the next block is full but for one bit, and a final 1 bit again, leading to a minimum padding of two bits ("11") and a maximum of (rate + 1) bits; this padding is named pad10*1.

    To generate output of a certain length one just squeezes blocks until the length is reached and truncates if necessary.

This results in the basic Keccak primitive for a capacity of c bits and a certain output length:

Keccak[c](msg, output length) := Sponge[Keccak-f[1600], c](pad10*1(msg), output length)

The output length could be “infinite”, resulting in an endless bit stream (for use in a stream cipher or as PRNG).

Security

An algorithm is said to provide n bits of security (for a specific property) if an attack requires 2^n steps (i.e. requires bruteforcing n bits).

To provide n bits of security (preimage, second preimage and collision) with a sponge construction (assuming the transformation function is secure) the capacity has to be 2*n bits, and the output has to be of length 2*n bits. If the output is only n bits, preimage and second preimage are still n bit secure, but collision security drops to n/2 bits (“birthday paradox”).

# || denotes concatenation of bit strings
SHA3-224(msg) := Keccak[448](msg || 01, 224)
SHA3-256(msg) := Keccak[512](msg || 01, 256)
SHA3-384(msg) := Keccak[768](msg || 01, 384)
SHA3-512(msg) := Keccak[1024](msg || 01, 512)
SHAKE128(msg, output length) := Keccak[256](msg || 1111, output length)
SHAKE256(msg, output length) := Keccak[512](msg || 1111, output length)

All functions use a capacity for a security of n bits, where n is the number in the function name. However the output length of the SHA3-n functions is only n bits, restricting collision resistance to n/2 bits - there is no way around this, as the SHA3-n functions are intended to be used as replacements for the SHA-n functions from the SHA-2 family.

The different suffixes appended to the message before padding are used to distinguish different uses of the same Keccak[c] function (domain separation): the suffix "01" represents the domain of SHA3-n functions which take a simple message. The suffix "11" represents the SHAKE functions which take a Sakura padded message; in Sakura the suffix "11" represents a single message (Sakura also supports tree hashing).

Possible NIST manipulations / History

The original Keccak submission used the same capacity for the SHA3-n functions as are now in the draft. For the variable output length function it used a capacity of 576 bits, leading to a rate of 1024 bits. Also the suffixes were not part of the proposal.

In 2013 NIST proposed to use capacity 256 for SHA3-224, SHA3-256 and SHAKE256, and capacity 512 for SHA3-384, SHA-512 and SHAKE512.

While the parameter selection itself would have been more consistent (the proposed SHA3-512(msg) := Keccak[512](msg, 512) would have been 256-bit secure against everything), this would have reduced preimage and second preimage resistance below what the corresponding functions in the SHA-2 family provided.

Due to the negative feedback in the crypto community NIST announced in November 2013 to pick the parameters from the original submission, and renamed the SHAKE functions to reflect the bits of security instead of capacity.

So there are two changes remaining since the final submission:

  • The suffixes for domain separation (this was proposed by the Keccak team after the submission, not by NIST itself).
  • SHAKE256 (512 bit capacity) is a little bit less secure than the original proposed Keccak[] with a capacity of 576; on the other hand most people agree that aiming for more than 256 bit security is pointless.

    The Keccak authors wrote:

    In the Keccak design philosophy, safety margin comes from the number of rounds in Keccak-f, whereas the security level comes from the selected capacity.

As far as I can see SHA-3 is as trustworthy as the original Keccak submission.

Which SHA-3 hash function to select

If you are not bound to be compatible with SHA-2 functions I’d recommend to use SHAKE256 with 512 bits of output to get 256-bit security instead of SHA3-512 - this is nearly twice as fast (rate of 1088 bit vs 576), and more than 256-bit security is just wasting resources.

If you are only interested in 128-bit collision resistance (because you don’t want to “waste” more space storing the output) SHA3-256 is a good choice - SHAKE128 with 256 bits of output is not much faster (rate of 1344 bits vs 1088), but having 256 bit security for preimage could be worth the cost.


Complete story with comments

Fix ssh command quoting

Posted on 2013-01-30

SSH only takes a simple string as command to send to the remote end 1. In other words, ssh has to concatenate all arguments with a space as separator.

Example:

1
2
$ ssh stbuehler.de echo Hello World
Hello World

In this case, my local ssh program gets the command ['echo', 'Hello', 'World'] from the system, build the command string 'echo Hello World' from, sends it to my server, and the ssh process there will give this command string to the shell, which will expand it into the 3 separate parts again.

You can see the command string if you give ssh the -v option and have a look for the line with debug1: Sending command:.

If I run the “same” command on my local system, I get the same output:

1
2
$ echo Hello World
Hello World

Now lets try something else:

1
2
$ ssh stbuehler.de 'echo Hello World'
Hello World

1
2
$ 'echo Hello World'
bash: echo Hello World: command not found

What did happen now?
With ssh, my local ssh program got ['echo Hello World'] – but the command it sent was the same, so the server printed the same line as before.

But my local shell still sees the quotes around, and won’t split it – that is what quotes are for.

This behaviour allows tricks like this one:

1
2
$ ssh stbuehler.de 'echo *'
[list of the (visible) files in the home directory on the server]

On my local system '*' won’t get expanded as it is quoted, but the remote end doesn’t have the quotes anymore, and the shell will expand it.

So what is the problem?

1
2
$ printf '%-10s %s\n' Hello World
Hello      World

printf is a nice tool to output stuff in a formatted way. Now lets try that with ssh:

1
2
$ ssh stbuehler.de printf '%-10s %s\n' Hello World; echo X
%sn       Hello     World     X

I added the echo X so you can see that my server didn’t even print a newline.

This is not what I wanted to see though – how did this happen?

1
2
3
4
5
6
$ ssh -v stbuehler.de printf '%-10s %s\n' Hello World
[...]
debug1: Sending command: printf %-10s %s\\n Hello World
[...]
$ printf %-10s %s\\n Hello World; echo X
%s\n      Hello     World     X

As it looses the quotes around my arguments (which contained spaces), it breaks the first argument up, which leads to a completely different result.

I think this is a bug – you would expect that a command with ssh works the same way as it does local. For this the command should have been designed as a list of strings in the SSH Protocol. Nobody will fix this now ofc, so we will have to work around that.

At stackoverflow someone had the same problem, and the answers show how to workaround it. But I wanted a new “ssh” program, that would fix this for me. I named it sshsystem, and it works:

1
2
$ sshsystem stbuehler.de printf '%-10s %s\n' Hello World
Hello      World

Yes!

Other fun you can have:

1
2
3
4
5
6
7
8
9
$ ssh stbuehler.de echo Foo ';' echo Bar
Foo
Bar
$ ssh stbuehler.de echo Foo $'\n' echo Bar
Foo
Bar
$ sshsystem stbuehler.de echo Foo $'\n' echo Bar
Foo 
 echo Bar

(The shell on the other side can execute more than one command – either split them with ; or \n – quoted ofc, otherwise your local shell will interpret them)

Source is available at gist.github.com/4672115, stackoverflow and below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
#!/bin/bash

# quote command in ssh call to prevent remote side from expanding any arguments
# uses bash printf %q for quoting - no idea how compatible this is with other shells.
# http://stackoverflow.com/questions/6592376/prevent-ssh-from-breaking-up-shell-script-parameters

sshargs=()

while (( $# > 0 )); do
    case "$1" in
    -[1246AaCfgKkMNnqsTtVvXxYy])
        # simple argument
        sshargs+=("$1")
        shift
        ;;
    -[bcDeFIiLlmOopRSWw])
        # argument with parameter
        sshargs+=("$1")
        shift
        if (( $# == 0 )); then
            echo "missing second part of long argument" >&2
            exit 99
        fi
        sshargs+=("$1")
        shift
        ;;
    -[bcDeFIiLlmOopRSWw]*)
        # argument with parameter appended without space
        sshargs+=("$1")
        shift
        ;;
    --)
        # end of arguments
        sshargs+=("$1")
        shift
        break
        ;;
    -*)
        echo "unrecognized argument: '$1'" >&2
        exit 99
        ;;
    *)
        # end of arguments
        break
        ;;
    esac
done


# user@host
sshargs+=("$1")
shift

# command - quote
if (( $# > 0 )); then
    # no need to make COMMAND an array - ssh will merge it anyway
    COMMAND=
    while (( $# > 0 )); do
        arg=$(printf "%q" "$1")
        COMMAND="${COMMAND} ${arg}"
        shift
    done
    sshargs+=("${COMMAND}")
fi

exec ssh "${sshargs[@]}"

1 The Secure Shell (SSH) Connection Protocol, Starting a Shell or a Command


Complete story with comments
Generated using nanoc and bootstrap - Last content change: 2013-08-16 14:47