Rationale

Pyjamask aims to provide symmetric (authenticated) encryption enjoying fast software implementations with high levels of security against side-channel attacks. To achieve this goal, Pyjamask has been designed to be as lightweight as possible in the presence of high-order masking in software, while still enjoying unmasked and/or hardware implementations with satisfying performances.

Several works have recently shown that the best performances for high-order masked implementations are obtained through the use of bitslicing. In such implementations, the non-linear layer is computed through a sequence of bitwise operations. The performances are then highly correlated to the number of bitwise AND operations.

Pyjamask has been designed to enjoy such fast bitslice implementations in the presence of high-order masking. Specifically, we have favored:

  • a minimal number of 32-AND operations for efficient implementation on 32-bit platforms,
  • a parallelization degree to address 64-bit platforms and/or processor with vector instructions,
  • a design with reasonable performances for unmasked and/or hardware implementations,
  • a design that relies on the well-studied SPN architecture (Sbox layer, linear diffusion layer, and bitwise key addition).

To fulfill these criteria, we have opted for a design based on the following choices:

  • The nonlinear layer is composed of 32 parallel applications of a small Sbox, either a 3-bit or a 4-bit Sbox, which yield two instances of the cipher with either a 96-bit state or a 128-bit state. For each instance, the Sbox has the minimal cost in terms of AND gates, i.e., 3 and 4 respectively. This makes a nonlinear layer that can be evaluated with 3 or 4 bitwise AND operations in total.
  • The 4-bit Sbox enjoys a possible parallelization of the AND gates, namely it can be evaluated with two pairs of parallel AND gates. As a result, the nonlinear layer of Pyjamask-128 can be evaluated with two 64-AND operations in total, which makes it further well suited for 64-bit architectures (or processors with vector instructions).
  • Since linear parts are virtually free in the masking world, the linear layer of the Pyjamask block cipher has been conceived to provide high diffusion by means of 32x32 binary matrices. Different matrices are used for the different 32-bit slices in order to avoid too much regularity. On the other hand, we chose to use circulant matrices to obtain acceptable performances for unmasked and/or hardware implementations.
  • The key-schedule of the cipher has been designed to only involve linear operations for an optimal performances in the presence of masking.