Saturday, July 30, 2016

ArcticCrypt

From 17th to 22nd of July the northernmost crypto workshop ever organized took place in Longyearbyen, Svalbard at latitude 78.13N. There are only about 1300 kilometers missing until the north pole.

Nordenskiöld glacier
Three ECRYPT-NET fellows (Marie-Sarah, Matthias and Ralph) had the opportunity to join the fascinating event.

The talks had a good mixture between talks from invited speakers as well as talks from researchers that submitted a paper. The topics reached from symmetric cryptography to fully homomorphic encryption as well as digital signatures and side channel attacks. The full program can be found here: http://arcticcrypt.b.uib.no/program/.

One of the most interesting talks on Monday, was given by Eik List: POEx: A Beyond-Birthday-Bound-Secure On-Line Cipher. In his talk he presented POEx that reached beyond-birthday-bound security with one call to a tweakable blockcipher and a call to a 2n-bit universal hash function per message block.  He then showed a security proof and gave possible instantiations.

In the evening from Monday/Tuesday at midnight there were two fascinating talks during midnight sun. The first one was given by Ron Rivest on: Symmetric Encryption based on Keyrings and Error correction. The second talk was from Adi Shamir: How Can Drunk Cryptographers Locate Polar Bears.

Adi Shamir explaining how drunk cryptographers can locate polar bears.

On Tuesday, Joan Daemen gave his invited talk on: Generic security of full-state keyed duplex. In his talk he briefly explained sponge constructions and how they can be used for authenticated encryption. Afterwards, he explained how to achieve beyond-birthday-bound security using sponges. In the end, he showed a new core of sponges, the (full state) keyed duplex construction.

On Wednesday, there was a full day of sightseeing planned, where we went on a boat trip to the "ghost town" Pyramiden. We started our boat trip in Longyearbyen, where we saw some minke whale. The captain of the ship told us that he saw also a blue whale a few days earlier. After a while we approached the bird cliffs where many seagulls and puffins were nesting. Birds are very important for the eco system of svalbard, as they exchange life from the water to the main lands. From there we continued our journey to the nordenskiöld glacier, a huge glacier with blueish shining ice. After a whiskey with glacier ice, we continued to our final destination. The "ghost town" pyramiden was a russian settlement and coal-mining community was closed in 1998 and is since 2007 a tourist attraction.
Reindeer
Polar bear
Polar bear warning sign
On Thursday, Marie-Sarah presented her paper: Security of BLS and BGLS signatures in a multi-user setting that was related to her master's thesis at the University of Waterloo. Marie-Sarah was inspired to re-visit it after reading about how the tightness of security reductions for the Schnorr signature scheme played a role in the CFRG's selection of an elliptic-curve based signature scheme for standardization. The standard notion of existential unforgeability under chosen-message attacks is in the single-user setting, since the adversary is given one public key to target.
In the more realistic multi-user setting, the attacker gets all users' public keys and can choose which one to attack. In her paper, she first analysed the BLS (Boneh-Lynn-Shacham) signature scheme's security in a manner similar to what was done for Schnorr - is key-prefixing necessary to maintain unforgeability of signatures in a multi-user setting? Next, she analysed the multi-user security of the aggregate signature scheme BGLS (Boneh-Gentry-Lynn-Shacham). She proposed a security notion in a multi-user setting analogous to the multi-user setting for normal (non-aggregate) signatures, then analysed BGLS's security in this model.

On Friday, Gregor Leander was presenting Structural Attacks on Block Ciphers where he presented invariant subspace attacks. Furthermore, he introduced an improved technique called non-linear invariant attacks.



Matthias, Marie-Sarah and Ralph
The artist of this sign never saw a polar
bear before in his life. Therfore, they told
him to paint a big white dog. 
Santa Clause mailbox

Wednesday, July 13, 2016

Crypto events in Île-de-France

The sunny weather and the general feeling of holiday were not impeding crypto-enthusiasts around Paris to meet and discuss the advancements in this topic. On one hand, the Paris Crypto Day brought together people working on different aspects of cryptography, who are based in the Paris area. The last such meeting was organized by ENS on 30.06.2016 and was fortunate to have Anne Canteaut (INRIA Paris), Leo Reyzin (BU), Victor Shoup (NYU) and Rafael Pass (Cornell) speaking about their research. On July 5-7, Paris also hosted a workshop organized within the HEAT (Homomorphic Encryption Applications and Technology) programme. It was held at Universite Pierre et Marie Curie (a.k.a. Paris 6) and it was composed of six invited talks given by famous researchers within the homomorphic encryption community and ten "regular" talks given by younger researchers and students.


Paris Crypto Day

The first presentation was given by Anne Canteaut on Algebraic Distinguishers against Symmetric Primitives. The talk focused on presenting a unified view about the notions of cube distinguishers and the more recently introduced division property. The aforementioned attacks are based on Knudsen's higher-order differential attacks which exploit properties of the polynomial representation of the cipher. The presentation was very appreciated by the symmetric and asymmetric cryptographers.

Victor Shoup gave a talk about hash proof systems1 and their applications, in which he reviewed definitions, constructions and applications. Hash proof systems can be seen as a family of keyed hash functions $H_{sk}$ associated to a language $L$ defined over a domain $D$. The secret hashing key $sk$ is used to compute a hash value for every input $x \in D$. Magically, there is a second way to compute the same hash value: it uses a projection key $pk$ (derived from the $sk$) and also a witness $w$ for $x \in L$. The original definition of hash proof systems requires that the projection key does not depend on the word $x$, but later, smooth projective hash functions allow for this change. Smooth projective hash functions have found applications, among others, in password authenticated key exchange.

Leo Reyzin from Boston University (joint work with Joel Alwen, Jeremiah Blocki, and Krzysztof Pietrzak), presented an analysis of SCrypt (originally introduced by Colin Percival in 2009 for Tarsnap), a tool whose potential applications include the realization of time-lock puzzles from memory-hard problems. The starting point for their work was the key derivation function in SCrypt. As stated by Leo during the talk, SCrypt is defined as the result of $n$ steps, where each step consists of selecting one of two previously computed values (the selection depends on the values themselves) and hashing them. It is conjectured that this function is memory-hard. The new result shows that in the Parallel Random Oracle Model, SCrypt is maximally memory-hard. One metric used is the product of time and memory used during the execution of SCrypt, for which the authors show the bound must be $\Theta(n^2)$. Interestingly, for a non-constant amount of memory used during the computation (this scenario simulates real applications), a more accurate metric - defined by the sum of memory usage over time - is again proven to be bounded by $\Theta(n^2)$ and this holds even if the adversary is allowed to make an unbounded number of parallel random oracle queries at each step.

The last speaker was Rafael Pass, from Cornell, who gave an gripping talk about the Analysis of the Blockchain Protocol in Asynchronous Networks. During his talk, Rafael defined the notions of consistency and liveness in asynchronous networks. In what followed, he explained his result that proves the blockchain consensus mechanism satisfies a strong forms of consistency and liveness in an asynchronous network with adversarial delays that are a-priori bounded.


HEAT Workshop

The workshop was really interesting because, besides new theoretical advances in the field, many talks were about the practical-side of FHE: how to set the parameters, concrete results in cryptanalysis, libraries and real-world applications. The part about lattice reduction techniques was especially interesting.

In particular, Antoine Joux gave a talk named "The prehistory of lattice-based cryptanalysis" where he reviewed some lattice reduction algorithms (Gauss's algorithm for two dimensions and LLL for higher dimensions) and gave some cryptanalytic results, e.g. Shamir's attack against the knapsack problem and the low-density attack against Merkle-Hellman knapsack. Basically, lattice-reduction aims at finding a "good" basis, made of short and almost orthogonal vectors, from a "bad" one, made of long and non-orthogonal vectors. In fact, with a good basis problems like SVP or CVP become easy and it is possible to break cryptosystems based on these problems. There are algorithms that do this (like the famous LLL) but the conclusion was that lattice-base cryptography remains secure as long as lattices are big enough: in fact, all the lattice-reduction algorithms work well if the dimension is not too high. With higher dimension many problems appear and lattice-reduction remains hard.

Another interesting talk about this kind of topic was "An overview of lattice reduction algorithms" by Damien Stehlé, who pointed out that lattice reduction has mainly two goals: beside the predictable one of cryptanalysing lattice-based cryptosystems (such as NTRU and all those based on SIS and LWE), it is useful for cryptanalysing other cryptosystems as well, like variants of RSA. He then presented the two main algorithms in this field, i.e. BKZ and LLL, and outlined their differences, like the global strategy used by BKZ versus the local one used by LLL. He also introduced faster-LLL2, an improvement of the LLL algorithm which is the subject of one of his most recent works. In the conclusions, he mentioned some open problems and finding a "quantum acceleration" is certainly one of the most interesting ones. In fact, as far as we know, lattice problems are not easier for quantum computers, and this is the reason why they are considered the most promising candidate for post-quantum cryptography.

If someone is into coding, this may be interesting: Shi Bai gave a short talk about FPLLL, an implementation of Floating-Point LLL and BKZ reduction algorithms created by Damien Stehlé. It is a C++ library (also available in Python under the name of FPyLLL) which is also used by the popular Sage. Its goal, as stated by the authors, is to provide benchmarks for lattice reduction algorithms and, more in general, lattice reduction for everyone. More details can be found at https://github.com/fplll/fplll and contributions are welcome!

Besides lattice reduction algorithms, another interesting talk was given by Florian Bourse, who presented a recent work3 about circuit privacy for FHE. The main result is that it is possible to homomorphically evaluate branching programs over GSW ciphertext's without revealing anything about the computation, i.e. the branching program, except for the result and a bound on the circuit's size, by adding just a small amount of noise at each step of computation. This means that the "price" to pay is quite low, especially if compared to other techniques based on bootstrapping. Also, this method does not rely on not-so-well-understood assumptions like circular security and only assumes the hardness of LWE with polynomial modulus-to-noise ratio.


References

1. Cramer R, Shoup V. Universal hash proofs and a paradigm for adaptive chosen ciphertext secure public-key encryption. In Advances in Cryptology— EUROCRYPT 2002, vol. 2332, LNCS. Springer: New York, NY, 2002; 45–64.

2. Arnold Neumaier and Damien Stehlé. Faster LLL-type reduction of lattice bases. ISSAC 2016.

3. Florian Bourse, Rafael Del Pino, Michele Minelli and Hoeteck Wee. FHE circuit privacy almost for free. CRYPTO 2016, to appear.


This blog post has been collaboratively written by Michele and Razvan.

Tuesday, July 12, 2016

The Subset-Sum Problem

The Subset-Sum problem (also known as knapsack problem) we want to review in this blog-post is defined as follows: Given $n, S, a_1, a_2, \dots a_n \in \mathbb{N}$, find \begin{align} I \subseteq [n] = \{1,2, \dots, n\}: \sum_{i \in I} a_i = S. \quad (1) \label{s} \end{align}
Historical Remarks

This old problem was first studied in 1897, the same year the first airborne mission to completely reach the geographical north pole (NP) started (and ended...), and was one of the first proven to be NP-complete -- worst-case instances of this problem are computationally intractable. Subset-Sum was proved to be NP-complete by reducing '3-SAT' to the 'Graph Coloring Problem' which was reduced to 'Exact cover' which was reduced to Knapsack and close variants thereof. These proofs were carried out during the early 1970's rigorous reduction proofs and Subset-Sum problem is featured on Karp's somewhat famous list of 21 NP-complete problems, all infeasible to solve on current computers & algorithms thus a possible basis for cryptographic primitives. In the following table one can see how the expected time/space requirements of algorithms solving (1) in hard cases evolved as the techniques were refined by modern research:

Expected time and space requirements of algorithms solving (1) in average hard instances.
The time and space requirements of current approaches are considerably less than for performing exhaustive search, a generic meet in the middle attack or even a finer combinatorial split into four (or multiple) lists. The currently best known algorithm, asymptotically speaking, is a quantum algorithm. The problem of determining a lower bound of the run-time is still an open research question. It seems more difficult than to see a possible link between the declining polar bear population in the arctic regions and the retreating sea ice accelerated by the observable rapid climate change.

Let us review two classical techniques that led to remarkable speed-ups:

Technique 1 - Meet in the Middle

Schröppel-Shamir: Combining disjoint sub-problems of smaller weight.
Hard instances of the Subset-Sum problem are characterized by relatively large elements $(\log_2 a_i \approx n)$ and a balanced solution, i.e. $|I| \approx \frac n 2$ in Equation (1). Identifying subsets of $[n]$ with length $n$ vectors $x$ over the 'number-set' $\{0,1\}$ via $i \in I \Leftrightarrow x[i] = 1$ one constructs lists $L_1, L_2$ of pairs merged to a solution in $L_0$:
Algorithms based on the birthday-paradox construct expected collisions in the second component of the sub-problems in the lists $L_1, L_2$ forcing any $x \in L_0$ to fulfill (1). The difficulty is to estimate the list-size needed to observe the existence of one solution with high probability. It is desirable to ensure that it is more likely to terminate the algorithm with a non-empty $|L_0| \geq 1$ (i.e. have a solution) than the chance to see a polar bear towards the north-east or meet one in the middle of Svalbard, Norway.

Technique 2 - Enlarge Number Set

BCJ11: Adding length $n$ sub-solutions increases the number-set.
The idea in Howgrave-Graham-Joux (2010) that was later extended by Becker-Coron-Joux (2011) was to allow multiple representations. This comes at the price of enlarging the number-set, i.e. having $$x_0[i] = x_1[i] + x_2[i] \not \in \{0, 1\}.$$ Additionally to the first improvements due to constructing colliding sums, constructing too many potential solutions and introducing a non-trivial filtering step to remove 'inconsistent' ones when merging the two lists $L_1$ and $L_2$ back into one, still gave an overall speed-up asymptotically. 
The number-set used by the authors was $\{-1,0,1\}$, indicating a summand appearing on both sides of Equation (1).
After constructing sufficiently many sub-problems and their respective partial solutions a collision can be expected thus the combination forms a solution for the given instance.  


Applications 
The cryptanalytic methods for structurally approaching the Subset-Sum problem are valuable algorithmic meta-techniques also applicable to other NP-complete problems like lattice- or code-based problems.
Credits: http://fav.me/d3a1n08
Such problems are promising candidates for the construction of post-quantum cryptosystems, cloud security applications and encrypted Polar Bear TV broadcasts. There is a conference focusing on such topics (Polar bears and NP-complete problems) coming up - stay tuned.



PS: The bad image quality, is due to blogger wouldn't let me include vector-graphics like .pdf or .eps nor directly render them giving latex code... :-(

Sunday, July 10, 2016

Workshop on PIR, distributed storage, and network coding at RHUL

Friday at Royal Holloway, there was a one-day workshop on private information retrieval (PIR), distributed storage, and network coding. Four speakers gave talks on topics including coding for distributed storage, coding vs. replication, and multi-server PIR schemes.

Coding for distributed storage

The first speaker, P. Vijay Kumar, gave an overview of coding for distributed storage. When a storage node fails (due to hardware failure or corrupted data, for example), there are two main issues to consider: repair bandwidth (how much data it must download to recover) and repair degree (how many other nodes must communicate with it to help it recover). We learned about which types of coding schemes are used in practice. For some applications, triple replication—without any coding—is the standard solution. The RAID 6 storage architecture uses any maximum distance separable (MDS) code that tolerates 2 failures. Facebook uses HDFS-RAID with one of two erasure codes (blog post by Facebook engineers). HDFS-RAID is a modified instance of the Hadoop distributed file system (HDFS, which replicates data three times) that uses erasure codes to reduce the effective replication of data to about 2. Windows Azure Storage uses Local Reconstruction Codes (LRC) (Microsoft blog post, paper).

PIR and coding

The second speaker, Alex Vardy, presented techniques for PIR that use coding instead of replication.

Private information retrieval (PIR) is relatively new; it was proposed in a 1995 paper by Chor, Goldreich, Kushilevitz, and Sudan. The setting for PIR is as follows. A server has a public, static database of n items (bits, for example) and a user wants to retrieve a certain one without the server knowing exactly which. Ideally, a computationally-unbounded server wouldn't be able to get any information about the user's request: the distribution of queries wouldn't depend on the index of the requested data item. But the only PIR scheme for a single database that satisfies this notion of "perfect privacy" requires the server to send the entire database in response to every query. Instead, there are computational notions of privacy for PIR (where the server is computationally bounded) and for settings with more than one copy of the database, there are other information-theoretic notions.

One interesting aspect of the multi-server PIR setting that Alex presented was that the servers are assumed to be non-colluding. There was some discussion about whether this assumption makes sense, and whether any communication at all should be allowed between the servers. For example, if all but one of the servers receives a query, will they know?

The third speaker, Salim El Rouayheb, presented some coding-based PIR schemes with a different security model: some number b of the nodes are passive adversaries who can collude. We learned about Freenet, a privacy-friendly P2P distributed storage system for sharing files (paper).

Network coding

The last talk, by Tuvi Etzion, was about network coding and its connections to distributed storage and PIR. A network is represented by an acyclic directed graph that can have parallel arcs (edges) and whose arcs each have an associated capacity. We considered two types of multicast networks. In the first, there is one source node that must distribute h messages (field elements) to n receiver nodes. In the second, there are h source nodes with one message each and n receiver nodes that need to receive all of the messages. In the scalar linear setting, each node "transforms" the messages it receives according to its local coding vector. (In the vector linear setting, nodes have local coding matrices.)

An example of a simple linear multicast network. The top two nodes are the source nodes and A and B are the messages they want to send to both of the receiver nodes (the bottom two nodes).

Tuvi shared his conjecture that for a multicast network of the first type with two messages, there is no vector solution that outperforms the optimal scalar solution. (See one of Tuvi's recent papers for related results.) It was neat that properties of multicast networks can be expressed succinctly in terms of graph-theoretic properties, like the size of a minimum cut between source nodes and receiver nodes.

I enjoyed this educational yet relaxed one-day workshop. What I found most interesting was how cloud storage providers and companies like Facebook encode their data. It's clear why they care about even small reductions in effective data replication: they want to maximize availability and tolerate hardware failures while reducing storage and communication costs. As a skeptical cryptographer, I can't help but wonder how they compose these coding schemes with encryption and other cryptographic tools to protect users' privacy.