RSA
Introduction
This post will not focus on the history of RSA or why it is used. There are plenty of resources available that explain this already - see here. Instead we will focus on some of the mathematics around how RSA is implemented, why it is secure, and how it can be implemented in Python.
How it works
RSA is a public-key cryptosystem based on exponentiation modulo a composite prime such that , where and are large primes.
The protocol is as follows:
- Bob selects two large primes and - these are kept private.
- He computes the modulus - this is made public.
- Bob selects a public key such that is coprime with *. Where is computed with either the Totient function or the Carmichael function.
- Bob computes a private key such that .
We will now look at each step in more detail.
- Note: two numbers, and are coprime if .
Step 1 - Generating large primes
This first step in RSA is to generate sufficiently large primes, and . In order for to be 2,048-bits, and must both be 1,024-bits each.
A common way to generate a large prime is as follows:
- Create , a random -bit number (e.g 1,024-bits). It should be the case that is odd as no prime number, with the exception of 2, is even. Thus we select a decimal number in the range up to . We call the prime candidate.
- We can then perform a low-level primality test by checking if is divisible by any of the first prime numbers (this list of primes is precomputed). The value of should be as large as possible. If is divisible by any of these pre-computed primes, the prime candidate is composite and we return to step 1.
- Once we find a that passes this low-level primality test, we can perform a high-level primality test known as the Miller–Rabin primality test. In the case of testing for primality of extremely large numbers, such as those used in RSA, a deterministic method is not feasible in terms of compute power. Instead, the Miller–Rabin primality test uses a probabilistic approach. It tests a prime candidate () for primality many times. Each time the test passes the candidate has a 75% chance of being prime. By performing many iterations, we can increase the probability that such a candidate is in fact prime. Enough tests should be performed such that the probability of the prime candidate being composite is less than . If any of the iterations fail, we return to step 1.
With the above algorithm we can now generate and which are both 1,024-bit primes.
Step 2 - Compute the modulus
Once we have values for and , it is trivial to compute .
Step 3 - Select a public key
We must now select a public key such that is co-prime with .
can be calculated using either Euler’s Totient function or the Carmichael function.
- Using Euler’s Totient function
- Using the Carmichael function
Once is computed, a public key should be chosen such that and are co-prime. Although the value of can be any value provided , typically is set to 65,537. 65,537 is a Fermat Prime. This is useful as given it is prime, it is guaranteed to be co-prime with and it is also very easy to calculate modular exponents that are Fermat Numbers (as in the case of encryption in RSA).
Step 4 - Compute a private key
The private key is calculated such that . This is done by using the Extended Euclidean algorithm.
The Extended Euclidean algorithm calculates and such that . A simple online calculator can be found here.
We can now look at why this works. If we let and we can rewrite the above as:
We know that and are co-prime and thus . Therefore we can rewrite this as:
We then take this modulo to get:
The value of does not matter as it will be eliminated modulo . Thus we are left with:
Here is equal to our private key .
We have now computed all the values required to use RSA for encryption and decryption.
Encryption and Decryption
To encrypt data, the message is broken down into blocks such that each block is smaller than the modulus . Note if and are both 1024-bit primes, will be 2048-bits. Meaning that each block must be shorter than 2048-bits to avoid data loss.
To encrypt data:
- Bob makes his public key and the modulus public.
- Alice encrypts plaintext such that where is the resulting ciphertext.
To decrypt data:
- Bob, the holder of the private key , can now decrypt by where is the recovered plaintext.
Note: RSA is not typically used to encrypt large amounts of data. Instead it is used as part of a key exchange protocol so that Alice and Bob can agree upon a shared secret for use with a symmetric cipher such as AES. This is because RSA is not as performant as some symmetric ciphers which leverage bitwise operations rather than more complex mathematical operations seen here.
Security in RSA
Brute force approach:
Given and are public, if Eve was to know some ciphertext , she could try all plaintext options until she finds an that satisfies . However, this not feasible when is large.
Breaking the cipher:
If Eve was to learn any one of , , , or she can fully break the cipher.
If she learns or , it is trivial to calculate the other values given and is public. From here she can workout all remaining values.
If she learns , she can solve for and given she knows the function and that . This is a case of solving for two unknowns with two equations.
If she learns , the private key, she can decrypt any chipertext easily.
If we assume that a brute force approach is not feasible and that all private values remain private, then the security of RSA relies on Eve being able to factorise into and . This is known as prime factorisation.
Prime factorisation
The best attack for Eve is to factor into and . However, there is no efficient, non-quantum integer factorisation algorithm known when and are sufficiently large. Even the fastest prime factorisation algorithms on the fastest computers will take so much time to make the search impractical. However, knowing and makes computing trivial, this is the one way function used in RSA.
Some algorithms used for prime factorisation include general number field sieve, the quadratic sieve, Fermat's factorization method, and Pollard's rho algorithm. As it stands, for large RSA key sizes (in excess of 1024-bits) no efficient method for computing the factors is known.
Interestingly, a recent paper by Schnorr [6], controversially claimed that a new factoring method “destroys the RSA cryptosystem”. However, it has subsequently been shown not to be the case [7].
RSA in Python
A complete implementation of RSA in Python can be found here. The code should be self explanatory given the information above.
One interesting point that I have not discussed is how to compute efficiently if is large. One way to handle this is by using exponentiation by squaring. In the code you will notice on line 41 of main.py
I compute the ciphertext by raising the plaintext to the power of directly. This is feasible given is relatively small. However, on line 45 of main.py
, I compute the plaintext by using exponentiation by squaring instead as is too large to compute this directly. Python comes with a built in function to handle this - see here.
Besides encryption and decryption of data, RSA is also used for digital signatures. I have not gone into much detail about digital signatures as that is for a different post but I have included a full example in the code attached. You will notice that the only difference is that we use the keys in reverse order. I.e you sign a message with your private key and anyone who has a copy of your public key can verify it was you who signed it.
References
-
https://coderoasis.com/implementing-rsa-from-scratch-in-python/
-
Fast Factoring Integers by SVP Algorithms - Claus Peter Schnorr
Note: these references exclude hyperlinks included throughout the document.
Disclaimer: Do not use this to secure any information. This code is purely to be used for educational purposes only.