Pet peeve: the blog post mentions _decimal_ and _denary_ several times, but in reality, there are no decimal numbers involved in any part of the computation.
This is a common mistake made by people who haven't fully internalized the distinction between numbers (which have no intrinsic representation) and _numerals_ which are the encodings of numbers.
When you are converting a permutation to its index, you are not converting to _decimal_, you are converting it to an integer. That integer has no intrinsic base. In Python (and the linked C++ code) that integer will be internally represented in binary (or a base that is a higher power of 2) and the corresponding decimal representation is generated only when printing.
So functions like decimalToFactoradic() should really be named integerToFactoradic(), etc.
> people who haven't fully internalized the distinction between numbers
Or people whose first intuition is that the space measured by 52! permutations, which are painfully obviously based on taking the suit and rank of each card which makes it unique, can be covered with even remote adequacy by considering only the card color, which only narrows down the card to one of two sets of 26.
> This is a common mistake made by people who haven't fully internalized the distinction between numbers (which have no intrinsic representation) and _numerals_ which are the encodings of numbers.
Hey all, I found a cool way to convert text into a specific order of a deck of playing cards. I detailed the instructions of how it works in the blog post but a brief overview would be that it uses Lehmer codes which allow you to uniquely identify each permutation of a set i.e. each of the many many ways a deck of cards can be shuffled/arranged
Someone else mentioned that the orientation of the cards (up or down) and possibly even the front-back facingness of the card (facing up, facing down) would add another 2 possible bits to the available encoding space. (Of course, at that point you'd have to also encode which side of the whole deck is the "top"...)
My own thought was to add par2 to make it robust against small errors... at the cost of some transmission space!
Use a casino decommissioned deck. They typically have either a hole punched in them, or a corner cut off. Either way it won't be symmetric, but still perfectly plausible as a cheap deck of cards.
Yeah that would make an interesting addition. I was thinking about error correction so if you swapped two cards it would be okay but was struggling with how it would work, but I think it would be quite fun to add :)
Good stuff. You could get much better bandwidth than this by tokenizing and using something like a Huffman or arithmetic code on token frequencies. As a simple example, if you set your tokens to be all English words - let's say there are between 500k and 1 million - that's about 9-10 bits per word. I am sure you could do much better than this as well
You can get much better than that by taking a well-known LLM model and encoding a series of offsets from the most likely sequence of tokens, especially if you are OK with the message being slightly different.
Despite appearing to have perfectly ordinary structured static HTML content (aside from the fact that it spams things like <span style="font-weight:bold"> instead of using some basic CSS properly), without JavaScript I only see a solid-colour background.
Anyway, this doesn't offer a whole lot of storage:
$ python -c 'print(sum(__import__("math").log2(x) for x in range(1, 53)) // 8)'
28.0
28 bytes out of ~225 bits, sure. (I compute 228 bits, but precision gets funny with quantities so large. Interestingly, an emulated TI-86 and a real TI-89 Titanium returned different answers; the latter wasn't even divisible by 8!)
For movie-plot '52 pick-up' that's not so bad, especially if used to encode something like a key for the "Solitaire" cryptosystem, mentioned nearby, wherein the same deck can in turn be reconfigured and manipulated to generate an arbitrary-length keystream for application to a longer message transmitted under separate cover.
There are other five bit character sets as well, such as the 5-bit Baudot character set and the 5-bit Zork character set. You could also use variable bits characters, or other bases of numbers.
You can also use other decks, e.g. with tarot cards you will have 78 cards rather than only 52 cards, and can make a longer message.
Other comments on here had mention doing other things such as face-up vs face-down cards.
Only if your deck has a rotationally symmetric back. A lot of decks are oriented with pictures or logos. Tarot decks almost always do to allow inverted readings (and you'd get a few more bits out from the major arcana).
A while ago I made an interactive demonstration of how encoding with factoradic works https://ilcavero.github.io/ seems like I found someone else who thought it was a fun thing to demo
How about just assigning a number to every sentence in every language known to man, and using the absurdly huge number of deck combinations to identify them?
Thanks for your post. It sent me to the 3 hour rabbit hole figuring out how to maximise bandwidth given those 104 card-tokens (52x2 for face up/down cards) in real life. I wanted the solution to be practical for, say, two people in prison cells. So the math should be really simple.
My best attempt so far is assigning about ~20 tokens to most popular words (the, and, you...), another ~30 to popular trigrams (ing, ion, tio...), another ~30 to digrams (th, er, on...) and the rest to single letters. The number of unique tokens to be adjusted to the occurrence frequency of corresponding words/trigrams/digrams in natural language.
If the encoder runs out of a token, she just skips it (assuming the decoder will calculate this token is already used and guessing to add it) up until the point where the words become unrecognisable by the decoder. Encoder is also free to use synonyms to avoid running of tokens fast.
Not quite strict system, but my rough assessment gives it 80-100 letters per deck. Should be enough to plan a jailbreak :)
Pet peeve: the blog post mentions _decimal_ and _denary_ several times, but in reality, there are no decimal numbers involved in any part of the computation.
This is a common mistake made by people who haven't fully internalized the distinction between numbers (which have no intrinsic representation) and _numerals_ which are the encodings of numbers.
When you are converting a permutation to its index, you are not converting to _decimal_, you are converting it to an integer. That integer has no intrinsic base. In Python (and the linked C++ code) that integer will be internally represented in binary (or a base that is a higher power of 2) and the corresponding decimal representation is generated only when printing.
So functions like decimalToFactoradic() should really be named integerToFactoradic(), etc.
> people who haven't fully internalized the distinction between numbers
Or people whose first intuition is that the space measured by 52! permutations, which are painfully obviously based on taking the suit and rank of each card which makes it unique, can be covered with even remote adequacy by considering only the card color, which only narrows down the card to one of two sets of 26.
Pet Peeve: Pointless Pedantry.
Always Adore: Amazing Alliteration.
> This is a common mistake made by people who haven't fully internalized the distinction between numbers (which have no intrinsic representation) and _numerals_ which are the encodings of numbers.
Counterpoint: it doesn’t matter.
Reminds me of the playing-card based encryption system designed by Bruce Schneier for the novel Cryptonomicon .
https://en.wikipedia.org/wiki/Solitaire_(cipher)
Hey all, I found a cool way to convert text into a specific order of a deck of playing cards. I detailed the instructions of how it works in the blog post but a brief overview would be that it uses Lehmer codes which allow you to uniquely identify each permutation of a set i.e. each of the many many ways a deck of cards can be shuffled/arranged
TIL about Lehmer codes... and "poker encoding" ;)
(I just prefer poker to solitaire...)
Someone else mentioned that the orientation of the cards (up or down) and possibly even the front-back facingness of the card (facing up, facing down) would add another 2 possible bits to the available encoding space. (Of course, at that point you'd have to also encode which side of the whole deck is the "top"...)
My own thought was to add par2 to make it robust against small errors... at the cost of some transmission space!
> Of course, at that point you'd have to also encode which side of the whole deck is the "top"...)
An asymmetrical joker could indicate which short edge is "right way up", while also indicating which card is the first or last of the deck.
> "...which side of the whole deck is the 'top'..."
A dark line drawn across the top of the deck would be enough. Though it would ruin the stealth factor of the cards.
Also, the pattern on the back of some playing decks isn't symmetrical, so that could be used as well.
Use a casino decommissioned deck. They typically have either a hole punched in them, or a corner cut off. Either way it won't be symmetric, but still perfectly plausible as a cheap deck of cards.
Decode it both ways and see which isn't gibberish.
Yeah that would make an interesting addition. I was thinking about error correction so if you swapped two cards it would be okay but was struggling with how it would work, but I think it would be quite fun to add :)
Nice! But why just hide one message if you can run an entire encryption algorithm with the deck? :)
https://en.wikipedia.org/wiki/Solitaire_(cipher)
Por que no los dos? One deck has the encrypted message, the other deck has the key.
Good stuff. You could get much better bandwidth than this by tokenizing and using something like a Huffman or arithmetic code on token frequencies. As a simple example, if you set your tokens to be all English words - let's say there are between 500k and 1 million - that's about 9-10 bits per word. I am sure you could do much better than this as well
You can get much better than that by taking a well-known LLM model and encoding a series of offsets from the most likely sequence of tokens, especially if you are OK with the message being slightly different.
https://arxiv.org/abs/2306.04050
https://bellard.org/ts_zip/
That sounds very interesting, I'll look into it thanks :)
Despite appearing to have perfectly ordinary structured static HTML content (aside from the fact that it spams things like <span style="font-weight:bold"> instead of using some basic CSS properly), without JavaScript I only see a solid-colour background.
Anyway, this doesn't offer a whole lot of storage:
28 bytes out of ~225 bits, sure. (I compute 228 bits, but precision gets funny with quantities so large. Interestingly, an emulated TI-86 and a real TI-89 Titanium returned different answers; the latter wasn't even divisible by 8!)
For movie-plot '52 pick-up' that's not so bad, especially if used to encode something like a key for the "Solitaire" cryptosystem, mentioned nearby, wherein the same deck can in turn be reconfigured and manipulated to generate an arbitrary-length keystream for application to a longer message transmitted under separate cover.
https://deckcrypt.github.io/
45 characters according to the blog post and this demo
45 code points in a custom 5-bit encoding representing 32 characters; 28 bytes (with 1 to 4 bits left over) of 8-bit ASCII.
7 characters of UTF-32
There are other five bit character sets as well, such as the 5-bit Baudot character set and the 5-bit Zork character set. You could also use variable bits characters, or other bases of numbers.
You can also use other decks, e.g. with tarot cards you will have 78 cards rather than only 52 cards, and can make a longer message.
Other comments on here had mention doing other things such as face-up vs face-down cards.
I've done Pontifex from Cryptonomicon on a Commodore 64 for fun. Bruce Schneier came up with it. https://imapenguin.com/2021/05/making-and-breaking-ciphers-o...
You should be able to get another 45 bits or so by also using the orientation of the cards (everything except non face card Diamonds).
The 2, 4, 8, and 10 of all suits are typically rotationally symmetrical.
Only if your deck has a rotationally symmetric back. A lot of decks are oriented with pictures or logos. Tarot decks almost always do to allow inverted readings (and you'd get a few more bits out from the major arcana).
You could also add face up or down.
You might squeeze a tag bit for the deck out of the 7◇, depending on the pack design.
>you might squeeze a tag bit for the deck out of the 7◇
awww, youuuuu!! hugz!
"The Seven of Diamonds meaning in a Tarot reading can show that you will be surrounded by love."
(i was looking it up to find what the different cards looked like and found that)
and yes "you and I might squeeze a bit" later
Good to see you're enjoying yourself, whoever you are.
is it just me or that comment unexpectedly lives up to your username
A while ago I made an interactive demonstration of how encoding with factoradic works https://ilcavero.github.io/ seems like I found someone else who thought it was a fun thing to demo
How about just assigning a number to every sentence in every language known to man, and using the absurdly huge number of deck combinations to identify them?
Impractical, but possible.
225 bits
Oh! Duh, the article explicitly said that and I totally missed it as a number. I just thought, "That's only around 28 bytes... That's not a lot."
Thanks.
Encoding with common objects was (and maybe still is) actually used in practice by actual spies, see CIA shoelace code.
Thanks for your post. It sent me to the 3 hour rabbit hole figuring out how to maximise bandwidth given those 104 card-tokens (52x2 for face up/down cards) in real life. I wanted the solution to be practical for, say, two people in prison cells. So the math should be really simple.
My best attempt so far is assigning about ~20 tokens to most popular words (the, and, you...), another ~30 to popular trigrams (ing, ion, tio...), another ~30 to digrams (th, er, on...) and the rest to single letters. The number of unique tokens to be adjusted to the occurrence frequency of corresponding words/trigrams/digrams in natural language.
If the encoder runs out of a token, she just skips it (assuming the decoder will calculate this token is already used and guessing to add it) up until the point where the words become unrecognisable by the decoder. Encoder is also free to use synonyms to avoid running of tokens fast.
Not quite strict system, but my rough assessment gives it 80-100 letters per deck. Should be enough to plan a jailbreak :)