Cryptography: Solution to Cipher 1

Recall that our original text is:

Cryptogram 1

In solving such a message, there are several things we start by examining. First and foremost is the frequency of each letter in the message. In total, there are 292 letters in the message. Their frequency is as follows:

F - 43, 14.7%
P - 33, 11.3%
Y - 33, 11.3%
H - 27, 9.2%
Q - 25, 8.6%
O - 25, 8.6%

M - 16, 5.5%
K - 15, 5.1%
T - 12, 4.1%
G - 11, 3.8%
C - 8, 2.7%
D - 7, 2.4%

S - 7, 2.4%
U - 7, 2.4%
A - 6, 2.1%
I - 5, 1.7%
L - 4, 1.4%
W - 4, 1.4%

E - 2, 0.7%
X - 1, 0.3%
R - 1, 0.3%
B - 0, 0.0%
Z - 0, 0.0%
N - 0, 0.0%

Another point to note is which letters end words. Although 21 of the 24 letters appear in our cipher, only nine occur as the final letter of a word. The list below shows these words, with their frequency:

G - 6
D - 3
H - 6

I - 1
M - 11
P - 6

T - 3
F - 9
Y - 12

Finally, we look at short words. There is only one single-letter word in our message: D.

There are seven two-letter words:

HF
QG (x2)

QD
IP (x2)

LD
OT (x2)

PM (x3)

Armed with this data, we get to work. We start with the frequency table for Biblical Greek. Based on the UBS text of Matthew, the frequency of the various letters is:

A -- 11.0%
E -- 10.1%
O -- 10.1%
I -- 9.5%
N -- 8.2%
S -- 7.6%
T -- 7.4%
U -- 6.0%
H -- 3.9%
R -- 3.3%
K -- 3.3%
W -- 3.2%
P -- 3.1%
L -- 2.8%
M -- 2.6%
D -- 2.0%
Q -- 1.7%
G -- 1.6%
B -- 0.6%
C -- 0.6%
F -- 0.6%
X -- 0.3%
Z -- 0.2%
Y -- 0.1%

There are several ways to start our attack. One is to work directly with the above table of letter frequencies. This is usually the best approach in English or German, where the letter "E" predominates so much that it will be the most common message in almost any monoalphabetic cipher where the sample exceeds 100 letters. But it will be obvious that this is not true in Greek. A E O I are all almost tied, with N S T close enough that one of them might be more common in a short message than one of the big four.

An easier line of attack, in this case, is the last letters. Two letters -- N S -- are overwhelmingly the most common terminal letters for Greek words. And we note that there are two letters which are overwhelmingly the most common in our terminal letters list: M Y. Thus it is highly likely that one of these represents N and the other S.

(This approach, incidentally, has had great use in Biblical linguistics, in the deciphering of Ugaritic. Hans Bauer's attack on the Ugaritic alphabet started with the assumption that it was a Semitic language, and that this regulated which letters were to be found at the beginnings and ends of words. That let him make up a short list of possible meanings for several letters, which could be tested -- whereupon many more fell into his lap. Several rounds of this broke the Ugaritic alphabet, and then it was a matter of figuring out the language -- no easy task, to be sure, but a lot easier than it was when the alphabet was unknown!)

Then we look at our two-letter words. The most common is PM. Among the most common words in Biblical Greek is EN. And P is the second-most-common letter in our sample. So it's a pretty good bet that PM is EN. Which, incidentally, gives us another likely word: DE is another common word, and we see IP occurs twice in our two-letter-word list.

But if
M stands for N, then the other letter common at the end of words, Y, must be S. So let's assume
P --> E,
M --> N
I --> D
Y --> S

That makes our message appear as this (note: we will show "solved" letters in UPPER CASE, unsolved in lower):

hDEkwof ld chfDfh ufNESeE qhfS wsESfN hkkh qd ahafh NdcfhxEqE qhfS DE wsESfN qEkEfof ufNESeE EN qg Nolg uEushcqhf oqf EN EqEsoukgSSofS ahf EN rEfkESfN EqEsgN khkdSg qg khg qotqg ahf otD otqgS EfShaotSoNqhf lot kEuEf atsfoS gSqE hf ukgSShf EfS SdlEfoN EfSfN ot qofS cfSqEtotSfN hkkh qofS hcfSqofS d DE csowdqEfh ot qofS hcfSqofS hkkh qofS cfSqEtotSfN

At this point, things start to get tricky. But note that interesting word wsESfN which occurs twice in the first line or two. That looks very much like a verb. Can we, then, do something with that letter f? We know it has to be a vowel. It isn't e, because we've assigned that. It could be h or w. But observe that f is the most common letter in our frequency list. h and w aren't common enough. u is unlikely in such a situation. Our choice is between a, i, and o.

But note that nine words end with f, and that two of them are ahf -- a rare letter followed by two common letters. Note also that there are only seven three-letter words in the whole message, and that two of them are ahf. The obvious conclusion? ahf represents KAI -- the most amazing thing being that it shows up only twice in the message!

That gives us three more letters, and the following version of the message:

ADEkwoI ld cAIDIA uINESeE qAIS wsESIN AkkA qd KAKIA NdcIAxEqE qAIS DE wsESIN qEkEIoI uINESeE EN qg Nolg uEusAcqAI oqI EN EqEsoukgSSoIS KAI EN rEIkESIN EqEsgN kAkdSg qg kAg qotqg KAI otD otqgS EISAKotSoNqAI lot kEuEI KtsIoS gSqE AI ukgSSAI EIS SdlEIoN EISIN ot qoIS cISqEtotSIN AkkA qoIS AcISqoIS d DE csowdqEIA ot qoIS AcISqoIS AkkA qoIS cISqEtotSIN

We don't have much in the way of complete words yet, but the form of what we're seeing looks good. This looks like Greek. That's promising.

From here we have several ways we could proceed. For example, look at that word AkkA, which occurs three times. There aren't many words which fit this pattern. You could argue that it's ABBA or ANNA -- but what are the odds of those words three times in a short message that's worth encrypting? A much better bet is ALLA.

And then look at all those words like qoIS and qAIS. This is a strong indication that q is t -- and hence that o in fact represents itself. (Note that having an occasional letter represent itself does not represent a weakness in the cipher; in fact, not allowing a letter to represent itself is a weakness, because it reduces the number of possible ciphers). If we make those changes, we have:

ADELwOI ld cAIDIA uINESeE TAIS wsESIN ALLA Td KAKIA NdcIAxETE TAIS DE wsESIN TELEIOI uINESeE EN Tg NOlg uEusAcTAI OTI EN ETEsOuLgSSOIS KAI EN rEILESIN ETEsgN LALdSg Tg LAg TOtTg KAI OtD OtTgS EISAKOtSONTAI lOt LEuEI KtsIOS gSTE AI uLgSSAI EIS SdlEION EISIN Ot TOIS cISTEtOtSIN ALLA TOIS AcISTOIS d DE csOwdTEIA Ot TOIS AcISTOIS ALLA TOIS cISTEtOtSIN

We're really almost there. We have quite a few unidentified letters -- but almost three-quarters of the message text is cracked, and we can easily figure out most of the remaining letters from context. For example, the first word is obviously adelfoi, so w is F. Also, it's quite clear that the combination Td is th. That, by elimination, means that Tg is tw. Consider, too, the phrase OtD Ot. Clearly t stands for u.

At this point we have:

ADELFOI lH cAIDIA uINESeE TAIS FsESIN ALLA TH KAKIA NHcIAxETE TAIS DE FsESIN TELEIOI uINESeE EN TW NOlW uEusAcTAI OTI EN ETEsOuLWSSOIS KAI EN rEILESIN ETEsWN LALHSW TW LAW TOUTW KAI OUD OUTWS EISAKOUSONTAI lOU LEuEI KUsIOS WSTE AI uLWSSAI EIS SHlEION EISIN OU TOIS cISTEUOUSIN ALLA TOIS AcISTOIS H DE csOFHTEIA OU TOIS AcISTOIS ALLA TOIS cISTEUOUSIN

From the second word it would appear that l is m, and checking the remaining words seems to confirm this. Again, it seems clear that s is r and c is p. Making those changes, we get:

ADELFOI MH PAIDIA uINESeE TAIS FRESIN ALLA TH KAKIA NHPIAxETE TAIS DE FRESIN TELEIOI uINESeE EN TW NOMW uEuRAPTAI OTI EN ETEROuLWSSOIS KAI EN rEILESIN ETERWN LALHSW TW LAW TOUTW KAI OUD OUTWS EISAKOUSONTAI MOU LEuEI KURIOS WSTE AI uLWSSAI EIS SHMEION EISIN OU TOIS PISTEUOUSIN ALLA TOIS APISTOIS H DE PROFHTEIA OU TOIS APISTOIS ALLA TOIS PISTEUOUSIN

At this point I'm not even going to bother any more. You should be able to figure out the rest for yourself. The solution is from 1 Corinthians 14:20-22:

Solution to Cryptogram 1

Thus the complete complete key is (note: I've included the three letters not found in the above sample of text, since I didn't know I wouldn't be using them):

As a footnote: It always seems, when I read one of these examples, that the person solving the cryptogram cheats, knowing the solution in advance. This can happen; had I not known the answer, for instance, I might have tried assuming that f, the most common letter in the ciphertext, represented a, the most common letter in most passages. But this sample was short enough that this was not the desirable solution. It's best to attack, as we did here, from all angles: counting letters, counting last letters of words, counting short words. That led me to a shorter solution without errors.

That's if you know where the words end. Since classical works were generally written without word divisions, they are usually encrypted the same way. If you want another challenge, you may try this:

Cryptogram 2

NZGOYGQEUZFOXOGTNQRTOSQEANRCNFZXCDYRNTCNZCNRTOFD
DCQWOYQOZCNRXAYNTRFNRNCRWUYUXRXOWNFQYNZXCURXYODE
GQGNTQTOBYRXNZCNWEDYUFOCUTQGTUTQRXQEGDMORNZCUXWO
XRXYODEXUGDXQCOR

(Please note: the word/line breaks here are purely arbitrary, to make this fit on your screen. They should be ignored for frequency analysis.)

I won't walk you through the solution to this one. But if you want to see it, it's here.

Finally, here is one that offers the full range of difficulty known to the ancients. This is a monoalphabetic substitution, but it is not a simple cipher. Rather, it is a nomenclator (from Latin nomen calator, "name caller") which is a cipher with code elements. A proper nomenclator will eliminate certain common words by replacing them with symbols (e.g. in English, the word "the" might be replaced by % or some other token), will probably include two forms of common letters (e.g. in English, "E" might be replaced by either G or !), will eliminate letters often found together (e.g. English "SH" might become $), may include nulls (that is, characters simply to be eliminated), e.g. # might simply stand for "ignore me") and may include modifiers (e.g. 2 might stand for "repeat previous letter" or "repeat following letter").

Cryptogram 3

For this, because it's much harder, we'll give you several samples. Some are Biblical, some are not; two are from classical authors, although the Greek will be understandable to those who know only koine. You'll notice something of a theme to these quotations. If you can solve them, there is at least some chance it applies to you.

Note that, as above, the line breaks are purely arbitrary (and represent the same number of characters on each line, though this may not be evident depending on your font metrics). All quotes are reasonably grammatically complete.

S=$AUBLWAI*DLMUAYMUWXAZU~YAU%MH+UPNXAZ**MHWPGZL~NWH
WDYXAZUY~AU%M\GDMLYNX*AZML~KLML~

BLTAUMU~%E%CAU+POA*IDW\KANGZLWAUW

CAODBP*UBZU~E%S=OO**LWH+LYKA=PKDWAUW%U

^KMHYHKPKQ=DW%WBZDWF=LWAU*BH^BAFHOD~HYM%Y\B\YPNMDW

XZLMAZ%XPWMDWAKMU~MPU+U%$YNWA~UYGZLWHYAD~ATPUDWLY

AUBAMUYNEDWOAUXAM%U+UP~PUMAUMDXPZPM\BUBLWMLYCA\X%~U
W%XODY$^LWAUBUFLWMS=L~$BLC**HYAMPU%NMDPUMAUMDBAAWXU
~MAU^BAWBU%KZUWLE*AWLYLIPZBUG=%KZUWLE*AWL~ALUKAWKON
BDWUC%OPY~HYPWAEUFLEAWD$ZUXUFLEAWD

LI%ZMLUXZL*^CANYHXPWM%AXUMZLXAN\~%MPCHWMM%XZLWR=LU%X
WANS=EPAWC*AZELWDYXAZLZI%WLWNXLR%OO\~PMHGNYAUPXP~UE
AMG=ABDKAWPYDEM\OLI\EAMA~QABAAK**%YMLW\XAZHBNWPML

HBA~XGU%XL**CAWANZACHXLULYBAMLXLYA~MU**WMHYAXUYMH^~
\KLUBAWRZLMLYLBAWP*NMH~\BA^ANZACHAW%WCZDXLU~L=

Because this is so difficult, I'll give you a series of hints if you choose to take them. To help those who don't want to cheat (you don't get hints in a real cryptogram!), I've put them in white type. You can copy them and paste them into another program to read them, or simply drag across them; they should show up in inverse type. The goal, of course, is to use as few hints as possible.

HINT 1: ==All letters stand for letters; the only special symbols are the symbols =, *, %, etc.==

HINT 2: ==The only whole word to be replaced by a symbol is KAI==

HINT 3: ==There are three other combinations of letters replaced by a single symbol, but these may be replaced within words. The three combinations are OU MH SOF==

HINT 4: ==Two letters have been split in two (i.e. are represented by two different symbols): A S==

HINT 5: ==The two remaining special symbols are NULL (i.e. simply omit from the plaintext) and DELETE PREVIOUS (i.e. this character and the character before it should be omitted). To prevent confusion, the latter symbol is not allowed to be doubled.==

HINT 6: ==The only New Testament passage is James 1:5-6.==

HINT 7: ==The passage from James is the sixth message.==

HINT 8: ==There are four passages from LXX: Ecclesiastes 2:13, Proverbs 3:31, Sirach 1:4, Job 28:12-13, in that order -- but of course there are some other quotes intervening.==

HINT 9: ==The symbol used for E is A ==

HINT 10: ==The first letters of the eight actual plaintexts are K D Q M P E O H ==

The whole answer to this cryptogram is here