Hello, Internet!

In my previous post I talked about my solutions for the first 4 problems of the Tuenti Challenge 9. Today we’ll go up to the problem 7.

You can find all the posts with solutions grouped here.

Challenge 5 - Forbidden Love

Not much to say about this one.

We know that the messages are encoded so that each letter is substituted by the one located X columns and Y rows away from its original position in the keyboard of the used typewriter. We also know that every message ends with the first letter of the name of the sender. Since we are provided with the information of who sent each message, we just need to generate X and Y substracting the position of the last letter from the position of the letter that was supposed to be there.

tw = [
    '1','2','3','4','5','6','7','8','9','0',
    'Q','W','E','R','T','Y','U','I','O','P',
    'A','S','D','F','G','H','J','K','L',';',
    'Z','X','C','V','B','N','M',',','.','-'
    ]
rows = 4
cols = 10

def find_key( key ):
    index = tw.index( key )
    row = index // cols
    column = index % cols
    
    return ( row , column )

def get_key( row, column ):
    index = ( column % cols ) + cols * ( row % rows )
    return tw[ index ]

def decode( message, off_row, off_column ):
    dec_message = ''
    for key in message:
        if key in tw:
            row, column = find_key( key )
            dec_message += get_key( row + off_row , column + off_column )
        else:
            dec_message += key
    return dec_message

G_row, G_column = find_key( 'G' )
B_row, B_column = find_key( 'B' )

sender = # Get sender from the file
message = # Get message from the file

sign = message[ -1 ]
sign_row , sign_column = find_key( sign )

if sender == 'G':
    off_row = G_row - sign_row
    off_column = G_column - sign_column
else:
    off_row = B_row - sign_row
    off_column = B_column - sign_column

dec_message = decode( message , off_row , off_column )

print( dec_message )

Challenge 6 - Alphabet from outer space

At first glance, I though this problem should be easy to model. After all you just need to order an alphabet given an ordered dictionary of that alphabet. However, this would only be trivial if that dictionary contained at least one word that starts with every element of the alphabet, which sadly won’t happen in the test cases. Therefore we find ourselves in situations such as these:

  • Given aa, bb, cc, we clearly see that a < b < c.
  • Given ab, ac, ba, we see that a < b because the first letter of the words, and also we see that, since ab, ac share the first letter, then b < c. As a result we know that a < b < c.
  • Given aa, ac, ba, we see that a < b because the first letter of the words, and also we see that, since aa, ac share the first letter, then a < c. But with this information we still don’t know if b < c or c < b, so we have an ambiguous case.

This got me thinking. So we have elements and some one-to-one connections between them. Boy, it looks like a graph to me. So we can build a graph with all the partial orderings that we find in the dictionary (like we did in challenge 2) and then look for a path in the graph that contains every element without repetition. If the dictionary is properly ordered, this path should be unique or nonexistant (ambiguous case).

def create_graph( words, index = 0, graph = None ):
    if not graph:
        chars = set( ''.join( words ) )
        graph = { key: [] for key in chars }
        
    prev_word = words[ 0 ]
    top_subindex = 0
    low_subindex = 0
    subblock = False
    
    for i, word in enumerate( words[1::], start=1 ):
        if len( prev_word ) > index and len( word ) > index:
            if prev_word[index] != word[index]:
                if subblock:
                    graph = create_graph(
                        words[top_subindex:low_subindex:], 
                        index+1, 
                        graph )
                subblock = False
                top_subindex = i
                
                if word[index] not in graph[prev_word[index]]:
                    graph[prev_word[index]] += [ word[index] ]
            else:
                subblock = True
                low_subindex = i + 1
            
            prev_word = word
                
        if len( prev_word ) <= index:
            prev_word = word
    
    if subblock:
        graph = create_graph( 
            words[top_subindex:low_subindex:], 
            index+1, 
            graph )
        
    return graph

def find_path( graph, visited = None, options = None ):
    if not options:
        options = graph.keys()
        
    if not visited:
        visited = []
            
    for node in options:
        if node not in visited:
            new_visited = visited + [ node ]
            if len( new_visited ) == len( graph ):
                # Found a path that contains every element
                return new_visited
            
            if graph[node]:
                answer = find_path( graph, new_visited, graph[node] )
                
                if answer:
                    return answer
    return None

words = # Get words from file
graph = create_graph( words )
result = find_path( graph )

if not result:
    answer = 'AMBIGUOUS'
else:
    answer = ' '.join( result )

print(answer)

Challenge 7 - Hash of cards

Warning: I freaked out while coding the answer to this exercise and skipped it, so, even though I’m pretty sure my answer is correct, it is not tested.

First thing is understanding what the hash function actually does. It takes the message, split it in rows of 16 characters and then sums (modulus 256) the numerical values of the characters of each column to generate an array of 16 signed bytes (that is why the sum of positive ASCII values overflows into negative hashes) that will be the hash.

To visualize this, let’s use the example message and their hashes (last row). This is the original message:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S u b j e c t : B o a t ; F r
o m : C h a r l i e ; T o :
D e s m o n d ; - - - - - - N o
t P e n n y s b o a t
122 103 95 92 -123 -89 -78 14 44 -8 99 56 86 75 -50 1

This is the modified message:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S u b j e c t : B o a t ; F r
o m : C h a r l i e ; T o :
D e s m o n d ; - - - - - - P e
n n y s b o a t : )
116 -75 -120 30 -118 89 -101 86 26 76 33 3 30 -41 -48 -9

And finally this is the modified message with the payload:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
S u b j e c t : B o a t ; F r
o m : C h a r l i e ; T o :
D e s m o n d ; - - - 0 3 W 0 0
0 0 0 0 S 0 e 0 0 0 0 X z z w u
e 0 8 B z Q z 0 Z 0 D z z z z z
z R z z z z z e z _ z z - - - P
e n n y s b o a t : )
122 103 95 92 -123 -89 -78 14 44 -8 99 56 86 75 -50 1

As you can see, for every character added to a message all the following characters are shifted one column and as a result all the hash numbers get modified. That is why the hashes of vey similar messages are so different.

In order to ensure that the final hash is the same as the one of the original message, we need to add a sequence of characters (the payload) to our modified message that ensures that the final sum is the desired one in each column. Depending of the final length of the payload there will be a different amount of payload characters in each column (0, 1, 2…) with which values modify the hash.

To create the payload we need to follow some rules:

  • Characters of the payload have to be alphanumeric, which means a value range from 48 to 122.
  • Payload has to be as short as possible.
  • Among all the payloads of the same length we need to take the one with lowest numeric value (characters with higher value go at the end).

Before continuing let’s set some terms:

  • i: the number of a column of the hashing matrix shown before. It will range from 0 to 15.
  • p: the amount of characters in a given payload.
  • N(i,p): the number of payload characters in column i for a payload of length p. Of course, in the real problem, we need to know the initial position of the payload to calculate this.
  • Orig(i): unsigned 8-bit value of the hash of the original message in the column i.
  • Mod(i,p): unsigned 8-bit value of the hash of the modified message in the colum i when added a payload that contains p null values (0x00). I do this because later those payload spaces will be filled with the values I obtain through the algorithm.
  • Dif(i,p): result of Orig(i) - Mod(i,p). Note that, since we operate with unsigned 8-bit values, we have oveflowed results like 1 - 2 = 255.
  • Pay(i,p): unsigned 8-bit value of the sum of the N(i,p) payload characters of column i.
  • Min(i,p): 48 * N(i,p). This is the unsigned 8-bit value that the sum of the payload characters of a column i, taking into consideration that the minimum character value is 48 (‘0’).
  • Max(i,p): 122 * N(i,p). This is the unsigned 8-bit value that the sum of the payload characters of a column i, taking into consideration that the maximum character value is 122 (‘z’).
  • Over_max(i,p) or Over_min(i,p): this is the number of times that the value overflowed while calculating Max(i,p) or Min(i,p).

Now, we know that

Dif(i,p) = Orig(i) - Mod(i,p) => Orig(i) = Mod(i,p) + Dif(i,p)

and therefore we need to ensure that

Pay(i,p) == Dif(i,p)

But for this to happen, Dif(i,p) needs to be in the range of values of Pay(i,p). The ranges of Pay(i,p) can be:

  1. If Over_max(i,p) - Over_min(i,p) = 0 the range of Pay(i,p) is [ Min(i,p) , Max(i,p) ], because both limits were [ 48 , 122 ] when N(i,p) == 1 and both values have overflowed the same amount of times they remain in order.
  2. If Over_max(i,p) - Over_min(i,p) = 1. the max value has overflowed one time more than the minimum, so the valid range is the union of [ Min(i,p) , 255] and [ 0 , Max(i,p) ], so we have two scenarios:
    1. If Max(i,p) < Min(i,p), the valid range remains the union of [ Min(i,p) , 255] and [ 0 , Max(i,p) ].
    2. If Max(i,p) >= Min(i,p), the union covers the whole [0,255], therefore there will be an answer always because we can cosntruct every 8-bit number.
  3. If Over_max(i,p) - Over_min(i,p) > 1. the max value has overflowed at least two times more than the minimum so the valid range is the union of [ Min(i,p) , 255] and [ 0 , 255 ] and [ 0 , Max(i,p) ], which obviously means that the valid range is finally [ 0 , 255 ] and there will always be a solution.

Knowing this, we calculate the minimum p by iterating and looking for a case in which, for every i, Dif(i,p) falls in one of the previous valid ranges.

Once we have the value of p, we calculate Dif(i,p) + 256 * Over_max(i,p) (if we previously found the range [ 0 , Max(i,p) ]) or Dif(i,p) + 256 * Over_min(i,p) (any other case). Then we divide that value in N(i,p) valid numbers in ascending order (for example, 250 divided in 3 would be the list [ 48, 80, 122 ]).

These lists of numbers are the values that will fill each column i, so, to craft the final payload string, we can create a matrix with these lists as columns and read each row starting for the column j, being j the column where the payload will start to be writen.

import math

min_byte = ord('0')
max_byte = ord('z')

def byte_to_signed( b ):
    return b if b < 128 else ( 256 - b ) * (-1)

def signed_to_byte( b ):
    return b if b >= 0 else 256 + b

# How much need to be added to b2 to be b1
def positive_dif_for_signed( b1 , b2 ):
    byte_b1 = signed_to_byte( b1 )
    byte_b2 = signed_to_byte( b2 )
    
    return byte_b1 - byte_b2 if byte_b1 >= byte_b2 
        else byte_b1 + ( 255 - byte_b2 )

def notSoComplexHash( text ):
    def f( col ):
        b = sum( 
                [ ord( x ) 
                for i, x in enumerate( text ) 
                if i % 16 == col ] 
            ) % 256
        return byte_to_signed( b )
        
    return [ f( col ) for col in range( 16 ) ]

def cols_payload( initial_col, num_bytes ):
    return 
        [ len(
            [ b 
            for b in range( num_bytes ) 
            if ( initial_col + b ) % 16 == col ] ) 
        for col in range( 16 ) ]

def min_max( n ):
    return (
            ( min_byte * n ) % 256,
            ( min_byte * n ) // 256,
            ( max_byte * n ) % 256,
            ( max_byte * n ) // 256,
        )
        
def valid_range( value, min_i, over_min_i, max_i, over_max_i ):
    if over_max_i - over_min_i > 1:
        return ( True , over_min_i )
    if over_max_i - over_min_i == 1:
        if max_i >= min_i:
            return ( True , over_min_i )
        if value <= max_i:
            return ( True , over_max_i )
        elif value >= min_i:
            return ( True , over_min_i )
        else:
            return ( False , None )
    if value >= min_i and value <= max_i:
        return ( True , over_min_i )
    else:
        return ( False , None )
        
def distribute( value, spots ):
    # TODO
            
orig_msg = # From file
mod_msg = # From file
orig_hash = notSoComplexHash( orig_msg )

payload_position = orig_msg.find('---') + 3
payload = ''

p = 0
while True:
    p++
    
    gaped_msg = mod_msg[:payload_position:] + 
        chr( 0 ) * p + 
        mod_msg[payload_position::]
    gaped_hash = notSoComplexHash( gaped_msg )
    dif = [ positive_dif_for_signed( o, m ) 
        for o, m in zip( orig_hash, gaped_hash ) ]
    
    found = True
    n = cols_payload( payload_position, p )
    overs = []
    for dif_i, n_i in zip( dif, n ):
        valid , over = valid_range( dif_i , *min_max( n_i ) )
        overs += over
        if not valid:
            found = False
    
    if found:
        numbers = None * 16
        for i, dif_i, n_i, over_i in enumerate( zip( dif , n , overs ) ):
            numbers[i] = distribute( ( dif_i + 256 * over_i ) , n_i )
        
        for x in range(p):
            payload += chr( numbers[( payload_position + x ) % 16][x // 16] )
        
        break

print(payload)

And that is all for today! Next post I’ll cover the two problems that I enjoyed the most, both related with security.

See you!