• I love square ctfs and the way they do security. This was one of the highlight ctf last year which kept me excited for the week.

1. Misc – MATH category: Captcha

Concept

  • Solve math, Identify Characters, Work with Fonts Style!

Given

  • A web page to solve mathematical expression and answer to get the flag (sounds easy!)

Thoughts –> Think, Think, Think

  • But, when the expression or characters are copied, they appear as garbage, page source show different mapping –> This is because of the custom font style use
    d which is base64 encoded within the page
  • Oldschool, tried solving the challenge by manually typing as fast as I can, but it was clever to change in a few seconds to a different mapping and captch, th
    e page is linked with a token that keeps track of the current captcha… (To probably work and solve it in under 4 seconds else it reset everything!!!!)
  • Obtaining the web page with captcha programmatically –> make sure you set the user agent for http proper, else the page doesnt respond
  • Tried studying font styles and cmap tables
    **- IDEA1:** a proper mapping from font style to the characters visible would help to reconstruct the expression (Font tools in python was helpful, but studying the different fonts was not exhaustive, it involved how they are drawn and different tables that define fonts)
      
    **- IDEA2:** use character recognition using OCR, and construct the expression to solve
      
    **- IDEA3:** Try extract the base64 encoded font style into a ttf file and try processing in the current operating system to recognize, (the reverse) to rebuild the expression fron the webpage becomes tough again
    
  • Expression could be solved easily as a string with python eval

  • Moving forward with the IDEA2 after a number of fails
    • Take screenshot of the browser loading the expression - (by python selenium lib)
    • Testing OCR capabilities in online ocr tools worked 100% accurate, translating the expression
      • Approach1: Take screenshot of the browser loading the expression –> send to a online OCR website (API) –> Fetch the expression text –> Solve
        • Some website work for sure (manual upload), while the ones that offer API key access do not work successful
      • Approach2: Use python OCR - PYTESSERACT to perform OCR and obtain the text (This seems to not be accurate, some characters are not properly recognized ao it cant be relied upon)
        • Refactoring this logic helped me solve the problem, look at the steps
          • Fails: Using pillow to increase resolution, contrast, tryc cropping and enhance image did not help. Still OCR was inaccurate!
          • Success: Using a hybrid method (@steps), making OCR a separate process to map the characters to text

Steps

1. OBTAIN DATA (SCREENSHOT + HTML) - Use Python Selenium to do this

  • Goto the captcha webpage
  • Take screenshot of the captcha page and save as png
  • Also, store the page source html

2. OCR (Create a mapping for fonts)

  • We already know performing OCR on the screenshot is not accurate using pytesseract
  • We know that the expression could contain only ‘1234567890-+xX()’
    • xX for multiply, should be replaced by ‘*’ once the expression is obtained
  • Replace the content of the html source (ie the expression) with the string ‘unique characters in expression that would map to 1234567890-+xX()’ with proper spaces for the OCR to work (REASON: The spacing and the way the characters are displayed affect the OCR process a lot!!!!!!)
    • This process can also be improved by making fewer character mapping (as we have only less characters to map) repeatedly to be more accurate (this was not needed for this challenge though)
  • Now again repeat the process step 1 and OBTAIN DATA with the new html created, this time take a new screenshot with the known characters and order you have placed.
  • The OCR now seems to be more accurate and the mapping between the font and characters could be performed easily.

3. CONSTRUCT EXPRESSION

  • Construct the expression now, replace ‘x’ with ‘*’ and execute it with eval

4. SUBMIT CAPTCHA RESPONSE

  • Obtain the token and the answer, create a JSON and make a POST request to submit the answer

– It is noted that I have added one round trip of selenium action for the new html with added characters to get the mapping with better OCR. Hope this doesnt cost be enough time!!

Program:

  • crack_structured.py
  • crack.py - initially worked on code with different pocs
import os, re, sys
import requests
import pytesseract
from selenium import webdriver

### STEP1 - Obtain Data

# Use selenium to grab a screen shot of the webpage
driver = webdriver.Firefox()
driver.get('https://hidden-island-93990.squarectf.com/ea6c95c6d0ff24545cad')
element = driver.find_elements_by_tag_name('p')

# Html source, token and expression
htmls = driver.page_source
text = element[0].text
t = "".join(list(text))
tok = driver.find_element_by_name("token")
token = tok.get_attribute("value")
var = list(set(t))
vars = []
for ch in var:
    if ch.strip():
       vars.append(ch)
print vars
print htmls

### STEP2 - OCR - Recogize and Map

html = htmls.replace(text, " ".join(vars))
#print html
new_html = open("new.html","w")
new_html.write(str(html))
new_html.close()
alt_html = "file://"+os.path.abspath("new.html")
driver.get(alt_html)
screenshot = driver.save_screenshot('expression.png')
driver.quit()
expression = pytesseract.image_to_string(Image.open("expression.png"))
expression =  expression.split()[1]
expression = list(expression)
print vars, expression


### STEP3 - Construct expression

for k,v in zip(vars, expression):
    text = text.replace(k, v)
print text
# Replace x or X and solve
#expr = expression.split("\n")[1]
expr = text.replace("x","*")
expr = expr.replace("X","*")
print expr
ans =  eval(expr)
print ans


### STEP4 - Submit the answer

url = "https://hidden-island-93990.squarectf.com/ea6c95c6d0ff24545cad"
data = dict(token=token, answer=str(ans))
r = requests.post(url, data=data, allow_redirects=True)
print r.content

Terminal Output showing the work

Flag Obtained

RESULT

  • The first try failed, second try failed too with incorrect answer response
  • The third try was successfull!!!!

2. GDPR category: deAnonymization

Concept

  • In GDPR, anonymization is when the privacy of the user is protected by anonymizing the data such that nothing is derived about any person.

Given

  • Five csv data sets containing different parameters like Firstname, email, 4 digits of SSN, Role, Pay, State, Street Address.
  • A web portal which has a login and reset password page
  • Says you have to find details about the user Yakubovics who is the captain to login the system

Think, Think, Think

  • Looking at the portal it is very intriguing to perform a sql injection or admin login BUT we have the datasets and a hint name.
  • It is obvious, from the details of the dataset and the reset password form that we should find the data from the datasets and fill in reset password to reset it and then login

Steps

  • Start from the name we have Yakubovics and boil down to get the firstname, ssn, street address, state
  • From all possible sets obtained from the above filter, use these in reset password form
  • Get or change the password (The final result ought to be just viewing previous password in the reset password page)
  • Login

Details:

  • Start with the given name we have Yakubovics
  • Check the dataset1 –> We obtain email with the last name
  • Check the dataset2 –> Use the email and last name obtained from dataset1 to obtain the STATE
  • Check the dataset3 –> With the State, obtain the ssn and street address
  • Check the dataset4 –> Get income and postal code with the state obtained
  • Check the dataset5 –> From the email we know the first character of name is e, use this to filter first name in the fifth dataset
  • As we progress delete the non matched sets

Program

  • reader.py
    • This program throws the final set of data from all the filter through the dataset csv 1 -5. From this set using Elyssa gives the answer
import json
names = []
yaku = {}

# File 1: Fetch all the existence of the names of the Captain
for i in range(1,2):
    name = str(i)+".csv"
    file = open(name,"r")
    for line in file.readlines():
        if "Yakubovics" in line.strip() or "Yakubovics".upper() in line.strip() or "Yakubovics".lower() in line.strip():
           l = line.strip().split(",")
           yaku["email"] = l[0]
           yaku["role"] = l[1]
           yaku["income"] = l[2]
    file.close()


# File 2
for i in range(2,3):
    name = str(i)+".csv"
    file = open(name,"r")
    for line in file.readlines():
        if "Yakubovics" in line.strip() or "Yakubovics".upper() in line.strip() or "Yakubovics".lower() in line.strip():
           l = line.strip().split(",")
           yaku["state"] = l[1]
    file.close()

# doc has the ssn, address --> Fetch all florida addresses and ssn
name = str(3)+".csv"
file = open(name,"r")
for line in file.readlines():
    if "Florida" in line.strip() or "Florida".lower() in line.strip() or "Florida".upper() in line.strip():
       l = line.strip().split(",")
       # ssn
       yaku[l[2]] = {}
       # street
       yaku[l[2]]["ssn"] = l[0]
file.close()

# Fourth
name = str(4)+".csv"
file = open(name,"r")
for line in file.readlines():
    l = line.strip().split(",")
    if " ".join(l[2:-1]) in yaku and ("Florida" in l[1] or "Florida".upper() in l[1] or "Florida".lower() in l[1]):
       yaku[" ".join(l[2:-1])]["income"] = l[0]
       yaku[" ".join(l[2:-1])]["postal"] = l[-1]
file.close()

# Fifth
name = str(5)+".csv"
file = open(name,"r")
for line in file.readlines():
    l = line.strip().split(",")
    if " ".join(l[1:]) in yaku:
       yaku[" ".join(l[1:])]["name"] = l[0]
       # This assumption doesnt work?
       if l[0][0] != "e".upper() and l[0][0] != "e".lower():
          del yaku[" ".join(l[1:])]
file.close()

print json.dumps(yaku, sort_keys=True, indent=4)

Output Flag - Hidden password can be obtained by looking at the source for the masked/hidden field

Program output - Based on the relation boil the data down to possible sets

Result

  • At reset password page, Using details of Elyssa throws the answer

`
“4 Magdeline”: {
“income”: “96605”,
“name”: “Elyssa”,
“postal”: “33421”,
“ssn”: “4484”
},
`

  • The previous password appears masked with ‘*’, viewing page source gives out the password

3. Programming category: dot-n-dash

  • A puzzle by Alok himself and the first puzzle. Proved to be challenging!

Concept

  • Encode/Decode. Program by Reversing Logic.

Given

  • Encoder/Decoder written in Javascript, with decoder code missing. There are instructions provided (with flag obviously) in encoded format. Complete the decoder code! (Reverse Encoder Code.)

Think, Think, Think

  • With a bunch of debug statements, analyze the encoder code.
  • Trying to literally following through the encoder program to write the decoder functions
  • Possible pattern recognition that can be leveraged to reverse.?

Steps

  • This problem had me in a confused state for a long time, trying to dig through JS, analyzing and reversing the code.
  • After some crazy haul on this, while I was just reading the console, a pattern struck!…implemented the same in the below code to decode.
    • Convert dot and dashes back to its respective integer
    • reverse math
    • covert back to ascii

Code

</head>
<body>
<p>It is a known fact that space travelers love to devis unique encoding and decoding methods...</p>
  <textarea id="input" placeholder="type something here..."></textarea>
  <div>
    <button onclick="return encode();">Encode</button>
    <button onclick="return decode();">Decode</button>
  </div>
<script>
function encode() {
  var t = input.value;
  if (/^[-.]+$/.test(t)) {
    alert("Your text is already e'coded!");
  } else {
    input.value = _encode(t);
  }
  return false;
}

function decode() {
  var t = input.value;
  if (/^[-.]*$/.test(t)) {
    input.value = _decode(t);
  } else {
    alert("Your text is not e'coded!");
  }
  return false;
}

function _encode(input) {
  var a=[];
  for (var i=0; i<input.length; i++) {
    var t = input.charCodeAt(i);
    console.log(t);
    for (var j=0; j<8; j++) {
      //console.log(t >> j);
      //console.log((t >> j) & 1)
      //console.log(1 + j + (input.length - 1 - i) * 8)
      if ((t >> j) & 1) {
        console.log(t >> j);
        console.log((t >> j) & 1)
        console.log(1 + j + (input.length - 1 - i) * 8)
        a.push(1 + j + (input.length - 1 - i) * 8);
      }
    }
  }
 
  console.log(a);

  var b = [];
  while (a.length) {
    var t = (Math.random() * a.length)|0;
    b.push(a[t]);
    a = a.slice(0, t).concat(a.slice(t+1));
  }

  console.log(b);

  var r = '';
  while (b.length) {
    var t = b.pop();
    r = r + "-".repeat(t) + ".";
  }
  return r;
}
 
// Everything below this line was lost due to cosmis radiation. The engineer who knows
// where the backups are stored already left.
function _decode(input) {
  var b = [];
  
  // Reverse r logic
  dot_split = input.split(".")
  console.log(dot_split);
  for (var i=0; i<dot_split.length; i++) {
      if (dot_split[i].length) {
         b.push(dot_split[i].match(/-/g).length);
      }
  }

  input = ["0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f","g","h","i","j","k","l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "-"];
  input = input.join('');
  var dick={};
  for (var i=0; i<input.length; i++) {
    var t = input.charCodeAt(i);
    var a=[];
    console.log(input[i],t);
    for (var j=0; j<8; j++) {
      //console.log(t >> j);
      //console.log((t >> j) & 1)
      //console.log(1 + j + (input.length - 1 - i) * 8)
      if ((t >> j) & 1) {
        //console.log(t >> j);
        //console.log((t >> j) & 1)
        //console.log(1 + j + (input.length - 1 - i) * 8)
        a.push(1 + j + (1 - 1 - 0) * 8);
      }
    }
    dick[a.join('')] = input.charAt(i);
    //console.log(a);
  }
  console.log(dick);

  b = b.sort(function(a, b){return a-b});
  console.log(b);
  var output = [];
  while (b.length) {
     var less_than_8 = [];
     var stop = 0;
     for (var p=0; p<b.length; p++) {
         if (b[p] > 8) {
             b[p] = b[p] - 8;
         } else {
             less_than_8.push(b[p]);
             stop = p;
         }
     }
     b = b.slice(stop+1);
     console.log(less_than_8);
     console.log(b);
     output.push(dick[less_than_8.sort(function(a, b){return a-b}).join("")]);
  }
  console.log(output);

  return output.reverse().join("");

}
</script>

Result