Systems View: December 2009

Sunday, December 6, 2009

Breathalyzer -- Facebook puzzle

More puzzles; in this post I provide the code for another puzzle provided on Facebook. Note that the unit test code is not included here. You can copy the code below after creating a project in an IDE such as Eclipse (which is what I use).

By the way, this puzzle was interesting. I did spend some time thinking of the algorithm to solve the problem but then started Googling for it. Found out about LevenshteinDistance which solves this problem nicely. Have to confess though that my thoughts on solving this problem were focused on using a tree structure. After finding the solution (and also the code for that) I gave up on that approach.

So given below is the problem and the code. The code contains two different functions for getting the Levenshtein Distance. Was having problems with the first function and hence included the second; now have kept both.

By the way see the example below. The word "tihs" has "ties" as the closest although in the context of the example sentence I was not seeing this until I used the program to clear my mental block. An example of computer not having to deal with the "mental block".

PROBLEM: (from Facebook page)

To safeguard against the dreaded phenomenon of wall posting while drunk, Facebook is implementing a feature that detects when post content is too garbled to have been done while sober and informs the user that they need to take an online breathalyzer test before being allowed to post.

Unfortunately, there is far too much content for a given set of persons to evaluate by hand. Fortunately, you are a programmer of some repute and can write a program that processes and evaluates wall posts. You realize that such a program would be of great use to society and intend to resolve the problem once and for all. The program you write must compute a score for a body of text, returning this score upon completion.

Your program will be given a list of accepted words and run on one wall post at a time. For each word W in the post, you must find word W' from the list of accepted words such that the number of changes from W to W' is minimized. It is possible that W is already W' and thus the number of changes necessary is zero. A change is defined as replacing a single letter with another letter, adding a letter in any position, or removing a letter from any position. The total score for the wall post is the minimum number of changes necessary to make all words in the post acceptable.

Input Specification

Your program must take a single string argument, representing the file name containing the wall post to analyze. In addition, your program must open up and read the accepted word list from the following static path location:

/var/tmp/twl06.txt

For testing purposes, you may download and examine the accepted word list here. When submitting your code, you do not need to include this file, as it is already present on the machine.

The input file consists entirely of lower case letters and space characters. You are guaranteed that the input file will start with a lower case letter, and that all words are separated by at least one space character. The file may or may not end with a new line character.

Example input file:

tihs sententcnes iss nout varrry goud

You are guaranteed that your program will run against well formed input files and that the accepted word list is identical to the one provided for testing.

Output Specification

Your program must print out the minimum number of changes necessary to turn all words in the input wall post into accepted words as defined by the word list file. Words may not be joined together, or separated into multiple words. A change in a word is defined as one of the following:

Replacing any single letter with another letter.
Adding a single letter in any position.
Removing any single letter.

This score must be printed out as an integer and followed by a single new line.

Example Output (newline after number):

------------------------------------------------------------------------------------------

package com.farooq;

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.ArrayList;

public class Word_distance {

/**
* @param args
*/
private ArrayList legal_word_list = new ArrayList();

public static void main(String[] args) {

String[] sentence_to_check = new String[args.length];
int Breath_analyzer_number; // the result we are looking for

//read the sentence which has to be tested from command line/file etc
if (args.length > 0) {
sentence_to_check = args;
}

//create an instance and determine the Levenshtein distance
Word_distance sentence_LD = new Word_distance();

Breath_analyzer_number = sentence_LD.sentenceAnalysis(sentence_to_check);
System.out.println("The number for the given sentence is " +Breath_analyzer_number);

}

public Word_distance() {
String filename = "C:\\tempfiles\\word_list_Levenshtein.txt";
String line_read;
try {
FileReader word_list_file = new FileReader(filename);
BufferedReader word_buffer = new BufferedReader(word_list_file);

while ((line_read = word_buffer.readLine()) != null) {
//so reading the list of "legal" words into the ArrayList
legal_word_list.add(line_read);
}
word_list_file.close();

} catch (Exception e) {
System.out.println("Exception is " +e);
}
}

public int sentenceAnalysis(String[] sentence) {

// get each word of sentence and determine Levenshtein number for word
//have to iterate the given word over entire legal_word_list
//and determine lowest value; if zero then break, repeat for all words
int total_distance =0;
int[] word_distance = new int[sentence.length] ; //minimum word_distance for each word in the sentence
int temp_distance;

for (int i=0;i< sentence.length; i++) {

for (int j=0; j< legal_word_list.size();j ++) {
temp_distance = getLevenshteinDistance(sentence[i], legal_word_list.get(j).toLowerCase());
// temp_distance = LD(sentence[i].trim(), legal_word_list.get(j).toLowerCase());

if (temp_distance ==0) {
word_distance[i] = temp_distance;
break;
} else if (j==0) {
word_distance[i] = temp_distance;
} else if (word_distance[i] > temp_distance) {
word_distance[i] = temp_distance;
} else {
//DO NOTHING as new word distance is larger than what we had
}
}
total_distance = total_distance + word_distance[i];
}
return total_distance;
}

// the following code copied from http://www.merriampark.com/ldjava.htm by Chas Emerick
public int getLevenshteinDistance (String s, String t) {
if (s == null || t == null) {
throw new IllegalArgumentException("Strings must not be null");
}

/*
The difference between this impl. and the previous is that, rather
than creating and retaining a matrix of size s.length()+1 by t.length()+1,
we maintain two single-dimensional arrays of length s.length()+1. The first, d,
is the 'current working' distance array that maintains the newest distance cost
counts as we iterate through the characters of String s. Each time we increment
the index of String t we are comparing, d is copied to p, the second int[]. Doing so
allows us to retain the previous cost counts as required by the algorithm (taking
the minimum of the cost count to the left, up one, and diagonally up and to the left
of the current cost count being calculated). (Note that the arrays aren't really
copied anymore, just switched...this is clearly much better than cloning an array
or doing a System.arraycopy() each time through the outer loop.)

Effectively, the difference between the two implementations is this one does not
cause an out of memory condition when calculating the LD over two very large strings.
*/

int n = s.length(); // length of s
int m = t.length(); // length of t

if (n == 0) {
return m;
} else if (m == 0) {
return n;
}

int p[] = new int[n+1]; //'previous' cost array, horizontally
int d[] = new int[n+1]; // cost array, horizontally
int _d[]; //placeholder to assist in swapping p and d

// indexes into strings s and t
int i; // iterates through s
int j; // iterates through t

char t_j; // jth character of t

int cost; // cost

for (i = 0; i<=n; i++) {
p[i] = i;
}

for (j = 1; j<=m; j++) {
t_j = t.charAt(j-1);
d[0] = j;

for (i=1; i<=n; i++) {
cost = s.charAt(i-1)==t_j ? 0 : 1;
// minimum of cell to the left+1, to the top+1, diagonally left and up +cost
d[i] = Math.min(Math.min(d[i-1]+1, p[i]+1), p[i-1]+cost);
}

// copy current distance counts to 'previous row' distance counts
_d = p;
p = d;
d = _d;
}

// our last action in the above loop was to switch d and p, so p now
// actually has the most recent cost counts
return p[n];
}

// following code also from web site; not my code
//*****************************
// Compute Levenshtein distance
//*****************************

public int LD (String s, String t) {
int d[][]; // matrix
int n; // length of s
int m; // length of t
int i; // iterates through s
int j; // iterates through t
char s_i; // ith character of s
char t_j; // jth character of t
int cost; // cost

// Step 1

n = s.length ();
m = t.length ();
if (n == 0) {
return m;
}
if (m == 0) {
return n;
}
d = new int[n+1][m+1];

// Step 2

for (i = 0; i <= n; i++) {
d[i][0] = i;
}

for (j = 0; j <= m; j++) {
d[0][j] = j;
}

// Step 3

for (i = 1; i <= n; i++) {

s_i = s.charAt (i - 1);

// Step 4

for (j = 1; j <= m; j++) {

t_j = t.charAt (j - 1);

// Step 5

if (s_i == t_j) {
cost = 0;
}
else {
cost = 1;
}

// Step 6

d[i][j] = Minimum (d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1] + cost);

}

}

// Step 7

return d[n][m];

}

//****************************
// Get minimum of three values
//****************************

private int Minimum (int a, int b, int c) {
int mi;

mi = a;
if (b < mi) {
mi = b;
}
if (c < mi) {
mi = c;
}
return mi;

}

}

Liar, Liar -- Facebook Puzzle

This was an interesting puzzle from Facebook. See the code below; i do not include unit test code here. You should be able to create a project in an IDE such as Eclipse (which is what I use) and create a class with the code below in the class. Similarly, you can use Eclipse to create the unit test code.

By the way, the code below is not completely robust though it works in several cases; a few more TODOs to be addressed that you see in the code. I got tired and so have put this code out without completing it.

For preciseness, the situation that the code does not address is when you lack commonality while processing each group of accuser and accused people leading to a disjoint group of liars and non-liars. The code works fine otherwise.

PUZZLE
As a newbie on a particular internet discussion board, you notice a distinct trend among its veteran members; everyone seems to be either unfailingly honest or compulsively deceptive. You decide to try to identify the members of the two groups, starting with the assumption that every senior member either never lies or never tells the truth. You compile as much data as possible, asking each person for a list of which people are liars. Since the people you are asking have been around on the board for a long time, you may assume that they have perfect knowledge of who is trustworthy and who is not. Each person will respond with a list of people that they accuse of being liars. Everyone on the board can see that you are a tremendous n00b, so they will grudgingly give you only partial lists of who the liars are. Of course these lists are not to be taken at face value because of all the lying going on.

You must write a program to determine, given all the information you've collected from the discussion board members, which members have the same attitude toward the telling the truth. It's a pretty popular discussion board, so your program will need to be able to process a large amount of data quickly and efficiently.

------------------------------------------------------------------------------------
CODE:

/**
* THIS PROGRAM SOLVES A PUZZLE THAT WAS GIVEN ON FACEBOOK; DETAILS BELOW.
*
* Your program must take a single command line argument; the name of a file. It must
* then open the file and read out the input data. The data begins with the number of
* veteran members n followed by a newline. It continues with n chunks of information,each
* defining the accusations made by a single member. Each chunk is formatted as follows:
*
* followed by m lines each containing the name of one member that the accuser says is a
* liar. accuser name and m are separated by some number of tabs and spaces. m will
* always be in [0, n]. All member names contain only alphabetic characters and are
* unique and case-sensitive.
*
*
*/
package com.farooq;

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.ArrayList;

/**
* @author fanjum
*
*/
public class LiarDetection {
// a few TO DO's remain before this can be called robust code
//The basic algorithm works and has been tested.

/**
* @param args
*/

private ArrayList realGroupOne = new ArrayList();
private ArrayList realGroupOneOpposite = new ArrayList();
private ArrayList dummyGroupOne = new ArrayList();
private ArrayList dummyGroupOneOpposite = new ArrayList();

boolean doneWithAll; // to check if all people have been categorized
int lineNumber; // to maintain info about the number of lines read from file.
static String fileName; // the file which contains the detailed info
int numberOfPeople;
FileReader fileToRead; // file reader to be kept open until done
BufferedReader filebuffer;
int dummyCounter; // to split groups in the dummyGroup
final int DUMMY_INIT = 1000;

public static void main(String[] args) {

// get the file name;
fileName = "C:\\tempfiles\\file_liar_liar.txt";

// get the object initialization done
LiarDetection LD = new LiarDetection();

//now divide the people into two arrays associated with created LD object

LD.dividePeople();

System.out.println("WE ARE DONE");
//PRINT TWO REAL ARRAY LISTS IF YOU WANT and check.
for (int i=0;i System.out.println("next in this group is " +LD.realGroupOne.get(i));
}
for (int i=0; i < LD.realGroupOneOpposite.size(); i++) {
System.out.println("next in opposite groups i " +LD.realGroupOneOpposite.get(i));
}
}

public LiarDetection() {

String first_line, next_line;
String first_part, second_part; // to read the accuser name and number accused
int space_index; //to determine where accuser name ends and number accused begins
int numAccused; //to determine number accused
dummyCounter =DUMMY_INIT; // initializing dummyCounter to 1000;
//open the file and keep it open

try {
fileToRead = new FileReader(fileName);
filebuffer = new BufferedReader(fileToRead);

//read teh file and get the number numberOfPeople;

first_line = filebuffer.readLine();
numberOfPeople = Integer.valueOf(first_line).intValue();

// read the first block with member name, number accused m and accused names
lineNumber = 1; // to signify that one line was read
next_line = filebuffer.readLine().trim();
space_index = next_line.indexOf(" ");
first_part = next_line.substring(0, space_index); //name of accuser
second_part = next_line.substring(space_index).trim(); //number of accused in String
numAccused = Integer.valueOf(second_part).intValue(); //convert to int
realGroupOne.add(first_part);

for (int i=1; i<=numAccused; i++) {
//put all this in one of the two real ArrayLists;
//associate lineNumber with number of lines read so you know where to start next
next_line = filebuffer.readLine().trim();
realGroupOneOpposite.add(next_line);
}
} catch (Exception e) {
System.out.println("have an exception" +e);
}

}

public void dividePeople() {
// we start to divide people into two camps
// we start with 2 since we have read the accusation of first member
for (int mem_number=2; mem_number<=numberOfPeople; mem_number++) {
//make a call to see if we are done;
doneWithAll = checkIfDone();

if (!doneWithAll) {
//not divided all, so go to read accusation of mem_number
//A. read name of next member and the accusation list as String and String[]
//B. put the String and String[] in proper ArrayLists real or dummy and return here
//next method does both A and B above
divideNextGroup();
} else {
//we have divided all and so we are done
//close FILEREADER
try {
if (fileToRead != null)
fileToRead.close();
} catch (Exception e) {

}
break;
}
}

}
public boolean checkIfDone() {
// check if realGroupOne and realGroupOneOpposite have all n members with no dups
//if dups then a problem
//also the dummyGroupOne and its pair should have no members.
if (realGroupOne.size() + realGroupOneOpposite.size() == numberOfPeople) {
// we are done
return true;
} else {
return false;
}

}

public void divideNextGroup(){
String next_line;
String accuser_name, number_accused; // to read the accuser name and number accused
int numAccused; //number accused as an int
int space_index; //to determine where accuser name ends and number accused begins
String [] accused = new String[numberOfPeople];
boolean dividedIntoGroup; // true when the group members are put into one of four arrayLists.

try {
//read next member into String accuser_name
//read list of accused into String[] accused; accuser_name and accused go into different groups
next_line = filebuffer.readLine().trim();
space_index = next_line.indexOf(" ");
accuser_name = next_line.substring(0, space_index); //name of accuser
number_accused = next_line.substring(space_index).trim(); //number of accused in String
numAccused = Integer.valueOf(number_accused).intValue(); //convert to int
for (int i=0;i accused[i] = filebuffer.readLine().trim();
}

//check if accuser in realGroupOne or realGroupOneOpposite and if so
// ensure that all accused are in the other group; ensure no duplicates while adding.
dividedIntoGroup = isGroupInRealList(accuser_name, accused,numAccused);

if (!dividedIntoGroup) {
// not divided into realList, so check dummyList
dividedIntoGroup = isGroupInDummyList(accuser_name, accused,numAccused);

if (!dividedIntoGroup) {
//still not divided into group, so we put this in the dummyLists separated by an ID
putGroupInDummyList(accuser_name, accused, numAccused);
}
}

}catch (Exception e){

}

// when here the group has been divided in either the real group or the dummy group; so done
}

public boolean isGroupInRealList(String accuserName, String[] accusedList, int numberAccused) {

boolean inList = false;

// 6. check if A is IN any of the two real ArrayLists;
// 7a. If A in any of two real ArrayLists, add (B,C,...N) to the paired AL GO TO 4A.
// 7b. If A in none of two real ArrayLists, GO TO 8.
// 8. check if any of (B,C,..N) is in any of two real ArrayLists.
// 9a. If (B,C,..N) in any of two real ArrayLists add others to same AL,
// add A to paired AL, GO TO 4A.

// ALSO CHECK FOR INCONSISTENCY AND THROW AN EXCEPTION OR WHATEVER:
if (realGroupOne.contains(accuserName)) {
// accuser is in GroupOne, so add all in accusedList to other without dups
inList = true;
for (int i=0;i if (realGroupOne.contains(accusedList[i])) {
//this is a problem, should not happen
}
if (realGroupOneOpposite.contains(accusedList[i])) {
//accused name is already in list and so go to next
continue;
} else {
realGroupOneOpposite.add(accusedList[i]);
}
}

} else if (realGroupOneOpposite.contains(accuserName)) {
//accuser is in GroupOneOpposite so add all in accusedList to other without dups
inList = true;
for (int i=0;i if (realGroupOneOpposite.contains(accusedList[i])) {
//this is a problem, should not happen
}
if (realGroupOne.contains(accusedList[i])) {
//accused name is already in list and so go to next

} else {
realGroupOne.add(accusedList[i]);
}
}

} else {

//so accuser is not in either of real lists; so check if accused names are in either
for (int i=0;i //check if each accused in either of real lists
if (realGroupOne.contains(accusedList[i])) {
//add all the rest of the accusedList to GroupOne without duplicates
//and add accuser to other group
inList = true;
for (int j=i+1;j //add rest of group to GroupOne
if (!(realGroupOne.contains(accusedList[j]))) {
//add to this group
realGroupOne.add(accusedList[j]);
}
}
realGroupOneOpposite.add(accuserName);

break;
}
if (realGroupOneOpposite.contains(accusedList[i])) {
//add all the rest of the accusedList to GroupOne without duplicates
//and add accuser to other group
inList = true;
for (int j=i+1;j //add rest of group to GroupOne without duplicates
if (!(realGroupOneOpposite.contains(accusedList[j]))) {
//add to this group
realGroupOneOpposite.add(accusedList[j]);
}
}
realGroupOne.add(accuserName);

break;

}

}

}
return inList;
}

public boolean isGroupInDummyList(String accuserName, String[] accusedList, int numberAccused) {
int indexOfName, indexOfCounter;
int tempindex;
boolean indummylist = false;
// 10. check if A in any of two dummy ALs.
// 10a. If A in any of two dummy ALs, add (B, C,...N) to same block in paired dummy AL.
// GO TO 4A.
// 10b. check if any of B,C, ...N in any of two dummy ALs.
// 10c. Merge all the blocks that contain any of B,C, ...N into one TEMPBLOCK with unique elements.
// also remove the blocks when you put them into TEMPBLOCK;
// 10d. for every block that contains B,C,..N, merge the corresponding block in other
// AL into one TEMPBLOCK with unique elements.
// 10e. get teh next counter, add to dummy AL and merge TEMPBLOCK back into dummy AL.
// each TEMPBLOCK should be in a different dummy AL.

if (dummyGroupOne.contains(accuserName)) {
// accusername is in this; find index and add to same subGroup in other dummy group
indexOfName = dummyGroupOne.indexOf(accuserName);
indummylist = true;
for (int j=indexOfName; j >=0; j--) {
// to go and figure out the counter index
try {
indexOfCounter = Integer.valueOf(dummyGroupOne.get(j)).intValue();
//so now indexOfCounter contains the counter number
tempindex = dummyGroupOneOpposite.indexOf(Integer.toString(indexOfCounter));

for (int i=0; i //have to check if the ith accused is in dummy list but we skip that now
//TO DO --- SO THIS IS TO BE FIXED
dummyGroupOneOpposite.add(tempindex+1,accusedList[i]);
}
} catch(NumberFormatException e) {
continue;
}
}

} else if (dummyGroupOneOpposite.contains(accuserName)) {
//accuserName is in this; find index and add to same subGroup in other dummy group
indexOfName = dummyGroupOneOpposite.indexOf(accuserName);
indummylist = true;
for (int j=indexOfName; j >=0; j--) {
// to go and figure out the counter index
try {
indexOfCounter = Integer.valueOf(dummyGroupOneOpposite.get(j)).intValue();
//so now indexOfCounter contains the counter number
tempindex = dummyGroupOne.indexOf(Integer.toString(indexOfCounter));

for (int i=0; i //have to check if the ith accused is in dummy list but we skip that now
//TO DO --- SO THIS IS TO BE FIXED
dummyGroupOne.add(tempindex+1,accusedList[i]);
}
} catch(NumberFormatException e) {
continue;
}
}
}

//now check if accused[] are in either of the groups, if so add accuser to other dummy group
//also ensure rest of accused[] are in the same subgroup

//TO DO -- check on this and complete this.

// TO DO -- ALSO MERGE REAL AND DUMMY LISTS;
return indummylist;
}

public void putGroupInDummyList(String accuserName, String[] accusedList, int numberAccused) {
// 11. If (B,C,..N) in none of four ALs add A to dummyAL_G1 and (B,C,..N)
// to dummyAL_G2 after counter

//adding accuser to dummy Group One after counter
dummyGroupOne.add(Integer.toString(dummyCounter));
dummyGroupOne.add(accuserName);

//adding accused to dummy Group OneOpposite after counter
dummyGroupOneOpposite.add(Integer.toString(dummyCounter));
for (int i=0; i< numberAccused; i++) {
dummyGroupOneOpposite.add(accusedList[i]);
}
dummyCounter++;

}
}

Puzzles in Facebook

I came across this page on Facebook's site. The page provides different puzzles with increasing level of complexity. Each puzzle requires both algorithmic as well as programming experience.

Planning to solve these to get me back in my programming groove. Over this weekend I was able to solve 2 puzzles with a "snack" level of difficulty completely. Have the algorithm worked out for a third and need to code it.

Now I plan to put the code up on the blog. I also plan to take these as examples to post on Google's app-engine and also address scalability by leveraging concepts such as MapReduce, BigTable. But all this planned for the future. Let us see how much I can get done in the "infinite" amount of free time that I have.

Systems View

Sunday, December 6, 2009

Breathalyzer -- Facebook puzzle

Input Specification

Output Specification

Liar, Liar -- Facebook Puzzle

Puzzles in Facebook

Followers

Blog Archive

About Me