Monday, February 22, 2010

social and Google -- Who I am, who do I know, what do I do

What does social mean to Google? Who I am, who do I know, what do I do. 

Check out the above link for details but the above seems a nice pithy answer.

getting ready to leave Qualcomm

Hi all, getting ready to leave Qualcomm. Planning to join a startup in the machine to machine wireless space. Will have more articles related to this space going forward.

Tuesday, January 26, 2010

communication nirvana -- email?

Email was supposed to be cool but then it has degenerated to something akin to a curse. We have examples of several people declaring email free days, email bankruptcy etc. What went wrong?  To quote from an old article in CNN

As software goes, e-mail is almost socialist: From each according to his ability, to each whether or not he needs it.
Here's where e-mail's socialism turns from strength to weakness: It doesn't matter if the message comes from a spammer hawking Viagra, your wife asking you to pick up some wine, your boss telling the company that Monday is a holiday, or a client asking for a meeting at his office at 11 a.m. In today's inboxes, all e-mail messages are equal.
That even the most tech-savvy among us are unable to cope indicates an underlying problem: E-mail has become a crutch, a way of passing the buck. Want to make an appointment? That's 10 messages back and forth. Then there are corporate updates, birthday announcements, forwarded jokes, and (if you're me) the occasional amorous ditty.

So mankind is in need of a solution to this problem. Is there a market for this? To quote the same article in CNN,
 According to the Radicati Group, a market research firm, there are about 1.2 billion e-mail users and 1.8 billion active e-mail accounts worldwide. And in much of Asia and Latin America, Internet usage is still low. When those people show up in full force, e-mail traffic is going to increase exponentially.
And to get the size of the market from another recent GigaOm article
The Radicati Group, a Palo Alto, Calif.-based research company, recently released a study showing that the number of worldwide email users will increase to almost 1.9 billion by 2013 compared with over 1.4 billion in 2009. Radicati projects that worldwide email traffic will reach 507 billion messages per day by 2013, almost double the 2009 figure of 247 billion messages per day.

What can be an approach to solve this problem? From the same article
In reality, of course, some are more equal than others. Spam, alerts, and calendar items all need to be treated separately. A smart inbox would -- all in one interface -- catch spam in junk filters, display the wine reminder in an IM, move company news to an RSS feed, and intelligently negotiate appointment requests with your calendar in the background.
Hmm, a problem that needs a solution. Why did I start thinking of this. Two words -- Google Wave. Will that solve this problem? Probably will address that in a separate article. But is this the only option out there. Not really. Some others of interest seem to be  Postbox, SenderOk (division of WebCEO), Xobni, Zimbra etc.

The feature of Postbox that caught my eye was the to-do tagging. This seems to give me an ability to add task related comments to my email. Have to check that out.

SenderOk seems to have some cool features such as the ability to sort email according to your past behavior and the behavior of others in the network. Algorithms mark received email into either VIP category that need to be addressed asap or in regular category that can be ignored for a while. They also seem to provide social networking profiles in the email header pane.  They also claim to provide real time anonymous statistics to companies on what really happens to their email and whether they can reach people another way (such as starting a conversation via the social networking portal inside the email header pane). Interesting.

Have to check the above approaches out.



Saturday, January 23, 2010

Weather data based startups

Information about the weather is useful for many industries such as utility, transportation and even retail. A utility can decide on the resource usage to generate power based on the weather conditions. A transport company can use the weather information for planning purpose while a retailer can use the weather information to stock up on appropriate material.

In fact weather data can be the basis of creation of many applications as this article from GigaOm points out.

Smart Grids and real time information

FCC is currently focusing on smart grid. To quote GigaOM
Those recommendations will include how to promote open standards and commercial networks, how to use policies to encourage utilities to provide their customers with real-time open access to energy data and potential ways to use federal spectrum bands for utilities’ smart grid deployments.
This should be of interest to startups such as OnRamp Wireless.

LTE related startups

Promising opportunity for startups who can improve the experience on a LTE network. Approximately 1.3 billion dollars to be provided to startups who can focus on problems faced by the core LTE infrastructure. More details at GigaOm.

Sunday, December 6, 2009

Breathalyzer -- Facebook puzzle

More puzzles; in this post I provide the code for another puzzle provided on Facebook. Note that the unit test code is not included here. You can copy the code below after creating a project in an IDE such as Eclipse (which is what I use).

By the way, this puzzle was interesting. I did spend some time thinking of the algorithm to solve the problem but then started Googling for it. Found out about LevenshteinDistance which solves this problem nicely. Have to confess though that my thoughts on solving this problem were focused on using a tree structure. After finding the solution (and also the code for that) I gave up on that approach.

So given below is the problem and the code. The code contains two different functions for getting the Levenshtein Distance. Was having problems with the first function and hence included the second; now have kept both.

By the way see the example below. The word "tihs" has "ties" as the closest although in the context of the example sentence I was not seeing this until I used the program to clear my mental block. An example of computer not having to deal with the "mental block".


PROBLEM: (from Facebook page)

To safeguard against the dreaded phenomenon of wall posting while drunk, Facebook is implementing a feature that detects when post content is too garbled to have been done while sober and informs the user that they need to take an online breathalyzer test before being allowed to post.

Unfortunately, there is far too much content for a given set of persons to evaluate by hand. Fortunately, you are a programmer of some repute and can write a program that processes and evaluates wall posts. You realize that such a program would be of great use to society and intend to resolve the problem once and for all. The program you write must compute a score for a body of text, returning this score upon completion.

Your program will be given a list of accepted words and run on one wall post at a time. For each word W in the post, you must find word W' from the list of accepted words such that the number of changes from W to W' is minimized. It is possible that W is already W' and thus the number of changes necessary is zero. A change is defined as replacing a single letter with another letter, adding a letter in any position, or removing a letter from any position. The total score for the wall post is the minimum number of changes necessary to make all words in the post acceptable.


Input Specification

Your program must take a single string argument, representing the file name containing the wall post to analyze. In addition, your program must open up and read the accepted word list from the following static path location:
/var/tmp/twl06.txt
For testing purposes, you may download and examine the accepted word list here. When submitting your code, you do not need to include this file, as it is already present on the machine.

The input file consists entirely of lower case letters and space characters. You are guaranteed that the input file will start with a lower case letter, and that all words are separated by at least one space character. The file may or may not end with a new line character.

Example input file:
tihs sententcnes iss nout varrry goud
You are guaranteed that your program will run against well formed input files and that the accepted word list is identical to the one provided for testing.


Output Specification

Your program must print out the minimum number of changes necessary to turn all words in the input wall post into accepted words as defined by the word list file. Words may not be joined together, or separated into multiple words. A change in a word is defined as one of the following:
  1. Replacing any single letter with another letter.
  2. Adding a single letter in any position.
  3. Removing any single letter.
This score must be printed out as an integer and followed by a single new line.

Example Output (newline after number):
8




------------------------------------------------------------------------------------------

package com.farooq;

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.ArrayList;

public class Word_distance {

/**
* @param args
*/
private ArrayList legal_word_list = new ArrayList();

public static void main(String[] args) {

String[] sentence_to_check = new String[args.length];
int Breath_analyzer_number; // the result we are looking for

//read the sentence which has to be tested from command line/file etc
if (args.length > 0) {
sentence_to_check = args;
}

//create an instance and determine the Levenshtein distance
Word_distance sentence_LD = new Word_distance();

Breath_analyzer_number = sentence_LD.sentenceAnalysis(sentence_to_check);
System.out.println("The number for the given sentence is " +Breath_analyzer_number);

}

public Word_distance() {
String filename = "C:\\tempfiles\\word_list_Levenshtein.txt";
String line_read;
try {
FileReader word_list_file = new FileReader(filename);
BufferedReader word_buffer = new BufferedReader(word_list_file);

while ((line_read = word_buffer.readLine()) != null) {
//so reading the list of "legal" words into the ArrayList
legal_word_list.add(line_read);
}
word_list_file.close();

} catch (Exception e) {
System.out.println("Exception is " +e);
}
}

public int sentenceAnalysis(String[] sentence) {

// get each word of sentence and determine Levenshtein number for word
//have to iterate the given word over entire legal_word_list
//and determine lowest value; if zero then break, repeat for all words
int total_distance =0;
int[] word_distance = new int[sentence.length] ; //minimum word_distance for each word in the sentence
int temp_distance;


for (int i=0;i< sentence.length; i++) {

for (int j=0; j< legal_word_list.size();j ++) {
temp_distance = getLevenshteinDistance(sentence[i], legal_word_list.get(j).toLowerCase());
// temp_distance = LD(sentence[i].trim(), legal_word_list.get(j).toLowerCase());

if (temp_distance ==0) {
word_distance[i] = temp_distance;
break;
} else if (j==0) {
word_distance[i] = temp_distance;
} else if (word_distance[i] > temp_distance) {
word_distance[i] = temp_distance;
} else {
//DO NOTHING as new word distance is larger than what we had
}
}
total_distance = total_distance + word_distance[i];
}
return total_distance;
}


// the following code copied from http://www.merriampark.com/ldjava.htm by Chas Emerick
public int getLevenshteinDistance (String s, String t) {
if (s == null || t == null) {
throw new IllegalArgumentException("Strings must not be null");
}

/*
The difference between this impl. and the previous is that, rather
than creating and retaining a matrix of size s.length()+1 by t.length()+1,
we maintain two single-dimensional arrays of length s.length()+1. The first, d,
is the 'current working' distance array that maintains the newest distance cost
counts as we iterate through the characters of String s. Each time we increment
the index of String t we are comparing, d is copied to p, the second int[]. Doing so
allows us to retain the previous cost counts as required by the algorithm (taking
the minimum of the cost count to the left, up one, and diagonally up and to the left
of the current cost count being calculated). (Note that the arrays aren't really
copied anymore, just switched...this is clearly much better than cloning an array
or doing a System.arraycopy() each time through the outer loop.)

Effectively, the difference between the two implementations is this one does not
cause an out of memory condition when calculating the LD over two very large strings.
*/

int n = s.length(); // length of s
int m = t.length(); // length of t

if (n == 0) {
return m;
} else if (m == 0) {
return n;
}

int p[] = new int[n+1]; //'previous' cost array, horizontally
int d[] = new int[n+1]; // cost array, horizontally
int _d[]; //placeholder to assist in swapping p and d

// indexes into strings s and t
int i; // iterates through s
int j; // iterates through t

char t_j; // jth character of t

int cost; // cost

for (i = 0; i<=n; i++) {
p[i] = i;
}

for (j = 1; j<=m; j++) {
t_j = t.charAt(j-1);
d[0] = j;

for (i=1; i<=n; i++) {
cost = s.charAt(i-1)==t_j ? 0 : 1;
// minimum of cell to the left+1, to the top+1, diagonally left and up +cost
d[i] = Math.min(Math.min(d[i-1]+1, p[i]+1), p[i-1]+cost);
}

// copy current distance counts to 'previous row' distance counts
_d = p;
p = d;
d = _d;
}

// our last action in the above loop was to switch d and p, so p now
// actually has the most recent cost counts
return p[n];
}


// following code also from web site; not my code
//*****************************
// Compute Levenshtein distance
//*****************************

public int LD (String s, String t) {
int d[][]; // matrix
int n; // length of s
int m; // length of t
int i; // iterates through s
int j; // iterates through t
char s_i; // ith character of s
char t_j; // jth character of t
int cost; // cost

// Step 1

n = s.length ();
m = t.length ();
if (n == 0) {
return m;
}
if (m == 0) {
return n;
}
d = new int[n+1][m+1];

// Step 2

for (i = 0; i <= n; i++) {
d[i][0] = i;
}

for (j = 0; j <= m; j++) {
d[0][j] = j;
}

// Step 3

for (i = 1; i <= n; i++) {

s_i = s.charAt (i - 1);

// Step 4

for (j = 1; j <= m; j++) {

t_j = t.charAt (j - 1);

// Step 5

if (s_i == t_j) {
cost = 0;
}
else {
cost = 1;
}

// Step 6

d[i][j] = Minimum (d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1] + cost);

}

}

// Step 7

return d[n][m];

}



//****************************
// Get minimum of three values
//****************************

private int Minimum (int a, int b, int c) {
int mi;

mi = a;
if (b < mi) {
mi = b;
}
if (c < mi) {
mi = c;
}
return mi;

}




}