Dictionary Lookup Web Service

by Chris 10. July 2009 12:28

It's time to continue my series (I Want More Mobile (Web) Services and Flight Lookup Web Service, Package Lookup Web Service, and Movie Lookup Web Service) of useful Web Services for Windows Mobile with a service to look up definitions of words in a dictionary. I sometimes want to get the definition of a word that I've stumbled on, and since I'm often in a discussion or reading something when this happens, I want to do it wherever I am at that moment. Since a good dictionary is extensive (also regularly updated) and therefore not very suitable to install/store on a mobile device, a simple client to a web service would be a nice solution. As for the other Web Services in this series, you can get the code on CodePlex in a project called Windows Mobile Web Services.

dictlookup As in previous posts, you find the UX on the right, and this is the code behind the right menu button (somewhat simplified)...

WebServices.Service ws = new WebServices.Service();
definitionTextBox.Text = ws.DictionaryLookup(wordTextBox.Text).Replace("\n", "\r\n");

...so when a word is entered in the text box, the following code on the server is run...

[WebMethod]
public string DictionaryLookup(string word)
{
   
const string host = "www.dict.org";
   
const int port = 2628;
   
const string database = "wn";

   
// Connect to dictionary (RFC 2229) server
    TcpClient tc = new TcpClient(host, port);
   
StreamReader sr = new StreamReader(tc.GetStream());
   
DictServerStatus status = new DictServerStatus(sr.ReadLine());
   
if(status.Code != (int)DictServerStatusCodes.Banner)
       
throw new Exception(status.ToString());

   
// Send command (DEFINE)
    string command = string.Format("DEFINE \"{0}\" \"{1}\"\r\n", database, word);
   
Encoding enc = System.Text.Encoding.ASCII;
    tc.GetStream().Write(enc.GetBytes(command), 0, enc.GetBytes(command).Length);

   
// Get definition(s) for word
    status = new DictServerStatus(sr.ReadLine());
   
if(status.Code != (int)DictServerStatusCodes.Definitions)
       
throw new Exception(status.ToString());

   
// Parse definitions
    int definitionCount = Convert.ToInt32(status.Message.Substring(0,
        status.Message.IndexOf(" ") + 1));
   
Hashtable definitions = new Hashtable(definitionCount);
   
for(int i = 0; i < definitionCount; i++)
    {
        status =
new DictServerStatus(sr.ReadLine());
       
if(status.Code == (int)DictServerStatusCodes.Definition)
        {
           
string responseText = string.Empty;
           
string s = sr.ReadLine();
           
while(s != ".")
            {
                responseText += s +
"\r\n";
                s = sr.ReadLine();
            }
           
Match definitionLine = Regex.Match(status.Message, "^\\\"(?<word>[^\\\"]+)\\\"\\s+(?<database>[\\S]+)\\s+\\\"(?<dbname>[^\\\"]+)\\\"$");
            definitions.Add(definitionLine.Groups[
"database"].ToString(),
                responseText);
        }
       
else
            throw new Exception(status.ToString());
    }
   
string definition = string.Empty;
    status =
new DictServerStatus(sr.ReadLine());
   
if(status.Code == (int)DictServerStatusCodes.Ok)
        definition = definitions[database].ToString();
   
else
        throw new Exception(status.ToString());

   
return definition;
}

...and in contrast to the previous posts in this series, this Web Service isn't using "scraping" to get the info from a Web page. Instead, the method used to access the dictionary is the standard Dictionary Server Protocol (RFC 2229). It's a text-based protocol (not HTTP, so plain sockets are used) that defines a range of dictionary functionality, and the above code is just using the ability to lookup (command DEFINE) a word in a dictionary.

The dictionary used is the WordNet database located at the DICT Development Group site.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , ,

Chris | Compact Framework | WM Web Services

Movie Lookup Web Service

by Chris 8. June 2008 14:37

I continue my series (I Want More Mobile (Web) Services and Flight Lookup Web Service, and Package Lookup Web Service) of useful Web Services for Windows Mobile with a service to check movie info. I often find myself in discussions about movies, and have often wished for a simple app where I can get some quick info about a movie. My effort is a growing creation on CodePlex called Windows Mobile Web Services where you get to the code.

moviescrape As in previous posts, you find the UX on the right, and if we start on the client side with the right menu button (somewhat simplified)...

private void lookupMenuItem_Click(object sender, EventArgs e)
{
    if(lookupMenuItem.Text == "Lookup")
    {
       
Movie[] movies = null;
       
try
        {
           
Service ws = new Service();
           
//ws.Credentials = new NetworkCredential("uid", "pwd");
            movies = ws.MovieLookup(titleTextBox.Text);
        }
       
catch(Exception ex)
        {
            MessageBox.Show(ex.Message);
        }
        if(movies != null && movies.Length > 0)
        {
            titleComboBox.Items.Clear();
           
foreach(Movie movie in movies)
                titleComboBox.Items.Add(movie);
            titleComboBox.Visible =
true;
            titleTextBox.Visible =
false;
            lookupMenuItem.Text =
"Back";
        }
    }
   
else
    {
        titleTextBox.Text =
string.Empty;
        titleTextBox.Visible =
true;
        titleComboBox.Visible =
false;
        lookupMenuItem.Text =
"Lookup";
    }
}

...so when a title is entered in the text box, a search is made with the text box replaced with a combo box containing the search results. When an item in the combo box is selected...

private void titleComboBox_SelectedIndexChanged(object sender, EventArgs e)
{
   
Movie movie = titleComboBox.SelectedItem as Movie;
    resultTextBox.Text =
string.Format("Title: {0}\r\nYear: {1}\r\n\r\nPlot: {2}",
        movie.Title, movie.Year, movie.Description);
   
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(movie.ImageUrl);
   
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
    pictureBox.Image =
new Bitmap(resp.GetResponseStream());
}

...the movie info is shown along with the box shot. On the server, the code looks like this...

[WebMethod]
public Movie[] MovieLookup(string movieTitle)
{
   
string url = string.Format("http://www.imdb.com/find?s=all&q={0}", movieTitle);
   
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
   
StreamReader responseReader = new StreamReader(request.GetResponse().GetResponseStream());
   
string responseData = responseReader.ReadToEnd();
    responseReader.Close();

   
int i = responseData.IndexOf("<b>Popular Titles</b>");
   
if(i <= 0)
       
return new Movie[] { };
   
int j = responseData.IndexOf("</p>", i);

   
string table = Regex.Match(responseData.Substring(i, j - i), "<table.*?>(.*?)</table>").ToString();

   
var q = from Match match in Regex.Matches(table,
                "<a\\shref=[\"\"\"\"'](?<url>.*?)[\"\"\"\"'].*?>(?<title>.*?)</a>")
           
where !match.Groups["title"].ToString().Contains("<img")
           
select new Movie { ID = Regex.Match(match.Groups["url"].ToString(),
                "(?<=\\w\\w)\\d\\d\\d\\d\\d\\d\\d").ToString(),
                Title =
HttpUtility.HtmlDecode(match.Groups["title"].ToString()) };

   
Movie[] movies = q.ToArray<Movie>();

   
foreach(Movie movie in movies)
    {
        url =
string.Format("http://www.imdb.com/title/tt{0}", movie.ID.ToString());
        request =
WebRequest.Create(url) as HttpWebRequest;
        responseReader =
new StreamReader(request.GetResponse().GetResponseStream());
        responseData = responseReader.ReadToEnd();
        responseReader.Close();

       
string title = Regex.Match(responseData, "(?<=<(title)>).*(?=<\\/\\1>)").ToString();
        movie.Title =
HttpUtility.HtmlDecode(Regex.Match(title, ".*(?=\\s\\(\\d+.*?\\))").ToString()).Replace(
            "\"", string.Empty);
        movie.Year =
Regex.Match(title, "(?<=\\()\\d+(?=.*\\))").ToString();
        movie.ImageUrl =
Regex.Match(Regex.Match(responseData, "(?<=\\b(name=\"poster\")).*\\b[</a>]\\b").ToString(),
           
"(?<=\\b(src=)).*\\b(?=[</a>])").ToString().Replace("\"", string.Empty).Replace("/></", string.Empty);

       
try
        {
           
if(movie.Title.Contains("(VG)"))
            {
                i = responseData.IndexOf(
"<h5>Plot Summary:</h5>") > 0 ?
                    responseData.IndexOf("<h5>Plot Summary:</h5>") :
                    responseData.IndexOf(
"<h5>Tagline:</h5>");
               
if(i > 0) j = responseData.IndexOf("</div>", i);
            }
           
else
            {
                i = responseData.IndexOf(
"<h5>Plot:</h5>") > 0 ? responseData.IndexOf("<h5>Plot:</h5>") :
                    responseData.IndexOf(
"<h5>Plot Summary:</h5>");
               
if(i <= 0) i = responseData.IndexOf("<h5>Plot Synopsis:</h5>");
               
if(i > 0) j = responseData.IndexOf("<a class=", i);
               
if(j <= 0)
                    j = responseData.IndexOf(
"</div>", i);
            }
           
string plotOutline = responseData.Substring(i, j - i).Remove(0, "<h5>Plot:</h5> ".Length);
            plotOutline =
HttpUtility.HtmlDecode(plotOutline);
            movie.Description =
Regex.Replace(plotOutline.Contains("is empty") ||
                plotOutline.Contains("View full synopsis")
                ?
string.Empty : plotOutline, "<a.*?href=[\"'](?<url>.*?)[\"'].*?>(?<name>.*?)</a>", string.Empty);
        }
       
catch
        {
            movie.Description =
string.Empty;
        }
    }
   
return movies;
}

...and as you can see, I'm using the excellent IMDB to get the movie info. Of course, there's a lot more info to be retrieved about a movie, its actors, etc. My recommendation if you want to build further on this example, take a look at Imdb Service project on Codeplex.

Note that we save a lot of coding by the extensive use of regular expression (regex) to extract the data from the web page. First, a request is made to do a search for the movie title, and from the search results the matching movies in the category "popular titles" is captured as a list of movies. Then, for each movie in the list a request is made to the movie details page, and each movie object is updated with the data about the movie (as well as the image URL). Then all is returned to the client, and note that the nicely typed movie data is kept in a small helper class...

public class Movie
{
   
public string ID { get; set; }
   
public string Title { get; set; }
   
public string Year { get; set; }
   
public string ImageUrl { get; set; }
   
public string Description { get; set; }
}

...which is transferred to the client via the Web Service proxy. For more details, check out the project on CodePlex.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , ,

Chris | Compact Framework | WM Web Services

Package Lookup Web Service

by Chris 18. May 2008 14:41

I continue my series (I Want More Mobile (Web) Services and Flight Lookup Web Service) of useful Web Services for Windows Mobile with a service to check on package delivery status. If you are like me, you are inpatient when you wait for a package to arrive. I'm almost always waiting for another package to arrive, and when I have a few moments to spare, I come to think of the latest package and want to check where it was last seen. My effort is a growing creature on CodePlex called Windows Mobile Web Services where you get to the code.

packagescrape As in previous posts, you find the UX on the right, and the client code should be familiar if you've seen the previous posts...

WebServices.Service ws = new WebServices.Service();
resultTextBox.Text = ws.PackageLookup(packageTextBox.Text).Replace("\n", "\r\n");

...and on the server, the code looks like this...

[WebMethod]
public string PackageLookup(string packageNo)
{
   
// Get DHL tracking page
    string url = string.Format("http://www.dhl.com/cgi-bin/tracking.pl?AWB={0}", packageNo);
   
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
   
StreamReader responseReader = new StreamReader(request.GetResponse().GetResponseStream());
   
string responseData = responseReader.ReadToEnd();
    responseReader.Close();

   
Regex regex = new Regex("<a\\s*href=\\x23[0-9]*?><font(.|\\n)*?>(?<number>.*\\n?.*)</font(.|\\n)*?<a\\shref=\\\"" +
       
"(.|\\n)*?\\\">(?<origin>.*\\n?.*)</a(.|\\n)*?<a\\shref=\\\"(.|\\n)*?\\\">(?<destination>." +
       
"*\\n?.*)</a(.|\\n)*?face=\\\"arial\\\">(?<status>.*\\n?.*)<img*", RegexOptions.IgnoreCase);

   
// Extract using regex
    string s = null;
   
Match match = regex.Match(responseData);
   
if(match.Success)
    {
       
//match.Groups["number"];
        s = string.Format("Origin: {0}\nDest.: {1}\nStatus: {2}",
            match.Groups[
"origin"], match.Groups["destination"],
            match.Groups[
"status"].ToString().Replace("<BR>", "\n"));
    }
   
else
        throw new Exception("Not found, try again!");
   
   
return s;
}

...and as you can see, I'm looking up DHL shipments. It shouldn't be too hard to extend it to cover other couriers as well.

This time I have used a regular expression (regex) to extract the key information from the web page. As you can see in the code above, the code is simpler - if you understand the regex. The regex extracts four pieces of info: the shipping number (which is not used), the package pick up location (origin), the package drop off location (destination), and the current status of the shipment. Regular expression is a very powerful concept that can be used for many things, and it's especially helpful when scraping web pages. My favorite tools for working with regex are Expresso and Regulator, and Regulator is especially useful for .NET development as it's written using .NET (yes, unfortunately there are some minor differences between different implementations).

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , ,

Chris | Compact Framework | WM Web Services

I Want More Mobile (Web) Services

by Chris 12. January 2008 13:57

With VS2008 released this week, I should be diving into LINQ, WCF, and other cool stuff in the .NET CF 3.5. But it's already the end of the week, and it's time for the bigger picture. I want to say a few words about the apps that I'm missing, and all the great apps that I want to write. I use my WM device(s) every day, and there are so many things that I want to do, that I can't. Maybe I can, but it's still very difficult. Let me take a few examples. I was in my car the other day, and I wanted to look up the number to a friend. I started my browser, I searched, I got the link to the "white pages" site, I entered the name, ... I finally got the number, but it took too long, and I was already somewhat frustrated.

Person LookupI talked to a mobile veteran the other day, and I asked him what he misses the most, and his instant response was "a good mobile browser". I believe that he's right, there can definitely go more development into browser technology. But I think that the real problem is that browsing is no good on a small screen with low bandwidth. I think (know) there's another solution. As a developer, I have the liberty to write my own apps when I need some functionality on my device, and I do it with the best development tools in the world (that was just released this week). When there is a Web Service available with reasonable pricing, I just add a few controls to a form along with a Web Reference and a few lines of glue code, and I got my personal service in place (usually in a couple of minutes). There are great directory sites to find Web Services like XMmethods, RemoteMethods, APIfinder, etc, and companies like ServiceObjects, StrikeIron, CDYNE, WebservcieX, etc, are offering commercial Web Services. But even if the pricing is getting more reasonable (some services are a few cents per transaction), the number of available services are still very limited.

However, if there's no Web Service available, my only option is to get the information directly from the Web using a technique called "scraping". It's really nothing advanced, just the manipulation of the HTTP requests and responses that the browser natively handles from code. For example, let's say I want to put my earlier frustration when looking for my friend's number to an end. I would go out and look for a nice site for looking up numbers, like whitepages, and after analyzing the requests and responses (many times the "view source" in IE is sufficient, but there are also great tools like Fiddler), I write code similar to this...

// Get search result page
string
url = string.Format("http://www.whitepages.com/search/FindPerson?who={0}&where={1}", whoTextBox.Text, whereTextBox.Text);
HttpWebRequest request = WebRequest.Create(url)
as HttpWebRequest;
StreamReader responseReader = new StreamReader(request.GetResponse().GetResponseStream());
string responseData = responseReader.ReadToEnd();
responseReader.Close();

// More than one?
int i = 0;
if((i = responseData.IndexOf("results_multiple_widget_matching")) > -1)
{
    i = responseData.IndexOf(
"<strong>", i) + 8;
   
string matches = responseData.Substring(i, responseData.IndexOf("\n", i) - i);
    MessageBox.Show(
string.Format("{0} matches, refine search!", matches));
}
else
{
   
// Any
    if(responseData.IndexOf("class=\"fn n\">") < 0)
        MessageBox.Show(
"No matches, try again!");
   
else
    {
       
// Only one
        string s = extractValue(responseData, "fn n"); // name
        s += "\r\n" + extractValue(responseData, "street-address"); // street
        s += "\r\n" + extractValue(responseData, "locality"); // city
        s += ", " + extractValue(responseData, "region"); // state
        s += " " + extractValue(responseData, "postal-code"); // zip
        s += "\r\n" + extractValue(responseData, "tel"); // phone

        resultTextBox.Text = s;
    }
}

...and on the right you see the code in action (well, Don Box is not exactly my friend, but me and Andy sat down and talked to him after an event once, and that should count for something ;-)). The code for the private method looks like this...

private string extractValue(string s, string name)
{
   
string valueDelimiter = "class=\"" + name + "\">";

   
int valuePosition = s.IndexOf(valueDelimiter);
   
if(valuePosition < 0)
       
return string.Empty;
   
int startPosition = valuePosition + valueDelimiter.Length;
   
int endPosition = s.IndexOf("<", startPosition);

   
return s.Substring(startPosition, endPosition - startPosition);
}

...and even if helpers like this can be very useful, I clearly recommend the use of regex for more advanced scraping. Note that this approach still needs to download the whole page just to get to the data, and if the above code is running on the WM device the response time will be the same as accessing the site through the browser. That is why this code should be wrapped into a Web Service on a server somewhere, and then the WM device can access the Web Service with the few lines of code that I mentioned above for an already existing Web Service. Making the Web Service responsible for the actual scraping is also better because the clients doesn't need to be updated if there are changes to the source site.

This way, I can now easily look up both addresses and phone numbers to my friends, and it would be easy to add more functionality like calling the found phone number, etc. I should mention that whitepages has added a great mobile version of their site that you enter automatically when you go to their site with a WM device (read more about it, and see a demo), so they are no longer the best example of a service to scrape, but still, their mobile site takes much more bandwidth than the Web Service approach I described above. I don't know about you, but I prefer the clean look in the screenshot above compared to a cluttered web page.

There are so many simple services like this that I still miss, and to name a few apart from the above mentioned phone number lookup, I would like info about flights (delays), cinema search/booking, package tracking, etc, etc, etc. I guess an instant reverse phone number lookup would be a killer for anyone that wants to know who is calling right now, and the number is not in Contacts. What services are you missing?

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , ,

Chris | Compact Framework | WM Web Services

Powered by BlogEngine.NET 1.4.5.0
Theme by Mads Kristensen