Saturday, March 22, 2008

I continue my series (I Want More Mobile (Web) Services, Flight Lookup Web Service, and Package Lookup Web Service) of useful Web Services for Windows Mobile with a service to check movie info. I often find myself in discussions about movies, and have often wished for a simple app where I can get some quick info about a movie. My effort is a growing creation on CodePlex called Windows Mobile Web Services where you get to the code.

moviescrape As in previous posts, you find the UX on the right, and if we start on the client side with the right menu button (somewhat simplified)...

private void lookupMenuItem_Click(object sender, EventArgs e)
{
    if(lookupMenuItem.Text == "Lookup")
    {
       
Movie[] movies = null;
       
try
        {
           
Service ws = new Service();
           
//ws.Credentials = new NetworkCredential("uid", "pwd");
            movies = ws.MovieLookup(titleTextBox.Text);
        }
       
catch(Exception ex)
        {
            MessageBox.Show(ex.Message);
        }
        if(movies != null && movies.Length > 0)
        {
            titleComboBox.Items.Clear();
           
foreach(Movie movie in movies)
                titleComboBox.Items.Add(movie);
            titleComboBox.Visible =
true;
            titleTextBox.Visible =
false;
            lookupMenuItem.Text =
"Back";
        }
    }
   
else
    {
        titleTextBox.Text =
string.Empty;
        titleTextBox.Visible =
true;
        titleComboBox.Visible =
false;
        lookupMenuItem.Text =
"Lookup";
    }
}

...so when a title is entered in the text box, a search is made with the text box replaced with a combo box containing the search results. When an item in the combo box is selected...

private void titleComboBox_SelectedIndexChanged(object sender, EventArgs e)
{
   
Movie movie = titleComboBox.SelectedItem as Movie;
    resultTextBox.Text =
string.Format("Title: {0}\r\nYear: {1}\r\n\r\nPlot: {2}",
        movie.Title, movie.Year, movie.Description);
   
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(movie.ImageUrl);
   
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
    pictureBox.Image =
new Bitmap(resp.GetResponseStream());
}

...the movie info is shown along with the box shot. On the server, the code looks like this...

[WebMethod]
public Movie[] MovieLookup(string movieTitle)
{
   
string url = string.Format("http://www.imdb.com/find?s=all&q={0}", movieTitle);
   
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
   
StreamReader responseReader = new StreamReader(request.GetResponse().GetResponseStream());
   
string responseData = responseReader.ReadToEnd();
    responseReader.Close();

   
int i = responseData.IndexOf("<b>Popular Titles</b>");
   
if(i <= 0)
       
return new Movie[] { };
   
int j = responseData.IndexOf("</p>", i);

   
string table = Regex.Match(responseData.Substring(i, j - i), "<table.*?>(.*?)</table>").ToString();

   
var q = from Match match in Regex.Matches(table,
                "<a\\shref=[\"\"\"\"'](?<url>.*?)[\"\"\"\"'].*?>(?<title>.*?)</a>")
           
where !match.Groups["title"].ToString().Contains("<img")
           
select new Movie { ID = Regex.Match(match.Groups["url"].ToString(),
                "(?<=\\w\\w)\\d\\d\\d\\d\\d\\d\\d").ToString(),
                Title =
HttpUtility.HtmlDecode(match.Groups["title"].ToString()) };

   
Movie[] movies = q.ToArray<Movie>();

   
foreach(Movie movie in movies)
    {
        url =
string.Format("http://www.imdb.com/title/tt{0}", movie.ID.ToString());
        request =
WebRequest.Create(url) as HttpWebRequest;
        responseReader =
new StreamReader(request.GetResponse().GetResponseStream());
        responseData = responseReader.ReadToEnd();
        responseReader.Close();

       
string title = Regex.Match(responseData, "(?<=<(title)>).*(?=<\\/\\1>)").ToString();
        movie.Title =
HttpUtility.HtmlDecode(Regex.Match(title, ".*(?=\\s\\(\\d+.*?\\))").ToString()).Replace(
            "\"", string.Empty);
        movie.Year =
Regex.Match(title, "(?<=\\()\\d+(?=.*\\))").ToString();
        movie.ImageUrl =
Regex.Match(Regex.Match(responseData, "(?<=\\b(name=\"poster\")).*\\b[</a>]\\b").ToString(),
           
"(?<=\\b(src=)).*\\b(?=[</a>])").ToString().Replace("\"", string.Empty).Replace("/></", string.Empty);

       
try
        {
           
if(movie.Title.Contains("(VG)"))
            {
                i = responseData.IndexOf(
"<h5>Plot Summary:</h5>") > 0 ?
                    responseData.IndexOf("<h5>Plot Summary:</h5>") :
                    responseData.IndexOf(
"<h5>Tagline:</h5>");
               
if(i > 0) j = responseData.IndexOf("</div>", i);
            }
           
else
            {
                i = responseData.IndexOf(
"<h5>Plot:</h5>") > 0 ? responseData.IndexOf("<h5>Plot:</h5>") :
                    responseData.IndexOf(
"<h5>Plot Summary:</h5>");
               
if(i <= 0) i = responseData.IndexOf("<h5>Plot Synopsis:</h5>");
               
if(i > 0) j = responseData.IndexOf("<a class=", i);
               
if(j <= 0)
                    j = responseData.IndexOf(
"</div>", i);
            }
           
string plotOutline = responseData.Substring(i, j - i).Remove(0, "<h5>Plot:</h5> ".Length);
            plotOutline =
HttpUtility.HtmlDecode(plotOutline);
            movie.Description =
Regex.Replace(plotOutline.Contains("is empty") ||
                plotOutline.Contains("View full synopsis")
                ?
string.Empty : plotOutline, "<a.*?href=[\"'](?<url>.*?)[\"'].*?>(?<name>.*?)</a>", string.Empty);
        }
       
catch
        {
            movie.Description =
string.Empty;
        }
    }
   
return movies;
}

...and as you can see, I'm using the excellent IMDB to get the movie info. Of course, there's a lot more info to be retrieved about a movie, its actors, etc. My recommendation if you want to build further on this example, take a look at Imdb Service project on Codeplex.

Note that we save a lot of coding by the extensive use of regular expression (regex) to extract the data from the web page. First, a request is made to do a search for the movie title, and from the search results the matching movies in the category "popular titles" is captured as a list of movies. Then, for each movie in the list a request is made to the movie details page, and each movie object is updated with the data about the movie (as well as the image URL). Then all is returned to the client, and note that the nicely typed movie data is kept in a small helper class...

public class Movie
{
   
public string ID { get; set; }
   
public string Title { get; set; }
   
public string Year { get; set; }
   
public string ImageUrl { get; set; }
   
public string Description { get; set; }
}

...which is transferred to the client via the Web Service proxy. For more details, check out the project on CodePlex.

posted on Saturday, March 22, 2008 10:21:11 AM UTC  by Chris  #    Comments [0]
 Monday, January 07, 2008

I continue my series (I Want More Mobile (Web) Services and Flight Lookup Web Service) of useful Web Services for Windows Mobile with a service to check on package delivery status. If you are like me, you are inpatient when you wait for a package to arrive. I'm almost always waiting for another package to arrive, and when I have a few moments to spare, I come to think of the latest package and want to check where it was last seen. My effort is a growing creature on CodePlex called Windows Mobile Web Services where you get to the code.

packagescrape As in previous posts, you find the UX on the right, and the client code should be familiar if you've seen the previous posts...

WebServices.Service ws = new WebServices.Service();
resultTextBox.Text = ws.PackageLookup(packageTextBox.Text).Replace("\n", "\r\n");

...and on the server, the code looks like this...

[WebMethod]
public string PackageLookup(string packageNo)
{
   
// Get DHL tracking page
    string url = string.Format("http://www.dhl.com/cgi-bin/tracking.pl?AWB={0}", packageNo);
   
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
   
StreamReader responseReader = new StreamReader(request.GetResponse().GetResponseStream());
   
string responseData = responseReader.ReadToEnd();
    responseReader.Close();

   
Regex regex = new Regex("<a\\s*href=\\x23[0-9]*?><font(.|\\n)*?>(?<number>.*\\n?.*)</font(.|\\n)*?<a\\shref=\\\"" +
       
"(.|\\n)*?\\\">(?<origin>.*\\n?.*)</a(.|\\n)*?<a\\shref=\\\"(.|\\n)*?\\\">(?<destination>." +
       
"*\\n?.*)</a(.|\\n)*?face=\\\"arial\\\">(?<status>.*\\n?.*)<img*", RegexOptions.IgnoreCase);

   
// Extract using regex
    string s = null;
   
Match match = regex.Match(responseData);
   
if(match.Success)
    {
       
//match.Groups["number"];
        s = string.Format("Origin: {0}\nDest.: {1}\nStatus: {2}",
            match.Groups[
"origin"], match.Groups["destination"],
            match.Groups[
"status"].ToString().Replace("<BR>", "\n"));
    }
   
else
        throw new Exception("Not found, try again!");
   
   
return s;
}

...and as you can see, I'm looking up DHL shipments. It shouldn't be too hard to extend it to cover other couriers as well.

This time I have used a regular expression (regex) to extract the key information from the web page. As you can see in the code above, the code is simpler - if you understand the regex. The regex extracts four pieces of info: the shipping number (which is not used), the package pick up location (origin), the package drop off location (destination), and the current status of the shipment. Regular expression is a very powerful concept that can be used for many things, and it's especially helpful when scraping web pages. My favorite tools for working with regex are Expresso and Regulator, and Regulator is especially useful for .NET development as it's written using .NET (yes, unfortunately there are some minor differences between different implementations).

posted on Monday, January 07, 2008 6:18:42 AM UTC  by Your DisplayName here!  #    Comments [0]
 Tuesday, December 04, 2007

As a follow-up on my post I Want More Mobile (Web) Services, I will continue with something very common that I want to do with my WM device - look up flight status info. One of the best public services for that is FlightExplorer, and therefore I decided to scrape their site to get the info that I want. Just as I suggested in the previous post, I have now created a ASP.NET Web Service application that takes care of the actual scraping, and the mobile client simply capture the parameter, calls the web service, and present the result. A minimum of information is transferred over the wire, and therefore this solution is very fast and cost effective.

If you want to get right to the code, I have created my second CodePlex project called Windows Mobile Web Services. If you want to be part of the development, please let me know.

Starting with the front end, flightscrapeyou can see the UX on the right, and the client code looks like this...

WebServices.Service ws = new WebServices.Service();
resultTextBox.Text = ws.FlightLookup(flightTextBox.Text).Replace("\n", "\r\n");

...and on the server side, the code begins like this...

[WebMethod]
public string FlightLookup(string flight)
{
   
// Get session cookie and viewstate
    CookieContainer cookies = new CookieContainer();
    string url = "http://travel.flightexplorer.com/TrackFlight.aspx";
   
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
    request.UserAgent = USER_AGENT;
    request.CookieContainer = cookies;
   
StreamReader responseReader = new StreamReader(request.GetResponse().GetResponseStream());
   
string responseData = responseReader.ReadToEnd();
    responseReader.Close();

   
// Extract the viewstate value and build POST data
    string viewState = extractValue(responseData, "__VIEWSTATE");
   
string postData = String.Format("__VIEWSTATE={0}&leftcol1...flightNum={1}&FastTrack1...",
        viewState, flight);

   
// Now post to the same page
    request = WebRequest.Create(url) as HttpWebRequest;
    request.UserAgent = USER_AGENT;
    request.Method =
"POST";
    request.ContentType =
"application/x-www-form-urlencoded";
    request.CookieContainer = cookies;
   
StreamWriter requestWriter = new StreamWriter(request.GetRequestStream());
    requestWriter.Write(postData);
    requestWriter.Close();
    responseReader =
new StreamReader(request.GetResponse().GetResponseStream());
    responseData = responseReader.ReadToEnd();
    responseReader.Close();

...and this is a typical approach when scraping ASP.NET web applications (note, the above code is simplified to increase readability). First the page is requested without any parameters, just to get the session state cookie and the view state, and then a POST is made to the same page with the cookie and the form data created using the view state. When the resulting page content is captured, it's parsed like this...

    // Get flight status
    int i = 0;
   
string status = null;
   
string result = "Flight not found";
   
if(responseData.IndexOf(result) < 0)
    {
       
string s = "Status:";
       
if((i = responseData.IndexOf(">" + s + "<")) > -1)
        {
            i = responseData.IndexOf(
"class='ft1'>", i) + 12;
            status = responseData.Substring(i, responseData.IndexOf(
"<", i) - i);
            result =
string.Format("{0} {1}", s, status);
           
switch(status)
            {
               
case "Planned":
                    s =
"Departure in:";
                    i = responseData.IndexOf(
">" + s + "<", i);
                    i = responseData.IndexOf(
"FTText2' colspan=3>", i) + 19;
                    result +=
string.Format("\n{0} {1}", s,
                       
responseData.Substring(i, responseData.IndexOf("<", i) - i));
                   
break;
               
case "In Flight":
                    s =
"Time remaining:";
                    i = responseData.IndexOf(
">" + s + "<", i);
                    i = responseData.IndexOf(
"class='ft1'>", i) + 12;
                    result +=
string.Format("\n{0} {1}", s,
                       
responseData.Substring(i, responseData.IndexOf("<", i) - i));
                   
break;
            }
        }
    }
   
return result;
}

...and here is the helper method for extracting form values (it's very generic, and can be used in most ASP.NET scraping)...

private string extractValue(string s, string nameDelimiter)
{
   
string valueDelimiter = "value=\"";

   
int namePosition = s.IndexOf(nameDelimiter);
   
int valuePosition = s.IndexOf(valueDelimiter, namePosition);
   
if(namePosition < 0 || valuePosition < 0)
       
return string.Empty;
   
int startPosition = valuePosition + valueDelimiter.Length;
   
int endPosition = s.IndexOf("\"", startPosition);

   
return HttpUtility.UrlEncode(s.Substring(startPosition, endPosition - startPosition));
}

Finally, I also want to mention a detail in the client implementation. To ease entering the flight id, the input mode changes depending on number of characters entered. For the first three, the input mode is alphanumeric, and otherwise it's numeric. The code looks like this...

private bool inputNumeric = false;
private void flightTextBox_TextChanged(object sender, EventArgs e)
{
   
if(flightTextBox.Text.Length >= 3 && !inputNumeric)
    {
       
InputModeEditor.SetInputMode(flightTextBox, InputMode.Numeric);
        inputNumeric =
true;
    }
   
else if(flightTextBox.Text.Length < 3 && inputNumeric)
    {
       
InputModeEditor.SetInputMode(flightTextBox, InputMode.Default);
        inputNumeric =
false;
    }
}

...and the use of the private variable is just to prevent unnecessary input mode changes (the InputModeEditor class is in the Microsoft.WindowsCE.Forms namespace).

I will continue my quest for more, simple, but very usable Web Services, and any suggestions are most welcome. Just point me to a site and how it could be made available in an efficient way from a WM device...

posted on Tuesday, December 04, 2007 9:20:37 PM UTC  by Chris  #    Comments [0]
 Monday, November 26, 2007

With VS2008 released this week, I should be diving into LINQ, WCF, and other cool stuff in the .NET CF 3.5. But it's already the end of the week, and it's time for the bigger picture. I want to say a few words about the apps that I'm missing, and all the great apps that I want to write. I use my WM device(s) every day, and there are so many things that I want to do, that I can't. Maybe I can, but it's still very difficult. Let me take a few examples. I was in my car the other day, and I wanted to look up the number to a friend. I started my browser, I searched, I got the link to the "white pages" site, I entered the name, ... I finally got the number, but it took too long, and I was already somewhat frustrated.

Person LookupI talked to a mobile veteran the other day, and I asked him what he misses the most, and his instant response was "a good mobile browser". I believe that he's right, there can definitely go more development into browser technology. But I think that the real problem is that browsing is no good on a small screen with low bandwidth. I think (know) there's another solution. As a developer, I have the liberty to write my own apps when I need some functionality on my device, and I do it with the best development tools in the world (that was just released this week). When there is a Web Service available with reasonable pricing, I just add a few controls to a form along with a Web Reference and a few lines of glue code, and I got my personal service in place (usually in a couple of minutes). There are great directory sites to find Web Services like XMmethods, RemoteMethods, APIfinder, etc, and companies like ServiceObjects, StrikeIron, CDYNE, WebservcieX, etc, are offering commercial Web Services. But even if the pricing is getting more reasonable (some services are a few cents per transaction), the number of available services are still very limited.

However, if there's no Web Service available, my only option is to get the information directly from the Web using a technique called "scraping". It's really nothing advanced, just the manipulation of the HTTP requests and responses that the browser natively handles from code. For example, let's say I want to put my earlier frustration when looking for my friend's number to an end. I would go out and look for a nice site for looking up numbers, like whitepages, and after analyzing the requests and responses (many times the "view source" in IE is sufficient, but there are also great tools like Fiddler), I write code similar to this...

// Get search result page
string
url = string.Format("http://www.whitepages.com/search/FindPerson?who={0}&where={1}",
    whoTextBox.Text, whereTextBox.Text);
HttpWebRequest request = WebRequest.Create(url)
as HttpWebRequest;
StreamReader responseReader = new StreamReader(request.GetResponse().GetResponseStream());
string responseData = responseReader.ReadToEnd();
responseReader.Close();

// More than one?
int i = 0;
if((i = responseData.IndexOf("results_multiple_widget_matching")) > -1)
{
    i = responseData.IndexOf(
"<strong>", i) + 8;
   
string matches = responseData.Substring(i, responseData.IndexOf("\n", i) - i);
    MessageBox.Show(
string.Format("{0} matches, refine search!", matches));
}
else
{
   
// Any
    if(responseData.IndexOf("class=\"fn n\">") < 0)
        MessageBox.Show(
"No matches, try again!");
   
else
    {
       
// Only one
        string s = extractValue(responseData, "fn n"); // name
        s += "\r\n" + extractValue(responseData, "street-address"); // street
        s += "\r\n" + extractValue(responseData, "locality"); // city
        s += ", " + extractValue(responseData, "region"); // state
        s += " " + extractValue(responseData, "postal-code"); // zip
        s += "\r\n" + extractValue(responseData, "tel"); // phone

        resultTextBox.Text = s;
    }
}

...and on the right you see the code in action (well, Don Box is not exactly my friend, but me and Andy sat down and talked to him after an event once, and that should count for something ;-)). The code for the private method looks like this...

private string extractValue(string s, string name)
{
   
string valueDelimiter = "class=\"" + name + "\">";

   
int valuePosition = s.IndexOf(valueDelimiter);
   
if(valuePosition < 0)
       
return string.Empty;
   
int startPosition = valuePosition + valueDelimiter.Length;
   
int endPosition = s.IndexOf("<", startPosition);

   
return s.Substring(startPosition, endPosition - startPosition);
}

...and even if helpers like this can be very useful, I clearly recommend the use of regex for more advanced scraping. Note that this approach still needs to download the whole page just to get to the data, and if the above code is running on the WM device the response time will be the same as accessing the site through the browser. That is why this code should be wrapped into a Web Service on a server somewhere, and then the WM device can access the Web Service with the few lines of code that I mentioned above for an already existing Web Service. Making the Web Service responsible for the actual scraping is also better because the clients doesn't need to be updated if there are changes to the source site.

This way, I can now easily look up both addresses and phone numbers to my friends, and it would be easy to add more functionality like calling the found phone number, etc. I should mention that whitepages has added a great mobile version of their site that you enter automatically when you go to their site with a WM device (read more about it, and see a demo), so they are no longer the best example of a service to scrape, but still, their mobile site takes much more bandwidth than the Web Service approach I described above. I don't know about you, but I prefer the clean look in the screenshot above compared to a cluttered web page.

There are so many simple services like this that I still miss, and to name a few apart from the above mentioned phone number lookup, I would like info about flights (delays), cinema search/booking, package tracking, etc, etc, etc. I guess an instant reverse phone number lookup would be a killer for anyone that wants to know who is calling right now, and the number is not in Contacts. What services are you missing?

posted on Monday, November 26, 2007 12:50:23 AM UTC  by Chris  #    Comments [0]