Posts

Showing posts from August 11, 2013

Using simple methods for data extraction

Image
Here is a simple problem IMDB publishes movies of all time. The list contains 250 movies. Problem : In the web page, there is no sorting based on rating or votes. Goal : I want to sort it by year, so that I can watch movies from latest to the old ones. And year is not a field at all in their table. It will be good to have sorting by votes,rating as well. How to go about it? We can use very simple methods to achieve the solution instead of writing a full fledged program. Steps The first step is to copy the data into excel and separate it out into columns. (Excel can separate it out into columns if we copy the data into a text file and use the import operator.) But there are problems with importing into excel. The two broad types of import are "fixed width" and "delimiter". The length of the movie name is not fixed. So we cannot use fixed width column import. We cannot use space as delimiter. Since movie names contain space. Alternate Solu