Vertical search across the educational horizon

Dec 22, 2010

Searching the web usually involves typing keywords or a phrase into a search engine and clicking the "search now" button. It's very effective and several large companies have become prominent in the field by providing users with searchable access to millions, if not billions of web pages in this way. However, according to researchers at Hewlett Packard in Palo Alto, California and Chinese technology company, Innovation Works, general search engines, while very effective at tracking down information, are nevertheless unstructured, which limits the user's ability to further automate the processing of the search results.

Other researchers have attempted to find ways to support more precise searching on specific sites, so-called content verticals, but writing in the International Journal of and Engineering, HP's Meichun Hsu and IW's Yuhong Xiong explain an alternative web search system that could be used to search across such verticals. They have demonstrated how the new system works by focusing on online courses.

The researchers point out that in the pre-web days, a relational database within a company or educational establishment was equivalent to the modern online content vertical. Users of relational databases could embed their search results in an application program for that database. The HP team hopes to take forward this embedding process and extend it to the wider web. As an example of the kind of search such an approach might allow they describe how they would like to be able to carry out the following:

SELECT product_name FROM hp.com WHERE product_type PC

Imagine how a similar query across online educational resources might be made transparent to users by clever programming so that they could pull up specific prospectuses, curricula, timetables, and tests quickly and easily, across domains rather than on a single computer system. To solve this problem the team has exploited "focused crawling" in which only the pages likely to be relevant are crawled and indexed. This ties in neatly with "web content classification", which adds meta-data to those relevant pages that accelerates searching. Finally, "information extraction" pulls out the important information from that focused and classified data. The team has now applied this approach to HP's OfCourse project.

"The technologies can be used to support structured queries over contents extracted and aggregated from the web," the team says. "They are also foundational to personalization, by offering more insights into the web content of interest to particular users." The new approach to search does require human intervention at certain stages so that contents within each domain crawled might be classified more effectively, but machine learning approaches can also lead to some degree of automation of this process too. The research, the team says, takes us one step closer to "the convergence of database technology and information retrieval in the era of the web."

Explore further: Researchers develop fast, economical method for high-definition video compositing

More information: "Scalable information extraction for web queries" in International Journal of Computational Science and Engineering, 2010, 5, 176-184

add to favorites email to friend print save as pdf

Related Stories

Search engine mashup

Jul 06, 2007

A mashup of two different types of web search tools could make find the useful nuggets of information among all the grit on the Internet much easier.

AOL testing mobile search services

Jul 27, 2005

America Online Inc.announced Wednesday that it is testing a suite of new mobile search services. Once launched the service will give mobile-phone users access to AOL's Pinpoint, Shopping Search and Yellow Pages.Now available as ...

Branding matters -- even when searching

Jun 28, 2007

Web searchers who evaluated identical search-engine results overwhelmingly favored Yahoo! and Google, providing evidence that branding matters as much on the Internet as off, according to a Penn State study.

The engines of change

Nov 05, 2010

In today's wired world, search engines have changed the way people find data, and social searches are making it even easier to find exactly what you're looking for, with a little help from your friends. For ...

Google, IBM team up on PC desktop search

Nov 01, 2005

IBM is teaming up with Google to find documents on personal computers. The newest plug-in for IBM’s enterprise search technology will integrate with Google Desktop for Enterprise, which is downloadable free.

Recommended for you

Solar plane aims for new world distance record

8 minutes ago

Solar Impulse, the first aircraft that can fly day and night fueled entirely by energy from the sun, embarked Wednesday on the second leg of its historic journey across the American continent.

Facebook joins Web freedom group

2 hours ago

Facebook on Wednesday became a full member of the Global Network Initiative, a non-governmental organization promoting Internet freedom and privacy rights.

EU leaders look to energy for growth boost

6 hours ago

EU leaders, desperate to give growth a boost, target energy policy Wednesday amid concerns a US-led revolution in shale oil and gas development will reshape the global economy and leave Europe far behind.

Model will unlock mysteries of the voice

6 hours ago

Swedish researchers are leading the development of the world's first comprehensive model of the human voice, which could contribute to better voice care, voice prosthetics, talking robots and teaching opportunities.

Tests lead to doubling of fuel cell life

6 hours ago

(Phys.org) —Researchers working to improve durability in fuel cell powered buses, including a team from Simon Fraser University, have discovered links between electrode degradation processes and bus membrane ...

User comments : 0

More news stories

Solar plane aims for new world distance record

Solar Impulse, the first aircraft that can fly day and night fueled entirely by energy from the sun, embarked Wednesday on the second leg of its historic journey across the American continent.

Facebook joins Web freedom group

Facebook on Wednesday became a full member of the Global Network Initiative, a non-governmental organization promoting Internet freedom and privacy rights.