New metasearch engine leaves Google, Yahoo crawling

Mar 25, 2009
Weiyi Meng, a professor of computer science at Binghamton University, State University of New York, is hopeful that one day in the not-too-distant future, you'll be able to type a query into an online search engine and have it deliver not Web pages that may contain an answer, but just the answer itself. Credit: Jonathan Cohen

One day in the not-too-distant future, you'll be able to type a query into an online search engine and have it deliver not Web pages that may contain an answer, but just the answer itself, says Weiyi Meng, a professor of computer science at Binghamton University, State University of New York.

For instance, imagine typing in "Who starred in the film Casablanca?" The would respond with "Humphrey Bogart and Ingrid Bergman."

Not impressed?

Try asking a more nuanced question, such as "What do Americans think of universal health care?" A search engine will create a report indicating trends in opinion based on what has been posted to the Web.

Search engines may eventually be used to conduct polling and even help sort fact from fiction, said Meng, who is helping to make such possibilities a reality, both through his research and as president of a company called Webscalers.

The way Meng sees it, big search engines such as and Yahoo are fundamentally flawed. The Web has two parts: the and the . The surface Web is made up of perhaps 60 billion pages. The deep Web, at some 900 billion pages, is about 15 times larger.

Google, which relies on a "" to examine pages and catalog them for future searches, can search about 20 billion pages. Web crawlers follow links to reach pages and often miss content that isn't linked to any other page or is in some way "hidden."

Meng, along with researchers at the University of Illinois at Chicago and the University of Louisiana at Lafayette, has helped pioneer large-scale metasearch-engine technology that harnesses the power of small search engines to come up with results that are more accurate and more complete.

"Most of the pages on the deep Web aren't directly 'crawlable.' We want to connect to small search engines and reach the deep Web," he said. "That's the idea. Many people have the that Google can search everything, and if it's not there it doesn't exist. But we should be able to retrieve many times more than what Google can search."

Not only can a metasearch engine probe deeper, it can also offer the latest information.

"In principle," Meng said, "small guys are much better able to maintain the freshness of their data. Google has a program to 'crawl' all over the world. Depending on when the crawler has last visited your server, there's a delay of days or weeks before a new page will show up in that search. We can get fresher results."

The concept is not new. In fact, the first metasearch engine was built in 1994.

"The big difference between our technology and the ones pursued by other people is that most of the other technologies do the metasearching on top of a small number of general-purpose search engines, such as Yahoo, Google or MSN," Meng explained. "We have a completely different perspective. We want to build large-scale metasearch engines on top of many small search engines."

The Web has millions of search engines at businesses, universities, newspapers and other organizations. Since 1997, and with continued funding from the National Science Foundation, Meng and his collaborators have found ways to run queries across multiple search engines and sort through the results.

Webscalers is based in the Start-Up Suite at Binghamton University's Innovative Technologies Complex, which is home to several young companies that have their roots in faculty inventions.

"If the Web keeps on growing, a company like Google may run out of resources to crawl all of those pages," said Vijay V. Raghavan, vice president of Webscalers and a faculty member at the University of Louisiana at Lafayette. "We won't have that problem. We will scale much better."

Webscalers' technology could be useful for large organizations with many divisions. For example, Webscalers has developed a prototype that would allow a search of all 64 campuses in the State University of New York system as well as SUNY's central administration.

"People can use it to find collaborators," Meng said. "It could also help prospective students find programs they're interested in."

The technology could be adapted to large companies or even the government, Meng said.

Challenges for large-scale metasearch engines include determining which search engines are the best for a given query, automating the interaction with search engines as well as organizing the search results.

Meng hopes to build a grand metasearch engine one day that would integrate all of the 1 million small search engines into a single system. "There are still a lot of significant challenges in creating a system of such magnitude," he said, "but I am optimistic that such a metasearch engine can be built."

Try out the concept online

Webscalers has already launched several metasearch products:

The first is a news metasearch engine called AllinOneNews. Available at www.allinonenews.com , it connects to 1,800 news sources in 200 countries. That's the largest metasearch engine in the world.

Webscalers also offers MySearchView, a system that allows any user to create his or her own metasearch engine just by checking off a few options at www.mysearchview.com .

Source: Binghamton University

Explore further: Modeling the ripples of health care information

add to favorites email to friend print save as pdf

Related Stories

Search engine branding to be examined by researcher

Jun 11, 2008

Like other industries, companies that maintain search engines must work harder to recruit and retain customers. One way to do this is branding -- creating a cognitive impression that a user is likely to retain and rely on ...

Branding matters -- even when searching

Jun 28, 2007

Web searchers who evaluated identical search-engine results overwhelmingly favored Yahoo! and Google, providing evidence that branding matters as much on the Internet as off, according to a Penn State study.

Search engine mashup

Jul 06, 2007

A mashup of two different types of web search tools could make find the useful nuggets of information among all the grit on the Internet much easier.

Recommended for you

Forging a photo is easy, but how do you spot a fake?

Nov 21, 2014

Faking photographs is not a new phenomenon. The Cottingley Fairies seemed convincing to some in 1917, just as the images recently broadcast on Russian television, purporting to be satellite images showin ...

Algorithm, not live committee, performs author ranking

Nov 21, 2014

Thousands of authors' works enter the public domain each year, but only a small number of them end up being widely available. So how to choose the ones taking center-stage? And how well can a machine-learning ...

Professor proposes alternative to 'Turing Test'

Nov 19, 2014

(Phys.org) —A Georgia Tech professor is offering an alternative to the celebrated "Turing Test" to determine whether a machine or computer program exhibits human-level intelligence. The Turing Test - originally ...

User comments : 6

Adjust slider to filter visible comments by rank

Display comments: newest first

earls
4 / 5 (1) Mar 25, 2009
I guess every month from here on out we'll hear about the next big "Google Killer."

This "metasearch" however, has little do to with a typical "Google Search." It seems to be more related to the "post-processing" of the results than actually finding them.

Airkin also made me aware in another article that it "seems to be limited to its own database (think Wiki) not like Google's large ones of the internet."

This is evidenced by "Meng hopes to build a grand metasearch engine one day that would integrate all of the 1 million small search engines into a single system."

It seems like a step back in my mind... Or at least, too infantile to be of any use (yet).

Another issue that should be consider is "Should you trust the result."

The converse of "one simple answer" is being painted as a negative: Google search returns many (too many?) results that have to be poured over to distill an answer... Though this is not really the case, as (generally) you'll get the answer you're looking for in the top 10 results.

However, with one absolute answer, "just because the computer said," how do you know it's the correct answer? Many results gives you the ability to compare and contrast and decide for yourself what's true.

I suppose (and understand) this is what Wolfram and Meng are attempting to accomplish... An authoritative response that falls within the human margin of error... But it just seems to me "humans are computers, and computers aren't human." Is there simply an natural disconnect between the two different "mediums" or will a singularity be reached in the future?

I wonder what the metasearch would have to say about that question. ;) "42."
vlam67
not rated yet Mar 25, 2009
yeah, sure, great Meng. Type in "Tibet" and the answer is " China's territory". Enough said.
ealex
not rated yet Mar 26, 2009
Wasn't there recently another one of these. What's up with that? Is there a grudge against google on the physorg team? This is basically the exact same stuff, only different search engine.

Let it go already, we get it.
Choice
not rated yet Mar 29, 2009
The program should return several answers and let the asker choose the one he or she likes.
pcunix
not rated yet Mar 29, 2009
"Depending on when the crawler has last visited your server, there's a delay of days or weeks before a new page will show up in that search. We can get fresher results."

Really? Gosh, I've seen pages I post show up literally minutes later.

For this kind of stuff, I'll believe it when I see it, and I think seeing it is a long, long way off.
denijane
not rated yet Apr 07, 2009
As much as I like it, there is one thing that we must admit-the search the way it is,provides us with more information. For example, wanting to know something more a date, will provide you with pages and pages with related content that you have to skip trough in order to find out what you're looking for. And during this process,you learn a lot more and sometimes even stuff that are quite useful for you, but wouldn't have known otherwise. While if you got the answer in a line or 3, you would limit your knowledge.

Yes, I know this isn't really a flaw. I just wanted to point out that all the search engines have their good and their bad sides and can develop simultaneously.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.