Friday, June 26, 2015

How to Query 11 Different Search Engines

You can query a search engine with URLFetch. For the URL you need a String that includes a standard prefix and a suffix that begins with something like "/?q=querykeywords". I have removed some unnecessary junk from most of the query Strings to make them as short and simple as possible.

There are several ways to functionally compose a query for a search engine. This one uses StringJoin. I use Which to handle the cases of 1) a single keyword, 2) multiple keywords, or 3) any other data type, which is an error. Using True as the third test in conditional statements is a standard way to handle 'all else' cases.

queryTemplateDuckDuckGo="https://duckduckgo.com/?q=";
keyword1="longevity";
keywords2={"machine","learning","algorithm"};

Clear@searchEngineQuery;
searchEngineQuery[searchEnginePrefix_String, keywords_] := 
 Module[{url},
  Which[
   Head@keywords === String, 
   url = StringJoin[searchEnginePrefix, keywords],
   Head@keywords === List, 
   url = StringJoin[searchEnginePrefix, 
     Table[i <> "+", {i, Most@keywords}], Last@keywords],
   True, Print@"Error: wrong data type."];
  Print@url;
  URLFetch@url
  ]

We test it on a single keyword (I omit the full output, which you can try at home).

searchEngineQuery@keyword1

https://duckduckgo.com/?q=longevity

We try it on a List of keywords.

searchEngineQuery@keywords2

https://duckduckgo.com/?q=machine+learning+algorithm

We try it on a wrong data type.

searchEngineQuery[queryTemplateDuckDuckGo, 5]

Error: wrong data type.

And here are sample fetches for the search engines. Again, I omit the full output but you can try them yourself. I precede each sample with a rating the quality of 'cleanliness' of results by StringLength, the idea being that the shorter length results Strings have less junk in them and more of the actual results. By that measure DuckDuckGo, Blekko and Alhea give the 'cleanest' output for further processing. But don't misconstrue that measure as a quality rating of the results themselves.

StringLength/@{%365,%366,%368,%370,%372,%374,%376,%378,%380,%382,%384}

{42192,11496,80722,94714,164072,117642,19161,42104,14660,104627,21239}


11 Search Engines Queried



Google: 42,192


URLFetch["https://www.google.com/search?q=atherosclerosis"]

DuckDuckGo: 11,496


URLFetch["https://duckduckgo.com/?q=atherosclerosis"]

Bing: 80,722


URLFetch["http://www.bing.com/search?q=atherosclerosis"]

Ask.com: 94,714


URLFetch["http://www.ask.com/web?qsrc=1&o=2545&l=dir&q=\
atherosclerosis"]

AOL: 164,072


URLFetch["http://search.aol.com/aol/search?s_it=searchbox.webhome&v_t=\
na&q=atherosclerosis"]

Wow: 117,642


URLFetch["http://www.wow.com/search?s_it=search-thp&v_t=na&q=\
atherosclerosis&s_qt=ac"]

InfoSpace: 19,161


URLFetch["http://search.infospace.com/search/web?q=atherosclerosis&\
searchbtn=Search"]

Info: 42,104


URLFetch["http://www.info.com/search?qcat=web&r_cop=xxx&qkw=\
atherosclerosis"]

DogPile: 104,627


URLFetch["http://www.dogpile.com/info.dogpl/search/web?fcoid=417&fcop=\
topnav&fpid=27&q=atherosclerosis&ql="]

Alhea: 21,239

URLFetch["http://www.alhea.com/search/web?fcoid=417&fcop=topnav&fpid=\
27&q=atherosclerosis&ql="]


No comments:

Post a Comment