You can query a search engine with URLFetch. For the URL you need a String that includes a standard prefix and a suffix that begins with something like "/?q=querykeywords". I have removed some unnecessary junk from most of the query Strings to make them as short and simple as possible.
There are several ways to functionally compose a query for a search engine. This one uses StringJoin. I use Which to handle the cases of 1) a single keyword, 2) multiple keywords, or 3) any other data type, which is an error. Using True as the third test in conditional statements is a standard way to handle 'all else' cases.
queryTemplateDuckDuckGo="https://duckduckgo.com/?q=";
keyword1="longevity";
keywords2={"machine","learning","algorithm"};
Clear@searchEngineQuery;
searchEngineQuery[searchEnginePrefix_String, keywords_] :=
Module[{url},
Which[
Head@keywords === String,
url = StringJoin[searchEnginePrefix, keywords],
Head@keywords === List,
url = StringJoin[searchEnginePrefix,
Table[i <> "+", {i, Most@keywords}], Last@keywords],
True, Print@"Error: wrong data type."];
Print@url;
URLFetch@url
]
We test it on a single keyword (I omit the full output, which you can try at home).
searchEngineQuery@keyword1
https://duckduckgo.com/?q=longevity
We try it on a List of keywords.
searchEngineQuery@keywords2
https://duckduckgo.com/?q=machine+learning+algorithm
We try it on a wrong data type.
searchEngineQuery[queryTemplateDuckDuckGo, 5]
Error: wrong data type.
And here are sample fetches for the search engines. Again, I omit the full output but you can try them yourself. I precede each sample with a rating the quality of 'cleanliness' of results by StringLength, the idea being that the shorter length results Strings have less junk in them and more of the actual results. By that measure DuckDuckGo, Blekko and Alhea give the 'cleanest' output for further processing. But don't misconstrue that measure as a quality rating of the results themselves.
StringLength/@{%365,%366,%368,%370,%372,%374,%376,%378,%380,%382,%384}
{42192,11496,80722,94714,164072,117642,19161,42104,14660,104627,21239}
URLFetch["https://www.google.com/search?q=atherosclerosis"]
URLFetch["https://duckduckgo.com/?q=atherosclerosis"]
URLFetch["http://www.bing.com/search?q=atherosclerosis"]
URLFetch["http://www.ask.com/web?qsrc=1&o=2545&l=dir&q=\
atherosclerosis"]
URLFetch["http://search.aol.com/aol/search?s_it=searchbox.webhome&v_t=\
na&q=atherosclerosis"]
URLFetch["http://www.wow.com/search?s_it=search-thp&v_t=na&q=\
atherosclerosis&s_qt=ac"]
URLFetch["http://search.infospace.com/search/web?q=atherosclerosis&\
searchbtn=Search"]
URLFetch["http://www.info.com/search?qcat=web&r_cop=xxx&qkw=\
atherosclerosis"]
URLFetch["http://www.dogpile.com/info.dogpl/search/web?fcoid=417&fcop=\
topnav&fpid=27&q=atherosclerosis&ql="]
27&q=atherosclerosis&ql="]
There are several ways to functionally compose a query for a search engine. This one uses StringJoin. I use Which to handle the cases of 1) a single keyword, 2) multiple keywords, or 3) any other data type, which is an error. Using True as the third test in conditional statements is a standard way to handle 'all else' cases.
queryTemplateDuckDuckGo="https://duckduckgo.com/?q=";
keyword1="longevity";
keywords2={"machine","learning","algorithm"};
Clear@searchEngineQuery;
searchEngineQuery[searchEnginePrefix_String, keywords_] :=
Module[{url},
Which[
Head@keywords === String,
url = StringJoin[searchEnginePrefix, keywords],
Head@keywords === List,
url = StringJoin[searchEnginePrefix,
Table[i <> "+", {i, Most@keywords}], Last@keywords],
True, Print@"Error: wrong data type."];
Print@url;
URLFetch@url
]
We test it on a single keyword (I omit the full output, which you can try at home).
searchEngineQuery@keyword1
https://duckduckgo.com/?q=longevity
We try it on a List of keywords.
searchEngineQuery@keywords2
https://duckduckgo.com/?q=machine+learning+algorithm
We try it on a wrong data type.
searchEngineQuery[queryTemplateDuckDuckGo, 5]
Error: wrong data type.
And here are sample fetches for the search engines. Again, I omit the full output but you can try them yourself. I precede each sample with a rating the quality of 'cleanliness' of results by StringLength, the idea being that the shorter length results Strings have less junk in them and more of the actual results. By that measure DuckDuckGo, Blekko and Alhea give the 'cleanest' output for further processing. But don't misconstrue that measure as a quality rating of the results themselves.
StringLength/@{%365,%366,%368,%370,%372,%374,%376,%378,%380,%382,%384}
{42192,11496,80722,94714,164072,117642,19161,42104,14660,104627,21239}
11 Search Engines Queried
Google: 42,192
URLFetch["https://www.google.com/search?q=atherosclerosis"]
DuckDuckGo: 11,496
URLFetch["https://duckduckgo.com/?q=atherosclerosis"]
Bing: 80,722
URLFetch["http://www.bing.com/search?q=atherosclerosis"]
Ask.com: 94,714
URLFetch["http://www.ask.com/web?qsrc=1&o=2545&l=dir&q=\
atherosclerosis"]
AOL: 164,072
URLFetch["http://search.aol.com/aol/search?s_it=searchbox.webhome&v_t=\
na&q=atherosclerosis"]
Wow: 117,642
URLFetch["http://www.wow.com/search?s_it=search-thp&v_t=na&q=\
atherosclerosis&s_qt=ac"]
InfoSpace: 19,161
URLFetch["http://search.infospace.com/search/web?q=atherosclerosis&\
searchbtn=Search"]
Info: 42,104
URLFetch["http://www.info.com/search?qcat=web&r_cop=xxx&qkw=\
atherosclerosis"]
DogPile: 104,627
URLFetch["http://www.dogpile.com/info.dogpl/search/web?fcoid=417&fcop=\
topnav&fpid=27&q=atherosclerosis&ql="]
Alhea: 21,239
URLFetch["http://www.alhea.com/search/web?fcoid=417&fcop=topnav&fpid=\27&q=atherosclerosis&ql="]
No comments:
Post a Comment