|portal → main → operators.htm||
~ Search engine operators ~
("lego bricks" for webbits writers)
Version 0.03, updated February 2008
Explaining the whys and the hows
-inurl:htm -inurl:html -inurl:jsp -inurl:php -inurl:pdf -inurl:asp -inurl:txt -inurl:shtml -inurl:phtml -inurl:cgi -intitle:free -intitle:download -intitle:archive +intitle:index+of/ +parent-directory +name +"last modified" +size +description (oasis OR shakira) (mp3 OR wma OR m4a) -download
You see the inurl: and intitle: parameters? You understand what they do?
(Biggest index, small brain)
(Biggest brain, lotta spam)
(Have NEAR, will travel)
(Powerful macros but small database)
(Working a lot you can do wonders)
Multiple search terms are processed as an AND operation.
You can use the + and - signs to include, respectively exclude, a term. To exclude terms in an effective way, read my search engines anti-optimization essay.
Inktomi offers full Boolean searching and its syntax is OR and NOT (as in Google, nothing stands for an AND), allows the use of - instead of NOT and searching can be nested using parentheses (). Operators must be in upper case. You are well advised to not use the OR operator for keyword variants, because your query will attract irrelevant search results (Inktomi gives an higher rank to documents containing all ORed keywords), in those cases you should use stemming whenever you can. Example, compare:
Inktomi lets you search for phrases by enclosing them with quotes ("). You can also use underscores (_) to build phrases (partially discovered by fagan), compare:
The standard way for searching phrases inside fields, like title:, inurl:, etc, do not work (example title:"index of"). Nevertheless for every such field you have two ways for searching phrases. Example for title: (the other are similar):
Phrase searches are often used to search for documents generated by some kind of software (and therefore have some fixed strings of text). The "index of", or more precisely title:index_of is a classical example, where you search for open directories, in this case those generated by the apache server.
Phrase searches are also a valuable tool when you arrive to pages showing a glimpse of some document and trying to sell the whole document... More often than not, that very same document is available somewhere else for free! Lets take these bastards, which have stolen the previous version this very same document and are trying to sell the access to their database for $9.99 a month. There you find the following snippet:
Inktomi is one of the best search engines out there. Unfortunately its search syntax is not well documented, which is a pity, because Inktomi offers one of the richest search syntaxes, with lots of unique features and a ranking algo which works often quite well. The purpose of this essay consists precisely in documenting Inktmi's search syntax and providing examples showing its usefulness. For that purpose old HotBot's search FAQs and others Inktomi's web partners' search FAQs were read. The core syntax present in them was expanded using search engines and the WayBack Machine. Finally, from the source code of old HotBot's advanced search pages, additional search syntax was guessed: feature:homepage, originurlextension: and stem:. Inktomi unveiled Inktomi doesn't provide a public search engine in a way that search engines like AltaVista or Google do. This paper is the property of learnessays.com Copyright © 2003-2005
My dear reader when you find something like this all you have to do is take a phrase and put it on a search engine, example: "Inktomi is one of the best search engines out there". Morale: phrases provide very powerfull spells to summon the document you want!
The asterisk * can be used within a phrase search to match any word in that position. Thanks to the * you can do proximity searches on Yahoo! This is a very handy feature to search images for instance, because most people follow the "content - name relation" when naming files. For example if you are searching for a Caravaggio picture, you can do the following search on Yahoo: "caravaggio * jpg". That way you'll get pages linking/containing images named "caravaggio_2.jpg", "caravaggio 07.jpg", etc. Do not expect as many search results as in Google, because Yahoo do not index image's alt attribute (done by Google), nor images src attribute, nor the href attribute of <a...> tags.
Inktomi has no case sensitive searching. Using either lower or upper case results in the same hits.
No truncation (*, ?) is currently available, but you can use word stemming (stem:).
All words are searched. There are no known stop words.
Inktomi was one of the first search engines allowing you to change its ranking algorithm. This is done by giving to each keyword a weight. Weight factors can vary betwen 0.0 and 9.9 and the syntax is weight*keyword, by default each keyword has weight 1.0 as you can see comparing these two queries: 1.0*fravia and fravia. The simplest way of using this feature is by using the 80 - 20 rule, i.e. multiply bad keywords (the highly spamed ones) by 0.2*, multiply context keywords by 0.0* (to not disturb the ranking algo, they must be there, but don't rank) and multiply good keywords (those less likely to be spammmed) by 1.0*. Example:
As a rule of thumb this Pareto rule is not too shaby...
Denotes how far webpages will be searched in a site's directory structure. The number (0, 1, 2, 3, 4) specifies the maximum number of subdirectories, relatively to host's root directory, which could appear in the URL. As a general rule (not universal! duh:) webpage's content increase with directory's depth and, besides, spammers think that webpages on home directory get a ranking boost and are more likely to being indexed, therefore they put often their doorway pages there. This useful feature offers a handy way of getting ride of those anoiances... excluding root directories' pages!
Example: title:german hear feature:audio -depth:0
Restricts a search to the selected domain. Domains can be specified up to three levels deep. Once you have found a promising site, this operator provides you a way of building a local search engine and, in that way, of flying directly to the meat. Example of use -constructing a local search engine to searchlores site- : domain:searchlores.org .
Searches for pages linking to PDF files, although there are some who do escape. Compare the queries:
As quality documents, like papers, are often written in pdf format, this filter provides a way of getting high quality pages, those linking to that very same files. Example: "link structure" feature:acrobat. As PDF files may have not been indexed for some reason (examples: robots.txt file or robots meta tags), this feature may provide, in an indirect way, some interesting results.
Detects pages containing embedded content, be it sounds, movies, flash, java, pdf files, powerpoint presentations, etc... almost everything can be embedded in a webpage. The detection is made by verifying the presence of an <object... > tag, as you can see comparing the results of following queries with the original pages:
Content embedded with the <embed...> tag is not matched by feature:activex, as the following example shows:
As the canonical way of embedding content for M$IE is by using activex and as almost every luser uses M$IE, page's creators are compelled to also embed content by using the <object...> tag which, nowadays, is also the official HTML 4.01 standard. That said, this feature provides a handy way of getting (or excluding:) pages containing precisely that very same content. Example -searching for pages containing fravia's workshops embedded as movie or sound: fravia stem:workshop feature:activex.
Detects <applet ...> tags in page's source code, compare:
the tags <object ...> (for Internet Explorer) and <embed ...> (for Netscape) can also be used embed applets, but Inktomi doesn't detect applets embedded this way. Compare:
Documents containing links to .class or .java aren't also taken into account, compare:
Example of use -searching for pages where you can play chess interactively: feature:applet title:play title:chess.
Detects if a page contains a link to an audio file. Audio files could be among others: wav, mp3, m3u, mid, midi, au, snd, ... The link could be in a:
feature:audio doesn't match embedded audio files:
If you want to search for embedded audio files you must have to resort to use the rather coarse feature:activex. Example of use -searching for audio files of fravia's workshops- : fravia feature:audio
Contrary to what we could expect, Inktomi do not detect neither the existence of the <embed ...> tags, nor the existence of the <object ...> tags. For Inktomi feature:flash simply means webpages linking to files with extensions: fla, spl or swf, compare:
If you want to search for embedded flash you must have to resort to use the rather coarse feature:activex.
The Inktomi's crown jewel. Detects the <form> tag in page's source code. Inktomi may not index the hidden web, but offers you a way of knowing where the front doors are! For instance you can use Inktomi to find Laws' Databases, translation services: dutch english translate url feature:form, etc.
Detects pages containing frames.
Restrict your search to personal pages (identifier ~). Very useful, because it's still the convention for personal pages on educational sites. Example: web search feature:acrobat feature:homepage.
Detects <img...> tag in HTML or a link to an image.
Interested in finding images of birds of paradise? Try the following query on Yahoo!:
("bird of paradise" OR "birds of paradise") (papua OR "new guinea") feature:image -stem:travel -stem:hotel
Images are widely used for aesthetic reasons. If an HTML webpage doesn't contain images you may wonder if there's an hidden agenda... probably it's a cloaked/spammed page by a a spammer putting only keywords n' links and not taking the hassle of building a real webpage. You can often trash those annoyances using this useful feature!
Restricts your search results to the host's top page. Very useful to find sites about a given theme! The host's homepage is the most valuable site's real estate, there the site's owner should put a resume of what his site is all about and provide links to his most important pages. Example searching for FTP search engines: ftp search feature:index feature:form. Inktomi indexes approximately 1,520,000,000 webhosts cf.: feature:index. 1.5 Billion webhosts is quite an odd figure, because Inktomi has 19.2 Billion documents in its database, so, on average, Inktomi indexes 13 documents per webserver. Given that some domains spam a lot, for most servers Inktomi indexes only the entry page... maybe there's nothing more to index... Nevertheless is quite odd. Altough, as ritz points out, this probably is an anecdotal evidence that the number of sites containing n pages folow a Pareto distribution.
Detects <meta ...> tags in webpage's source code.
Detects pages containing links to files with extension dcr, dir, fla,
If you want to search for embedded shockwave you must have to resort to use the rather coarse feature:activex.
Search for pages containing the <table ...> tag. Tables are widely used to control page's layout and of course to build tables! If an HTML webpage doesn't contain tables you might wonder if there's an hidden agenda... probably its a cloaked/spammed page by a SEO fearing that some search engines may not full y support tables, or a spammer putting only keywords 'n' links and not taking the hassle of building a real webpage. You can often trash those annoyances using this useful feature!
Detects pages containing the <title> tag. As allmost all webpages contain a title, this feature gives a good estimation of how many HTML documents are in Inktomi's database. Cf.: feature:title.
Search for pages linking to video files (file extensions: avi, mpg, mpeg, mov, etc.). Videos embedded with <img...> tags with the legacy attribute dynsrc are not matched, compare:
neither pages with <embed...> or <object...> tags, compare:
If you want to search for embedded video files you must have to resort to use the rather coarse feature:activex. Interested in finding videos of fravia's workshops? Try the folowing query on Yahoo!: fravia feature:video.
Search for pages containing a link to a vrml file (wrl, wrz, vrml). Compare:
Inktomi is unable to see embedded vrml files. Compare:
Example: web links graph feature:vrml
Allows one to find all documents from a particular host only. It has similar uses to those already found on domain:.
Searches for words in URL, you can also search for phrases, but the syntax isn't the one we would expect: inurl:"keyword1 keyword2", instead it is "inurl:keyword1 inurl:keyword2". Once you have found a promising directory's site, this operator provides you a way of building a local search engine and, in that way, of flying directly to the meat, or of getting a 'directory listing'. Example of use -constructing a local search engine to the Seeker's message board- : domain:2113.ch inurl:mb001.
Finds pages containing hypertext links to the exact specified URL. Comes in handy when you land, using a search engine, on a webpage you like and you want more similar pages from that web site. In those cases you can try to find the 'table of contents' of that specific site. Example, using the URL of this essay to find where are the searchlores' 'table of contents': link:http://www.searchlores.org/inktomi.html domain:searchlores.org, among other 'table of contents' you find the folowing ones (once they get reindexed:) http://www.searchlores.org/news.htm, http://www.searchlores.org/essays.htm and http://www.searchlores.org/main.htm.
Intelligent seekers search the web backwards! Once they have found a good site, they identify the most interesting pages on that site, 'table of contents' pages are good candidates, and see who's linking to them. This strategy provides a way, once we know a good site, of finding more good sites or pages linking to good sites. The rationale is good sites only link to good sites! Searching backwards, once you have found a good site, is the main way of combing the web for good sites which others have already found. Example: link:http://www.searchlores.org/news.htm, link:http://www.searchlores.org/essays.htm and link:http://www.searchlores.org/main.htm.
Inktomi doesn't have an operator which lets you search for keywords on links, so you don't have a direct way of searching for links to a given directory. The only thing you can do in such cases is to use wget to get a directory listing from that directory and feed all those URLs to Yahoo using wget and the link: operator.
Searches for pages linking to any page in a given domain up to three levels deep. This operator provides a more versatile way of searching the web backwards (cf. the link: operator). Examples of use:
Searches for pages linking to files with a given extension. This operator provides a way of searching for files which are not downloaded and processed by Inktomi's spiders, such as images, audio, videos and other binary files. One possible use for this operator is searching for blogs. Most blogs, at least most blogs that are nowadays worth reading, have a RSS feed somewhere on them. This operator provides a great way of finding pages containing RSS feeds, which are usually just a RSS, XML, RDF, or ATOM document type. Interested in finding blogs on web search techniques? Try: searchlores linkextension:rss, searchlores linkextension:xml, searchlores linkextension:rdf and searchlores linkextension:atom.
Restricts documents search by type, aka file extension. Document's type is a good proxy of document's quality. Some examples of high quality documents are .pdf, .doc (word), .xls (spread sheets), .ppt (power point presentations), .ps, .dvi and .rtf files. Example of use: web search originurlextension:pdf.
Searches for pages linking to document with a given mime type. Mime type is inferred by document's extension as the folowing example shows: outgoingurltype:image/jpeg -linkextension:jpg -linkextension:jpeg -linkextension:jpe -linkextension:jfif. This operator does more or less the same as linkextension:, altough is a little bit more general, because it clusters file extensions by type .
Searches for words in URL's path, you can also search for phrases, but the syntax isn't the one we would expect: path:"keyword1 keyword2", instead it is "path:keyword1 path:keyword2". Once you have found a promising directory's site, this operator provides you a way of building a local search engine and, in that way, of flying directly to the meat, or of getting a 'directory listing'. Example of use -constructing a local search engine to the Seeker's message board- : domain:2113.ch "path:phplab path:mbs.php3" inurl:mb001.
Restricts your search to a geographical region (africa, centralamerica, downunder aka Oceania, europe, mediterranean, mideast aka Middle East, northamerica, southamerica, southeastasia). You can find which countries are included in each region here. I think that Inktomi assigns, for each domain name, either the information got by a whois search when top domains are .com, .org, .net, .biz, .edu, etc, as we can infer from the following two queries on Yahoo:
or assigns the country corresponding to the two letters top domain names (examples: .au, .ca, .de, .es, .fr, .uk, .us, etc). The main use for this operator is restricting your search to a given geographical region, example: stem:laws noise stem:levels region:europe. This field can also be used to get an estimation of how many documents are in Inktomi's database: