A blockfile of your very own
Previous Back to contents Next

The Blocklist/Blockfile feature is really one of the most powerful new additions to Proxomitron. Actually "block list" is a bit of a misnomer though, since they can do much more than just block. In fact, they're really an extension of the normal matching expressions.

Most often you'll see them in a filter's URL match, but they're by no means limited to URLs alone. A list can contain any matching commands - even calls to other lists!

When it comes right down to it, a block list is a simply a text file containing a list of matching items. Each line in the list is checked until a match is found - otherwise the list returns false. Every block list file has a name (in the settings dialog), and can be included at any point in a matching expression by using "$LST(listname)".

You can have up to 255 different lists and use them in any way you like. Common uses could include URLs to kill, sites to accept cookies from, pages to disable or enable JavaScript on, etc. To create a new file just follow these steps...

  1. First, use any text editor (like notepad) to make a list. Save it as a plain somename.txt file.
  2. Next, add the list under Proxomitron's Blockfile tab in the config dialog. Use "ADD" to select your file then give it a name. The name you use inside Proxomitron can be different from the filename itself. This lets you swap out actual blockfiles without having to change any filters that use them.
  3. Finally, to use the blockfile, place a $LST(BlockfileName) in your filter at the point you would like the list items to be checked. Think of it a bit like including your list at this point.

For example, if you had a matching expression like...

(Keitarou|Naru|Suu|Mitsune|Motoko|Shinobu|Mutsumi|Kame)

You could create a list like so..

#
# Sample List LoveHina.txt
#

Keitarou
Naru
Suu
Mitsune
Motoko
Shinobu
Mutsumi
Kame

Then, name the list something like "LoveHina" and place "$LST(LoveHina)" in your match in place of the old "(...|...|...)" separated parenthetical group of items. As you might guess, this can be a very convenient way to deal with a large number of items. Not only are such lists easier to maintain, but the same list can be used in different filters!

Unmatching a match

You can also add "exclude" lines By prefixing a line with the '~' character. They can be used to limit what a list will match, and are only checked if a regular match is found first in the list. The list will return as true only if none of the exclude lines match. For example a list like...

#
# Another sample list 
# an example of using `~` to exclude
#

*.gif
~*/gamera.gif

The first line would match anything ending in ".gif", however the second line checks to see if it also matches "/gamera.gif". Which will insure that Gamera is never caught in our list (think of it as a Turtle Excluder Device).

Obfuscating more clearly

Lists can also be called in the replacement text of a filter. Here they're not really used to match anything but instead are used to set a positional variable to some value. For instance, by using the $CON(#,#) along with the $SET(#=...) matching commands, replacement text variables can be rotated based on the connection number (like multiple User-Agents or browser versions for instance)....

#
# A sample value rotation list (named "MyList")
#

$CON(1,3)$SET(0=Value One)
$CON(2,3)$SET(0=Value Two)
$CON(3,3)$SET(0=Value Three)

will place the next of the three values into \0 each time it's called. You could use this in a replacement section like so...

$LST(MyList) \0

First we call the list to set \0 then we print the value of \0 by placing it in the replacement text.

Breaking up isn't so hard to do

Normally, each line in a list is treated as an independent matching expression. However long expressions can be broken up over multiple lines by indenting the lines that follow. For example...

taste (this|that|these|those|the other thing)

could also be written as....

taste (
  this|
  that|
  these|
  those|
  the other thing)

The effect is exactly the same, but you can break long lines up for easier reading - leading or trailing whitespace on each line will be ignored.

Some comments on comments

Also, as you've probably guessed from the examples, Lists can contain comments by beginning a line with '#'. Comments normally will be ignored, but the first few lines of a list are scanned for certain "keywords" which can affect how the list works. Currently there are six keywords: "NoAddURL", "junkbuster", "NoHash", "NoUrlHash", "NoPreHash", and "Logfile".

"NoAddURL" hides the list from the "Add to blockfile" menu in the sysytem tray. It's useful to keep it from becoming cluttered by lists not used for URL matches.

"JunkBuster" if found, will cause Proxomitron to attempt to read and interpret the list as a JunkBuster style blockfile. It's probably less than perfect, but seems to work fairly well with most JunkBuster lists.

Note that due to differences in methodology, designing your own list by adding URLs as you find them will likely be more efficient. In particular, JunkBuster processes hostnames in reverse (root first). Proxomitron treats a URL the same as any random text, so you're better off not using an initial wildcard. For instance, "(www.|)somehost.com" will be much faster than "*somehost.com". If you need a leading wildcard try "[^/]++somehost.com". It's a little better than '*' since it only scans up to the first "/" in the URL.

"NoHash", "NoUrlHash", and "NoPreHash" are used to disable various hashing algorithms used in lists. NoHash eliminates all hashing and can save memory for list that are seldom called or where speed isn't an issue. "NoUrlHash" and "NoPreHash" disable particular hash types (see section on hashing below). You probably shouldn't need to use these very often (if at all).

"Logfile" is a special command that tells Proxomitron not to load or process the contents of the file like a regular blocklist. Instead it's useful for using a blocklist for logging purposes along with the $ADDLST() command.

Normally Proxomitron loads and parses the contents of a blockfile into memory as matching expressions. However, for a logfile, not only would that be a waste of memory, but it might also produce parsing errors since the file's contents may not be matching expressions at all.

To set up a logfile, just create a file, add "LOGFILE" somewhere within the first few lines like so...

##
## Proxomitron log file (LOGFILE)
##

Then add the file as you would any other blocklist. To write messages to the log, just use the $ADDLST(Logname, text to add) command.

To view the log just use Proxomitron's "Edit blockfile" option. This will first close the file to flush any unwritten data, then it will launch it in the default editor for the filename's extension (like .txt or .log). To make adding new data faster, unlike other blocklists, Proxomitron keeps logfiles open between writes.

Blocklist Indexing (hashes)

Another advantage lists have over using a bunch of ORs in a match is speed. Proxomitron can do a sort of indexed hash lookup on eligible list entries. Not everything can be indexed, but for items that can, it makes finding matches in a large list much faster. Normally you don't need to worry much about how this works, but if you want to guarantee your blocklist will be as fast as possible here's some tips...

First Proxomitron checks each item in the list to see if it's "hashable" (can be indexed). If so, it's added to a hashable list - if not, it's added to a non-hashable list that must be scanned each time the list is checked. Of course, it's better to be hashable.

There's two types of indexes Proxomitron can use - a fixed prefix and a URL style index. Each item in the list is checked to see if it can be indexed using either one method or the other, if so the method that can index the most characters is used for that item. The total list may have mixture of both types.

"fixed prefix" are the simplest - they're just any expression that has a fixed number of characters without any wildcards. The longer the prefix is before any wildcard, the more indexable it becomes. Most user added URLs probably fall into this category, but it benefits many non-URL based lists too. Here's a few examples of eligible expressions...

www.somewhere.com
127.0.0.
shonen(kinfe|)
foo(bar|bat)*bear

ANDs are fine too as in "this*&*that" - however ORs outside of parens like "this|that" won't index since the match can begin with two different values. In this case it's better to place each on its own line.

URL style hashes as the name implies are designed mainly for lists of URLs. The goal is to allow some leading wildcards to be used, since often this is necessary for matching partial hostnames. It works by looking in the expression for the end of the hostname (marked by a ":" or "/") and indexes back from there. For it to work there must be no other wildcards between the hostname's end and the leading wildcard. Valid wildcards include "*", "\w", "[...]+", "[...]++", and "(...|)". This should cover the most useful ones for leading off URLs. Again here's some examples...

*somehost.com/(anything|after|here|is|fine)/\w.html
\wsomehost.com/
[^.]+.somehost.com/
[^/]++somehost.com/
(www.|)somehost.com:[0-9]+/
([^/]++.|)somehost.com/

Unfortunately things like...

([^/]++.|)somehost.*/
([^/]++.|)somehost.(com|net)/

won't be indexable. In these cases it may actually be faster to write them as two entries...

([^/]++.|)somehost.com/
([^/]++.|)somehost.net/

One change blocklist authors should take note of is, when using a leading wildcard, it's very important to use the full hostname including the "/". Previously this wasn't necessary, so an entry like...

([^/]++.|)microsoft.

would be better written as...

([^/]++.|)microsoft.com/

or perhaps multiple entries if necessary. It also means there's less of a need to write expressions like...

www.(ad(server|engine|banner)|banner(site|click|)).(com|net)

Instead, listing all the actual hosts to be matched will be faster - not to mention easier to maintain.

On the inside...

To get an "inside" look at what's going on with your blocklists - including how they're being hashed, how often each item is checked, and how often each item matches, Proxomitron now includes a special information URL...

Here you'll find a table of all loaded lists, their filenames, number of items they contain, and the number of items that have been prefix or URL hashed. Clicking on a list's name will bring up a detaied breakdown of each entry it contains. This can come in very useful when trying to make the most efficient use of your lists.

Limitations...

Blocklist do have some limitations that make them work a bit different from a list of OR seperated items. Mainly they operate in their own scope. For example, say you have a match like...

www.$LST(hosts).com
and a list like...
#
# Host list
#

adsite
adsite-2
Given the host "www.adsite-2.com" would not match even though it looks like it should. This is because the list first finds a match with "adsite" but, being in it's own scope, can't look beyond to see that it also needs ".com". To avoid this it's important to make matches unambiguous. for example, by moving the trailing "." to the list like so...
www.$LST(hosts)com
#
# Host list
#

adsite.
adsite-2.

Also be aware that a "*" at the end of a list item will match all the way to the end of the available characters. Normally you don't want this, instead place them in the calling match...

www.$LST(hosts)*.com

The end...

Well that should be about all there is to know about blocklists. If you've made it this far I officially award you a P.C.B.F.E. (Proxomitron Certified Blocklist (or Blockfile) Engineer). Now, won't that look good on your next Resume!


Return to main index