KNOW-HOW Squid

Implementing a home proxy server with Squid SAFESAFE HARBORHARBOR

A proxy server provides safer and more efficient surfing. Although commercial proxy solutions are available, all you really need is and an old PC in the attic. and integrates easily with an BY GEERT VAN PAMEL . In my case, the Squid proxy server and the iptables firewall worked together to protect my network from have had a home network for several unwanted popup ads, and block danger- intruders and dangerous HTML. You’ll years. I started with a router using ous URLs. find many useful discussions of firewalls I Windows XP with ICS ( Con- A Squid proxy server filters Web traffic in books, magazines, and Websites. (See nection Sharing) and one multi-homed and caches frequently accessed files. A [1] and [2], for example.) The Squid Ethernet card. The main disadvantages proxy server limits Internet bandwidth proxy server, on the other hand, is not as were instability, low performance, and a usage, speeds up Web total lack of security. Troubleshooting access, and lets you Table 1: Recommended Hardware was totally impossible. Firewall configu- filter URLs. Centrally Necessary Components Specifics ration was at the mercy of inexperienced blocking advertise- Intel Pentium II CPU, or higher - users, who clicked randomly at security ments and dangerous Why not a spare Alpha Server? 350 MHz settings as if they were playing Russian downloads is cost 80 - 100 MB memory minimum more is better roulette. effective and transpar- 1 or more IDE disks (reuse 2 old disks: 1 GB I finally turned to Linux and set up an ent for the end user. system SW + swap & 3 GB for cache + /home disk) 4 GB minimum iptables firewall on a Pentium II com- Squid is a high per- 2 Ethernet cards, minihub, fast Ethernet modem, 100 Mbit/ s if wireless router or hub possible puter acting as a router. The firewall sys- formance implementa- CDROM, DVD reader software is tem would keep the attackers off my net- tion of a free Open- mostly distri- work and log incoming and outgoing Source, full-featured buted via DVD traffic. Along with the iptables firewall, I proxy caching server. Use only normal straight LAN cables [no need for modem and also set up a Squid proxy server to Squid provides exten- cross cables] minihub cross improve Internet performance, filter out sive access controls themselves!

48 ISSUE 60 NOVEMBER 2005 WWW.LINUX - MAGAZINE.COM Squid proxy server KNOW-HOW

well documented, especially for small some of the important settings in the cache_mgr sysman home networks like mine. In this article, squid.conf file. I will show you how to set up Squid. First of all, you can prevent certain dns_nameservers 192.168.0.1 metadata related to your configuration dns_testnames router.mshome.net Getting Started from reaching the external world when fqdncache_size 1024 The first step is to find the necessary you surf the Web: hardware. Figure 1 depicts the network http_port 80 configuration of the Pentium II computer vi /etc/squid/squid.conf icp_port 0 I used as a firewall and proxy server. ... This firewall system should operate with anonymize_headers deny U http_port is the port used by the proxy minimal human intervention, so after From Server Via User-Agent server. You can choose anything, as long the system is configured, you’ll want to forwarded_for off as the configuration does not conflict disconnect the mouse, keyboard, and strip_query_terms on with other ports on your router. A com- video screen. You may need to adjust the mon choice is 8080 or 80. The Squid BIOS settings so that the computer will Note that you cannot anonymize Referer default, 3128, is difficult to remember. boot without a keyboard. The goal is to and WWW-Authenticate because other- We are not using cp_port, so we set it be able to put the whole system in the wise authentication and access control to 0. This setting synchronizes proxy attic, where you won’t hear it or trip mechanisms won’t work. servers. over it. From the minihub shown in Fig- forwarded_for off means that the IP With log_mime_hdrs on, you can ure 1, you can come “downstairs” to the address of the proxy server will not be make mime headers visible in the access. home network using standard UTP cable sent externally. log file. or a wireless connection. Table 1 shows With strip_query_terms on, you do not recommended hardware for the firewall log URL parameters after the ?. When Avoid Disk Contention machine. this parameter is set to off, the full URL Squid needs to store its cache some- Assuming your firewall is working, is logged in the Squid log files. This fea- where on the hard disk. The cache is a the next step is to set up Squid. Squid is ture can help with debugging the Squid tree of directories. With the cache_dir available from the Internet at [3] or one filters, but it can also violate privacy option in the squid.conf file, you can of its mirrors [4] as tar.gz (compile from rules. specify configuration settings such as the sources). You can easily install it using The next settings identify the Squid following: one of the following commands: host, the (internal) domain where the • disk I/ O mechanism – aufs machine is operating, and the username • location of the squid cache on the disk rpm -i /cdrom/RedHat/RPMS/U of whoever is responsible for the server. – /var/ cache/ squid squid-2.4.STABLE7-4.i386.rpmU Note the dot in front of the domain. Fur- • amount of disk space that can be used # Red Hat 8 ther on, you find the name of the local by the proxy server – 2.5 GB DNS caching server, and the number of • number of main directories – 16 rpm -i /cdrom/Fedora/RPMS/U domain names to cache into the Squid • subdirectories – 256 squid-2.5.STABLE6-3.i386.rpm U server. For instance: # Fedora Core 3 visible_hostname squid cache_dir aufs U rpm -i /cdrom/.../U append_domain .mshome.net /var/cache/squid 2500 16 256 squid-2.5.STABLE6-6.i586.rpmU # SuSE 9.2 Internet At this writing, the current stable Squid version is 2.5. Configuring Squid Once Squid is installed, you’ll need to configure it. Squid has one central configuration file. Every time this file changes, the configuration must be reloaded with the command /sbin/ init. d/ squid reload. You can edit the configuration file with a text editor. You’ll find a detailed description of the settings inside the squid.conf file, although the discussion Local Network is sometimes very technical and difficult to understand. This section summarizes Figure 1: Ethernet basic LAN configuration.

WWW.LINUX - MAGAZINE.COM ISSUE 60 NOVEMBER 2005 49 KNOW-HOW Squid proxy server

U The disk access method options are as maximum_object_size Table 2: ACL Guidelines follows: _in_memory 2048 KB • ufs – classic disk access (too much I/O • the order of the rules is important can slow down the Squid server) Log Format Specification • first list all the deny rules • aufs – asynchronous UFS with threads, You can choose between Squid log for- • the first matching rule is executed less risk of disk contention mat and standard log format • the rest of the rules are ignored • the last rule should be an allow all • diskd – diskd , avoiding disk using the parameter emulate_httpd_log. contention but using more memory When the parameter is set to on, stan- UFS is the classic file system I/ O. dard web log format is used; if the pages, the more likely the page is to be We recommend using aufs to avoid I/ O parameter is set to off, you get more cached. Because your own ISP is more bottlenecks. (When you use aufs, you details with the Squid format. See [7] for remote, the ISP is less likely to be cach- have fewer processes.) more on analyzing Squid log files. ing its competitor’s contents…

# ls -ld /var/cache/squid Proxy Hierarchy cache_peer proxy.tiscali.beU lrwxrwxrwx 1 root rootU The Squid proxy can work in a hierarchi- parent 3128 3130 U 19 Nov 22 00:42 U cal way. If you want to avoid the parent no-query default /var/cache/squid -> U proxy for some destinations, you can cache_peer_domain U /volset/cache/squid allow a direct lookup. The browser will proxy.tiscali.be .tiscali.be still use your local proxy! I suggest you keep the standard file loca- no-query means that you do not use, or tion for the squid cache /var/ cache/ acl direct-domain U cannot use, ICP (the Internet Caching squid, then create a symbolic link to the dstdomain .turboline.be Protocol), see [8]. You can obtain the real cache directory. If you move the always_direct allow U same functionality using regular expres- cache to another disk for performance or direct-domain sions, but this gives you more freedom. capacity reasons, you only have to mod- ify the symbolic link. acl direct-path urlpath_regexU cache_peer proxy.tiscali.beU The disk space is distributed among -i "/etc/squid/direct-path.reg" parent 3128 3130 U all directories. You would normally look always_direct allow direct-path no-query default for even distribution across all directo- acl tiscali-proxy U ries, but in practice, some variation in Some ISPs allow you to use their proxy dstdom_regex -i U the distribution is acceptable. More com- server to visit their own pages even if \.tiscali\.be$ plex setups using multiple disks are pos- you are not a customer. This can help cache_peer_access U sible, but for home use, one directory you speed up your visits to their pages. proxy.tiscali.be allow U structure is sufficient. The closer the proxy to the original tiscali-proxy

Cache Replacement Listing 1: Blocking Unwanted Pages The proxy server uses an LRU (Least 01 acl block-ip dst "/etc/squid/block-ip.reg" Recently Used) algorithm. Detailed stud- ies by HP Laboratories [6] have revealed 02 deny_info filter_spam block-ip that an LRU algorithm is not always an 03 http_access deny block-ip intelligent choice. The GDSF setting 04 keeps small popular objects in cache, 05 acl block-hosts dstdom_regex -i "/etc/squid/block-hosts.reg" while removing bigger and lesser used objects, thus increasing the overall effi- 06 deny_info filter_spam block-hosts ciency. 07 http_access deny block-hosts 08 cache_replacement_policyU 09 acl noblock-url url_regex -i "/etc/squid/noblock-url.reg" heap GDSF 10 http_access allow noblock-url Safe_ports memory_replacement_policyU heap GDSF 11 12 acl block-path urlpath_regex -i "/etc/squid/block-path.reg" Big objects requested only once can 13 deny_info filter_spam block-path flush out a lot of smaller objects, there- 14 http_access deny block-path fore you’d better limit the maximum object size for the cache: 15 16 acl block-url url_regex -i "/etc/squid/block-url.reg" cache_mem 20 MB 17 deny_info filter_spam block-url maximum_object_sizeU 18 http_access deny block-url 16384 KB

50 ISSUE 60 NOVEMBER 2005 WWW.LINUX - MAGAZINE.COM Squid proxy server KNOW-HOW

Listing 2: Making a Page .mshome.net • day & hour U • browser type Invisible always_direct allow local-domain • username 01 vi /etc/squid/errors/filter_spam Listing 1 shows examples of commands 02 ... acl localnet-dst dst U that block unwanted pages. 03 tion settings. You have already ing