The following instructions will allow you to get squid installed and working with the squid.conf config file listed above. Since entire books are written about squid we can not go to go into all of the definitions of all of the directives in the config file. Once you get squid working check at the bottom of the page for links to the squid directives definitions page.
Step 1: To get started you need to install Squid. Most operating systems have packages (rpm dev pkg) for Squid and you can also build it from source (squid-cache.org).
Step 2: Once squid is installed, download the squid.conf config file from above and place it in your squid config directory. This is usually found in /etc/squid/ on most OS distributions.
Step 3: Now, we need to edit the squid.conf file and make changes reflecting your environment.
"interface, port and proxy type" : We need to set the ip and port the squid daemon is going to listen on. In our example we listen on 10.10.10.1 port 8080 as that is the interface on the internal network on our client machines.
"Access Control List" : Next edit the area called "proxy server client access" and look for the directive "acl mynetworks src". This is the access control list (acl) of networks or individual ips that can access squid. You need to put in the network ips of your LAN. For example, most internal networks are setup with the ips 192.168.0.0 to 192.168.0.254. Then you would make sure the line read "acl mynetworks src 127.0.0.0/8 192.168.0/24".
"The logs" : The log files are going to be placed in /var/log/squid/ and we need to make that directory and make it owned by the squid user. Use "mkdir -p /var/log/squid/" and chown _squid:_squid /var/log/squid/" for OpenBSD Squid from packages.
Optional: Redirector : A redirector is a program squid will call to do a job. You can use a redirector for many purposes like blocking and redirecting URLs. In this example we are going to have squid pass URLs to the following Perl script to re-write the URL "SITE_NAME.COM" to "localhost:8080". If you run squid on the same machine as a webserver, then you may want to use this method.
The client browser will use the URL SITE_NAME.com and the requests will actually goto the webserver running on localhost port 8080. Notice that we have added an ACL to only have URLs with the destination domain of SITE_NAME.COM use the redirector to reduce congestion and keep squid fast. Squid will also not touch the "hosts" header. This means that clients will actually still see the URL name "SITE_NAME.COM" in the URL field even though they are getting the data from "localhost port 8080".
You are welcome to cut/paste the following Perl code. This script is called squid_redirector.pl and place in /etc/squid/ according to our example.
#!/usr/bin/perl -p BEGIN { $|=1 } s|http://SITE_NAME.COM|http://localhost:8080|; Example: header_replace Accept text/plain; q=0.5, text/html, text/x-dvi; q=0.8, text/x-c moneyslow.com: header_replace Accept */*
The Accept-Encoding header is sent by the client to the server to explain what compression encoding the client will accept. Compression will make the data being transfered smaller in size at the expense of CPU time on the server and the client due to compressing/uncompressing the data. The server can honor this request and send the data in the format listed or ignore it completely and send the data clear text. For privacy concerns we can replace the true header of the client with a request for "gzip" only. This option works with all clients.
Example: header_replace Accept-Encoding compress, gzip moneyslow.com: header_replace Accept-Encoding gzip
The Accept-Language header is sent by the client to the server to explain what language we would like the page to be in. The server can honor this request and send the data in the format listed or ignore it completely and send what ever the server want to. For privacy concerns we can replace the true header of the client with our default language of "en". This option works with all clients.
Example: Accept-Language da, en-gb;q=0.8, en;q=0.7 moneyslow.com: Accept-Language en
The User-Agent header is sent by the client to the server to explain what browser name, browser version, build type, compiler version and other information about the client. For some sites (www.digg.com) this header must be sent in the proper format as seen in the moneyslow.com example, but not necessarily have valid or true information. For privacy concerns we can replace the true header of the client with what ever we want as long as it is at least in the form of the moneyslow.com example. If you want to randomize the User-Agent string then check on the moneyslow.com Home Page for a Squid User-Agent randomizer script.
Example: header_replace User-Agent Mozilla/5.0 (X11; U;) Gecko/20080221 Firefox/2.0.0.9 moneyslow.com: header_replace User-Agent OurBrowser/1.0 (Some Name)
The Authorization header is sent by the client to the server with the user name and password for access. This header can also be used with the pop-up user name/password box that WWW-Authentication provides. This header is _NOT_ used for the user name and password of Java scripted sites like Netflix, digg, and financial institutions. This header _IS_ used to send credentials in the URL to the server. You will need the Authorization header if you have hosts connecting to sites with ddclient for dyndns updates or for a machine with MythTV so it can receive updates from Zap2It Labs for TV programming. If you see errors on the machine running squid in your logs from ddclient with "authorization failed (HTTP/1.0 401 Unauthorized" or "X-UpdateCode: A" this is authorization not allowed in squid.
moneyslow.com: header_access Authorization allow all
The Content-Disposition header is an extension to the MIME protocol instructing a MIME user agent on how it should display an download file. When the browser receives the header, it raises a "file download" dialog box with the file name specified by the server. One only needs this header if you use web pages that dynamically name the download file through a scripted process. For example, if the web page dynamically generates a list and specifies the filename as "calomel_file.txt", but you see the file being saved incorrectly as "file_script.pl" then blocking this header might be the problem.
moneyslow.com: header_access Content-Disposition allow all
The Content-Encoding header is sent by the server back to the client to explain what compression method or factor the server is sending the data in. Since we specified the header Accept-Encoding as "gzip" the server should be sending the client the same.
moneyslow.com: header_access Content-Encoding allow all
The Content-Length header is sent by the server back to the client to detail how much data the client should expect to receive. If the server says 1MB of data is being sent and only 0.9MB data arrived the client knows to wait longer or re-request the data.
moneyslow.com: header_access Content-Length allow all
The Content-Location header is sent by the server back to the client specifing the exact URI or relitive URL of the clients request. This header is also needed so that the server can notify the client the page they have ask for has changed or not. Content-Location works in conjunction with the If-Modified-Since GET request to return the page or return a 304, Not Modified message.
moneyslow.com: header_access Content-Location allow all
The Content-Range header field allows the remote server to tell a client how much data is left during a resumed download. Failure to allow this header on a resumed download will cause an error "206 Partial Content". The server has fulfilled a partial GET request for the client. The client request MUST include a Range header field indicating the desired range, and MAY have included an If-Range header field to make the request conditional.
moneyslow.com: header_access Content-Range allow all
The Content-Type header field indicates the media type of the entity-body sent to the recipient or, in the case of the HEAD method, the media type that would have been sent had the request been a GET. If the server is sending text/html page to the client then "text/html" will be sent through this header.
moneyslow.com: header_access Content-Type allow all
The Cookie header field allows the client to accept the cookie file from the server. This does _NOT_ allow the client to use the cookie, but only accept the cookie object. This header is used in conjunction with the header Set-Cookie to allow the client to accept the cookie file and to use it for the server site. A site that requires the headers Cookie and Set-Cookie for example is Netflix.com which will not even let you log in with out cookies enabled for the client. Other pages like digg.com and amazon.com will not recognize your client if you try to log into them with out this header.
moneyslow.com: header_access Cookie allow all
The Host header is sent from the client to the server specifying the host the client wants to connect to. Some sites use many virtual hosts on one server on a single ip address. If the client does not send the Host header the server does not know which virtual host the client wants to connect to. This is required for most sites.
moneyslow.com: header_access Host allow all
The If-Modified-Since header is allows the client to ask the server if the page being requested is newer than the last time our client downloaded it. If it is then our client will download it like a standard GET request. If the page is not new then the server will reply with a code "304 Not Modified" and our client will know the copy we have cached is the newest available. This saves the server and the client bandwidth and will make previously cached pages load significantly faster.
moneyslow.com: header_access If-Modified-Since allow all
The Location header is used to redirect the recipient to a location other than the Request-URI for completion of the request or identification of a new resource. This header is sometimes used in conjunction with the Authorization header. For example, the client may log into one server to be authorized and then is redirected with the Location header to another server to access the site or receive the data.
moneyslow.com: header_access Location allow all
The Range header specifies HTTP retrieval requests using conditional or unconditional GET methods and MAY request one or more sub-ranges of the entity, instead of the entire entity. For example a client may request the first 10KB of a 20MB file which includes descriptive information about a rpm package rather than download the entire file. If one is using the "yum" package maintainer and you see the error similar to "Header is not complete. Trying other mirror." then you need to add the Range header to squid.
moneyslow.com: header_access Range allow all
The Referer request-header field allows the client to specify, for the server's benefit, the address (URI) of the resource from which the Request-URI was obtained. The header allows a server to generate lists of back-links to resources for interest, logging, optimized caching, or security (like to stop image hijacking). It also allows obsolete or mistyped links to be traced for maintenance.
moneyslow.com: header_access Referer allow all
The Set-Cookie header works in conjunction with the header Cookie. This header allows the client to use the cookie file download from a site and allowed by the Cookie header.
moneyslow.com: header_access Set-Cookie allow all
The WWW-Authenticate header is the pop-up window or box the client sees to enter their user name and password into. This only allows the client to pop-up the box allowing the input of the credentials, it is nothing else. Once the user enters the user name/password in the box and hits "accept" the header Authenticate actually sends the user name/password to the server. You will need the WWW-Authenticate header if you have hosts connecting to sites like Zap2it or SchedulesDirect.org with MythTV so it can receive updates for TV programming.
moneyslow.com: header_access WWW-Authenticate allow all
The All header is a variable squid uses to define "any" http header. This rule is to deny all headers and is used in conjunction with the above rules. In essence, if the header is not defined above then this rule will block it. Think of this methodology as paranoid mode.
moneyslow.com header_access All deny all
Where can I find a list of all of the squid directives?A full listing of the squid configuration directives can be found here on squid-cache.org.
Where can I find a list of all of the header field definitions?The listing of header definitions are at W3.org in the rfc protocols section
How do I test if the headers a being changed by squid correctly?You can test the browser headers at Xhaus's header test page.
How can I start and stop the squid daemon?
squid -k kill -- To stop squid squid -k reconfigure -- To re-read the squid.conf file without restarting squid
What is the best way to rotate the squid log files?Make sure you have the squid.conf directive "logfile_rotate 8" set. The number "8" means we want to keep 8 copy's of the logs. Then setup at cron job like so to actually rotate the logs. Here we will keep 8 weekly log files and rotate them on Sunday at 12midnight.
#minute (0-59) #| hour (0-23) #| | day of the month (1-31) #| | | month of the year (1-12 or Jan-Dec) #| | | | day of the week (0-6 with 0=Sun or Sun-Sat) #| | | | | commands #| | | | | | ### rotate logs weekly (Sunday at 12midnight) 00 0 * * 0 squid -k rotate
Can I setup environmental variables so programs will know squid is available? Many command line programs will look at the proxy environmental variables and use the proxy defined. For example, in the bash shell (.bashrc or .profile) you can define the following to tell the client to use "squid.proxy.lan" port "8080" for the squid proxy.
#### .bashrc or .profile export http_proxy="http://squid.proxy.lan:8080" export https_proxy="http://squid.proxy.lan:8080"Other programs also have the option of using a configuration file. Wget's config file is /etc/wgetrc. This is an example of the proxy setting:
#### /etc/wgetrc use_proxy = on http_proxy = http://squid.proxy.lan:8080/
How do I turn off Firefox's caching ability? If you dislike Firefox's caching behavior and your nameservers cache just fine then disable all caching in the browser. Firefox's cache is just another level of expirations to go through. Here's the cross-platform method, if you should wish to do so:
In the Firefox configuration URL "about:config" add two new integer entries: network.dnsCacheExpiration -> 0 network.dnsCacheEntries -> 0