Squid Proxy Server @ moneyslow.com

# ### moneyslow.com Squid squid.conf # ########### squid.conf ########### # ## interface, port and proxy type #http_port 10.10.10.1:8080 transparent http_port 10.10.10.1:8080 ## general options cache_mgr not_to_be_disturbed client_db on collapsed_forwarding on detect_broken_pconn on dns_defnames on dns_retransmit_interval 2 seconds dns_timeout 5 minutes forwarded_for off half_closed_clients off httpd_suppress_version_string on ignore_unknown_nameservers on pipeline_prefetch on retry_on_error on strip_query_terms off uri_whitespace strip visible_hostname localhost ## timeouts forward_timeout 30 seconds connect_timeout 30 seconds read_timeout 30 seconds request_timeout 30 seconds persistent_request_timeout 1 minute client_lifetime 20 hours ## host definitions acl all src 0.0.0.0/0 acl localhost src 127.0.0.1/255.255.255.255 acl to_localhost dst 127.0.0.0/8 ## proxy server client access acl mynetworks src 127.0.0.0/8 10.10.10.0/28 http_access deny !mynetworks ## max connections per ip acl maxuserconn src 127.0.0.0/8 10.0.10.0/28 acl limitusercon maxconn 500 http_access deny maxuserconn limitusercon ## disable caching cache deny all cache_dir null /tmp ## disable multicast icp icp_port 0 icp_access deny all ## disable ident lookups ident_lookup_access deny all ## no-trust for on-the-fly Content-Encoding acl apache rep_header Server ^Apache broken_vary_encoding allow apache ## logs logformat combined [%tl] %>A %{Host}>h "%rm %ru HTTP/%rv" %Hs %<st "%{Referer}>h" "%{User-Agent}>h" %Ss:%Sh access_log /var/log/squid/access.log combined cache_store_log /var/log/squid/store.log cache_log /var/log/squid/cache.log logfile_rotate 8 ## support files coredump_dir /tmp pid_filename /var/log/squid/squid.pid ## ports allowed acl Safe_ports port 80 443 http_access deny !Safe_ports ## ssl ports/method allowed acl SSL_ports port 443 acl CONNECT method CONNECT http_access deny CONNECT !SSL_ports ## protocols allowed acl Safe_proto proto HTTP SSL http_access deny !Safe_proto ## browsers allowed # acl Safe_browser browser ^Mozilla/5\.0.*Firefox/2\.0\.0\.6 # http_access deny !Safe_Browser ## disable ads ( //squid_adservers.html ) # acl ads dstdom_regex "/etc/squid/ad_block.txt" # http_access deny ads # deny_info TCP_RESET ads ## Banned Sites # acl Bad_Site dstdom_regex myspace.com youtube.com facebook.com # http_access deny Bad_Site ## redirector # acl my_url dstdomain SITE_NAME.COM # redirector_access allow my_url # redirect_children 1 # redirect_rewrites_host_header off # redirect_program /etc/squid/squid_redirector.pl ## methods allowed acl Safe_method method CONNECT GET HEAD POST http_access deny !Safe_method ## allow replies to client requests http_reply_access allow all ## header re-write # header_replace Accept */* # header_replace Accept-Encoding gzip # header_replace Accept-Language en header_replace User-Agent OurBrowser/1.0 (Some Name) ## header list ( DENY all - ALLOW listed ) header_access Accept allow all header_access Accept-Encoding allow all header_access Accept-Language allow all header_access Authorization allow all header_access Cache-Control allow all header_access Content-Disposition allow all header_access Content-Encoding allow all header_access Content-Length allow all header_access Content-Location allow all header_access Content-Range allow all header_access Content-Type allow all header_access Cookie allow all header_access Expires allow all header_access Host allow all header_access If-Modified-Since allow all header_access Location allow all header_access Range allow all header_access Referer allow all header_access Set-Cookie allow all header_access WWW-Authenticate allow all header_access All deny all ########## END ###########

Getting Started

The following instructions will allow you to get squid installed and working with the squid.conf config file listed above. Since entire books are written about squid we can not go to go into all of the definitions of all of the directives in the config file. Once you get squid working check at the bottom of the page for links to the squid directives definitions page.

Doing the install

Step 1: To get started you need to install Squid. Most operating systems have packages (rpm dev pkg) for Squid and you can also build it from source (squid-cache.org).

Step 2: Once squid is installed, download the squid.conf config file from above and place it in your squid config directory. This is usually found in /etc/squid/ on most OS distributions.

Step 3: Now, we need to edit the squid.conf file and make changes reflecting your environment.

"interface, port and proxy type" : We need to set the ip and port the squid daemon is going to listen on. In our example we listen on 10.10.10.1 port 8080 as that is the interface on the internal network on our client machines.

"Access Control List" : Next edit the area called "proxy server client access" and look for the directive "acl mynetworks src". This is the access control list (acl) of networks or individual ips that can access squid. You need to put in the network ips of your LAN. For example, most internal networks are setup with the ips 192.168.0.0 to 192.168.0.254. Then you would make sure the line read "acl mynetworks src 127.0.0.0/8 192.168.0/24".

"The logs" : The log files are going to be placed in /var/log/squid/ and we need to make that directory and make it owned by the squid user. Use "mkdir -p /var/log/squid/" and chown _squid:_squid /var/log/squid/" for OpenBSD Squid from packages.

Optional: Redirector : A redirector is a program squid will call to do a job. You can use a redirector for many purposes like blocking and redirecting URLs. In this example we are going to have squid pass URLs to the following Perl script to re-write the URL "SITE_NAME.COM" to "localhost:8080". If you run squid on the same machine as a webserver, then you may want to use this method.

The client browser will use the URL SITE_NAME.com and the requests will actually goto the webserver running on localhost port 8080. Notice that we have added an ACL to only have URLs with the destination domain of SITE_NAME.COM use the redirector to reduce congestion and keep squid fast. Squid will also not touch the "hosts" header. This means that clients will actually still see the URL name "SITE_NAME.COM" in the URL field even though they are getting the data from "localhost port 8080".

You are welcome to cut/paste the following Perl code. This script is called squid_redirector.pl and place in /etc/squid/ according to our example.

#!/usr/bin/perl -p
BEGIN { $|=1 }
s|http://SITE_NAME.COM|http://localhost:8080|;
Example:     header_replace Accept text/plain; q=0.5, text/html, text/x-dvi; q=0.8, text/x-c
moneyslow.com: header_replace Accept */*

The Accept-Encoding header is sent by the client to the server to explain what compression encoding the client will accept. Compression will make the data being transfered smaller in size at the expense of CPU time on the server and the client due to compressing/uncompressing the data. The server can honor this request and send the data in the format listed or ignore it completely and send the data clear text. For privacy concerns we can replace the true header of the client with a request for "gzip" only. This option works with all clients.

Example:     header_replace Accept-Encoding compress, gzip
moneyslow.com: header_replace Accept-Encoding gzip

The Accept-Language header is sent by the client to the server to explain what language we would like the page to be in. The server can honor this request and send the data in the format listed or ignore it completely and send what ever the server want to. For privacy concerns we can replace the true header of the client with our default language of "en". This option works with all clients.

Example:     Accept-Language da, en-gb;q=0.8, en;q=0.7
moneyslow.com: Accept-Language en

The User-Agent header is sent by the client to the server to explain what browser name, browser version, build type, compiler version and other information about the client. For some sites (www.digg.com) this header must be sent in the proper format as seen in the moneyslow.com example, but not necessarily have valid or true information. For privacy concerns we can replace the true header of the client with what ever we want as long as it is at least in the form of the moneyslow.com example. If you want to randomize the User-Agent string then check on the moneyslow.com Home Page for a Squid User-Agent randomizer script.

Example:     header_replace User-Agent Mozilla/5.0 (X11; U;) Gecko/20080221 Firefox/2.0.0.9
moneyslow.com: header_replace User-Agent OurBrowser/1.0 (Some Name)

The Authorization header is sent by the client to the server with the user name and password for access. This header can also be used with the pop-up user name/password box that WWW-Authentication provides. This header is _NOT_ used for the user name and password of Java scripted sites like Netflix, digg, and financial institutions. This header _IS_ used to send credentials in the URL to the server. You will need the Authorization header if you have hosts connecting to sites with ddclient for dyndns updates or for a machine with MythTV so it can receive updates from Zap2It Labs for TV programming. If you see errors on the machine running squid in your logs from ddclient with "authorization failed (HTTP/1.0 401 Unauthorized" or "X-UpdateCode: A" this is authorization not allowed in squid.

moneyslow.com:   header_access Authorization allow all

Want to BLOCK ad servers with squid? Make sure to check out the Squid Anti-Ad Server Guide. With a little time and understanding you could easily block 90% of that ads that show up in your browser. You can also setup a proxy auto configuration (PAC) file in the browser using our Proxy Auto Config for Firefox (PAC) "how to".

The Content-Disposition header is an extension to the MIME protocol instructing a MIME user agent on how it should display an download file. When the browser receives the header, it raises a "file download" dialog box with the file name specified by the server. One only needs this header if you use web pages that dynamically name the download file through a scripted process. For example, if the web page dynamically generates a list and specifies the filename as "calomel_file.txt", but you see the file being saved incorrectly as "file_script.pl" then blocking this header might be the problem.

moneyslow.com:   header_access Content-Disposition allow all

The Content-Encoding header is sent by the server back to the client to explain what compression method or factor the server is sending the data in. Since we specified the header Accept-Encoding as "gzip" the server should be sending the client the same.

moneyslow.com:   header_access Content-Encoding allow all

The Content-Length header is sent by the server back to the client to detail how much data the client should expect to receive. If the server says 1MB of data is being sent and only 0.9MB data arrived the client knows to wait longer or re-request the data.

moneyslow.com:   header_access Content-Length allow all

The Content-Location header is sent by the server back to the client specifing the exact URI or relitive URL of the clients request. This header is also needed so that the server can notify the client the page they have ask for has changed or not. Content-Location works in conjunction with the If-Modified-Since GET request to return the page or return a 304, Not Modified message.

moneyslow.com:   header_access Content-Location allow all

The Content-Range header field allows the remote server to tell a client how much data is left during a resumed download. Failure to allow this header on a resumed download will cause an error "206 Partial Content". The server has fulfilled a partial GET request for the client. The client request MUST include a Range header field indicating the desired range, and MAY have included an If-Range header field to make the request conditional.

moneyslow.com:   header_access Content-Range allow all

The Content-Type header field indicates the media type of the entity-body sent to the recipient or, in the case of the HEAD method, the media type that would have been sent had the request been a GET. If the server is sending text/html page to the client then "text/html" will be sent through this header.

moneyslow.com:   header_access Content-Type allow all

The Cookie header field allows the client to accept the cookie file from the server. This does _NOT_ allow the client to use the cookie, but only accept the cookie object. This header is used in conjunction with the header Set-Cookie to allow the client to accept the cookie file and to use it for the server site. A site that requires the headers Cookie and Set-Cookie for example is Netflix.com which will not even let you log in with out cookies enabled for the client. Other pages like digg.com and amazon.com will not recognize your client if you try to log into them with out this header.

moneyslow.com:   header_access Cookie allow all

The Host header is sent from the client to the server specifying the host the client wants to connect to. Some sites use many virtual hosts on one server on a single ip address. If the client does not send the Host header the server does not know which virtual host the client wants to connect to. This is required for most sites.

moneyslow.com:   header_access Host allow all

Want to randomize your User-Agent though Squid? Check out our Squid User-Agent Randomizer Script. Hide the true name or your internal browsers and send any string you want to remote web servers.

The If-Modified-Since header is allows the client to ask the server if the page being requested is newer than the last time our client downloaded it. If it is then our client will download it like a standard GET request. If the page is not new then the server will reply with a code "304 Not Modified" and our client will know the copy we have cached is the newest available. This saves the server and the client bandwidth and will make previously cached pages load significantly faster.

moneyslow.com:   header_access If-Modified-Since allow all

The Location header is used to redirect the recipient to a location other than the Request-URI for completion of the request or identification of a new resource. This header is sometimes used in conjunction with the Authorization header. For example, the client may log into one server to be authorized and then is redirected with the Location header to another server to access the site or receive the data.

moneyslow.com:   header_access Location allow all

The Range header specifies HTTP retrieval requests using conditional or unconditional GET methods and MAY request one or more sub-ranges of the entity, instead of the entire entity. For example a client may request the first 10KB of a 20MB file which includes descriptive information about a rpm package rather than download the entire file. If one is using the "yum" package maintainer and you see the error similar to "Header is not complete. Trying other mirror." then you need to add the Range header to squid.

moneyslow.com:   header_access Range allow all

The Referer request-header field allows the client to specify, for the server's benefit, the address (URI) of the resource from which the Request-URI was obtained. The header allows a server to generate lists of back-links to resources for interest, logging, optimized caching, or security (like to stop image hijacking). It also allows obsolete or mistyped links to be traced for maintenance.

moneyslow.com:   header_access Referer allow all

The Set-Cookie header works in conjunction with the header Cookie. This header allows the client to use the cookie file download from a site and allowed by the Cookie header.

moneyslow.com:   header_access Set-Cookie allow all

The WWW-Authenticate header is the pop-up window or box the client sees to enter their user name and password into. This only allows the client to pop-up the box allowing the input of the credentials, it is nothing else. Once the user enters the user name/password in the box and hits "accept" the header Authenticate actually sends the user name/password to the server. You will need the WWW-Authenticate header if you have hosts connecting to sites like Zap2it or SchedulesDirect.org with MythTV so it can receive updates for TV programming.

moneyslow.com:   header_access WWW-Authenticate allow all

The All header is a variable squid uses to define "any" http header. This rule is to deny all headers and is used in conjunction with the above rules. In essence, if the header is not defined above then this rule will block it. Think of this methodology as paranoid mode.

moneyslow.com    header_access All deny all

Want more speed? Make sure to also check out the Network Speed and Performance Guide. With a little time and understanding you could easily double your firewall's throughput.

Questions?

Where can I find a list of all of the squid directives?A full listing of the squid configuration directives can be found here on squid-cache.org.

Where can I find a list of all of the header field definitions?The listing of header definitions are at W3.org in the rfc protocols section

How do I test if the headers a being changed by squid correctly?You can test the browser headers at Xhaus's header test page.

How can I start and stop the squid daemon?

squid -k kill        -- To stop squid
squid -k reconfigure -- To re-read the squid.conf file without restarting squid

What is the best way to rotate the squid log files?Make sure you have the squid.conf directive "logfile_rotate 8" set. The number "8" means we want to keep 8 copy's of the logs. Then setup at cron job like so to actually rotate the logs. Here we will keep 8 weekly log files and rotate them on Sunday at 12midnight.

#minute (0-59)
#|   hour (0-23)
#|   |    day of the month (1-31)
#|   |    |   month of the year (1-12 or Jan-Dec)
#|   |    |   |   day of the week (0-6 with 0=Sun or Sun-Sat)
#|   |    |   |   |   commands
#|   |    |   |   |   |
### rotate logs weekly (Sunday at 12midnight)
00   0    *   *   0   squid -k rotate

Can I setup environmental variables so programs will know squid is available? Many command line programs will look at the proxy environmental variables and use the proxy defined. For example, in the bash shell (.bashrc or .profile) you can define the following to tell the client to use "squid.proxy.lan" port "8080" for the squid proxy.

 #### .bashrc or .profile
  export http_proxy="http://squid.proxy.lan:8080"
	 export https_proxy="http://squid.proxy.lan:8080"

Other programs also have the option of using a configuration file. Wget's config file is /etc/wgetrc. This is an example of the proxy setting:

	  #### /etc/wgetrc
		 use_proxy = on
		  http_proxy = http://squid.proxy.lan:8080/

How do I turn off Firefox's caching ability? If you dislike Firefox's caching behavior and your nameservers cache just fine then disable all caching in the browser. Firefox's cache is just another level of expirations to go through. Here's the cross-platform method, if you should wish to do so:

			In the Firefox configuration URL "about:config" add two new integer entries:
			  network.dnsCacheExpiration -> 0
				  network.dnsCacheEntries    -> 0