httpd, the Apache HTTP Server

The Apache HTTP Server is a “heavy duty” network server that Subversion can leverage. Via a custom module, httpd makes Subversion repositories available to clients via the WebDAV/DeltaV protocol, which is an extension to HTTP 1.1 (see http://www.webdav.org/ for more information). This protocol takes the ubiquitous HTTP protocol that is the core of the World Wide Web, and adds writing—specifically, versioned writing—capabilities. The result is a standardized, robust system that is conveniently packaged as part of the Apache 2.0 software, supported by numerous operating systems and third-party products, and doesn't require network administrators to open up yet another custom port.[42] While an Apache-Subversion server has more features than svnserve, it's also a bit more difficult to set up. With flexibility often comes more complexity.

Much of the following discussion includes references to Apache configuration directives. While some examples are given of the use of these directives, describing them in full is outside the scope of this chapter. The Apache team maintains excellent documentation, publicly available on their web site at http://httpd.apache.org. For example, a general reference for the configuration directives is located at http://httpd.apache.org/docs-2.0/mod/directives.html.

Also, as you make changes to your Apache setup, it is likely that somewhere along the way a mistake will be made. If you are not already familiar with Apache's logging subsystem, you should become aware of it. In your httpd.conf file are directives that specify the on-disk locations of the access and error logs generated by Apache (the CustomLog and ErrorLog directives, respectively). Subversion's mod_dav_svn uses Apache's error logging interface as well. You can always browse the contents of those files for information that might reveal the source of a problem that is not clearly noticeable otherwise.

先决条件

为了让你的版本库使用HTTP网络,你基本上需要两个包里的四个部分。你需要Apache httpd2.0和包括的mod_dav DAV模块,Subversion和与之一同分发的mod_dav_svn文件系统提供者模块,如果你有了这些组件,网络化你的版本库将非常简单,如:

  • Getting httpd 2.0 up and running with the mod_dav module

  • Installing the mod_dav_svn back end to mod_dav, which uses Subversion's libraries to access the repository

  • Configuring your httpd.conf file to export (or expose) the repository

You can accomplish the first two items either by compiling httpd and Subversion from source code or by installing prebuilt binary packages of them on your system. For the most up-to-date information on how to compile Subversion for use with the Apache HTTP Server, as well as how to compile and configure Apache itself for this purpose, see the INSTALL file in the top level of the Subversion source code tree.

基本的 Apache 配置

Once you have all the necessary components installed on your system, all that remains is the configuration of Apache via its httpd.conf file. Instruct Apache to load the mod_dav_svn module using the LoadModule directive. This directive must precede any other Subversion-related configuration items. If your Apache was installed using the default layout, your mod_dav_svn module should have been installed in the modules subdirectory of the Apache install location (often /usr/local/apache2). The LoadModule directive has a simple syntax, mapping a named module to the location of a shared library on disk:

LoadModule dav_svn_module     modules/mod_dav_svn.so

注意,如果mod_dav是作为共享对象编译(而不是静态链接到httpd程序),你需要为它使用LoadModule语句,一定确定它在mod_dav_svn之前:

LoadModule dav_module         modules/mod_dav.so
LoadModule dav_svn_module     modules/mod_dav_svn.so

At a later location in your configuration file, you now need to tell Apache where you keep your Subversion repository (or repositories). The Location directive has an XML-like notation, starting with an opening tag and ending with a closing tag, with various other configuration directives in the middle. The purpose of the Location directive is to instruct Apache to do something special when handling requests that are directed at a given URL or one of its children. In the case of Subversion, you want Apache to simply hand off support for URLs that point at versioned resources to the DAV layer. You can instruct Apache to delegate the handling of all URLs whose path portions (the part of the URL that follows the server's name and the optional port number) begin with /repos/ to a DAV provider whose repository is located at /var/svn/repository using the following httpd.conf syntax:

<Location /repos>
  DAV svn
  SVNPath /var/svn/repository
</Location>

If you plan to support multiple Subversion repositories that will reside in the same parent directory on your local disk, you can use an alternative directive—SVNParentPath—to indicate that common parent directory. For example, if you know you will be creating multiple Subversion repositories in a directory /var/svn that would be accessed via URLs such as http://my.server.com/svn/repos1, http://my.server.com/svn/repos2, and so on, you could use the httpd.conf configuration syntax in the following example:

<Location /svn>
  DAV svn

  # any "/svn/foo" URL will map to a repository /var/svn/foo
  SVNParentPath /var/svn
</Location>

使用上面的语法,Apache会代理所有URL路径部分为/svn/的请求到Subversion的DAV提供者,Subversion会认为SVNParentPath指定的目录下的所有项目是真实的Subversion版本库,这通常是一个便利的语法,不像是用SVNPath指示,我们在此不必为创建新的版本库而重启Apache。

Be sure that when you define your new Location, it doesn't overlap with other exported locations. For example, if your main DocumentRoot is exported to /www, do not export a Subversion repository in <Location /www/repos>. If a request comes in for the URI /www/repos/foo.c, Apache won't know whether to look for a file repos/foo.c in the DocumentRoot, or whether to delegate mod_dav_svn to return foo.c from the Subversion repository. The result is often an error from the server of the form 301 Moved Permanently.

At this stage, you should strongly consider the question of permissions. If you've been running Apache for some time now as your regular web server, you probably already have a collection of content—web pages, scripts, and such. These items have already been configured with a set of permissions that allows them to work with Apache, or more appropriately, that allows Apache to work with those files. Apache, when used as a Subversion server, will also need the correct permissions to read and write to your Subversion repository.

你会需要检验权限系统的设置满足Subversion的需求,同时不会把以前的页面和脚本搞乱。这或许意味着修改Subversion的访问许可来配合Apache服务器已经使用的工具,或者可能意味着需要使用httpd.confUserGroup指示来指定Apache作为运行的用户和Subversion版本库的组。并不是只有一条正确的方式来设置许可,每个管理员都有不同的原因来以特定的方式操作,只需要意识到许可关联的问题经常在为Apache配置Subversion版本库的过程中被疏忽。

认证选项

At this point, if you configured httpd.conf to contain something like the following:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn
</Location>

then your repository is “anonymously” accessible to the world. Until you configure some authentication and authorization policies, the Subversion repositories that you make available via the Location directive will be generally accessible to everyone. In other words:

  • Anyone can use a Subversion client to check out a working copy of a repository URL (or any of its subdirectories).

  • Anyone can interactively browse the repository's latest revision simply by pointing a web browser to the repository URL.

  • Anyone can commit to the repository.

当然,你也许已经设置了pre-commit钩子来防止提交(见“实现版本库钩子”一节),但是就像你读到的,也可以使用Apache内置的方法来限制访问。

Setting up HTTP authentication

The easiest way to authenticate a client is via the HTTP Basic authentication mechanism, which simply uses a username and password to verify that a user is who she says she is. Apache provides an htpasswd utility for managing the list of acceptable usernames and passwords. Let's grant commit access to Sally and Harry. First, we need to add them to the password file:

$ ### First time: use -c to create the file
$ ### Use -m to use MD5 encryption of the password, which is more secure
$ htpasswd -cm /etc/svn-auth-file harry
New password: *****
Re-type new password: *****
Adding password for user harry
$ htpasswd -m /etc/svn-auth-file sally
New password: *******
Re-type new password: *******
Adding password for user sally
$

下一步,你需要在httpd.confLocation区里添加一些指示来告诉Apache如何来使用这些密码文件,AuthType指示指定系统使用的认证类型,这种情况下,我们需要指定Basic认证系统,AuthName是你提供给认证域一个任意名称,大多数浏览器会在向用户询问名称和密码的弹出窗口里显示这个名称,最终,使用AuthUserFile指示来指定使用htpasswd创建的密码文件的位置。

添加完这三个指示,你的<Location>区块一定像这个样子:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn
  AuthType Basic
  AuthName "Subversion repository"
  AuthUserFile /etc/svn-auth-file
</Location>

This <Location> block is not yet complete, and it will not do anything useful. It's merely telling Apache that whenever authorization is required, Apache should harvest a username and password from the Subversion client. What's missing here, however, are directives that tell Apache which sorts of client requests require authorization. Wherever authorization is required, Apache will demand authentication as well. The simplest thing to do is protect all requests. Adding Require valid-user tells Apache that all requests require an authenticated user:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn
  AuthType Basic
  AuthName "Subversion repository"
  AuthUserFile /etc/svn-auth-file
  Require valid-user
</Location>

一定要阅读后面的部分(“授权选项”一节)来得到Require的细节,和授权政策的其他设置方法。

One word of warning: HTTP Basic Auth passwords pass in very nearly plain-text over the network, and thus are extremely insecure.

Another option is to not use Basic authentication but to use Digest authentication instead. Digest authentication allows the server to verify the client's identity without passing the plain-text password over the network. Assuming that the client and server both know the user's password, they can verify that the password is the same by using it to apply a hashing function to a one-time bit of information. The server sends a small random-ish string to the client; the client uses the user's password to hash the string; the server then looks to see if the hashed value is what it expected.

Configuring Apache for Digest authentication is also fairly easy, and only a small variation on our prior example. Be sure to consult Apache's documentation for full details.

<Location /svn>
  DAV svn
  SVNParentPath /var/svn
  AuthType Digest
  AuthName "Subversion repository"
  AuthDigestDomain /svn/
  AuthUserFile /etc/svn-auth-file
  Require valid-user
</Location>

If you're looking for maximum security, then public-key cryptography is the best solution. It may be best to use some sort of SSL encryption, so that clients authenticate via https:// instead of http://; at a bare minimum, you can configure Apache to use a self-signed server certificate. [43] Consult Apache's documentation (and OpenSSL documentation) about how to do that.

SSL certificate management

商业应用需要越过公司防火墙的版本库访问,防火墙需要小心的考虑非认证用户“吸取”他们的网络流量的情况,SSL让那种形式的关注更不容易导致敏感数据泄露。

如果Subversion使用OpenSSL编译,它就会具备与Subversion服务器使用https://的URL通讯的能力,Subversion客户端使用的Neon库不仅仅可以用来验证服务器证书,也可以必要时提供客户端证书,如果客户端和服务器交换了SSL证书并且成功地互相认证,所有剩下的交流都会通过一个会话关键字加密。

It's beyond the scope of this book to describe how to generate client and server certificates and how to configure Apache to use them. Many other books, including Apache's own documentation, describe this task. But what can be covered here is how to manage server and client certificates from an ordinary Subversion client.

当通过https://与Apache通讯时,一个Subversion客户端可以接收两种类型的信息:

  • A server certificate

  • A demand for a client certificate

如果客户端接收了一个服务器证书,它需要去验证它是可以相信的:这个服务器是它自称的那一个吗?OpenSSL库会去检验服务器证书的签名人或者是核证机构(CA)。如果OpenSSL不可以自动信任这个CA,或者是一些其他的问题(如证书过期或者是主机名不匹配),Subversion命令行客户端会询问你是否愿意仍然信任这个证书:

$ svn list https://host.example.com/repos/project

Error validating server certificate for 'https://host.example.com:443':
 - The certificate is not issued by a trusted authority. Use the
   fingerprint to validate the certificate manually!
Certificate information:
 - Hostname: host.example.com
 - Valid: from Jan 30 19:23:56 2004 GMT until Jan 30 19:23:56 2006 GMT
 - Issuer: CA, example.com, Sometown, California, US
 - Fingerprint: 7d:e1:a9:34:33:39:ba:6a:e9:a5:c4:22:98:7b:76:5c:92:a0:9c:7b

(R)eject, accept (t)emporarily or accept (p)ermanently?

This dialogue should look familiar; it's essentially the same question you've probably seen coming from your web browser (which is just another HTTP client like Subversion). If you choose the (p)ermanent option, the server certificate will be cached in your private runtime auth/ area in just the same way your username and password are cached (see “客户端凭证缓存”一节). If cached, Subversion will automatically trust this certificate in future negotiations.

Your runtime servers file also gives you the ability to make your Subversion client automatically trust specific CAs, either globally or on a per-host basis. Simply set the ssl-authority-files variable to a semicolon-separated list of PEM-encoded CA certificates:

[global]
ssl-authority-files = /path/to/CAcert1.pem;/path/to/CAcert2.pem

Many OpenSSL installations also have a predefined set of “default” CAs that are nearly universally trusted. To make the Subversion client automatically trust these standard authorities, set the ssl-trust-default-ca variable to true.

When talking to Apache, a Subversion client might also receive a challenge for a client certificate. Apache is asking the client to identify itself: is the client really who it says it is? If all goes correctly, the Subversion client sends back a private certificate signed by a CA that Apache trusts. A client certificate is usually stored on disk in encrypted format, protected by a local password. When Subversion receives this challenge, it will ask you for both a path to the certificate and for the password that protects it:

$ svn list https://host.example.com/repos/project

Authentication realm: https://host.example.com:443
Client certificate filename: /path/to/my/cert.p12
Passphrase for '/path/to/my/cert.p12':  ********
…

注意这个客户端证书是一个“p12”文件,为了让Subversion使用客户端证书,它必须是运输标准的PKCS#12格式,大多数浏览器可以导入和导出这种格式的证书,另一个选择是用OpenSSL命令行工具来转化存在的证书为PKCS#12格式。

再次,运行中servers文件允许你为每个主机自动响应这种要求,单个或两条信息可以用运行参数来描述:

[groups]
examplehost = host.example.com

[examplehost]
ssl-client-cert-file = /path/to/my/cert.p12
ssl-client-cert-password = somepassword

一旦你设置了ssl-client-cert-filessl-client-cert-password参数,Subversion客户端可以自动响应客户端证书请求而不会打扰你。[44]

授权选项

此刻,你已经配置了认证,但是没有配置授权,Apache可以要求用户认证并且确定身份,但是并没有说明这个身份的怎样允许和限制,这个部分描述了两种控制访问版本库的策略。

Blanket access control

The simplest form of access control is to authorize certain users for either read-only access to a repository or read/write access to a repository.

You can restrict access on all repository operations by adding the Require valid-user directive to your <Location> block. Using our previous example, this would mean that only clients that claimed to be either harry or sally and that provided the correct password for their respective username would be allowed to do anything with the Subversion repository:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn

  # how to authenticate a user
  AuthType Basic
  AuthName "Subversion repository"
  AuthUserFile /path/to/users/file

  # only authenticated users may access the repository
  Require valid-user
</Location>

Sometimes you don't need to run such a tight ship. For example, Subversion's own source code repository at http://svn.collab.net/repos/svn allows anyone in the world to perform read-only repository tasks (such as checking out working copies and browsing the repository with a web browser), but restricts all write operations to authenticated users. To do this type of selective restriction, you can use the Limit and LimitExcept configuration directives. Like the Location directive, these blocks have starting and ending tags, and you would nest them inside your <Location> block.

LimitLimitExcept中使用的参数是可以被这个区块影响的HTTP请求类型,举个例子,如果你希望禁止所有的版本库访问,只是保留当前支持的只读操作,你可以使用LimitExcept指示,并且使用GETPROPFINDOPTIONSREPORT请求类型参数,然后前面提到过的Require valid-user指示将会在<LimitExcept>区块中而不是在<Location>区块。

<Location /svn>
  DAV svn
  SVNParentPath /var/svn

  # how to authenticate a user
  AuthType Basic
  AuthName "Subversion repository"
  AuthUserFile /path/to/users/file

  # For any operations other than these, require an authenticated user.
  <LimitExcept GET PROPFIND OPTIONS REPORT>
    Require valid-user
  </LimitExcept>
</Location>

这里只是一些简单的例子,想看关于Apache访问控制Require指示的更深入信息,可以查看Apache文档中的教程集http://httpd.apache.org/docs-2.0/misc/tutorials.html中的Security部分。

Per-directory access control

也可以使用Apache的httpd模块mod_authz_svn更加细致的设置访问权限,这个模块收集客户端传递过来的不同的晦涩的URL信息,询问mod_dav_svn来解码,然后根据在配置文件定义的访问政策来裁决请求。

如果你从源代码创建Subversion,mod_authz_svn会自动附加到mod_dav_svn,许多二进制分发版本也会自动安装,为了验证它是安装正确,确定它是在httpd.confLoadModule指示中的mod_dav_svn后面:

LoadModule dav_module         modules/mod_dav.so
LoadModule dav_svn_module     modules/mod_dav_svn.so
LoadModule authz_svn_module   modules/mod_authz_svn.so

为了激活这个模块,你需要配置你的Location区块的AuthzSVNAccessFile指示,指定保存路径中的版本库访问政策的文件。(一会儿我们将会讨论这个文件的格式。)

Apache is flexible, so you have the option to configure your block in one of three general patterns. To begin, choose one of these basic configuration patterns. (The following examples are very simple; look at Apache's own documentation for much more detail on Apache authentication and authorization options.)

The simplest block is to allow open access to everyone. In this scenario, Apache never sends authentication challenges, so all users are treated as “anonymous.” (See 例 6.1 “匿名访问的配置实例。”.)

例 6.1. 匿名访问的配置实例。

<Location /repos>
  DAV svn
  SVNParentPath /var/svn

  # our access control policy
  AuthzSVNAccessFile /path/to/access/file
</Location>
          

On the opposite end of the paranoia scale, you can configure your block to demand authentication from everyone. All clients must supply credentials to identify themselves. Your block unconditionally requires authentication via the Require valid-user directive, and it defines a means to authenticate. (See 例 6.2 “一个认证访问的配置实例。”.)

例 6.2. 一个认证访问的配置实例。

<Location /repos>
  DAV svn
  SVNParentPath /var/svn

  # our access control policy
  AuthzSVNAccessFile /path/to/access/file

  # only authenticated users may access the repository
  Require valid-user

  # how to authenticate a user
  AuthType Basic
  AuthName "Subversion repository"
  AuthUserFile /path/to/users/file
</Location>
          

A third very popular pattern is to allow a combination of authenticated and anonymous access. For example, many administrators want to allow anonymous users to read certain repository directories, but want only authenticated users to read (or write) more sensitive areas. In this setup, all users start out accessing the repository anonymously. If your access control policy demands a real username at any point, Apache will demand authentication from the client. To do this, use both the Satisfy Any and Require valid-user directives together. (See 例 6.3 “A sample configuration for mixed authenticated/anonymous access”.)

例 6.3. A sample configuration for mixed authenticated/anonymous access

<Location /repos>
  DAV svn
  SVNParentPath /var/svn

  # our access control policy
  AuthzSVNAccessFile /path/to/access/file

  # try anonymous access first, resort to real
  # authentication if necessary.
  Satisfy Any
  Require valid-user

  # how to authenticate a user
  AuthType Basic
  AuthName "Subversion repository"
  AuthUserFile /path/to/users/file
</Location>
          

Once you've settled on one of these three basic httpd.conf templates, you need to create your file containing access rules for particular paths within the repository. This is described later in this chapter in “基于路径的授权”一节.

Disabling path-based checks

The mod_dav_svn module goes through a lot of work to make sure that data you've marked “unreadable” doesn't get accidentally leaked. This means that it needs to closely monitor all of the paths and file-contents returned by commands such as svn checkout or svn update commands. If these commands encounter a path that isn't readable according to some authorization policy, then the path is typically omitted altogether. In the case of history or rename tracing—e.g., running a command such as svn cat -r OLD foo.c on a file that was renamed long ago—the rename tracking will simply halt if one of the object's former names is determined to be read-restricted.

All of this path checking can sometimes be quite expensive, especially in the case of svn log. When retrieving a list of revisions, the server looks at every changed path in each revision and checks it for readability. If an unreadable path is discovered, then it's omitted from the list of the revision's changed paths (normally seen with the --verbose option), and the whole log message is suppressed. Needless to say, this can be time-consuming on revisions that affect a large number of files. This is the cost of security: even if you haven't configured a module such as mod_authz_svn at all, the mod_dav_svn module is still asking Apache httpd to run authorization checks on every path. The mod_dav_svn module has no idea what authorization modules have been installed, so all it can do is ask Apache to invoke whatever might be present.

On the other hand, there's also an escape hatch of sorts, which allows you to trade security features for speed. If you're not enforcing any sort of per-directory authorization (i.e., not using mod_authz_svn or similar module), then you can disable all of this path checking. In your httpd.conf file, use the SVNPathAuthz directive as shown in 例 6.4 “禁用所有的路径检查”.

例 6.4. 禁用所有的路径检查

<Location /repos>
  DAV svn
  SVNParentPath /var/svn

  SVNPathAuthz off
</Location>
          

The SVNPathAuthz directive is “on” by default. When set “off,” all path-based authorization checking is disabled; mod_dav_svn stops invoking authorization checks on every path it discovers.

额外的糖果

We've covered most of the authentication and authorization options for Apache and mod_dav_svn. But there are a few other nice features that Apache provides.

Repository browsing

One of the most useful benefits of an Apache/WebDAV configuration for your Subversion repository is that the youngest revisions of your versioned files and directories are immediately available for viewing via a regular web browser. Since Subversion uses URLs to identify versioned resources, those URLs used for HTTP-based repository access can be typed directly into a web browser. Your browser will issue an HTTP GET request for that URL; based on whether that URL represents a versioned directory or file, mod_dav_svn will respond with a directory listing or with file contents.

Since the URLs do not contain any information about which version of the resource you wish to see, mod_dav_svn will always answer with the youngest version. This functionality has the wonderful side effect that you can pass around Subversion URLs to your peers as references to documents, and those URLs will always point at the latest manifestation of that document. Of course, you can even use the URLs as hyperlinks from other web sites, too.

Proper MIME type

当浏览Subversion版本库时,web浏览器通过从Apache的HTTP GET返回内容中查看Content-Type:头可以知道如何渲染文件的线索,这个值是一种MIME类型。默认情况下,Apache告诉浏览器所有的版本库文件都是缺省的MIME类型,通常是text/plain,这样有时候会让人沮丧,如果一个用户希望版本库文件能够更有意义的渲染—例如一个foo.html,在浏览时最好能够按照HTML方式渲染。

To make this happen, you need only to make sure that your files have the proper svn:mime-type set. This is discussed in more detail in “文件内容类型”一节, and you can even configure your client to automatically attach proper svn:mime-type properties to files entering the repository for the first time; see “自动设置属性”一节.

So in our example, if one were to set the svn:mime-type property to text/html on file foo.html, then Apache would properly tell your web browser to render the file as HTML. One could also attach proper image/* mime-type properties to image files and ultimately get an entire web site to be viewable directly from a repository! There's generally no problem with this, as long as the web site doesn't contain any dynamically generated content.

Customizing the look

You generally will get more use out of URLs to versioned files—after all, that's where the interesting content tends to lie. But you might have occasion to browse a Subversion directory listing, where you'll quickly note that the generated HTML used to display that listing is very basic, and certainly not intended to be aesthetically pleasing (or even interesting). To enable customization of these directory displays, Subversion provides an XML index feature. A single SVNIndexXSLT directive in your repository's Location block of httpd.conf will instruct mod_dav_svn to generate XML output when displaying a directory listing, and to reference the XSLT stylesheet of your choice:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn
  SVNIndexXSLT "/svnindex.xsl"
  …
</Location>

Using the SVNIndexXSLT directive and a creative XSLT stylesheet, you can make your directory listings match the color schemes and imagery used in other parts of your web site. Or, if you'd prefer, you can use the sample stylesheets provided in the Subversion source distribution's tools/xslt/ directory. Keep in mind that the path provided to the SVNIndexXSLT directory is actually a URL path—browsers need to be able to read your stylesheets in order to make use of them!

Listing repositories

如果你通过 SVNParentPath指示从一个URL维护一组版本库,也可以让Apache在浏览器显示所有存在的版本库,只需要通过SVNListParentPath指示激活:

<Location /svn>
  DAV svn
  SVNParentPath /var/svn
  SVNListParentPath on
  …
</Location>

If a user now points her web browser to the URL http://host.example.com/svn/, she'll see a list of all Subversion repositories sitting in /var/svn. Obviously, this can be a security problem, so this feature is turned off by default.

Apache logging

因为Apache的核心是一个HTTP服务器,它包含了梦幻般灵活的日志特性。各种配置日志的方式可以超出了本书的范围,但是我们必须指出,即使是最原始的文件httpd.conf也可以让Apache产生两个日志:error_logaccess_log。这些日志会出现在不同的地方,但通常是创建在Apache安装的日志区。(在Unix下,这个目录是/usr/local/apache2/logs/。)

error_log描述了所有Apache运行中的内部错误,access_log记录了Apache接收到的所有HTTP请求,这个日志很容易查看,例如包括Subversion客户端的IP地址,哪些用户正确认证和请求成功还是失败。

Unfortunately, because HTTP is a stateless protocol, even the simplest Subversion client operation generates multiple network requests. It's very difficult to look at the access_log and deduce what the client was doing—most operations look like a series of cryptic PROPPATCH, GET, PUT, and REPORT requests. To make things worse, many client operations send nearly identical series of requests, so it's even harder to tell them apart.

mod_dav_svn, however, can come to your aid. By activating an “operational logging” feature, you can ask mod_dav_svn to create a separate log file describing what sort of high-level operations your clients are performing.

为此,你需要利用Apache的CustomLog指示(在Apache自己的文档里有详细解释)指示,请确定在Subversion的Location指示之外配置这个指示。

<Location /svn>
  DAV svn
  …
</Location>

CustomLog logs/svn_logfile "%t %u %{SVN-ACTION}e" env=SVN-ACTION

In this example, we're asking Apache to create a special logfile svn_logfile in the standard Apache logs directory. The %t and %u variables are replaced by the time and username of the request, respectively. The really important part are the two instances of SVN-ACTION. When Apache sees that variable, it substitutes the value of the SVN-ACTION environment variable, which is automatically set by mod_dav_svn whenever it detects a high-level client action.

所以我们不选择翻译下面的传统的access_log文件:

[26/Jan/2007:22:25:29 -0600] "PROPFIND /svn/calc/!svn/vcc/default HTTP/1.1" 207 398
[26/Jan/2007:22:25:29 -0600] "PROPFIND /svn/calc/!svn/bln/59 HTTP/1.1" 207 449
[26/Jan/2007:22:25:29 -0600] "PROPFIND /svn/calc HTTP/1.1" 207 647
[26/Jan/2007:22:25:29 -0600] "REPORT /svn/calc/!svn/vcc/default HTTP/1.1" 200 607
[26/Jan/2007:22:25:31 -0600] "OPTIONS /svn/calc HTTP/1.1" 200 188
[26/Jan/2007:22:25:31 -0600] "MKACTIVITY /svn/calc/!svn/act/e6035ef7-5df0-4ac0-b811-4be7c823f998 HTTP/1.1" 201 227
…

you can instead peruse a much more intelligible svn_logfile like this:

[26/Jan/2007:22:24:20 -0600] - get-dir /tags r1729 props
[26/Jan/2007:22:24:27 -0600] - update /trunk r1729 depth=infinity send-copyfrom-args
[26/Jan/2007:22:25:29 -0600] - status /trunk/foo r1729 depth=infinity
[26/Jan/2007:22:25:31 -0600] sally commit r1730

For an exhaustive list of all actions logged, see “High-level logging”一节.

Write-through proxying

One of the nice advantages of using Apache as a Subversion server is that it can be set up for simple replication. For example, suppose that your team is distributed across four offices around the globe. The Subversion repository can only exist in one of those offices, which means the other three offices will not enjoy accessing it—they're likely to experience significantly slower traffic and response times when updating and committing code. A powerful solution is to set up a system consisting of one master Apache server and several slave Apache servers. If you place a slave server in each office, then users can check out a working copy from whichever slave is closest to them. All read requests go to their local slave. Write requests get automatically routed to the single master server. When the commit completes, the master then automatically “pushes” the new revision to each slave server using the svnsync replication tool.

This configuration creates a huge perceptual speed increase for your users, because Subversion client traffic is typically 80–90% read requests. And if those requests are coming from a local server, it's a huge win.

In this section, we'll walk you through a standard setup of this single-master/multiple slave system. However, keep in mind that your servers must be running at least Apache 2.2.0 (with mod_proxy loaded) and Subversion 1.5 (mod_dav_svn).

Configure the servers

First, configure your master server's httpd.conf file in the usual way. Make the repository available at a certain URI location, and configure authentication and authorization however you'd like. After that's done, configure each of your “slave” servers in the exact same way, but add the special SVNMasterURI directive to the block:

<Location /svn>
  DAV svn
  SVNPath /var/svn/repos
  SVNMasterURI http://master.example.com/svn
  …
</Location>

This new directive tells a slave server to redirect all write requests to the master. (This is done automatically via Apache's mod_proxy module.) Ordinary read requests, however, are still serviced by the slaves. Be sure that your master and slave servers all have matching authentication and authorization configurations; if they fall out of sync, it can lead to big headaches.

Next, we need to deal with the problem of infinite recursion. With the current configuration, imagine what will happen when a Subversion client performs a commit to the master server. After the commit completes, the server uses svnsync to replicate the new revision to each slave. But because svnsync appears to be just another Subversion client performing a commit, the slave will immediately attempt to proxy the incoming write request back to the master! Hilarity ensues.

The solution to this problem is to have the master push revisions to a different <Location> on the slaves. This location is configured to not proxy write requests at all, but to accept normal commits from (and only from) the master's IP address:

<Location /svn-proxy-sync>
  DAV svn
  SVNPath /var/svn/repos
  Order deny,allow
  Deny from all
  # Only let the server's IP address access this Location:
  Allow from 10.20.30.40
  …
</Location>
Set up replication

Now that you've configured your Location blocks on master and slaves, you need to configure the master to replicate to the slaves. This is done the usual way— using svnsync. If you're not familiar with this tool, see “版本库复制”一节 for details.

First, make sure that each slave repository has a pre-revprop-change hook script which allows remote revision property changes. (This is standard procedure for being on the receiving end of svnsync.) Then log into the master server and configure each of the slave repository URIs to receive data from the master repository on local disk:

$ svnsync init http://slave1.example.com/svn-proxy-sync file://var/svn/repos
Copied properties for revision 0.
$ svnsync init http://slave2.example.com/svn-proxy-sync file://var/svn/repos
Copied properties for revision 0.
$ svnsync init http://slave3.example.com/svn-proxy-sync file://var/svn/repos
Copied properties for revision 0.

# Perform the initial replication

$ svnsync sync http://slave1.example.com/svn-proxy-sync
Transmitting file data ....
Committed revision 1.
Copied properties for revision 1.
Transmitting file data .......
Committed revision 2.
Copied properties for revision 2.
…

$ svnsync sync http://slave2.example.com/svn-proxy-sync
Transmitting file data ....
Committed revision 1.
Copied properties for revision 1.
Transmitting file data .......
Committed revision 2.
Copied properties for revision 2.
…

$ svnsync sync http://slave3.example.com/svn-proxy-sync
Transmitting file data ....
Committed revision 1.
Copied properties for revision 1.
Transmitting file data .......
Committed revision 2.
Copied properties for revision 2.
…

After this is done, we configure the master server's post-commit hook script to invoke svnsync on each slave server:

#!/bin/sh
# Post-commit script to replicate newly committed revision to slaves

svnsync sync http://slave1.example.com/svn-proxy-sync > /dev/null 2>&1
svnsync sync http://slave2.example.com/svn-proxy-sync > /dev/null 2>&1
svnsync sync http://slave3.example.com/svn-proxy-sync > /dev/null 2>&1

The extra bits on the end of each line aren't necessary, but they're a sneaky way to allow the sync commands to run in the background, so that the Subversion client isn't left waiting forever for the commit to finish. In addition to this post-commit hook, you'll need a post-revprop-change hook as well, so that when a user, say, modifies a log message, the slave servers get that change also:

#!/bin/sh
# Post-revprop-change script to replicate revprop-changes to slaves

REV=${2}
svnsync copy-revprops http://slave1.example.com/svn-proxy-sync ${REV} > /dev/null 2>&1
svnsync copy-revprops http://slave2.example.com/svn-proxy-sync ${REV} > /dev/null 2>&1
svnsync copy-revprops http://slave3.example.com/svn-proxy-sync ${REV} > /dev/null 2>&1

The only thing we've left out here is what to do about locks. Because locks are strictly enforced by the master server (the only place where commits happen), we don't technically need to do anything. Many teams don't use Subversion's locking features at all, so it may be a nonissue for you. However, if lock changes aren't replicated from master to slaves, it means that clients won't be able to query the status of locks (e.g., svn status -u will show no information about repository locks). If this bothers you, you can write post-lock and post-unlock hook scripts that run svn lock and svn unlock on each slave machine, presumably through a remote shell method such as SSH. That's left as an exercise for the reader!

Caveats

Your master/slave replication system should now be ready to use. A couple words of warning are in order, however. Remember that this replication isn't entirely robust in the face of computer or network crashes. For example, if one of the automated svnsync commands fails to complete for some reason, the slaves will begin to fall behind. For example, your remote users will see that they've committed revision 100, but then when they run svn update, their local server will tell them than revision 100 doesn't yet exist! Of course, the problem will be automatically fixed the next time another commit happens and the subsequent svnsync is successful—the sync will replicate all waiting revisions. But still, you may want to set up some sort of out-of-band monitoring to notice synchronization failures and force svnsync to run when things go wrong.

Other Apache features

Several of the features already provided by Apache in its role as a robust web server can be leveraged for increased functionality or security in Subversion as well. The Subversion client is able to use SSL (the Secure Socket Layer, discussed earlier). If your Subversion client is built to support SSL, then it can access your Apache server using https:// and enjoy a high-quality encrypted network session.

Equally useful are other features of the Apache and Subversion relationship, such as the ability to specify a custom port (instead of the default HTTP port 80) or a virtual domain name by which the Subversion repository should be accessed, or the ability to access the repository through an HTTP proxy.

Finally, because mod_dav_svn is speaking a subset of the WebDAV/DeltaV protocol, it's possible to access the repository via third-party DAV clients. Most modern operating systems (Win32, OS X, and Linux) have the built-in ability to mount a DAV server as a standard network “shared folder.” This is a complicated topic, but also wondrous when implemented. For details, read 附录 C, WebDAV和自动版本.

Note that there are number of other small tweaks one can make to mod_dav_svn that are too obscure to mention in this chapter. For a complete list of all httpd.conf directives that mod_dav_svn responds to, see “指示”一节.



[42] 他们讨厌这样做。

[43] While self-signed server certificates are still vulnerable to a “man-in-the-middle” attack, such an attack is much more difficult for a casual observer to pull off, compared to sniffing unprotected passwords.

[44] 更多有安全意识的人不会希望在运行中servers文件保存客户端证书密码。

[45] 之前叫做“ViewCVS”。