***************************** Security aspects of Bazaar-NG ***************************** * Good security is required. * Usability is required for good security. Being too strict "because it's the secure way" just means that people will disable you altogether, or start doing things that they know is wrong, because the right way of doing this may be secure, but [..] also very inconvenient. -- Linus Torvalds .. contents: Requirements ============ David Wheeler gives some good requirements__: Problem is, the people who develop SCM tools often don't think about what kind of security requirements they need to support. This mini-paper describes briefly the kinds of security requirements an SCM tool should support. __ http://www.dwheeler.com/essays/scm-security.html confidentiality_ Are only those who should be able to read information able to do so? integrity Are only those who should be able to write/change information able to do so? This includes not only limiting access rights for writing, but also protecting against repository corruption. availability Is the system available to those who need it? (I.E., is it resistant to denial-of-service attacks?) identification/authentication Does the system safely authenticate its users? If it uses tokens (like passwords), are they protected when stored and while being sent over a network, or are they exposed as cleartext? audit Are actions recorded? non-repudiation Can the system "prove" that a certain user/key did an action later? self-protection Does the system protect itself, and can its own data (like timestamps) be trusted? trusted paths Can the system make sure that its communication with users is protected? Attacker categories ------------------- * Unprivileged outsiders. (Almost always read-only, but people might want to allow them to write in some cases, e.g. for wikis.) * Non-malicious developers with privilege. * Malicious developers with privilege. * Attackers who have stolen a privileged developer's identity. Access control -------------- Dan Nicolaescu gives these examples of access control: - security related code that is still emabargoed, only select few are allowed to see it, it is not desirable to release this information to the public because a fix is still being worked on. It would be nice to be able to have this kind of code under the same version control system used for normal development for ease of use and easy merging, yet it is crucial to restrict access to a branches, files or directories to certain people. - feature freeze before a release. It would be good if the release manager could disable writing to the release branch, so that the last tests are run, and not have someone commit stuff by mistake. - documentation/translation writers don't need write access to the whole source code, just to the documentation directories. - For proprietary companies restricting access is even more important, for example only some engineers should access the latest development version of some code in order to keep some trade secrets, etc, etc. In Bazaar-NG, the basic unit of access control is the branch. If people are not supposed to read a branch, or know of its existence, put it somewhere where they can't see it. If people are allowed to read from but not write to a branch then set those permissions. The code can later be merged into a public branch if desired with no loss of function. We largely rely on lower-level security measures controlling who can get read or write access to a branch. If you have a branch that should be confidential, then put it on an appropriately-secured machine, with only people in a particular group allowed to read it. Not having separate repositories is probably a feature here -- unlike Subversion, no features depend on having branches be in the same repository. Each repository can have different group ownership. (The directories should usually be setgid.) It also makes it easier to see just what the access control is; there is only one object that can meaningfully have an ACL. The existence of a secret branch can be fairly well hidden from the world. When its changes are merged in, all that is visible is the name, date, and branch name of the commit, not anything about the location of the source branch. The documentation case I would handle by having a separate documentation branch, which could perhaps be checked out into a subdirectory when it is required. I think this is fairly common for larger projects even in CVS. Confidentiality --------------- As dwheeler points out, this can be important even for open source projects, such as when preparing a security patch. Mechanisms that send email should have an option to encrypt the mail. I can't think of anywhere encrypted archives would be useful. If you want to store it on an encrypted filesystem you can. If you want to store encrypted files you can do that too, though that will leak some information in the metadata and branch structure. Security in distributed systems ------------------------------- If I have a branch on my laptop, the software ultimately cannot prevent me doing anything to that branch -- physical access trumps software controls. We can, at most, try to prevent non-malicious mistakes. The purpose of the software here is to protect other people, whose machines I do not control. In particular, it should be hard for me to lie to them; the software should detect any false statements. In particular, these should be prevented: * Claiming to be someone else. * Attempting to rewrite history. Revocation ---------- Suppose Alice's code-signing key is stolen by an attacker Charles. Charles can sign changesets purporting to come from Alice. Alice needs to revoke that key; hopefully she has saved a copy of the key elsewhere and can use that to revoke it. Failing that she can mail everyone and ask them to delete it. This can propagate through the usual GPG mechanism, which is very nice. Alice also needs to make a new key and get it trusted. This revocation does not distinguish between changesets genuinely signed by Alice in the past, and changesets fraudulently signed by Charles. What can Alice do now? First of all, she needs to work out what changesets signed by her key can still be trusted. One good way to do this is to check against another branch signed by Bob. If Bob's key is safe, we know his copy of Alice's changesets are OK and the full tree at various points is OK. Then: * Go through her old changesets, check that they're OK -- perhaps restore from a trusted backup. Re-sign those changesets with a new key bound to the same email address. Publish the new signatures instead. (This seems to indicate it is a good idea to bind signatures to changeset by author name/address rather than by key ID.) * Roll-up all previous development into a new tree, then sign that. This means there is no safe access to the previous individual changes, but in some cases it may be OK. If a key is revoked at a particular time then perhaps we could still trust commits made before that time. I don't know if GPG revocations can support that. Old keys -------- Keys also expire, rather than being revoked. What does this mean? Ideally we would check that the date when a changeset claims to have been signed is within the validity period of the key. This requires more GPG integration than may at the moment be possible, but in theory we can do it. Also need to make sure that commits are in order by date, or at least reasonably close to being in order (to allow for some clock skew). One interesting case is when version is committed for which both the public and private keys have been lost. This will always be untrusted, but that should not prevent people continuing to use the archive if they can accept that. This suggests that perhaps we should allow for multiple signatures on a single revision. Encumbrance attacks ------------------- A special case where we need to be able to destroy history to avoid a legal problem. Allowed as discussed elsewhere: either destroy commits from the tail backwards, or equivalently branch from a previous revision and replace with that. People who saw the original branch can still prove it happened; people who look in the future will not see any record. Either way, probably requires physical branch access. Multiple signature keys ----------------------- Should we allow for several signatures on a single changeset? What would that mean? How do we know what signatures are meaningful or worthwhile? Forensics --------- dwheeler: [O]nce you find out who did a malicious act, the SCM should make it easy to identify all of their actions. In short, if you make it easy to catch someone, you increase the attackers' risk... and that means the attacker is less likely to do it. dwheeler asks that the committer's IP address be recorded. Putting this in the changeset seems to cause too much of a privacy/confidentiality problem. However, an active server might reasonably record the IPs of all clients. Non-repudiation --------------- If a changeset has propagated to Bob, signed by Alice's key, then Bob can prove that someone possessing Alice's key signed it. Alice's only way out is to claim her key was stolen. Trusted review -------------- Can be handled by importing onto another branch. Can have various levels for "quickly checked", "deeply trusted", etc. (Is it really necessary to import onto a new branch rather than add anotations to existing branches? Copying the whole text seems a bit redundant. This might be a nice place for arch-style taggings, where we just add a reference to another branch.) Hooks ----- Automatically running hooks downloaded from someone else is dangerous. In particular, the user may not have the chance to check the hooks are reasonable before they are run. Conversely, users can subvert client-side hooks. If we want to run a check before accepting code onto a shared branch, that must run on the server. Enforcing server-side checks gives a good way to run build, formatting, suspiciousness checks, etc. This implies that write access to a repository is through a mediating daemon rather than by directly writing. Signing ------- We use signing to prove that a particular person (or 'principal', possibly a robot) committed a particular changeset. It is the job of external signing software to help work out whether this is true or not. This has several parts: * Mathematical verification that a signature on a particular changeset header document is correct * Determining that the signature corresponds to a particular public key * Determining that the public key corresponds to the person claimed to have authored the changeset (identified by email address.) The second two are really PKI functions, and somewhat harder than the first. The canonical implementation is to use GPG/OpenPGP, but anything will do. There are simpler RSA/DSA implementations which assume each user manually builds a list of trusted keys. This leaves open the question of which people should be trusted to provide software on a particular branch or at all. This is not a very easy question for software to answer. We assume that people will know by other means. For public code, it may be that all changesets are re-signed by say samba-team@samba.org. I think it is fair to distinguish people by an email address, or at least by $ID@$DOMAIN. There is no need to have this actually receive email, so spam need not be a problem. The signing design is inspired by the very usable security afforded by OpenSSH: it automatically protects where it can, and allows higher security to users who want to do some work (by offline verification of signatures). Using a signing mechanism other than GPG when key developers already have GPG and there is a big infrastructure to support it seems undesirable. It is true that GPG is quite complex. The purpose of signing is to protect against unauthorized modification of archives. Bazaar-NG can apply a GPG signature to both patches and manifests. This vallows a later proof that the revision and the changeset were produced by the author they claim to have been written by. We cannot cryptographically prove that a particular patch was merged into a branch, because the person doing the merge might have subverted the patch in the process of merging it. All we can prove cryptographically is that the merge committer asserts they took the patch. GPGME and PyMe seem to give a reasonable interface for doing this: there is a function to check a signature, and the return indicates the signing name, with possible errors including a missing key, etc. Sign branches, not revisions '''''''''''''''''''''''''''' Aaron Bentley suggested the interesting idea of signing the mapping of revisions onto branches, rather than revisions themselves. For example a branch could contain just a signed pointer to the most recent revision. (It probably is useful to be able to check signatures on previous revisions, for example when recovering from an intrusion.) Protocol attacks ---------------- Both client and server should be resistant to malicious changesets, network requests, etc. There's no easy solution. * Defense in depth. Check reasonablenes at various points. * Disallow changesets that try to change files outside of the branch. Availability ------------ bzr can be configured so as to have no single point of failure to a denial-of-service attack (or at least nearly none): * Can have any number of mirrors of a branch. * If a central server is taken out, developers can continue working with state they already have (unbind their branches), and can collaborate by email or other means until the server is repaired or replaced. * The origin branch can be on a machine whose location is secret and which is not directly publicly accessible. * Branches can be moved between machines or IP addresses without disrupting anything else. * Branches can be moved around out-of-band, as tarballs over bittorrent, etc. I think the only possible denial of service attacks are those that aim to shut down the entire network, or block communication with individual developers, for example by flooding their email address. But if those people can get connected through some other means, they can continue.