================ Network Protocol ================ :Date: 2007-09-03 .. contents:: Overview ======== The smart protocol provides a way to send a requests and corresponding responses to communicate with a remote bzr process. Layering ======== Medium ------ At the bottom level there is either a socket, pipes, or an HTTP request/response. We call this layer the *medium*. It is responsible for carrying bytes between a client and server. For sockets, we have the idea that you have multiple requests and get a read error because the other side did shutdown. For pipes we have read pipe which will have a zero read which marks end-of-file. For HTTP server environment there is no end-of-stream because each request coming into the server is independent. So we need a wrapper around pipes and sockets to seperate out requests from substrate and this will give us a single model which is consistent for HTTP, sockets and pipes. Protocol -------- On top of the medium is the *protocol*. This is the layer that deserialises bytes into the structured data that requests and responses consist of. Request/Response processing --------------------------- On top of the protocol is the logic for processing requests (on the server) or responses (on the client). Server-side ----------- Sketch:: MEDIUM (factory for protocol, reads bytes & pushes to protocol, uses protocol to detect end-of-request, sends written bytes to client) e.g. socket, pipe, HTTP request handler. ^ | bytes. v PROTOCOL(serialization, deserialization) accepts bytes for one request, decodes according to internal state, pushes structured data to handler. accepts structured data from handler and encodes and writes to the medium. factory for handler. ^ | structured data v HANDLER (domain logic) accepts structured data, operates state machine until the request can be satisfied, sends structured data to the protocol. Request handlers are registered in the `bzrlib.smart.request` module. Client-side ----------- Sketch:: CLIENT domain logic, accepts domain requests, generated structured data, reads structured data from responses and turns into domain data. Sends structured data to the protocol. Operates state machines until the request can be delivered (e.g. reading from a bundle generated in bzrlib to deliver a complete request). This is RemoteBzrDir, RemoteRepository, etc. ^ | structured data v PROTOCOL (serialization, deserialization) accepts structured data for one request, encodes and writes to the medium. Reads bytes from the medium, decodes and allows the client to read structured data. ^ | bytes. v MEDIUM accepts bytes from the protocol & delivers to the remote server. Allows the protocol to read bytes e.g. socket, pipe, HTTP request. The domain logic is in `bzrlib.remote`: `RemoteBzrDir`, `RemoteBranch`, and so on. There is also an plain file-level transport that calls remote methods to manipulate files on the server in `bzrlib.transport.remote`. Protocol description ==================== Version one ----------- Version one of the protocol was introduced in Bazaar 0.11. The protocol (for both requests and responses) is described by:: REQUEST := MESSAGE_V1 RESPONSE := MESSAGE_V1 MESSAGE_V1 := ARGS [BODY] ARGS := ARG [MORE_ARGS] NEWLINE MORE_ARGS := SEP ARG [MORE_ARGS] SEP := 0x01 BODY := LENGTH NEWLINE BODY_BYTES TRAILER LENGTH := decimal integer TRAILER := "done" NEWLINE That is, a tuple of arguments separated by Ctrl-A and terminated with a newline, followed by length prefixed body with a constant trailer. Note that although arguments are not 8-bit safe (they cannot include 0x01 or 0x0a bytes without breaking the protocol encoding), the body is. Version two ----------- Version two was introduced in Bazaar 0.16. The request protocol is:: REQUEST_V2 := "bzr request 2" NEWLINE MESSAGE_V2 The response protocol is:: RESPONSE_V2 := "bzr response 2" NEWLINE RESPONSE_STATUS NEWLINE MESSAGE_V2 RESPONSE_STATUS := "success" | "failed" Future versions should follow this structure, like version two does:: FUTURE_MESSAGE := VERSION_STRING NEWLINE REST_OF_MESSAGE This is so that clients and servers can read bytes up to the first newline byte to determine what version a message is. For compatibility will all versions (past and future) of bzr clients, servers that receive a request in an unknown protocol version should respond with a single-line error terminated with 0x0a (NEWLINE), rather than structured response prefixed with a version string. Version two of the message protocol is:: MESSAGE_V2 := ARGS [BODY_V2] BODY_V2 := BODY | STREAMED_BODY That is, a version one length-prefixed body, or a version two streamed body. Version two with streamed bodies -------------------------------- An extension to version two allows streamed bodies. A streamed body looks a lot like HTTP's chunked encoding:: STREAMED_BODY := "chunked" NEWLINE CHUNKS TERMINATOR CHUNKS := CHUNK [CHUNKS] CHUNK := HEX_LENGTH CHUNK_CONTENT HEX_LENGTH := HEX_DIGITS NEWLINE CHUNK_CONTENT := bytes TERMINATOR := SUCCESS_TERMINATOR | ERROR_TERMINATOR SUCCESS_TERMINATOR := 'END' NEWLINE ERROR_TERMINATOR := 'ERR' NEWLINE CHUNKS SUCCESS_TERMINATOR That is, the body consists of a series of chunks. Each chunk starts with a length prefix in hexadecimal digits, followed by an ASCII newline byte. The end of the body is signaled by '``END\\n``', or by '``ERR\\n``' followed by error args, one per chunk. Note that these args are 8-bit safe, unlike request args. A streamed body starts with the string "chunked" so that legacy clients and servers will not mistake the first chunk as the start of a version one body. The type of body (length-prefixed or chunked) in a response is always the same for a given request method. Only new request methods introduced in Bazaar 0.91 and later use streamed bodies. Version three ------------- .. note:: For some discussion of the requirements that led to this new protocol version, see `bug #83935`_. .. _bug #83935: https://bugs.launchpad.net/bzr/+bug/83935 Version three has bencoding of most protocol structures, to make parsing simpler. For extra parsing convenience, these structures are length prefixed:: LENGTH_PREFIX := 32-bit unsigned integer in network byte order Unlike earlier versions, clients and servers are no longer required to know which request verbs and responses will have bodies attached. Because of length-prefixing and other changes, it is always possible to know when a complete request or response has been read, even if the server implements no verbs. The underlying message format is:: MESSAGE := "bzr message 3 (bzr 1.6)" NEWLINE HEADERS MESSAGE_PARTS HEADERS := LENGTH_PREFIX bencoded_dict MESSAGE_PARTS := MESSAGE_PART [MORE_MESSAGE_PARTS] MORE_MESSAGE_PARTS := END_MESSAGE_PARTS | MESSAGE_PARTS END_MESSAGE_PARTS := "e" MESSAGE_PART := ONE_BYTE | STRUCTURE | BYTES ONE_BYTE := "o" byte STRUCTURE := "s" LENGTH_PREFIX bencoded_structure BYTES := "b" LENGTH_PREFIX bytes This format allows an arbitrary sequence of message parts to be encoded in a single message. Headers ~~~~~~~ Each request and response will have “headers”, a dictionary of key-value pairs. The keys must be strings, not any other type of value. Currently, the only defined header is “Software version”. Both the client and the server should include a “Software version” header, with a value of a free-form string such as “bzrlib 1.5”, to aid debugging and logging. Clients and servers **should not** vary behaviour based on this string. Conventional requests and responses ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ By convention, most requests and responses have a simple “arguments plus optional body” structure, as in earlier protocol versions. This section describes how such messages are encoded. All requests and responses defined by earlier protocol versions must be encoded in this way. Conventional requests will send a sequence of: * Arguments (a STRUCTURE of a tuple) * (Optional) body * Single body (BYTES), or * Streamed body (multiple BYTES parts), followed by a status (ONE_BYTE) * if status is "E", followed by an Error (STRUCTURE) Conventional responses will send a sequence of: * Status (ONE_BYTE) * Arguments (a STRUCTURE of a tuple) * (Optional) body * Single body (BYTES), or * Streamed body (multiple BYTES parts), followed by a status (ONE_BYTE) * if status is "E", followed by an Error (STRUCTURE) In all cases, the ONE_BYTE status is either "S" for Success or "E" for Error. Note that the streamed body from version two is now just multiple BYTES parts. For new methods, these sequences are just a convention and may be varied if appropriate for a particular request or response. However, each request should at least start with a STRUCTURE encoding the arguments tuple. The first element of that tuple must be a string that names the request method. (Note that arguments in this protocol version are bencoded. As a result, unlike previous protocol versions, arguments in this version are 8-bit clean.) For errors (where the Status byte of a response or a streamed body is "E"), the situation is analagous to requests. The first item in the encoded sequence must be a string of the error name. The other arguments supply details about the error, and their number and types will depend on the type of error (as identified by the error name). Paths ===== Paths are passed across the network. The client needs to see a namespace that includes any repository that might need to be referenced, and the client needs to know about a root directory beyond which it cannot ascend. Servers run over ssh will typically want to be able to access any path the user can access. Public servers on the other hand (which might be over http, ssh or tcp) will typically want to restrict access to only a particular directory and its children, so will want to do a software virtual root at that level. In other words they'll want to rewrite incoming paths to be under that level (and prevent escaping using ../ tricks). The default implementation in bzrlib does this using the `bzrlib.transport.chroot` module. URLs that include ~ should probably be passed across to the server verbatim and the server can expand them. This will proably not be meaningful when limited to a directory? See `bug 109143`_. .. _bug 109143: https://bugs.launchpad.net/bzr/+bug/109143 Requests ======== The first argument of a request specifies the request method. The available request methods are registered in `bzrlib.smart.request`. **XXX**: ideally the request methods should be documented here. Contributions welcome! Recognised errors ================= The first argument of an error response specifies the error type. One possible error name is ``UnknownMethod``, which means the server does not recognise the verb used by the client's request. This error was introduced in version three. **XXX**: ideally the error types should be documented here. Contributions welcome! .. vim: ft=rst tw=74 ai