Terminology
To stay sane and communicate effectively we try hard to use consistent terminology throughout the pieces of the project. Please let us know if things here are confusing or lacking.
- blob
- an immutable sequence of 0 or more bytes, with no extra metadata
- blobref
- a reference to a blob, consisting of a cryptographic hash
function name and that hash function's digest of the blob's bytes,
in hex. Examples of valid blobrefs include:
sha1-f1d2d2f924e986ac86fdf7b36c94bcdf32beec15 md5-d3b07384d113edec49eaa6238ad5ff00 sha256-b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c
Concatenating the two together with a hyphen is the common representation, with both parts in all lower case. - blob server
- the simplest and lowest layer of the Perkeep servers (see: architecture). A blob server, while
potentially shared between users, is logically private to a
single user and holds that user's blobs (whatever they may represent).
The protocol to speak with a blob server is simply:
- get a blob by its blobref.
- put a blob by its blobref.
- enumerate all your blobs, sorted by their blobrefs. Enumeration is only really used by your search server and by a full sync between your blob server mirrors.
(Note: no delete operation)
- schema blob
- a Perkeep-recognized data structure, serialized as a JSON
object (map). A schema blob must have top-level keys
camliVersion
andcamliType
and start with a open brace ({
, byte 0x7B). You may use any valid JSON serialization library to generate schema blobs. Whitespace or formatting doesn't matter, as long as the blob starts with{
and is valid JSON in its entirety.Example:
{ "aKey": "itsValue", "camliType": "foo", "camliVersion": 1, "somethingElse": [1, 2, 3] }
- signed schema blob (aka "claim")
- if you sign a schema blob, it's now a "signed schema blob" or "claim". The terms are used pretty interchangeably but generally it's called a claim when the target of the schema blob is an object's permanode (see below).
- object
- something that's mutable. While a blob is a single immutable thing, an object is a collection of claims which mutate an object over time. See permanode for fuller discussion.
- permanode
- since an object is mutable and Perkeep is primarily content-addressed,
the question arises how you could have a stable reference to something that's
changing. Perkeep solves this with the concept of a permanode.
Like a permalink on the web, a permanode is a stable link to a Perkeep object.
A permanode is simply a signed schema blob with no data inside that would be interesting to mutate. See the permanode spec.
A permanent reference to a mutable object then is simply the blobref of the permanode.
The signer of a permanode is its owner. The search server and indexer will take this into account. While multiple users may collaborate on mutating an object (by all creating new, signed mutation schema blobs), the owner ultimately decides the policies on how the mutations are respected.
Example permanode blob: (as generated with
pk put --permanode
){"camliVersion": 1, "camliSigner": "sha1-c4da9d771661563a27704b91b67989e7ea1e50b8", "camliType": "permanode", "random": "HJ#/s#S+Q$rh:lHJ${)v" ,"camliSig":"iQEcBAABAgAGBQJNQzByAAoJEGjzeDN/6vt85G4IAI9HdygAD8bgz1BnRak6fI+L1dT56MxNsHyAoJaNjYJYKvWR4mrzZonF6l/I7SlvwV4mojofHS21urL8HIGhcMN9dP7Lr9BkCB428kvBtDdazdfN/XVfALVWJOuZEmg165uwTreMOUs377IZom1gjrhnC1bd1VDG7XZ1bP3PPxTxqppM0RuuFWx3/SwixSeWnI+zj9/Qon/wG6M/KDx+cCzuiBwwnpHf8rBmBLNbCs8SVNF3mGwPK0IQq/l4SS6VERVYDPlbBy1hNNdg40MqlJ5jr+Zln3cwF9WzQDznasTs5vK/ylxoXCvVFdOfwBaHkW1NHc3RRpwR0wq2Q8DN3mQ==gR7A"}
- frontend
- the public-facing server that handles sharing and unified access to both your blob server and search server. (see architecture diagram)
- full sync
- synchronizing all your blobs between two or more of your blob servers
(e.g. mirroring between your house, App Engine, and Amazon).
Generally a full sync will be done with the blob server's enumerate support and no knowledge of the schema. It's a dumb copy of all blobs that the other party doesn't already have.
- graph sync
- as opposed to a full sync, a graph sync is synchronizing a sub-graph of your blobs between blob servers. This level of sync will operate with knowledge of the schema.
- search server
- indexer
- TODO: finish documenting