The question has come up a few times recently about what UUIDs are and what they’re good for. In this quick answers post I explain what they are and provide reference links to further material, as well as a Hasura Short video to implementation and use in Postgres & Hasura.
A Hasura Bit
UUID stands for a universally unique identifier. Another term which is common for a UUID, is the GUID, or globally unique identifier, which is the common term Microsoft created for their UUIDs. Just know, that a GUID is a UUID, it’s just company naming convention vs the standard industry naming convention.
UUIDs are standardized by the Open Software Foundation (OSF) as part of the Distributed Computer Environment (DCE). Specifically UUID are designed and used from an USO/IEC spec, which if you’d like to know more about the standards they’re based on check out the Wikipedia page @ https://en.wikipedia.org/wiki/Universally_unique_identifier
The canonical textual representation, the 16 octets of a UUID are represented as 32 hexadecimal (base-016) digits. They’re displayed across five groups separated by hyphens in 8-4-4-4-12 format. Even though there are 32 hexadecimal digits, this makes a total of 36 characters for format display. If storing as a string for example, the string needs to be able to hold 36 characters.
Uses for a UUID
The first key use case for a UUID is to have something generate the UUID to use it as a completely unique value for use with a subset of related data. UUIDs are prefect for primary keys in a database, or simply any type of key to ensure uniqueness across a system.
UUIDs can be generated from many different origin points too without any significant concern for collision (i.e. duplicate UUIDs). For example, the database itself has database functions that enable the generation of a UUID at time of a data row’s insertion, as a default value. This means a client inserting data wouldn’t need to generate that UUID. However this can be flipped over to the client side as a responsibility and the client side development stack (i.e. like Go UUID generation) can generate the UUID. Which then enables the creation of a primary key entity being created with a UUID as the primary key, that can then be used to create what would be foreign key items and so on down the chain of a relationship. Then once all of these are created on the client side they can all be inserted in a batch, and even if ordered appropriately can be made transactional to ensure the integrity of the data.
I wanted to insure the other functions worked for the other versions so I added some code to create and print out each of them. At the same time, I’ve added what each of the versions are as I worked through creating them.
A version 1 UUID concatenates the 48-bit MAC address of the machine creating the UUID with a 60-bit timestamp. If the process clock does not advance fast enough, there is a 14-bit clock sequence that extends the timestamp to insure uniqueness. Based on these creation parameters there is a maximum of 18 sextrillion version 1 UUIDs that can be generated per node. So ya know, don’t get carried away or anything. 😛
It’s also important to note, albeit obviously, that this UUID can be tracked back to the MAC Address that was used to create it.
The code for this UUID creation is shown above in the first example.
Version 2 is reserved for DCE Security UUIDs. It’s a bit light on details in the RFC (4122). Even though the RFC is light on details, the DCE 1.1 Authentication and Security Services specification clarifies a bit more. Overall this UUID is generally similar to a version 1 UUID except the least significant 8 bits of the clock sequence (clock_seq_low) are replaced by local domain numbers. The least significant 32 bits of the timestamp replaced by an integer identifier.
Updated code with a working example of the specific domains used to create a v2 UUID.
Version 3 and 5 are similar UUIDs generated from hashing a namespace identifier and name. Version 5 uses SHA1 and version 3 uses MD5 as the hashing algorithm. The namespace identifier itself is a UUID and is used to represent the namespaces for URLs, fully qualified domain names (FQDNs), object identifiers, and X.500 distinguished names. Other UUIDs could be used as namespace designators, but the aforementioned are usually used.
To note, RFC 4122 refers version 5 (SHA1) over version 3 (MD5), and suggests against either as security credentials.