4 minute read time.
On Saturday, April 29th, Tom Manville gave one of the most interesting presentations that the IET NorCal committee has had the pleasure of hosting to date. About 20 people attentively listened to his one and a half hour discussion on the architecture and technologies behind Dropbox.

 
His talk started by highlighting the size of the data problem that the engineers at Dropbox work with. For example, Dropbox has about 500 million users by the latest count and those users had access to about 500 petabytes of storage in 2016; a significant increase from 40 petabytes in 2012. 500 petabytes is a 5 followed by 17 zero’s worth of bytes. To put that into perspective, that is enough storage to hold 14,000 times the amount of text held in all of the books in the Library of Congress. (It’s a lot!).

 
To serve all this data for its users, Tom described the basics building blocks of Dropbox; physical storage, data security and databases that power the services that they provide.

 
In terms of physical storage, Tom touched on the shingled magnetic recording technology that Dropbox uses in order to increase data storage densities. They also utilize loss less compression to minimize the amount of space that a user’s files require. At Dropbox, the storage system is known as Magic Pocket. This is an immutable block storage system and it stores encrypted chunks of files up to 4 megabytes in size.

 
46bab4b38f868fd5f0a9617966e55979-huge-iet1.png

Tom Manville discussing Dropbox’s physical storage.

When it comes to security of data, it is no surprise to learn that there are many layers of it in the Dropbox architecture and there is a key technology underpinning many of them, Sha256. This algorithm is a cryptographic hash that is like a signature for a text or a data file. It is how Dropbox knows what data is truly yours within your namespace. You can think of a namespace as a collection of files and folders in the internal representation of the Dropbox file system. Each user’s private Dropbox folder maps to a root namespace.

 
With the hardware in place and data secure, the next key component – and of the talk - is the data store.  At Dropbox, this is called Edgestore. Every one of those many users, each of their photos, thumbnails etc. are stored in Edgestore. This MySQL based database has several trillion entries, servicing millions of queries per second with 5–9s of availability. It is quite the undertaking! Managing and manipulating all of this data takes complex data analytics using the nodes and edges of graph theory.

 
Next, Tom covered the challenges that Dropbox has faced as technology has changed. Two of the examples covered were the data migration to Dropbox’s own, internal cloud and the issues around data security particularly in Europe regarding where data is stored and who has access to it (Data Protection Directive and the soon to be enacted General Data Protection Regulation).

 
Finally, the talk touched on features that people may not be familiar with or are a recent update to the Dropbox toolkit. Firstly there was versioning, where free users can see all of the changes to their files up to 30 days ago. There is computer vision, a document scanner and most recently, Paper for facilitating collective working.

 

07f46a6cd32285744af316e67013640c-huge-iet2.png

Some members of the rapt and engaged audience!

This short summary does not do justice to the topics Tom covered and on behalf of the NorCal local network of the IET, I want to thank him for taking the time to prepare and present such a fascinating talk.

 
If you are interested in learning more about Dropbox and the technologies it employs, you can find more information on their blogs.

 
For the next talk, we are hoping to try something new; a WebEx for those who can’t make the journey to the Bay Area. More details will follow as we confirm the date and details with the speaker.

 
Thanks to all those who attended and for reading this far!

 
Allan.