This is not a rhetorical question: How do you move 2 petabytes of data to a cloud that can analyze it? It’s not practical to move it over the internet — even with an obscenely fast 10 Gbps connection, it would take 19 days. Over more common 45Mbps T3 connections, it’s effectively an impossibility, because it would take 12 years, according to Microsoft’s calculations.
This is the problem the company faced when it was developing some of its intelligent edge offerings. Its solution is Microsoft’s Azure Data Box family, a series of appliances that offer high storage capacities. Most of the Data Boxes require customers to physically ship them to Microsoft so the data can be moved to Azure, which in a sense flips the paradigm of edge computing.
Part of the idea of edge computing is, as Microsoft head of Azure IoT Sam George told VentureBeat in an interview at Build 2019, “It’s often easier to move compute to where data is than to move data where compute is.” With most of the Data Boxes, though, it’s just not feasible to do so — and thus this model is moving data to compute, not the other way around.
The amount of data a company generates varies significantly. In a conversation with VentureBeat at Microsoft Build, Kundana Palagiri, principal product manager for Microsoft Azure, said that over the course of 15 days, a company working with autonomous technologies could potentially generate up to 2PB of data. But the sweet spot for most operations is probably close to 10TB, 35TB, or 100TB over that same time period.
Some companies need to have a Data Box onsite at all times and thus would need to have a rotation of Data Boxes constantly trucking between Microsoft and the company premises. In other cases, a company will ship off their Data Box to Microsoft and be without one for a time. “Sometimes [a company] will be without it,” Palagiri said, in the case of things like “archiving, testing — various scenarios.”
In other cases, the computing and transmitting of data is actually done on the edge. There are Data Boxes for both.
From the Build 2019 stage, Microsoft showcased how grocery chain Kroger is piloting the use of Microsoft Azure services and Data Boxes in two of its stores. Announced initially in January, the program partially is using Azure-powered video analytics and AI to track inventory based on what’s physically on store shelves. Kroger points video camera arrays at its shelves, and using machine learning, it can track in real time how many units of a given product are left. It’s a tremendous amount of data, with intelligence laid over top.
The Data Box family
There are four members of the Azure Data Box family, each with unique traits designed to suit different business needs.
The most endearing, perhaps, is the Data Box Heavy. Its name is precisely descriptive: It’s a box the size of a small refrigerator, and although it’s unclear how much it weighs, it is on wheels, which tells you all you need to know about that. Inside is a bunch of hard disks totaling 1PB of capacity. For context, that’s 1,000 terabytes (TB), or 1 million gigabytes (GB). It has 4 x 40 GbE QSFP+ networking capabilities, which is extraordinarily speedy.
Above: Azure Data Box HeavyImage Credit: Seth Colaner, VentureBeat
The Azure Data Box is physically smaller than its big brother Heavy, and although it offers a tenth of the storage capacity, that works out to a still quite robust 100TB. The Data Box is also designed such that the customer fills up the storage capacity and physically sends the box back to Microsoft for uploading and return. Though not as extreme as the Data Box Heavy, the networking prowess of the Data Box is significant, at 2 x 10 GbE SFP+.
Above: Azure Data BoxImage Credit: Seth Colaner, VentureBeat