The database has always revolved around rock-solid reliability. Data goes in and then comes out in exactly the same way. Occasionally, the bits will be cleaned up and normalized so all of the dates are in the same format and the text is in the same character set, but other than that, nothing should be different.
That consistency is what makes the database essential for any enterprise — allowing it to conduct things like ecommerce transactions. It’s also why the database remains distinct from the data warehouse, another technology that is expanding its mission for slower-twitch things like analysis. The database acts as the undeniable record of the enterprise, the single source of truth.
Now databases are changing. Their focus is shifting and they’re accepting more responsibilities and offering smarter answers. In short, they’re expanding and taking over more and more of the stack.
Many of us might not notice because we’ve been running the same database for years without a change. Why mess with something that works? But as new options and features come along, it makes sense to rethink the architectures of data flows and take advantage of all the new options. Yes, the data will still be returned exactly as expected, but it will be kept safer and presented in a way that’s easier to use.ADVERTISEMENT
Many drivers of the change are startups built around a revolutionary new product, like multi-cloud scaling or blockchain assurance. For each new approach to storing information, there are usually several well-funded startups competing to dominate the space and often several others still in stealth mode.
The major companies are often not far behind. While it can take more time to add features to existing products, the big companies are finding ways to expand, sometimes by revising old offerings or by creating new ones in their own skunkworks. Amazon, for instance, is the master at rolling out new ways to store data. Its cloud has at least 11 different products called databases, and that doesn’t include the flat file options.
The other major cloud providers aren’t far behind. Microsoft has migrated its steadfast SQL Server to Azure and found ways to offer a half-dozen open source competitors, like MySQL. Google delivers both managed versions of relational databases and large distributed and replicated versions of NoSQL key/value pairs.
The old standards are also adding new features that often deliver much of the same promise as the startups while continuing support of older versions. Oracle, for instance, has been offering cloud versions of its database while adding new query formats (JSON) and better performance to handle the endless flood of incoming data.
IBM is also moving dB2 to the cloud while adding new features like integration with artificial intelligence algorithms that analyze the data. It’s also supporting the major open source relational databases while building out a hybrid version that merges Oracle compatibility with the PostgreSQL engine.
Among the myriad changes to old database standards and new emerging players, here (in no particular order) are nine key ways databases are being reborn.
1. Better query language
SQL may continue to do the heavy lifting around the world. But newer options for querying — like GraphQL — are making it easier for front-end developers to find the data they need to present to the user and receive it in a format that can be dropped right into the user interface.
2. Streaming databases follow vast flows
The model for a standard database is a big ledger, much like the ones clerks would maintain in fat bound books. Streaming databases like ksqlDB are built to watch an endless stream of data events and answer questions about them. Instead of imagining that the data is a permanent table, the streaming database embraces the endlessly changing possibilities as data flows through them.
3. Time-series database
Most database columns have special formats for tracking date stamps. Time-series databases like InfluxDB or Prometheus do more than just store the time. They track and index the data for fast queries, like how many times a user logged in between January 15 and March 12. These are often special cases of streaming databases where the data in the streams is being tracked and indexed for changes over time.
4. Homomorphic encryption
Cryptographers were once happy to lock up data in a safe. Now some are developing a technique called homomorphic encryption to make decisions and answer queries on encrypted data without actually decrypting it, a feature that vastly simplifies cloud security and data sharing. This allows computers and data analysts to work with data without knowing what’s in it. The methods are far from comprehensive, but companies like IBM are already delivering toolkits that can answer some useful database queries.
5. In-memory database
The original goal of a database was to organize data so it could be available in the future, even when electricity is removed. The trouble is that sometimes even storing the data to persistent disks takes too much time, and it may not be worth the effort. Some applications can survive the occasional loss of data (would the world end if some social media snark disappeared?), and fast performance is more important than disaster recovery. So in-memory databases like Amazon’s ElasticCache are designed for applications that are willing to trade permanence for lightning-fast response times.
6. Microservice engines
Developers have traditionally built their code as a separate layer that lives outside the database itself, and this code treats the database as a black box. But some are noticing that the databases are so feature-rich they can act as microservice engines on their own. PostgreSQL, for instance, now allows embedded procedures to commit full transactions and initiate new ones before spitting out answers in JSON. Developers are recognizing that the embedded code that has been part of databases like Oracle for years may be just enough to build many of the microservices imagined by today’s architects.ADVERTISEMENT
Jupyter notebooks started out as a way for data scientists to bundle their answers with the Python code that produced it. Then data scientists started integrating the data access with the notebooks, which meant going where the information was stored: the database. Today, SQL is easy to integrate, and users are becoming comfortable using the notebooks to access the database and generate smart reports that integrate with data science (Julia or R) and machine learning tools. The newer Jupyter Lab interface is turning the classic notebook into a full-service IDE, complete with extensions that pull data directly from SQL databases.
7. Graph databases
The network of connections between people or things is one of the dominant data types on the internet, so it’s no surprise that databases are evolving to make it easier to store and analyze these relationships.
Neo4j now offers a visualization tool (Bloom) and a collection of data science functions for developing complex reports about the network. GraphDB is focusing on developing “semantic graphs” that use natural language to capture linguistic structures for big analytic projects. TerminusDB is aimed at creating knowledge graphs with a versioning system much like Git. All of them bring efficiency to storing a complex set of relationships that don’t fit neatly into standard tables.
8. Merging data storage with transport
Databases were once hidden repositories to keep data safe in the back office. Delivering this information to the user was the job of other code. Now, databases like Firebase treat the user’s phone or laptop as just another location for replicating data.
Databases like FaunaDB are baking replication into the stack, thus saving the DBA from moving the bits. Now, developers don’t need to think about getting information to the user. They can just read and write from the local data store and assume the database will handle the grubby details of marshaling the bytes across the network while keeping them consistent.
9. Data everywhere
A few years ago, all the major browsers began supporting the Local Storage and Indexed Storage APIs, making it easier for web applications to store significant amounts of data on the client’s machine. The early implementations limited the data to 5MB, but some have bumped the limits to 10MB. The response time is much faster, and it will also work even when the internet connection is down. The database is not just running on one box in your datacenter, but in every client machine running your code.
VentureBeat’s mission is to be a digital townsquare for technical decision makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you,
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform
- networking features, and more