Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, and welcome to this talk. The MariaDB evolution,
is it just a fork of MySQL? Well, spoiler alert, it is not.
It's a bit more than that. My name is Alejandro Duarte and I work in
developer relations for MariaDB plc. I'm a
software engineer. I have been writing code for almost 30 years,
I believe, and I published these three books about web
development with Java and a framework called Vidin,
which is very, very interesting. But I'm working on a new book right now
called MariaDB for developers. So if you're interested,
take the screenshot of these and you'll get a notification
when the book becomes available. But today we're going to talk about the MariaDB
ecosystem, the clion ecosystem. So we're going to see
the historical context in which both MySQL and MariaDB were born.
We're going to also talk about storage engines,
right? So a bit more technical stuff. We're going to talk about MariaDB
Enterprise because that's what you want to use when you move to production,
especially if you want to automate things such as failovers.
We are going to briefly touch also on the present and future of MariaDB
and migration. And who uses MariadB. Okay, so let's
start with the history of relational databases, and it's going to
be very brief. So it all
started in the 60s with General Electric and the integrated
data store ids. That's the very first database we know of.
It is not a relational database, it's another kind of database. But that was
the very first one. And that led to the development of something
called the Codasil database model, which basically were extensions
to the cobalt programming languages so that developers
can query the databases using nested loops
and pointers. So they need to think about data
structures, algorithms, all this kind of stuff. That means they
have to rewrite this codasil code on every schema
change. So Edgar code realizes this
and proposes the relational model, which is
oversimplifying, is like tables. So you have columns
and then you have rows. That's what the modern database
use. And he was a mathematician, so he formalized
these through something called relational calculus and relational algebra,
which is what actually database use.
Although modern databases, they are not just purely relational
algebra based, they have some more concepts there.
But this is the basis. All right? And these
kind of theories allows you to demonstrate that it is possible to
build query optimizers. And yes, they build also
these query optimizers that all relational databases have.
And a database, it contains tons
of algorithms and data structures, right? Like trees and hash tables and so
forth. And it knows your data, so it
can make very, very good decisions on the
plan to access that data on disk.
Much better than what a programmer would be able to do.
Now all these is theory until the first implementations start
to appear. So more or less at the mid
of the 70s in IBM, for example,
CSMR, which more than a product is a project,
it's a research project investigating,
researching databases. So they started to implement these,
to experiment with these ingress in the University of California,
the precursor of PostgreSQL,
Oracle very famous database Mimer, another academic project
in Sweden, the University of Upsala, I believe.
And the predominant query language was
called QL. So let's try to remember this
word there, QL, that is querying using the
english language, all right? That was the main language there.
Now later at the end, by end of 70s,
maybe at the beginning of the 80s,
through the years, don't pay too much attention of the exact location of
this vertical line in the timeline.
The scientists at IBM and the researchers and
programmers at IBM started to think
about what would be the best way to query
databases. Relational databases. So what's the best way
to specify queries using a
relational environment? That's what square stands
for. And more than a language, it was kind of a game
they had. Like I said before, they are trying to figure out, hey, I found
out this way, maybe I come up with this idea how we can
combine these and yeah, maybe it was also a language
but they were using this scientific notation, right? So it's subindices
and super indices. This is hard to introduce in a keyboard,
computer keyboard. So they redefine these and create something
called SQL, which is the SQL of Quill, right. So they are playing
with the words, this is like an improved version of Quill.
Maybe they named it like that. Now this
can be implemented and used in computers. However,
SQL was a trademark in some company in
the UK, I think it was some aircraft related company.
So they cannot use his name, but they removed the vowels in this
word and well, SQL is
born. So even though it is spelled as SQL,
you still pronounce it SQL. We still pronounce it SQL.
Some of us, some pronounce it SQL. It doesn't really
matter. It is here today. It's the
best. It's not perfect, but so far nobody has come
up with a better language than SQL.
Now IBM DB two,
Oracle and database, the main database in the market started to
adopt this language, SQL, and it became
a standard. I believe in 1986 or seven or
around those two years, NC and ISO.
Now to give you some bit of a perspective on what's going on
in the industry, by the late 80s,
open source is pretty well established with,
for example, the Gunu project. They created
something called the General Public License,
which means that if you release a software with the GPL,
you have to provide also the source code
and people can modify it, but if they modify it,
they have also to publish that source code. So it's
like the source code is going to be available always.
That's the GPL. Now Linux is being developed here in Finland
by Linux Torbox. It was published at some
point under the GPL.
Postgres in the University of California,
Berkeley is the academic project trying
to build this relational database is under development.
Unfortunately, it doesn't use
the GPL. It is still open source and it's a very permissive
license. However, there are no free SQL
databases because postgres wasn't designed to support SQL.
This changes with the very first free SQL database that
was called MySQL or mini SQL.
It offered better performance than postgresgres and
SQL. It is still in use in embedded devices.
In fact, the latest version was published well
that date. So it's not very active in development where it's still in use.
However, there's no open source SQL database
because this one, yeah, you can use SQL,
but you cannot see the source code, so you have that option.
This changes with MySQL and its
creator, Mikael Videnio. So he was working
with his company and his colleagues and he wanted just know, provide good
services and good products to his customers.
And he created something called Unirec to manage databases.
And on top of that, he started to develop its own SQL
layer, so to speak. And later they called it
MySQL and published the source code,
opened the source code. It's a very fast database. He wanted
a very fast database performant, easy to use.
Two things that you can still see today on MySQL,
in MySQL and MariaDB. Yeah, then in
the 90s it had its limitations, but it turned out to
be a great fit for a website. So we can say that it helped shape
the Internet as we know it today. And it was released at some
point under the GPL license. That means it cannot be closed
again. So MySQL gains popularity very quickly in
the next decade. A company called Inobase
produces or develops this module for MySQL. Let's call it
like that module, Enodbeam, that solves
the limitations and a company is created
to write services and that kind of stuff. But then Oracle
buys Inobase, which is like I said, employs the
developers who are writing the code for
EnodB. Oracle bought that, okay,
that was in 2005. Then later Sun
Microsystems buys MySQL,
the company MySQL Finland AB in 2008.
And then I guess some of you remember what happened
next or kind of guess where this is going.
Oracle buys Sun Microsystems.
So that was announced in 2009 and effective in
2010, I believe in January or something like that.
And now Oracle at this point owns not only the Oracle database,
which is the most successful commercial database, but also
the most popular open source database, MySQL.
So the community and especially Mikhail Vidanius
realizes that this is our risk for the project, for MySQL,
and at least there could be some conflict of interests,
right? That's just natural. And this
could maybe even hide the project or maybe stop
innovation or reduce it. I'm not going to be the judge of that, but I'm
going to show you this conversation also on the official MySQL
community, slack whatever. I didn't see
much innovation in the 8.1 innovation release notes there
are plenty of deprecations. Is that the new definition of innovation
at Oracle? Well, of course they are joking and I'll let you decide
where there is truth in these jokes.
Props to Oracle because the project is still alive and they are still
innovating. I'm not sure how much though,
but the project is alive.
However, Michael Vineyards forks this code. That is,
he takes the code and copies and publish it somewhere
else in another repository and creates a new project.
And many of the developers of MySQL,
the original developers of MySQL, then they moved to this new
project, to Mariadb. Right, so that's how
MariaDb was born, as a fork of MySQL. Indeed, it was a
fork of MySQL and it was supposed to be a drop pin replacement for MySQL
and it was for some time.
Nowadays, let's say they are highly compatible.
There are not two other database that are as compatible as MySQL and MariaDb.
However, projects have diverged. Right. And as
you can see, the way I see it at least is that
these are the guys who built MySQL. They are working now
with Mariadb. So it's more like a change in the name and
then the other company continue to keep the name and obviously some of
the developers and stuff and both projects benefit
from each other, I would say at the development level
anyway, the first release was Mariadb 5138.
We are on eleven something, so it's
been a long ride since then. It has to honor the GPL
license, obviously. And so that
means it's going to continue to be protected just like MySQL,
at least in terms of availability of the source code.
About development, we don't know, right? I mean,
you saw the conversation on slack. Now in the
case of Marie bees, it gained popularity
very quickly and it became the default
database in many Linux distributions.
And you can see it here, for example in the Debian popcorn popularity
contest, which kind of sends,
you have to install this package on Linux on your
machine and then it sends data on what packages you have installed.
So you see the MariaDb server package gaining
and taking over MySQL in number of
installations. And it's not just on Linux, also on windows
you see more and more installations. This tells
me, in my opinion that more developers are using
MariaDB as well. So not only in production,
but developers are choosing Mariadb.
Now. The MariaDB foundation was created
to protect the source code of
being controlled by one large entity. So that innovation continues
to happen. Also the MariaDB Corporation was
founded. Now it's called MariaDB plc. And they offered
services but also products and most of them open source. For example,
faster connectors, connectors like drivers or APIs for
Java or Node JS, C,
Python to connect to Mariadb. And they are faster
than those in MySQL.
Now they also created additional storage engines.
So what is that? Storage engines. Let's talk a little bit about storage
engines. And there are many, many storage engines
here you see in NoDB again, right? So in fact
I said it's a module and yeah, that's true, it's a module for MariaDB.
So it's something you put in Mariadb. MariaDB comes with several of
these, not all of these, some of these already when you install it, but you
can put more there or remove them if you want.
InoDB comes there. That's the one that you are going to use most
of the time. Ironically, you have horizontally
column store right in the middle for analytical workloads.
So that's for the average of
the numbers in these columns. It's going to be much faster than other storage engines.
So for reporting analytics you can do these with
MariaDB as well. You have myrocks
initially created by Facebook for workloads
that are write heavy. You have tons of writes
and maybe the opposite area, which is like Maria without the M
many reads but very few writes.
You can store it in memory, as you can see there, CSV,
you have a spider for database charting. That is
like dividing the data in multiple nodes so that your database
can grow even. You can optimize on
the cloud with s three and many others.
Okay, so let me show you this. So let's say we have an application
in which people can make tons of comments,
right? And then we expect quite a lot
of those. So we create a table, comments, some columns
there, and then we say engine equals my rocks don't.
This is optimized for write.
We're going to save money probably on this storage.
Now in the same database we have categories,
some columns there, but categories, they don't change ever. Maybe they
change every ten years or whatever. So we can say engines equals
area. To be honest, you will use probably InoDB here, but you
can use any of the others, even memory, and then just load them when
the database starts or, I don't know, use any kind of
strategy. You can do this with memory DB,
you can have in the same database, these two kind of tables with different
storage engines. And since they are on the same server, the same database,
you can run a query like select some columns,
let's say all the columns from comments join categories
at mixing this data, add a condition to filter the
data. And as you can see, we have in the same SQL query,
we have two storage engines. That's pretty cool.
Okay, if you want to learn a little bit more about the kind of different
workloads that MariaDB offers and what makes MariaDB unique,
this is a good video where I quickly, it's a very short video where I
quickly mention some of these things. Anyway, so let's talk about production,
because production is very important, right? So MariadB
Enterprise is made for that. And it's built on top of open source
software. Okay, now I
call it enterprise subscription and it includes something called
MariadB Enterprise server which is based on the community server
which is free. And it offers more
larger maintenance window,
up to eight years. I believe the community server is like one year or so,
don't believe me, but check
the policies online. But it got to be something like that.
It's a big difference in the maintenance window. It offers
also the possibility to run non blocking backups
so that operations continue. Even if you are taking a backup,
you need to stop operations. Enterprise audit, if you have
to comply with some certifications or
this kind of stuff. Same with security,
any kind of, what's the word for know?
You have to comply with some policies or standards.
Mariadb Enterprise offers more options for this
now. It also offers something called Maxscale,
which is a database proxy by the way. So MySQL,
the name Mai comes from Mikhail Vidani's
daughter. So he has a daughter called Mai. I don't know,
maybe in Swedish it would be like me, let me know. You speak Swedish,
how you would pronounce that?
And he also has another daughter called
Maria. So you have Maria Dubi and he also has a son called
Max. And so you have Max scale. So that's interesting fact
right there for you. Let's talk about the Max scale
then. It's a database proxy. That means that it's
something that sits between a client, in this case a web server
with an application, web application and the database.
But the web server or the application is talking to the
proxy, but directly physically to proxy. But it thinks it's talking to the
database or the server, right? Database server.
And the server or the database thinks it's replying to the client. That's what
a proxy generally speaking is. And I
call it intelligent because it understands SQL. So it
can make decisions on, for example, where to send a query if it's a cluster
of multiple database servers, or what to reply
if I need to modify the
results somehow. This is all configurable.
That's the idea of a database proxy. It also understands
SQL. So you have maybe a web application in Java or
node JS and it uses the MongoDB driver.
So now it has to use MQL. So the MongoDB
query language. So instead of using MongoDB you can send those
queries to Maxscale and Maxscale translates that to SQL
and stores the data in MariaDB. That's pretty cool.
So you have all the data in a relational database. The advantage
being that if you have other applications that use
relational database, you have all the data in one single database.
So you can use one single query to join the data from multiple
applications that use kind of a different nature like
SQL and NoSQL in one single query. That's pretty cool.
If you want to experiment with these, I have this video where you get
access to this docker compose file and
it spins up all the services, max scale and you
don't have to do much. You can just run a query using MongoDB
query language and then another one using SQL, but you don't have MongoDB
really. And then you can combine the data, both SQL
and NoSQL, so to speak, data in a single SQL query. It's pretty
cool. Did I mention that MariaDB B Max scale
was intelligent? Well, it also understands Kafka.
Now here I put the database server on the other side.
And basically what you can do here, what this
enables is something like change data capture, that is sending
database change events, events like changing the
schema or in data from Kafka
to any other kind of system, including MariaDB. You can
send it out to another MariaDB database if you want it. For example one
that has column store while this one has inodb. That one would be for
analytics. I don't know, there are many possibilities.
So CDC and you can do the opposite. You can do a data ingestion
that is storing data that comes through Kafka in
MariaDB. Pretty cool. Now this is a very interesting use case
read, write splitting. So let's say you have these two database servers
right here and then you configure MariaDB replication, which you
can learn with this video, this one, this code takes you to the
channel so you can just subscribe. There are plenty of interesting videos,
especially by my colleagues. I have to say they're top notch
experts in the database, in database technology I would say.
Anyway, so you have configured this, you put Maxcale here, you configure
it. So that sends the rights, it's very easy,
actually sends the rights to the primary and
the reads to the replica. So everything you write in the primary because
of MariaDB replication is going to be available in replica.
So you can read from the replica instead from
the primary. And then your web application or
your applications just send the SQL or connect to the
max scale proxy. Remember it's a proxy. So the
application thinks it's talking to a database and it thinks it's talking to one
database. In fact the connection string, in the case of Java, that's example,
but similar. In other programming languages the parameters would be similar.
It thinks it's just one endpoint. That's it, I'm going there.
It's one. But actually there are two nodes. In fact we can add a new
one and the web application. You don't need to restart it, you need
to stop it. Nothing. It continues to work.
It's just now can work more
efficiently with reads. In this case we are scaling reads horizontally
and you can remove also the replicas later when you don't need them to
save money, for example in the cloud. In fact you can change the whole
thing. You can change these to now three different clusters or availability
zones or even clouds if you want. And data is
replicated there, I don't know, with inoDB and column
store nodes for analytics, the replication
still doesn't know. It continues to use the same connection string. It thinks it's one
logical database. In fact there are many nodes, as you can see there.
So this is topology isolation, it's isolated that it can evolve,
it can be evolved. Okay, so automatic failure,
which is pretty cool. Let's say this is the primary server in this cloud,
and this one is managing all the
writes. So if it fails,
then we cannot write data anymore. That's bad. But Max scale detects
this automatically. You don't have to do anything.
And then it picks another one, reconfigures it and
promotes it as a new primary. That means that the web
application continues to write data.
Maybe there's a slight short delay in some
of the write operations while this configuration is taking place,
but it doesn't fail. Then later
maybe the failed node recovers
or you restart it, or it restarts automatically, whatever.
Maxiscale detects these and now reconfigures it
as a new replica. So all of a sudden you have the same capacity.
Assuming nodes have the same capacity and they are identical, you can do
the switch over through a UI that Maxscale offers,
a web based GUI or GUI
or the command line or your own script
or configuration files to always use to kind
of restore it to where it was before, manually.
You can do that, that was automatic failover.
Let's talk a little bit about the present future of Mariadb.
Today you can deploy MariadB anywhere.
So Docker for example, I deployed with docker swarm I
believe was this deployment of MariadB
in this raspberry PI cluster that I built. I didn't have it close
to me right now, but it's pretty cool because you can
disconnect one of these cables, the whole thing continues to operate.
Maxiscale is replicated, I think I have two nodes, I think it's
the two top nodes there. I installed Maxiscale
there and it replicates the configuration.
So I configure one, the other one changes accordingly. So it's pretty cool.
You can deploy in the cloud, obviously any cloud. Looking into the future,
the teams are working a lot on kubernetes,
deployments and orchestration and AI capabilities.
I'm not going to talk too much about it, so stay tuned for news on
these two fronts. Migrating to MariaDB
is actually very easy if you for example do it from MySQL.
And these are the main servers,
other servers that we see that people migrate from the most
to Mariadb. But this is boring. This documentation, what I wanted
to show you, it's actually a feature that MariaDB has.
So you can say set SQL mode equals oracle or
put this in a configuration file somewhere with a different, slight different syntax.
Now MariaDB understands Oracle well, it doesn't understand all the dialect
of Oracle. That would be crazy. But it helps with migration a
lot because you get closer to it, so you need to change less
things. The same with PostgreSQL and the same with SQL
server. So that's pretty cool tool for migration.
Who uses all this cool stuff? Well, here you see some usage around
the world. So you see Asia, United States,
Germany, Brazil, Mexico.
Well there are many. These are the countries with more downloads,
right? But they are used. Meridi is used everywhere,
globally. And remember, it's open source and it's backed up with these
companies, which are huge. So this project is not going to disappear.
I don't need to mention or say anything about these companies.
You recognize them. Some notable users,
Wikipedia. When you read something on Wikipedia you are reading information
stored on mariadb. Samsung. If you have devices
that are Samsung and you log into their networks,
you are using Mariadb. Nocare, Red Hat,
Google DBS. DBS is a huge bank in
Asia. They migrated from Oracle to MariaDb and they are
very happy because they are saving a lot of money and they gain
some functionality as well. These are some notable
users, but actually 75% of the Fortune 500
companies use MariaDB. So most of them use MariaDB.
Now it's not only big companies because MariaDB has more than 1 billion downloads
on Docker hub. That's quite a lot. Now in conclusion,
we saw the MariaDB evolutions in the 60s when you store data on
tape. This kind of stuff up to today where
you maybe even with a few clicks have the database running in
the cloud, fully managed sometimes or on your
raspberry Pis, I don't know. I wouldn't recommend going
production with raspberry PI, although I bet it has been done. It might work
even. I don't know. I don't know for databases though. But it
is fun to do it for experimentation. Anyway,
we saw this. I want to leave you with this message. Nobody says Ubuntu
is a fork of Debian or Microsoft SQL Server is a
fork of database unless they are making a historical remark in
the same way. Mariadb is much more than a fork of MySQL and I
hope you saw why this is true and learned something about
Mariadb or database try it out.
It's a lot of fun. These are my coordinates.
Feel free to reach out. I'll be happy to hear from you. Thank you
and enjoy the rest of the conference.