Transcript
This transcript was autogenerated. To make changes, submit a PR.
You. Hello everyone and welcome to this
session on building applications with r two DBC,
a new fullyreactive database connectivity specification used
to connect with relational databases in a fully reactive manner.
My name is Rob Hedgpeth and a little bit about myself. I actually work for
MariaDB as part of the developer Relations group and essentially what
that means is that I do anything and everything I can it to help improve
or just better the developer experience using MariaDB
products. Right? And so if you're not aware MariaDB is a relational database,
that's about was much time as I'll probably spend really talking about MariaDB specifically.
Some of the examples that I use, of course will use it. But what you
want to take away from this is that I'm going to be taking a look
at the r two DBC specification as a whole, how it can
be used not only really with MariaDB, but a variety of different
relational databases solutions. But mainly we're going to be looking at r two DBC as
it is at the end of this. As you go throughout the rest of the
conference and even after the conference, if you happen to have any questions or maybe
just some input on the session itself, please feel free to reach out to me
at rob or robh@mariadb.com you can
reach me on Twitter at probably realrob or you can go ahead and follow me
on GitHub. I put a lot of samples not only with
r two DBC, but a lot of things dealing with relational databases. So feel
free to go out and follow me on GitHub as well. Now let's dive into
reactive programming with relational databases,
specifically relational databases. But what we want to key into first,
just so we can get everybody kind of running at the same pace or really
on the same page, is this idea of reactive
programming. Well, what does that mean? And so we're going to dive a little bit
into that just to really serve as a refresher for some.
And maybe you aren't familiar with reactive programming in general.
Not a problem. We're going to get everybody again, like I said, on the
same page. So for some of you, this may be a refresher, but let's go
ahead and take a look at something that is a pretty relatable example. I think
for most of us out there in a simple application
or a simple solution design where we've got a client communicating
with a server and more specifically communicating with a server thread,
right? So some thread on the server that's actually doing the work process,
something that the client is sending in maybe as a request or something like that.
And so in this instance, we can imagine that the client, like I said,
sends in a request and the server thread picks that up. And so when the
server thread does that, the request is in this case indicating to do
something with the database, which of course makes sense if we're going to talk about
reactive programming with relational databases. But in the normal scenario,
right, it's going to take that request, maybe it's got some
instructions that basically indicate I need to execute some query.
In this case, we're going to execute some SQL, some structured query language against
a relational database. And while that's happening, that server thread is essentially
just waiting for the process to execute, right? So whatever
maybe query or queries that you're sending in, just waiting for those to
finish and during that time is essentially just waiting there, right? So once it
gets it, then it can go ahead and handle that response. Now you're probably knowing
where I'm getting with this, but if that server thread basically
is stuck waiting for the database to go ahead and continue processing
or doing whatever it's doing on that side, it's not really important to
us. Then if another request comes in from the client, one of the things
that we know off the bat is that we can't do anything with server thread
one because it's at the moment really waiting for some work to get
done from the database. And so for this situation, we go ahead and we
decide, and something that we've been doing for a while now is just make this
multithreaded, right? Just spin up another thread. We'll handle this
asynchronously, and we'll handle whatever requested information is coming
in from the client that standpoint. And again, it's going to follow that
same trend where maybe it works with the database, maybe it works with the file
system. Doesn't really matter, right. Is that essentially it kind of all works
the same. It's dealing with something that is making it wait. And so when the
third request or the fourth or 100th request comes in, we're dealing with this
with more and more thread context. Now that's great, right?
But the problem with that is that as we add more
and more threads, we're certainly making that more complex from a development
standpoint, but we're also making that more complex from a computational
standpoint in the sense that it could take more memory to basically
manage all of this thread context. So the concepts of all these threads
have just general thread management now. We're not going to dive in the minutiae of
why that is. But essentially one of the side effects of that is
that as you increase memory usage, you can have
other side effects like decreasing throughput. All things that you really don't want,
right? Increased memory released throughput. These are things that you don't want in your application.
And so it causes more problems. And this is really where, if we're getting
to the meat of it is where reactive programming methodologies
or thinking reactively has really stepped in to help alleviate
some of these problems. In a reactive solution, we've got a very similar setup
here where the client is then communicating with that server thread,
right, just the same, but instead of the server thread, you know, executing or
communicating with the database and then waiting on that process to finish
or complete for doing something else, it's really just throwing it over to fence to
the database to say, hey, go ahead and do this. I'm going to go continue
to process other things. Like for instance, we've got a second request in here,
or third or fourth or fifth request. When you're done database
and you want to give me back some results, then let me know, I will
handle those. And then of course subsequently send that downstream into
the client. This is a very simple explanation, but was you can see
here, rather than being blocked, which we'll get into, we're completely
unblocked to handle other requests like request two,
until eventually the database sends us something back that we need to
handle. And this is pretty age old
solution that we're probably used to, right? You've probably heard of things like the observer
pattern or pub sub. It's nothing necessarily
new, right, or revolutionary, but this is a way
of being able to handle things in a more efficient manner. So you're not necessarily
which you can, but you're not necessarily looking to improve performance,
but you are looking to become much more efficient in using the resources,
for instance, like on the server that you have. And this is a large part
of being able to think reactively. Well, as I've very broadly
and very abstractly really explained this, you're probably
wondering how exactly is this getting done? And there's a variety of ways
that this can be done and I'm going to examine a particular
way that this is done. And then particularly how that plays into r two
DBC. As we talk about reactive development and reactive
solutions with relational data sources. First we
need to take a look at the definition. A lot of times I like to
start with the simplest spot that we can build off of. And really that is
this definition of fullyreactive programming, right? And it's chock full
of computer sciency words. So we're going to kind of piece this apart
as I read this out. But we've got a declarative programming
paradigm concerned with data streams in the propagation
of change. Now, like I said, that's chock full of computer sciency language.
But essentially what that means is that if we look at it first, we're looking
at a declarative programming paradigm. Those of you may be familiar, may not
be familiar, but you can really think of declarative programming paradigms as
you're not necessarily concerned about the minutiae or
the step by step process of what's happening in some
kind of, let's say, a command or execution. You're really just interested
in performing something and then retrieving back a result of some
sort. So it's a little bit different than, say, something like imperative programming where
you're really controlling the workflow, right? Is a step by step process.
So there's that side where it's declarative. And then this idea of dealing
with data streams and ultimately the propagation of change.
Now we're going to dive a lot more into data streams, but propagation of change,
of course, you can just think of as this dissemination or spread
changes in data, for instance, and that's really largely
handled by data streams. So if we get nothing out of this definition, really what
we're keying into is how we can use data streams combined with
declarative programming to really help with the propagation or the
spread of change. Now, if we were to ask
probably a lot of you out there, well, what is a data stream? I mean,
the definition that really comes first and foremost to us is probably this
idea. Now we don't have to use publisher and subscriber, but you can think
of point a and point b where you're just quite literally
streaming data, right? And in fact you could say, hey,
send it all over, just send all of the data over. But of course,
most of us know that if this happens from one point to another, or in
this case, what we're using is the publisher and subscriber, this subscriber
can become overwhelmed, right? And so if that happens and
it's not able to handle the data that comes in either at volume
or even the velocity of which it's receiving it, then it's going
to have to put it someplace to handle it later. And in this case,
we can think of, this was a backlog, right? It's really creating this backlog of,
hey, I've got this operation to do, but I will handle this
later. But the problem with that is that can really start to mount up,
right? So as we're sending more and more data, maybe we're not decreasing the velocity
or we're increasing the volume. It doesn't really help the subscriber at that point.
And so it starts to build this backlog up even more. Right. And that really
has us dive into, okay, well, what is a data stream, right. It was pretty
simple on that last slide. You're just streaming data. But we
need to take a step deeper into what's available to us within
the anatomy of a data stream that we can use to make
that process more efficient. Because ultimately, as we talk about
fullyreactive programming, and then as we dive into R two DBC, we're really honing
in on this idea of efficiency and really improving that. And so
from an anatomical perspective of a data stream, of course, we know
that it has to start at some point. And logically, we hope,
right, it's not some infinite process, and it's going to complete,
right. And we hope at the end of that, that all data has been processed
in some manner. And this all happens over the course of time, right? You start
it, and then, of course, whether it's nanoseconds or it's hours,
days, weeks, whatever it is, this happens over the course of time. It's going to
start and it's going to complete. But what becomes more interesting, of course, is what
we're sending. Now, this is all relatively straightforward, but we're going to be
sending, in this case, calling them variables. But these are really that propagation or
just the changes in data that I was speaking about before, right. Was things
get sent along this stream. Data gets sent along this stream,
and ultimately, it can run one of two paths,
right? It can either successfully be processed and completed,
or it can unsuccessfully, right. Causing some error or some exception.
And while this anatomy is pretty straightforward and pretty easy
to wrap our minds around with, it's actually very powerful, because as we dive
into the next couple of slides and I start to extrapolate
from this or start to really build on top of this anatomical
setup for a data stream, we'll see that we can use all of these
parts in order to create some standards and some specifications
that really help us with reactive development. But the first thing that we want to
start on is this idea of back pressure, right? So one
of the things that the anatomy of a data stream really helps us key into
and the original problem that we had, right, was that we're taking maybe too
much information, either from a velocity perspective, a volume perspective,
or both, like in this image to the face, maybe right
when we've got this fire hose, and ultimately we can't handle drinking out of this
fire hose. And so back pressure comes in was a way to be able
to control that flow. Right. You can imagine that back pressure is
basically controlled with this guy's hand so he doesn't just get
annihilated in the face with some water. This really starts to introduce
us to the idea of back pressure. Now, if I bring up the diagram that
I had before, where we've got that simple relationship between a publisher
and a subscriber, what this means is that you give control
to the subscriber to communicate with the publisher to
be able to say, hey, I only want this much information. This is essentially how
much I can handle. And then at some given time, whenever that may
be, it could be instantly, it could be seconds, could be hours.
However, it is at some amount of time when the publisher is ready to
do what the subscriber has asked for it do, it can send that information.
And this really creates this idea of non blocking back pressure
where the subscriber is going to say, hey, this is what I can handle.
Send this to me when you can, and an undetermined amount of time, the publisher
is going to say, hey, I'm going to go ahead and send this. And it
creates a very non blacking scenario here where we can receive that information that
the subscriber is set up to handle. And of course, if we put this in
a scenario of operating on a couple of things, this means that if the subscriber
request will say a single element or a single piece of data, then of
course the publisher is going to send that single piece of information.
If the subscriber says, hey, I can receive more, then the publisher
can go ahead and send the subscriber two more pieces of information and so on
and so forth. But this is very simple, right? And that's good.
But the problem with that is that there's any number of ways that this
can be implemented. And we all know that working at different shops and
throughout the years, there's a lot of times where we may reinvent the
wheel or we'll go grab some random package and there's no
real consensus on how some of this stuff is done. And that can create a
problem, right, for the longevity of a project or maintenance of a project,
even onboarding people, and I've never seen this, what the heck how
are you doing this? And having to basically spend cycles bringing
everybody up to speed on exactly how you implemented non blocking back pressure.
So back in 2013, a group of individuals
from places like Netflix and Pivotal and Lightbend got together
and they created a specification. And this specification is called
reactive streams. And essentially it's a way to be able to take advantage
of the anatomy, right? So the pieces of the data stream in
such a way that you could create a very efficient design and a very
reusable, almost standard, right, is in the specification to be able to
use across a variety of different libraries or solutions
in a reusable manner that's broadly disseminated and broadly used.
And this again contains pieces like I'd shown before, like the
publisher and the subscriber, but using some of the anatomical
features of some of the things available within a data stream to be able
to take advantage of sending variables and understanding
whether or not things were done in error or cause an exception, and then
whether or not things completed, we can actually piece this puzzle together,
right, as we see here in this diagram, where through the use of a subscription,
we can set up a relationship between a publisher, a subscriber, and we can do
certain things, right, we can request certain information, we can cancel the stream
altogether, and then the publisher is essentially going to notify
the subscriber key into events, sorry, methods that exist on
the subscriber to say, hey, here's your next item. This actually had
an error, or I'm completely done. But again, this is very general.
So what this means to you, as the developers out there is you can really
take this and think of it very simply as a collection of interfaces.
Fullyreactive streams API is really a specification. And what
that means is that they're not really defining how exactly from a step by
step process. You should really do this. That's up to whatever implementing libraries,
which I'll get into a little bit later, decide to do that stuff instead.
These collection of interfaces are basically just defining the structure
and essentially the flow of how these things can work to create reactive
solutions using the fullyreactive streams API. Okay, but why
is this important? I know you're like, Rob, you started this session at the beginning
of it talking about reactive programming and how you can use that with
relational databases. And we haven't really gotten into databases
at all relational otherwise. So why is this important? Well,
fullyreactive database interactions, in order for that to
happen, they need to be top to bottom. So in order to
get the benefits of a reactive solution in general, is that the
whole thing needs to be reactive, right? Otherwise, if you've got maybe
a portion of it, say the server back end part of it is all reactive,
but then your communication with the database is not, it's blocked.
And if one part of an application or a solution is blocked,
then if you think about it, it's all really blocked, right? Because it's not fully
reactive in the sense that everything is communicating in this kind of publisher
subscriber, what you think of as event driven programming,
asynchronous data streams through event driven programming. And so if
it's not top to bottom, then it's really not reactive.
And so for reactive database interactions,
they really need to be fundamentally non blocking and they need to
use these things or this concept of back pressure, which is why I've described
it. And the reason why I put this up first is because there's a lot
of libraries like RX Java, project reactor that already
use reactive streams, right? They already have implementations of
that specification or that API was an implemented library.
So they've gone in to find the actual bits and processes
that need to happen to kind of fill out fullyreactive streams as an actual library
or an actual solution. And because again, it's supposed to be like
the standard specification, it's something that kind of plays well
with how you would communicate with a database. But unfortunately,
if we're talking about Java based or
really JVM ecosystem based solutions, we're very used
to using something like the Java database connectivity specification,
which is JDBC, right? So you're probably very familiar with that. Now, as I
described in the scenario before, you can create a fullyreactive application using maybe
RX Java or using Project reactor. But when
you hit the JDBC API, well, it was created
back in 1997, and it wasn't really made in
keeping in mind some of these more reactive types of interactions.
In fact, those weren't really mainstream for applications in general because we
weren't having to deal with things like taking advantage most efficiently
of the hardware. Underneath, we were really mostly concerned about things like
performance. And so that's really where a lot of the threading conversation comes
in, or handling context, or handling concurrency.
So by design, JDBC was actually designed as
a blocking driver, right? So it communicates using some kind of wire protocol
to the underlying database, and you have to wait for whatever it's
done. And solutions otherwise essentially are just done through
an asynchronous manner where you're kind of spinning up threads or threaded
communication in order to be able to do that. But while that may be
good for a lot of applications out there, if you're looking to maintain
or kind of improve as much as you can the efficiency of the
hardware and of the horsepower underlying for your database
and your applications, then you're going to need to have another solution.
And that's where finally, R two DBC enters
the chat or enters the conversation. Right, but what is R
two DBC? Well, as you remember, JDBC is an acronym that I went
into, and I've probably said all of the words enough and you've probably
guessed at this point if you didn't already know. But R two DBC
stands for reactive relational databases,
connectivity and like JDBC and like reactive streams,
it is a specification. Now, the goals and design
principles really of this specification are pretty simple and straightforward.
One, we live in a day and age where everything really needs to be open.
And coming from MariaDB and even
before that, really Marie DB, coming from MySQL, we're really
ingrained and really have our ears to the ground from the open source community,
because we understand that the open source community, a whole, can really come
together and solve a lot of problems. And so it was one of the goals
for the specification of RDBC to be completely open.
And that's because the group of people that originally started didn't
set out thinking, oh well, we have all of the solutions for all the problems
that may come up with communicating with relational data sources underneath.
In fact, quite the opposite, right? They want to open that up so that community
members, and I encourage you after this, if you want to go and
contribute to RTDBC, please do that. They come
from all walks and backgrounds and different types of solutions. And so
they can come with different types of solutions for the problems that exist out there.
And then beyond that, a lot of the principles have really gone into,
as we've talked about, reactive development in general, such as being completely nonblocking.
Right. It's important that in fact, it's crucial that the
database interaction for a reactive solution be completely non
blocking, because if you're going to create a fullyreactive solution, it needs to be fully
reactive. And of course, I dove into fullyreactive streams
because that really sets a standard, this universal specification
that can be used within your backend,
maybe API solution or whatever the solution may be, that communicates
with the database, which may already be using reactive streams.
We want to use that same specification within a database connectivity driver,
ultimately. And along with that, we need to have a very small
footprint, right? So a very lightweight specification that
doesn't set out to make too many assumptions or have too
many opinions. And so for RTDBC,
one of the largest goals was to keep it small, to keep it
simple, because while a lot of relational databases are very similar
in the way that they may connect, they may execute
queries, there are a lot of differences between them, right.
There's a lot of vendor differences between, say, MySQL, Marie DB,
Microsoft SQL Server, Oracle Postgres. Right. There's a lot of
things that you can take advantage of, each ones that really set them apart.
And so R two DBC, as a specification wanted to keep
this in mind so that at the driver level, things could be added
into it without having to absorb or kind of handle opinionated
or assumptions that were created within the R two JDBC specification.
And then ultimately other clients and libraries can also be created in combination
or used in tandem with the drivers, which I'll get into a little
bit later. And this really harkens back to the diagram that
I showed a few slides ago where rather than having a green
checkmark for the reactive app or the actual back end
application that you may be creating that communicates the database, now we've got two green
check marks because the whole application is completely reactive or fully
reactive. But diving a little bit deeper into that, we need to take a
look at the specification itself. Right. And how you can
use something called the service provider interface,
the SPI for R two DBC, to actually create
a reactive driver. And that reactive driver is then something the
implementation that then you can use within your applications. So first we're
going to take a look at the R two DBC SPI,
take a look at how that's constructed, what the principles
of design were really brought together for,
and how that then ultimately can be used within a vendor's
reactive driver, R two DBC driver, in order
to create something that you can use against a database.
Well, why this service provider interface?
Right. So before I get into what it is,
and really when, as we look at what it is, we're going to see that
it's pretty simple, but it's really important before we do that to understand
why. And a little bit of this I touched on just a couple of seconds
ago is this idea of being able to
create a very kind of unappenated or
unbiased approach, not making too many assumptions of
things that need to be done, essentially keeping the specification
as light as possible. And that's really done from a hindsight
perspective, where we look back at things like JDBC
and not to pick too much on JDBC because it was stood the test of
time. Obviously you can really use JDBC for a multitude
of solutions. But one of the things that I'd point out is that if you've
ever dealt with JDBC, either from an API perspective or
creating a driver from it, is that it can be
very opinionated, right. And it has done some things that has
made it ultimately very difficult, either from an API side or a
driver side, to have to add some extra code to
handle things specific to keywords like question mark binding,
to basically have to parse through URLs on
the driver side, which we all know URLs are fairly
universal at this point, so there's really no need to have to reinvent
the wheel for every driver empty that's constructed and so on
and so forth. These problems have kind of sprouted up. But the idea really
was to take the advantage of hindsight within RDBC and kind of
strip down or simplify things as possible while still being
broadly available or broadly be able to be used by implementing drivers.
And as I mentioned, URL parsing is one of the first things on that list,
because we know that URLs are pretty straightforward,
right? And for database connectivity, over the course of years,
through usage of things like JDBC, they've established a pretty set
URL to be able to do this, right, where you're defining things like the scheme
in this case of determining that it's R two DBC,
specifying a driver, whether that be MySQL or SQL server,
being able to identify the database through a host
or a port number, maybe you've got a default database, and then being able
to set on query parameters that maybe add things like security or
encryption to be able to add to that overall profile
of how you want to communicate with the targeted database, right? That's what
the URL is for. And so within RTDBC, they're handling all the
URL parsing for you. So the implementing driver doesn't have to worry
about those things. There's a set standard beyond that, the R two DBC driver
or the R two DBC SPI comes with really two levels of
compliance. So there are things or interfaces really if
we look at the API interfaces that have to be fully
supported, right? So they have to have an implementation and a full implementation.
And there's interfaces that exist within the SPI that have
to be partially supported. And part of that is because it allows that
flexibility within the drivers to key into vendor level
functionality or vendor specific functionality. But a lot of this should look
very familiar. Right. As we take a look at it, we're looking at things like
connection factory, which of course basically
a factory. It's producing some kind of product and in
this case the product is a connection. And so we're going to take a look
at kind of this kind of top down approach where we
can think about a very normal or very standard sequence
of events that may need to happen in order to be able to communicate with
the database, execute a query and return, and then parse some results.
And that involves in this case taking a look at a
couple of interfaces that exist within the SPI. One of course that
I mentioned was connectivity factory, how they can create connectivity,
then from those connections that have been established, how statements can then
be executed and then using results, object in a
row object, how those can be parsed. So let's go ahead and first take
a look at the connection factory within the SPI. It's pretty simple. This is the
entire interface that exists for connection factory. Now a lot of this seems
probably pretty straightforward, and I actually pretty much described exactly what
it does, which is that it creates connections. But one of the things to look
at, and this really takes us to the beginning of the session where
I talked about how reactive streams is an integral
part in R two DBC. And that's because fullyreactive streams
is used directly, right? So the specification is
used throughout the R two DBC specification.
Now again, like fullyreactive streams, R two DBC
is really just a collection of mostly interfaces. There are some classes
and there's some abstract classes in there, but it's mostly interfaces that have to be
implemented from a driver. Right? So from MySQL was a vendor,
they've got their driver, Marie DP or postgres,
Microsoft SQL Server. Right. A bunch of different drivers out there. But what
we want to key in on is this usage of reactive streams, because as we
take a look, we know that we're using reactive streams. Specifically we're
importing the publisher. And the reason why we're doing that is because
if we remember back to the relationship really between a publisher
and a subscriber, we understood that, that in that first very simple one
where we're talking about, hey, I want to request some information. The publisher says,
okay, whatever amount of time I need to prep this or prepare this,
I'm going to send this back over to you. This is what's happening here,
right? Is that rather than just getting a connection, right. So we're using
the create method to just get a connection back. We're actually
getting back a publisher. So when we execute the create
method, we're actually just receiving then this publisher that we
can then subscribe to, which is going to say, hey, at a given time I'm
going to give you this connection when I'm ready. And that could happen instantaneously,
it could happen nanoseconds, milliseconds down the road. But we're
tapping directly into the reactive streams and more importantly reactive
programming to make this more of an asynchronous process of receiving
this connection or ultimately this event that tells us that we
can have that connection. And this is a theme that plays throughout all of
the interfaces, right? And I say all of them, I mean, of course not all
of them have necessarily interactions with reactive streams as we get into some
of the enumerations and stuff like that. But as we get into
interacting with the database itself, it all is really
rooted or kind of built its foundation on reactive streams and
of course reactive programming. And so receiving things
or beginning things like a transaction, right? Being able to begin a transaction is
a process that we can subscribe to or being able to close a
connectivity, being able to commit transactions. These are all things that we're
taking advantage of in a reactive manner. Even the
idea of being able to execute a structured query language
statement or an SQL or SQL statement is being done
using reactive streams where we're essentially returning
back. So we're executing whatever our statement may be.
But when this was been executed, we're subscribing to a publisher to receive
that result. And so when that statement has been executed and the publisher
sends that to the subscriber, that's when we'll receive that result
object. And of course, nothing really new here is that as we
start to dive into the result object, which you could think of as a collection
of rows or a collection of pieces of data
that have come from whatever query statement we have executed.
We can also take advantage of being able to look at things like how many
rows to being able to parse through that
information in a reactive format or a reactive methodology
all the way down into a row, which is quite literally a row
of information. Now that of course doesn't have to be tied to a specific table
that could be just depending on your query. But ultimately we're trying
to get to this row. So as you can see, and if you're familiar
with it, doesn't even really have to exist within the Java or JVM ecosystem,
any real connection driver or connectivity driver out there.
Ultimately, if you're dealing with a relational database, you're trying to
take advantage of being able to use that tabular information and however you
constructed your query and be able to get after that. The difference here of
course is that we're doing that in a very reactive manner
on top of the reactive streams specification. But of
course, as I've described before throughout this, is that we really
want to dive into the how I can use it. How can I specifically
dive in and started to take advantage of R two DBC
for the relational databases that I'm using? Well, there are a variety of different
R two DBC drivers, or some places call them connectors.
Simply think of them as implementations of the R
two DBC specification that's specific to whatever database
or whatever vendor of relational storage happens to be communicated
with. And there's a variety of them out there and there's even more in the
works. But you can imagine distributed databases like cloud
Spanner to using H two MySQL,
Mariadb, Microsoft SQL Server and postgres.
Right? The idea of this R two DBC specification is to provide a
very broad standardized approach for running or
managing reactive interactions with the underlying database.
And I'm specifically going to take a look at how we can actually use
the implementation from MariaDB. Now of course I mentioned that
I am from Marie DB, but a lot of these drivers,
especially at the highest level, are very similar, right. And that's the
entire point of the R two DB specification, is to really tie together
a lot of the things that relational databases all do. And the first thing that
we can think about is this idea of being able to connect to the database.
Of course we understand that it's going to take information like
a location, right. The combination of a host address and a port number,
right. So we can specify exactly where that database is. And then at
the simplest level, then providing things like credentials. But of course there's
other things that you can be able to add onto this, such as security and
then limitation features as far as limiting timeouts
and stuff like that. There's all kinds of things that we could add onto this
and some of which may be vendor specific. But at the simplest case we
can use this connection configuration implementation
to be able to do that. Now as you can see here,
it's proceeded with MariaDB and that really just tells you
that the connection configuration which exists as can interface
within r two DBC, we have an implementing class which is
preceded by the name of the vendor. And that's pretty common for MariaDb,
for Microsoft SQL Server, for MySQL these naming patterns are
going to be pretty similar. And that's even more evident as we take
a look at how we can take advantage of that configuration object which
I created in the last slide to then be able to create an instance of
the MariaDB connection factory object. And that connection
factory implementation is really that just an implementation of
the connection factory interface within RDBC can use that
connection factory then to create or to get a hold of
a connection. Now as I mentioned before, we would do this
using reactive streams and this idea of first
receiving back a publisher, which then at some undetermined amount
of time is going to publish or send a connection object to us
that we can then use. But in the case of some things such as a
connection, when you create it or when you request it, you're most likely going to
want to use that immediately. And so in some cases,
very small cases, you actually do want to block and with
the reactive streams specification implementation.
So whatever that may be, in this case we're actually using project reactor.
They come with different mechanisms that allow you to wait or to block
that communication so we can actually wait for that connection object to be delivered.
In this case we're using block and now we've got this co n in
connection object and what that allows us to do
once we have this. Now I understand I'm starting to
incrementally add more code into each slide and that's really
for a purpose which I'll get into a little bit later. But don't worry,
we're not going to dive into the details of all of this. Just know that
starting from the top, we're taking advantage of the connection object and then
we're using the method of create statements to
specify the SQL statement that we want to execute. And what that's going to
do is it's going to return back a Mariadb statement
implemented, right? So the statement class, Mariadb statement
is a class and it's an implementation of the statement interface
which exists in r two DBC. From there, that's where
we can actually use that select statement object and be
able to take advantage of the execute method. And that execute method
is just returning us the publisher like I had said before.
And of course from there, ultimately we need to subscribe to
that. So at the very bottom you see that, okay, we're going to subscribe to
this and it's going to give us whatever we've mapped,
right. So in the middle there, there's some things that we've mapped and we've
kind of parsed through some of the things like result and row object
that I had sent before. But what's important about this is that
we're using a reactive streams implementation,
in this case that I mentioned before, project reactor, which basically
will take our interface, our publisher interface of type Maredbresult,
and it's going to kind of fill that out with flux. And Flux is an
implementation of the publisher interface.
And with that then we're able to do some things like map, right? We're going
to take the data that exists in the database, we're going to do that fanciness
to go ahead and map it to the Java data types
and a Java object. In this case we have task and
then whenever that's ready, the publisher is going to send that we've of course subscribed
to it, and then we're going to do something with that task. Right the
very bottom here. But got to say, that was a pretty simple implementation
as we talk through the steps. But you and I, we all know that
coming up with a new data access layer,
having to persist or maintain objects and
all of the steps in between that can be large.
Which is why over the course of time, clients, right,
libraries that have helped really abstract away a lot of
those details have come into existence. And I hit on it a little bit
earlier, but R two DBC was really designed with this in mind,
right? It was kept really lightweight, not only for the vendor level,
but so that the vendors could keep that as lightweight as they possibly can,
but then still adhere to this specification that could then be used on a client
level. And the client level is to create more humane
or more opinionated APIs and ultimately create this
level of abstraction or this layer of interactions that's going to handle things like
creating the data access, right? So handling connection factories, handling the
connections so that ultimately you can do what you need to do,
which is that I want to execute query or several queries,
and I want to put that into an object on my application side.
I don't want to have to worry about a lot of the steps that kind
of come in between that, such as the mapping and stuff that I showed before.
And there's a lot of clients that already exist and there's a lot more in
flight that have been validated as official RDBC clients. And the one that we're going
to look at today is spring data R two DBC.
So I'm going to jump into, right after this, I'm going to jump into a
live demonstration where I'm going to show how we can take advantage
of an R two DBC driver that's being used by the
spring data R two DBC client. So you can just think of spring data if
you haven't really used it, you can just think of it as, again, that abstraction
that's going to help us create the data access layer and the data
access and really persistence layer persisted objects that we can use
within our application. And ultimately we're just going to communicate with
that, with an application all the way to the database.
So let's go ahead and get started.
Within our demonstration, I am going to jump
directly into an integrated development environment known
as visual Studio code. It's a free,
essentially code editor, right. And you can use it for things like compilation,
but there's a variety of them out there, and a lot
of them will work with the type of project that I'm going to show you.
It's not necessarily important, but just so you know, I'm using visual studio code.
And on the left hand side here you can see what is my solution explorer,
which is going to show everything within my solution. And in this case I have
a maven based project. So maven is
a build management system essentially, right? So being able
to specify things that you want to be able to build within what we're
using as a Java application. Now, I've started by
going to start spring IO to go ahead and generate
this project. And really why I've done that is because I'm using spring,
the spring framework. There's a lot of dependencies
that come into play, and I don't want to go through the steps
involved and kind of pinning all that together and individually bringing those dependencies
in. And so I used a generator at start spring IO
to create a spring boot project. Now, spring boot is something that you would
traditionally use for something like an API. We're not going to do that. We're going
to keep it much simpler than that. But really I wanted to be able to
just very easily set up all the dependencies
so I can take advantage of spring data r two DbC,
which is the client, and then ultimately be able to take a look at
the Mariadb R two DbC connector.
And that really starts in a Maven project, right?
A maven based project, looking at the project
object model, which is this palm file and essentially the
palm file, it basically just will indicate a bunch
of things about your project as far as versioning, as far
as different types of properties and different versioning numbers
and stuff like that, that it'll use. Right. You can name the project, things like
that. But what we want to key into really is this
dependencies property. Inside of there we're defining the dependencies
or essentially these binaries that we're pulling from something
called the Maven central repository. Not really important other than
the fact that we're taking these libraries, we're pulling them down,
we're setting up all the dependencies that we need so that we can go write
code. But what I want to point out is that inside of the project you
obviously are going to pull down the spring
data r two DbC binaries and then you simply need to
pull down the Mariadb r two DBC connector.
And then a couple of other things that I have in here. I mentioned spring
boot, which we won't really be using, but it kind of helps set up this
project. But then project Lombok, which basically just
means that I'm lazy and I'm having Project Lombok kind
of generate or handle some boilerplate code
for me, some getters and setters and some classes which we'll see in here.
Now let's go ahead and actually dive into the code
of how we can use r two Dbc within a reactive application.
Now of course, I've already created this application to kind of save some time.
And this application is going to be run inside of a single file. And the
way that we're going to do this is we're going to communicate with a database,
which I've already set up using a docker container, which is the easiest
way to really get started with a lot of these databases.
And it just sits on my local host. So I'm just going to communicate using
one hundred and twenty seven zero, zero, one. And I'm going to say,
hey, I'm going to mess with a single table or I'm going to communicate with
a single table. And I want to do this in a reactive fashion using r
two DBC. And within this class I'm
going to set up a couple of things to let me do that right.
So within about 70 lines of code, we're going
to go ahead and be able to set up everything that we need to show
several different types of interactions with the database using r
two DBC. Now as we dive into this application,
something that I want to point out is that I have a main method here
and I've done some things so that I can take advantage of the running application
itself. And really ultimately why I've done this is
so that at the bottom here. So when you run an application, it's going to
run and then it's going to be done. But because we're using asynchronous
data streams, these activities from publishers and subscribers,
the application may finish before we're actually done processing
the information. And so we might not be able to see anything in the output.
So at the end of this I'm basically just preventing
the application from exiting so that we can within the console output,
see exactly what's happening. And I think that's the easiest way to really get our
feelers or kind of get a sense of what R two DBC is doing
and how we can use it. And then of course we can expand well beyond
that. And of course you wouldn't just keep your application open indefinitely,
but this will give us a good starting point. So first
and foremost, when we come into an application, especially a spring
application, we want to dive into application properties.
Now this application properties file is going to
allow us to take advantage of some of the configuration that
exists in spring, specifically in spring r two DBc that allow
us to specify things like the URL, the username
and the password. Now in the instance of time,
I'm going to go ahead and paste some of these things in here because
it's going to be a lot faster than me kind of fat fingering the code
and then probably not having anything that can compile at the
end. But I will explain everything as I go through it.
Now what I'm going to do first is I'm going to paste in
the information that allows us to connect to our underlying database.
Now was I kind of dove into before? The first
part of that is really establishing an R two DBC URL or the
URL that's parsed within the R two DBC specification.
Of course that starts with your scheme. Then you can specify the
driver. In this case I'm using Mariadb. I'm going to
indicate that I am using localhost on port 33
six, and my default database is just called to do right
inside of our database, which I'll bring up here in a
console application. Very simply here
I have one table that exists inside
of this to do database. The table is just called task
and it contains, as you'd imagine, a list of tasks.
Now of course this is very simple, but it's meant to be that way so
that we can establish a very clear example here. And it's already been
preloaded with a couple of pieces of information which will help us in the demonstration
to come. But know that this table exists within a local
database on my machine. Now to get to or
to gain access to this. Then I provide the username and the password
all within this application properties file. That's that.
So we've saved and we're going to go back to our demo application
because now it's time for us to start writing the code that we can
use to actually integrate and communicate with the database.
So again, I'm going to paste some things in here, but I will explain these
along the way. Now, if we want to think about communicating
with a database, and I mentioned it before, we want to think about how we
can persist those things on the application side. And the first thing that
we're going to do then is create a class that mimics or
that matches the task table that exists inside
of our database. Now you don't have to do this and it can be much
more complex, but this is a very simple application and as
I saw before, I have an id field, I have a
description and I have completed those all match to the table that exists
inside of my database. But there's a couple of annotations here which
are useful to know. One is this idea of data,
and this is coming from that project Lombok that I mentioned before, which is allowing
me to go ahead and basically have
it build out the things like the getters and setters so I don't have to
do those things. I've got some argument requirements
as far as being able to require arguments within the constructor whenever I
create a new object. And then most importantly here we
have this annotation of a table which is then setting up
the relationship really between this class of task to
the task table which exists inside of my MariadB
database or to do database. And then I've got some annotations
which basically describe a primary key and the fact that a
particular property cannot be null, right.
Things that are important for this so that I don't accidentally enter
a null object and we get an error. Now that is,
we've appropriately mapped, or we're setting up the persistence of
being able to take information from the to do or the task
table and put it into this task object. Next,
when we're using r two DBC or spring
data, r two DBC, we really want to take advantage of
this idea of repository. And a repository allows
us then to be able to communicate in a very simple way
to a repository of information underneath which
in this case is actually going to be tied directly to our task table.
And I'm doing this by using something that's already provided within spring data
RDBC which is called the reactive crud repository interface.
And this interface is just allowing basic
crud or it facilitates basic crud operations, right? So everything from
being able to insert, create, or so create,
read, update and delete, right? Just the crud
acronym, if you will. And all I have to do really
is specify that I am going to be using a task object which
I had previously mapped right through that annotation of table, and that
the primary key is an integer, right. For that id.
Now that's all I need to do. Now I've actually set up
the communication directly between my application and
the table that exists inside of my database. And I've done all of this through
the application property settings that I just added in. And then now
through the task repository, which uses the fullyreactive crudpository
and the task class. Go ahead and save
here. Now I want to add in a couple of bits here. Now a couple
of things that I'm going to add in are going to be, for instance,
I want to go ahead and do something called auto wiring,
and this is using something called inversion of control and dependency injection,
which allow me then to just create an instance of
my task repository that I can then use to
be able to execute those crud or those simple methods
to be able to do things like read and write information to the
underlying task table. I'm kind of obfuscating
away some of the details there, but certainly look that up if
you're not as familiar. But ultimately we just want to be able to get information
from our database. Now after I've auto wired up that
repository, next I want to add in a couple of
methods that I've really built that I can take advantage of
within side of this demo application, right, the ability to
save a task and the ability to get a collection of tasks.
So, right, I want to create a new task and then I want to be
able to display or retrieve those and then display those in the output.
And as you can see, I'm actually taking advantage of the
task repository instance that I previously established,
and it's returning monotask and flux
tasks. Now I already mentioned Flux, which is an implementation
of a publisher that specifically exists within
side of the project reactor library.
Flux returns zero to many objects, mono returns zero
to one. That's really all you need to know at this point. But those are
essentially implemented publishers from reactive
streams. Now I'm going to very
quickly show how I can take
advantage of both of those methods, right, so as I mentioned before,
I have four previously existing tasks. It's kind of like a magic trick.
Here we go. I've got four tasks inside of my table. What I'm going
to do now is I'm going to create a new task using the task object
that I created on the bottom of the screen, the task class. And I want
to implemented the only non
null property that I described within the constructor, which is description.
And I want to give it a new description of task five. I then want
to execute using demo application or the instance of demo application to
save the task. I'm going to subscribe to it,
right? And then it's going to output the result. Then I
want to output the results of all of the tasks. So I just want to
be able to read very simple things that you'd be able to do really in
any driver, any connector not necessarily specific to r two DBC.
So let's go ahead and execute save
here.
Let's see.
Okay, well, let's clear this
really quick and let's just go ahead and rerun
this. I might have had a previously running implementation.
Well, I've obviously missed something, but it looks like I got about
three minutes left. So I'm just going to pull over my other
application. I promise this has worked in the past, but again,
I always tend to find a way to do this. But what
I want to do here and what is actually executed, then I'll
see here inside of the output is
that I have been able then to create
a task five. So that first insert
right, it's going to return through the publisher right here
or through the subscription of that original save publisher,
the ability to return task five.
Then I just simply want to be able to read everything
that exists within that table.
After that, the plan was to go ahead
and jump into how you can use back pressure
to be able to actually modify the way that the subscriber communicates
with the publisher was far as how it's requesting information.
So I've kind of ruined the surprise here, but I want to show you how
easily that's done. Simply using the demo
application instance that I had created before, and again using git task,
which is the method that I had previously created, I basically just
want to create a custom subscriber. And within sight of
that, if we remember back to the subscriber interface
that exists inside of reactive streams, we can remember that
we had a variety of different things that we could do to take advantage
of the data streams. Anatomical features such
as being able to listen to on subscribe methods
that are all overridden here. Being able to understand when the next element or
the next task in this case was actually sent over from the publisher,
being able to determine if an error or an exception had happened and being
able to see if it's complete. Now, primarily what we want to look
at here is the ability that on next we want to control through back
pressure the amount of information that's sent. And the
way that I've done this is very simple, where I've essentially just determined
through an integer, a variable here within
side of this class to say, okay, I want to receive two at a time,
and then using modular division on two, I basically
just saying that every time I just want to request two pieces
of information, and when I've done that and I've incremented that I've received
two, then and only then am I going to request again.
Right? So every two times I'm going to request that and I've
added some print statements in order to be able to eliminate that.
Now, inside of our application we
can see that if we start here, we can see that
when I were to run this, I'm actually then receiving
two pieces of information or two tasks that
I can print out, and then I'm printing those then
and only then I request two more pieces of information or two more elements
until the publisher can send everything that it has. So in
this instance we have five records that exist within that
task table. And so at the end I've requested two, but the publisher
can really only provide one and the process is complete.
And that concludes the demonstration on reactive
streams. I'm going to go ahead and put this last bit
up here because I just want to point out that if you happen to have
an interest in contributing to the r two DBC specification,
please go and visit RDBC IO. It is
now a part of the reactive foundation, so it's getting quite a bit of momentum
and it's expected to be completely GA or version one
this year, so 2021. If you'd like to see more implementation examples,
please go and check out mariadb.com developers. There's a bunch
of open source free examples for getting started with R two DBC
using completely free instances of a database. So you can
jump in there. If you'd like to check out the driver code for mariadb R
two DBC, you can check that out on GitHub as well. I am actually
writing, or I've written a book and it's due to be published in April on
R two DBC. It's the first book on RT DBC called R two DBC
revealed. So if you're really interested in it and really want to dive into how
R two DBC has come together and how you can use it within your applications,
please check out that book that's coming out in April. And again, please feel free
to reach out to me at robh@mariadb.com at probably real Rob
and rhetchpath on GitHub. Thank you very much and I hope you enjoy the rest
of the conference. Have a great day.