Transcript
This transcript was autogenerated. To make changes, submit a PR.
You. Hello hello,
my name is Joe Wingard. Today I'm going to talk to you about an
application I've developed, I call service engine. Before I
do that, I'd like to draw your attention to the link at the bottom where
you will be able to download the slides that I'm about to present.
The URL is Tinyurl Comf
42 serviceengin
okay, service engine, what is it?
In short, service engine is an application that
you can run that will connect to many
really popular databases and it will auto provision,
rest, GraphQL and GRPC interfaces that
support full crud to any of the tables, views and materialized views
inside of that database. That's what
it is. The application is available in multiple
ways. You can see this on the right side of this slide where
I have it listed as multiple implementations. This is
a JavaScript conference and I have built this framework
and released it on NPM. So if you were to use NPM
service engine, you would download my framework. So the number one
there service engine, there's the repository that
holds the source for the NPM library. Number two
number two service engine docker that
holds a node application that implements the NPM package and also serves
as the source for the generic docker image.
Then there's a third implementation service engine template.
If you were to go there, you would see an implementation of the public
generic Docker image. And this is what I use when
I need to implement service engine myself.
The key features of service engine are that it auto provisions the
resources. I'm going to go into detail into how this works and some
of the following slides. But the
fastest way to explain it is that when the application
starts up it executes survey,
a database survey query, and it pulls internal tables and it uses
that material to hydrate the interfaces validation
at the source I've used happy joy for a long time as validators.
I think Joy has moved to a different organization, but I've
used joy validators for a long time and this application
supports validating every call, reads and writes.
This application also supports database schema operations.
Kinex has terrific support for schema migrations and I believe
every feature provided by Kinex is supported
by the Docker image. There's some additional
functionality that's fairly detailed and extensive.
There's two and a half hours of documentation on YouTube,
two and a half hours of video that I've put up, and there are links
to those later in the deck. But an example of one such
feature is there's a mechanism that allows
you to intercept a query to the application after it's
validated, but before it gets executed against the database.
This was useful to me. An actual use case I had was appending a
partition key to an inbound query that
was blind to the user, so it optimized the query and the user didn't know
why. All right, moving on.
How it works the
image in the bottom left with the green database and
the blue hexagon and the three horizontal bars.
The database this application supports all of the databases
that are supported by Kinex. That's postgres,
including with the PostGis extension, MySQL,
SQLite, Oracle, Redshift, and SQL server.
The way it works is that the application connects to the database.
It surveys all of the tables, views and materialized views. It collects names of all
the fields. It determines whether or not a null value is acceptable.
It determines whether each field is part of a primary or
unique key, and it uses that information to build happy joy validators,
which is annotated above on number three.
After it builds validators for each database resource, it provisions
rest, GraphQL and GRPC services that
allow reading and writing to the different resources in the database.
I'm going to go into detail on the next slide into the rest interface.
I'm not going to spend any time going over the GraphQL and GRPC implementations,
just know that they're going to be very similar to the rest interfaces
abstractly with the different components that are supported.
If you want to see how the GraphQL and GRPC implementations
are done, you can either check out the resources, code and service
engine, or you can check out the two and a half hours YouTube video on
the right here in those cards, mark two and three.
I have basically the minimum
that's required for you to run this application via public docker image and
give it a whirl. So if you're using a database that's supported by Kinex right
now, there are a couple of considerations which I'll hit at
the very end. But generally speaking, if you're running one of the supported
database dialects, you should be able to build a env file
has indicated in card two and
then execute the command line docker command on line three and
the application will be running. I'm pretty verbose with my
logging statements, so you should know right away and it should give you
some links you should be able to click on. Some of
the links that are available are just very general.
There's a general health check route. So now I'm on the bottom right of
this slide marked with the number four, there's a
health check route that lets you know the application is up. Then there is an
open API route that auto generates
the open API three document, which is really useful
for either rendering the open API three documentation in
a web application, kind of like the swagger user interface.
Or you can also use that endpoint to import the
API documentation into API clients like Insomnia
or Postman. The next endpoint
is the proto file. This is the file that's needed to make GRPC
calls. So you would be able to hit that endpoint, download the file
as text and use it in a client
to make the GRPC calls the final link.
This service only gets provisioned if the
node environment passed to the application is not production, but it's
a graphql playground.
Moving on, what I miss?
Yeah, key concepts.
So earlier I said when you run the application,
it starts a rest service on port 80 80 and a graphql
service on port 80 80 and a GRPC service
on 50 51. So here I'm
going to go into how the rest interface works.
The other interfaces operate in similar manners, but you would
have to take a look at them.
What you want to do is you want to get an idea for the general
pattern because the pattern is what gets defined and
that gets hydrated by the results of the database survey.
So at the top here, the key concept service call to
structured query language. That's the exercise, right? You want
to turn an HTTP call, a rest call into a
SQL statement and then you want to execute it. So you want to
go from card one to card two, card one.
Most of that URL I will explain later. The sample
app name that's a configuration variable you pass into the application service
that has meaning that I'll hit on the next slide. But the
final part of that path schema underscore table the
majority of the databases that are supported by Kinex support the
schema object. So you have a database, then a schema, and that schema holds tables,
views and materialized views, SQLite being the outlier there.
SQLite to my knowledge doesn't support schemas. However,
this is the general structure for how you would call a
specific resource inside of your database. Then next
three, the query string arguments that follow occupation,
state n and handle like these are fields on
the object followed by there's dot notation.
Then what follows the dot is the SQL operator
you want to employ. So occupation
has no dot, there's no operator defined. So it's going to
fall back on the default which is equal state in
it's using the operator and the
value that's passed there. So it's occupation,
engineer, state and New Jersey or Pennsylvania. That's how you would read that and
then handle. Handle is supporting is using the like operator
and I'm passing it pseudo
percentage sign.
All of the operators that are supported are marked in table five,
which is on the top right.
You have support for the logical operators, you have support for
not or like null and not null and not
in. You have support for range and then not
range. If you're using postjis and the
database resource has fields that are of geometry types,
those specific fields, you will also get access to three
geo functions, geo operators, geometry operators.
Right now what's supported is bounding box queries and radius where
you put in the latin long of a center point and then the number of
meters that you want to search. And also a custom polygon.
And the value for custom polygon would be a WKt string,
a well known text string.
Looking at that table on number five, I have the field field
operator and it maps to the SQL operator. And then there's two other
columns, multiple supported arguments and number.
So multiple supported arguments is whether or not you
can submit multiple values separated by default by commas.
So in the example there I have state in New
Jersey, Pennsylvania,
New Jersey Pipe, Pennsylvania. The pipe is being specified as
a separator at the bottom of that list. I'm aware
of the typo, I noticed it earlier today for the
first time. But by default, if you weren't to send that separator
as a pipe, the default separator would be a comma. So state n.
And if you look at table five, n supports multiple arguments.
That's what the third column is.
Some of these operators, not only do they support multiple arguments,
they have a defined number of arguments that can be passed,
the first one being range. And you can think about that when you're
building these queries, you're going to search where the range and then you're going to
pass it a lower and an upper right.
If you were to pass it more arguments
than that. Or yeah, if you were to pass it more arguments than that,
you would get a validation error. You would get a 400 level error. That's very
verbose. I'm going to speak more about validation probably in
the next slide. Returning to card one, if you
look at those query string arguments that begin with a pipe. So I'm looking at
page five, limit 30 order by pipe
fields and separator. All of those query string arguments
are they're not quite operators, but if you look to
the bottom right of the slide marked by six, the header is
additional query context. This is additional material that's needed to build a
query, but it's not related to any specific resource.
It's supported across all of the resources. Why did
I use a prefix of a pipe? I wanted to try
and eliminate the possibility
of collisions between naturally named for
resource fields and what I'm getting at here for
pagination ordering in fields.
So pagination is how you would expect when you run the
application. You can set a pagination limit. If you don't, there's a default.
If you don't send the page, it will call page one
ordering the ordering
context. If you look down on table six,
the example there is field a descending
comma, field B comma, field C descending
field b is ascending. You can send colon ASC,
you don't have to, that's the default. It understands that, but that's
how you would order fields. Speaking of fields,
if you were to move up one in table six and look at the pipe
fields, record the third record, these are the fields to return.
Anytime you're reading a record, you don't have to get all the
fields. Select star from table, you can specify
which fields you actually want, and that's real useful. And the last is separator.
Again, I noted the typo today.
I will put in an issue to get that fixed, but what this
does is by default the separator is a comma. If for whatever reason you didn't
want to use a comma, you could dictate what you want the separator
to be, but that will only apply to the actual fields on the
resource. So in the example back
to card one, the last query string argument is pipe separator,
and we're declaring that it be a pipe. And we're using that pipe on the
second argument state in we're separating New Jersey and Pennsylvania with a pipe.
However, that pipe is not being considered later on with
the order by or fields values, those will always be commas.
Table three, table three API rest endpoints. So when you're building
out these rest APIs to do databases operations,
not only do you have to map out what the query string
and the bodies are going to look like, you also have to consider like
what's the URL structure going to be like. So what I
have in table three are the patterns that are used by this application.
So there are five actions that I have listed there. Create, readme,
update, delete and read. That last read is a search.
I distinguish between a single unique record read and a
table search. So let's look at the first four create,
read, update, delete so create. That's HTTP
post. A note for using this application is
you can create one record or many with a call, and that's across
all the interfaces. It functions the same way for GraphQL and GrPC.
You can create one record or many.
A word about validation if you have a database
table and you're trying to create a record on the table,
and there's a field that's required which is dictated by the GDL
and you don't include it, you're going to get a 400 level error with a
verbose message. If there's a field that
is of numeric type and you send it a string that can't
be quickly cast to an integer or numeric type,
you're going to get a 400 level error. That's the
way the validation works. If you call a resource that doesn't exist, you're going to
get a 400 level error and that's the validation that happens
at the source. And it doesn't just happen on the create, it happens for all
of the operations. So for example with read
the next one there, this is reading a unique record. So read,
update and delete. These are operations that happen on a single unique
record.
These resources don't get provisioned if the database resource doesn't have
keys. So if you have a table with primary key,
that table is going to support reading a single record,
updating and deleting a single record. If you have a table without keys,
or if you have a view or materialized view and it's not keyed, you can
search that table, you can create to that table, but you can't update
and delete and read an individual record if
it is keyed. Let's just think the most simple you create
a table has a primary key. The way you would read that is
you would read that record. So now I'm looking at table three
and I'm looking at the second record read get and then
the path there is service. I'm passing in a resource parameter
and the parameter is going to be what's indicated on card one
schema, underscore table table view, materialized view.
Anyway, back to table three. You would make a get request
to serviceresourcerecord
and record is literally record.
That's what the string literal is going to be. And then you're going to pass
as query string arguments the keys.
So if it's a composite
key, you'll pass all the fields and all the values that make up that composite
key. If it's a single key, you'll pass in the field name and the key
and you'll get the single record back. And that's
also how update works and delete works. Now the last
one is the most interesting one, the read the fifth record there,
because at all the places I've been, that's the one that's
used the most. You're searching tables, trying to get
pages result sets back to calling clients.
And that's the example we went
over first. That's the example marked
in card one. A couple of things I wanted to mention more about
functionality and features. So all of those actions create,
read, update, delete and search. They all support the
additional query context for fields.
So you can declare which fields that you want for all of the calls.
Not all of the databases support returning data on mutations.
I actually mentioned that on a later slide. But think about postgres.
Postgres does. And if you update a record or
you create a record, you can return the record back within the same
request. So you can imagine if you posted
form data, if you posted form data from a user,
if you posted it to this endpoint and the table that it was writing
to had a date and timestamp, or maybe an auto incrementing key,
you can actually return that materialized back within the same request
response lifecycle. The last piece
worth noting here for the rest interface are the headers.
So there are three headers that are used in this application?
Yeah, there are three headers. The first one there always
comes back. I call it request id, right? So when
a request comes into the application, it gets issued a uuId.
I implement COA so that UUID gets attached to
the request state space
on the object and then for the
remaining time that that request is in the application. Anytime there's a log
statement that UUID gets injected into the log. That makes it very easy for
troubleshooting and finding out what happened.
The other headers are optional, you can send them if you want.
So the first one is X getsql. If you chose
to send that key and you send it a truthy value, what would
happen is as we went over with card one and card two,
the exercise is to build a rest query and
generate SQL and then execute it. Well,
whatever SQL gets generated you can actually get
back as text in the header. If you were to send x,
get query with the truth, you value it, actually send you back the SQL string
that was generated for you to inspect.
The last one is get count. It's x get count. And again,
that's optional. If you were to send that and you were to send it a
truthy value, you would get the unpage generated total
back. So the way that would work is going back
to card one there. If we're doing a search based on
this record, maybe we're looking at users or candidates,
maybe it's an applicant tracking system. So I'm going to search for this table,
I'm going to search for occupation as engineer. And the person is in
New Jersey or Pennsylvania and they have a handle of pseudo.
I'm getting page five back and 30 records per page.
But I also need to know the unpaginated
total so that I can render the pagination correctly.
So the way I can get that is I can just send
that header at the very bottom of the slide execute count. If I was to
include that, I would get the unpagenated total count.
So the way I have built this to work is that if you're going
to execute these searches,
when you reach out to get the first page, you would include that header.
So you get the unpaged account. But then for consecutive page lookups
you wouldn't include that. That saves a database lookup.
That's what that saves. Okay, cool. Moving on.
Systems design. The pattern is to normalize a request that comes in,
validate it, and build SQL. So normalize.
Indicated at the top are three buckets. One's red, blue and green.
That's rest, Graphql and GrPC. We just went over in
detail how the rest interface works. The others,
as I said earlier, they work in similar ways.
When they come in, they look different to the application. So the first thing that
happens is that those
unique calls get normalized. So on
the right side of this slide you'll see three cards. Now, one, three and six.
One and six are the same as the two cards from the previous slide.
One is one, that's the HTTP rest get
call. And six here is two on the other. And that's the SQL that gets
generated. What's new on this slide is the addition of
card three. Card three is a standardized
object. I call it a query object.
A user queried the service. This is the payload, this is the information
that they sent. And here it's been standardized. So here's the way it works.
Working down the stack, a rest call comes in through
the red bucket looking much like card one. It goes through the
service call normalizer, which is that layer that's marked with the number two.
If the resource being called exists, it'll be
normalized. If it doesn't exist, it will already respond
back to the user with the 400 low layer. If it does exist,
it'll be normalized. That normalized object
documented by cord three gets validated. This is where
the happy validators come in. They check all the fields that are there.
They check the operators that are being used, as you can see, state in
and handle. Like they're checking to make sure that the operator is supported.
Some of the operations are supported by specific data types,
so think. But the GIS operators.
So if you use a GIS operator here, it's going to check
to see whether the field is of geometry type or not. It's also
going to check to see if the database is postgis.
So that level of validation is happening at the resource level.
At the field level, it's also happening within
the context. So when you're asking for fields back and ordering,
it's going to make sure that the fields that you're ordering by exist on the
object. It's going to make sure that the fields you're asking for exist on the
object. All of that validation is happening at this layer. If it is invalid,
a verbose message is sent back in the
body of the response and you get a 400 level error.
If that request is valid, it continues down this chain to the query builder.
That normalized object in card three gets passed to a function and
it builds the SQL query. That's what happens. And then the
query gets executed against the database. That's actually optional.
I'll mention that at the end there is a debug mode that allows
you to make a call that runs the entire length of
this chain and stops just short of execution.
Documentation, feature,
videos and requirements so we talked about
a lot and we've only done so in a few minutes. I want to take
another moment to mention that there's two and a half
hours of YouTube video that's been up that goes over every feature
and every function. All the features in the application, some of
them, most of the features are explained
in rest. And then there are custom videos
for GraphQL and GRPC that show you how
to implement those same features in the other interfaces. If you're new to
service engine, I would certainly recommend that you run through all
of the features from the rest point of view and then take
a look at the other interfaces running
down that list to make sure that I explain all the features. The quickstart that's
basically what went over in the second slide, how to run the application in a
docker container. The key rest points, I showed you
a subset of those earlier. There are additional key rest endpoints,
specifically like the results of the database survey. That's a
key rest point. There's a video on importing an open
API three document into insomnia. For anyone who hasn't done
that, there's a video that shows you how to do that.
The next feature videos on YouTube are crud operations.
So it's the create, read, update and delete and showing
you how to call the different URLs and how the validation works and the headers.
The next one is SQL operators. So I haven't looked at that
in some time, but I'm thinking it's probably a table search and showing
you how to use all the various operators that are there.
API response metadata. That video goes over
the three different headers that are supported in the application
that we went over. Debug mode. This is
where moving up a couple of slides here.
If you look at, we can look at card one or card
three. Card three. So at the very end we're doing a
read. It's a get request. This is a table search to serviceresource.
If you were to change the literal service,
if you were to actually change that to debug instead so you
were calling debugresource, the application would function
in all the ways that I described. It would just stop short of
calling the database. So the query would get built and then you would
get a verbose response back explaining what happened.
This could be useful in testing. This could be useful for
a number of other things. Like if
you were going to use service engine to build sophisticated complex queries,
maybe you could use it for that but not execute them. It could be
useful there too. What's next?
Permissions. The databases survey pulls internal
tables and lists out all of your tables you use in Michelle's views.
When it starts up, it provides by default crud access
to all of the objects. There is a mechanism
for passing in a configuration that allows you to set resource
level controls over what create, readMe, update and delete
functionality you want to support on any of the resources.
And that video shows you how to do that. So imagine if you wanted to
set an entire all the objects in
the service to read only. But then there were a
couple of tables where you wanted to set as create or read. You could do
that using the permissions database schema migrations.
As I said earlier, I implement the
database schema migrations that are implemented in
connects. I did build a very small custom
module that works to only
push the SQL files into their own directory. When I'm
looking at SQL files, I like to see them as SQL as opposed to having
SQL strings inside of JavaScript files.
But if you watch that video, you'll see the custom work that I put in.
It's very small, but I like it.
GraphQL playground and geo subqueries. And then the next one is
GRPC service and geo subqueries. Those two videos cover
the other interfaces and they go over all of those features
from above, complex resources,
subqueries and aggregate queries. This application,
it's powered by a database survey, so the table view
and materialized view being queried has to exist. Now, a sub
query and an aggregate query doesn't exist as a resource in the database,
but it is built on objects that do exist.
So this application does support
those kind of subqueries. Right now you have to pass in
a configuration. Basically you have to pre build how
the aggregation is going to work, and that video shows you how to do that
in detail. The last feature video there is
middleware and operations redactions. I'll hit first because
it's easier. Think maybe you want to support searching
on a field, but you want to prevent it from being exposed
to the user. Or maybe you just want to prevent it from being exposed to
the user, not even facilitate it for search. So think.
But like maybe you have implemented
ids and keys and you hold those, has different fields, and maybe you want
to publish the id but withhold the key.
Maybe you're thinking about changing keys, maybe it's a partition key
and you want to maintain the ability to
swap those keys or to re optimize your databases without
being concerned that people who have had access to that record in the past are
going to try and query off of that material.
Middleware. Middleware is what I mentioned on the first slide, which is
the ability to intercept a query before it gets executed. So if
we go back up one, if we're running down this stack and a call comes
in and it gets normalized and validated between
validation, but before the query gets built, the middleware
is a function that will receive the object defined in card
three and will be able to modify it. So the example,
the example that I can think of is partition keys. Imagine if you have users
by state, and maybe you have them segmented
into, I don't know, NCAA,
like the SEC. What are those conferences?
Right? So maybe that's it. Maybe you're building an application for
NCAA and you have users and you want to be like okay,
this is the conference. So what you would do is you would take the state
that they're in, or maybe the location
from their IP address, you would drive,
you would be able to do that lookup inside of the middleware and append
the conference as a key to the object
defined in line three using middleware. So whoever's
calling your API would only be calling with something very normal.
What state am I in? They would have no idea that you're doing a conference
lookup in the back and you're pinning that to the query.
Application considerations as I said in the beginning, if you're
using a databases supported by Kinex, generally this
should work. A couple of things to consider.
A couple of things to consider. The first is GraphQL schema
definition language. So the names of your databases,
schemas, tables, views, materialized views, and your fields,
the names matter. This application executes
a database survey and it provisions everything upon that. And one
of the things it provisions are the Graphql schema
object and the GraphQL resolvers. But the schema
definition language supports only a very narrow set of fields.
I believe off the top of my head it is alphanumeric and underscores.
That might not be all of it, but I think that's it. So if you
attempt to run the application and your databases uses different names or
white spaces or things like that, it might not work, but you should get a
message right away indicating what the issue is.
The next thing is database permissions when
the application runs. By default, it runs migrations when
you start the application. Let's go back a couple of slides. We're looking
at the startup sequence. When you run the application by default
on startup it will attempt to run database schema migrations.
So it will check a directory on the disk in the containerized and
it will attempt to run the schema migrations that it finds. Now when
you're, but by default it's not going to find any. But that doesn't mean it's
not going to run the migrations. And part of running the migrations
for the first time, or every time is checking to see if a
database schema migration table exists in the store.
And if it doesn't exist it will attempt to build it. And I'm only going
over that in such detail because if you are
running it, imagine if your database services
user has access to create, read, update and delete
to and from objects, but you don't have the ability to create new objects,
you're probably going to have a problem. So what you can do
is there are methods for running the application and disabling databases
schema migrations at startup and that's what I would recommend.
And there's a video on that to the left that you can go find.
Another reason why you might want to disable database schema migrations when
the application starts is you might want to push your
ddls into version control and set up a
CI CD job that when merges to different
branches occur, the CI CD job pulls down
the changes and executes the migration
only one time, as opposed to deploying a batch of
twelve or 20 different containers in a cluster and
having every single one of them start up and try to execute the same migrations.
I don't think that would be a problem. It's just overhead. You don't have to
deal with the last is returning fields. I did mention
this earlier, not all of the database dialects support returning
fields on mutations. So what I have listed here at the end
is MySQL and SQL Lite. They don't. So if you were
thinking back to our rest example, if you were to create a record or
update a record, if you were using those databases,
you're going to get a 201 with a no body, a 201 status
code, and you're going to get a nobody. But if you're using postgres or one
of the other databases that do support returning fields,
you're actually going to get the fields that you asked for back. That's pretty cool.
Closing notes last SQLite starting
with most obvious. That's me. Hello, my name is Joe Wingard.
There's my contact information, my GitHub, Keybase and LinkedIn.
So if you have any notes about this talk, send me a note. I'd love
to hear it. So project feedback and closing thoughts.
We'll come back to that. First, I want to talk about the imagery here.
So on the left and the right, on the left I have a stack of
five different databases. I have them of
several of the different database dialects supported by connects. They don't
have to be that, they can just be different databases. But here I have
postgres and MySQL, SQLite, SQL server and
Oracle. And each one of those databases is,
there is an instance or many instances,
right? If you're scaling instances of service engine
connecting to that databases, publishing, rest, GraphQL and GRPC
services. On the right side of this slide I have
diagrammed a reverse proxy or an ingress
or yeah, it's a proxy that
enables you to basically virtualize your data access layer.
So at API domain.com that subdomain,
you would be able to make any of the calls using any of the interfaces
to any of the resources across any of the database engines.
And whoever's calling your other engineers who are calling your
data access layer would not have to really be concerned with what the
database. They wouldn't even know what the database is,
possibly. And this is
how I would recommend deploying your data access layer.
There's many reasons why I'm happy to get into if you send me a note,
we can have discussion about it, but you get a lot of wins by deploying
a system in this manner. Project feedback feedback
and recommendations are best received with pull requests and GitHub issues.
The feedback I care about hearing the most is from people who
like to roll up their sleeves and dive in. So if you're one of those
people and you have some feedback, open up a pull request or
a GitHub issue and let me know what we can do to make this better.
And the closing thought I hope this project is useful to
you and your team. If you find it valuable,
it helps you in your world. Please send me a note and
that's it. So this is service engine. I just gave you an overview
of. This is an NPM package. It's my first NPM package,
NPM services engine, and it's available as a
docker container and there's plenty of documentation. I think you can
use it right now, today. So that's all.
Thanks a lot for your time, and cheers.