Transcript
This transcript was autogenerated. To make changes, submit a PR.
You. Hello. My name
is Yehonathan Sharvit and I'm really glad to be here
at Conf 42 for my talk about data oriented
programming in Java. The purpose of this talk
is to give you a couple of insights that hopefully
are going to help you to liberate yourself at least
a bit from the complexity of of objects.
A couple of words about myself. I have been a
developer since 2001, first in C,
C, then in Java, and also
JavaScript, Ruby and closure,
and I'm the author of a book named
data oriented programming and in this talk
I'm going to share a couple of insights from the book
and how to apply the principles of data oriented
programming in Java. If you find my talk
interesting, you might want to purchase the book and I'll give you a
coupon for a discount at the end of the talk.
And you can follow me either on Twitter or on my
blog at blog clips tech so what
is dataoriented programming? Programming is
a programming paradigm aimed at reducing
the system complexity by treating data as
a first class citizen. What do we mean by complexity?
If you look for complexity in the dictionary
or in Wikipedia, you will find first the definition
of computational complexity,
which is the amount of resources, machine resources
like cpu or memory that are required
to run a program. But there is another meaning of complexity,
which is the system complexity. And the system complexity is the
amount of human brain resources required to
understand a system. So computational complexity is the
time it takes to run a program, and system complexity is the
time it takes to understand a program.
And data. Data dataoriented programming
the system complexity. In other words, when a system
is written according to data oriented programming principles,
the system is easier to understand,
to maintain, or to add new features.
So let's ask ourselves, ourselves, what usually
makes a system complex? In my book, and in this talk,
we are going to take a classic example. Imagine you
need to design and implement a
library management system. Disciplined object
dataoriented developer so the first thing
you do is to think about the design, the classes,
the objects, and the relationships between
the classes of your system. And you
might come up with a design like the
one on the screen where the entities, the main
classes are library. And the library has a catalog
and user management. And in the catalog
we have books and authors and book
as book items. And in the user side
we have different kinds of users. We have librarians
that can add books to the library, and we have members
that can borrow books from the library, and members have
book lendings, and book lendings belong to book items.
And you will probably come up with a design similar to the one
that is on the screen right now. If you are an experienced
Java developer, you are going probably to use a
couple of smart design patterns that might make
the design a bit simpler,
smarter, whatever. But my point here is
that the system here is complex in the sense that
it's hard to understand. And if you take
further look at this UML diagram, you might find
that the sources of the complexity
of the system is that we have nodes in the system
in the graph with many edges. Look at the library and class,
it is connected to 123456
classes. It's a big number, six in terms of
relationships between nodes. Another thing that makes the system complex
is that we have many kinds of arrows
of relationships between classes. We have association
like for example between book
and author. We have composition between catalog and book, we have inheritance
between librarian and user, and also between member and user.
And we have usage relationship between, let's say librarian
and book item. So it's a burden on
our mind. And this is what I mean by a complex
system. It takes time and energy and
efforts to understand a classic object
oriented system. So the first thing that dataoriented
programming is to separate between code and data.
Usually in object oriented programming and in Java,
we tend to encapsulate data inside classes
and to mix together data and behavior inside
classes that provides methods that manipulate
or modify the state of the object. And look at what
happens if we simply split each class
of our system into two classes, where one class is
responsible for the code, the behavior, and the other class is responsible
for the data. For example, we take the library class that mixes
data and code together, and we split it to a library code class
and a library data class. And the same we take the catalog class
and we split it between catalog code and catalog data, and so
forth and so forth. And what happens in terms of system
complexity is that instead of one systems
with many relationships between the code, we get two disjoint
systems with much simpler relationships
between the nodes or the classes of the system.
And this is really a great benefits
for our mind. It makes the system much easier
to understand, to resign about, and to maintain.
And the reason is that we have separation of concern. We have code
on one hand and data on the other hand, and also we have constraints
on the code diagram on the left. All the method in our
classes on the left are going to be stateless, and we're
going to see in a moment. And the relationship between code
classes is only usage relationships.
And the same on the data diagram we have another set
of constraints, which is that the Orna relationships between
data classes are either association or composition.
So putting constraints on our diagram tend
to make the overall system less complex,
easier to understand. So instead of the
big mess or the complex system that we
add on the left where code is mixed with data, we get a
simpler system made of two disjoint systems.
And this is huge benefit for our brain. Let's see now
practically how we do that in Java,
how we separate between code and data in
Java, actually it's quite simple. We put data on classes
that have only members, of course, getters and
setters, for example. And also data will have a first name
and a last name. That's it. No methods beyond
setters and getters. And for the code we have classes
like author code with only static methods,
no state, no data. The data that is to
be processed by the method is passed as an explicit
argument to the method. So for example, if we have
a data object representing Isaac Asimov
and we want to calculate the full name of
this author, instead of what you are probably used to Asimov
full name, we call author codeful name, which is a static
method, and we pass to it as an argument.
The object with the data that we want to process and it
returns a stream is like Asimov. So that's how
we separate between code and data. In Java we
have data classes with member only and code classes with
tactic methods only. So that's the first benefit that
we gain from data oriented programming. It makes the system
easier to understand. Now we are going to move forward and
see how we can make the code easier to understand.
And for that we are going to ask ourselves what usually makes
code hard to understand? The first thing that makes code hard
to understand in Java is that when
we pass an object or an argument
to a method, we have to ask ourselves whether
the object is passed by reference or by value.
And it's difficult to answer clearly to
this question. And usually the answer that we get in Java
tutorials is that in Java object references
are passed by value, which is really confusing.
Object references are passed by value.
And to show you an example of
this complexity, let's take again our example
with Isaac Asimov as a data object and
see what kind of complexity we have usually
in object dataoriented programming. Let's say that we have
a method in our author code class, a static method that
transform the last name of an author
into uppercase. So here how we call this method,
we have Asimov, and the method returns another
data object where Asimov is uppercase.
So the last name of Asimov number two is Asimov uppercase.
Now the question is what happened to the first Asimov? Did the method
mutated the data object or not? And by looking at
the code you cannot really know. It depends on
the implementation of this static method to upper
last name. If the implementation mutates the object,
the object that it receives, the response, the last
name of the first Asimov is going to be uppercase. And if it's not the
case it's going to be lowercase as it was passed.
And the reason for this confusion is that when we pass an
object to a method, we pass a reference to the object and the method
now has access to the object. And if the method
called the setters of the object that we passed,
then it's going to mutate our object. And the way we
usually protect ourselves, or one way to protect ourselves is to
copy the object before passing it to the method. We call it defensive
copy. And this is one thing that makes the code hard to understand or
to write. Every time we call a method we need to ask ourselves
is the method going to change my data or not? And it's
another cause of complexity. Another thing is in
a multistreaded system, in a multithreaded Java program,
we need to be careful when we pass object
references to method. And let's take a look at this simple
example. Let's say we have a member data and the member could be either
blocked or not blocked. And when a member is blocked,
the member shouldn't be allowed to borrow books anymore.
So a naive implementation of the borrow function in the member code could
be let's check if the member is blocked here by calling
the is blocked method of the data object of
the data member object. And if the member is not blocked then we are
going to allow the member to borrow the book here to
print to the console. The book is yours. Can you see why this code is
problematic? Can you see why this code is not
treats safe? And the reason is that between
the line that checks if the member is blocked and the line
that does the book borrowing, there could be a context switch
and in another thread the member could become blocked.
And that's definitely a source of complexity.
And how do we protect from that? By adding locking mechanism.
And when we add lock mechanisms to our code,
it definitely makes the code hard to understand and
we might get into deadlocks. And we need to think carefully how to
leverage the lock mechanism so that we make sure we don't have any deadlock
and also lock mechanisms also have a negative impact
on the performance. So for that dataoriented programming,
very simple solution, do not mutate data. If you
treat the data as a value, it will never change. And if
data is not going to change, we won't have any problem.
When we pass data to a method, no matter if
we are in a single threaded environment or in a multi threaded
environment, we have the guarantee that the
data is not going to change. And that's a huge
benefit in terms of complexity, simplicity.
It makes the code, the code much,
much easier to understand. You can look around and you
will find many great articles that explain what are
the benefits of immutable data in Java. And the
more important ones are that when you deal with immutable data,
you are inherently threat safe and you have no side effects.
Now the question is how to represent
immutable data in Java. And we have at least
two options here. And as you probably have noticed,
any problem in Java could be solved with Java annotations.
And this is how the project Lumbook proposes
to represent immutable data, simply by adding a
value annotation to a class. And when we add
value annotation to a class, what we get from Project Lamb book is auto
generation of public constructor, immutable private fields,
getters, setters, two string hash code and equal.
And we are guaranteed that the member fields
are not going to change because they are marked as immutable
by the code that is auto generated.
Another option that came up recently in Java
and actually is available only in Java 14. So it might
take a couple of months or years until it's
adopted in production. But I think that's an interesting one,
is that since Java 14 you have data classes
or data record with native implementation
in the JVM. And I think that's great
because you don't need to rely on third party libraries and
auto generation of code. You have a native implementation of,
again, constructor, immutable private field, getter setters,
two string hash code and equal, and the guarantee that the data cannot
change. And if you apply this
second principles from data oriented programming about dealing
with immutable data only, the benefits that you gain is
that no mutations, no surprises, no need to
defense copy against possible mutation
or possible and
invalid state of your data. And the code is inherently threat
safe, no race conditions, you don't need locks,
any lock mechanism, and the code is definitely easier
to understand, to maintain, to resell about,
and it makes our systems simpler.
So before we wrap up this presentation,
let me mention other topics that I'm addressing in the
book, and that makes the
systems that we build in Java even simpler.
In the book you will learn how to leverage efficient
immutable collections, or sometimes we call them efficient persistent
collections, so that even when you have a huge collection of
data, you can create new
version of it without having to deeply copy all
the data before you create a new version of it
with a slight modification, you will learn how to represent more and
more data using maps,
and it will teach you how to manipulate
data with general purpose functions like map, filter,
reduce, group by merge, et cetera,
et cetera. You will learn how to achieve polymorphism without
inheritance, without the big class hierarchy.
There are other ways to achieve polymorphism. You will learn also in the
book how to manage the application state when you represent the whole state
of the system as immutable data, and to get highly scalable concurrent
systems with optimistic locking instead of locking
mechanism like mutexes. You will learn also how
to get a flexible access to your database to give you
a lot of freedom and flexibility when retrieving and
manipulating data that you fetch from the database
and that you want to send over the wire, let's say
using JSON serialization. So that's the book.
And let me leave you with this diagram
with this mind map that summarizes the main principles of
data dataoriented programming separate between code
and data. The code is written with static methods only.
It's on the green here. Never or
avoid as most as you can instant methods and
the data is represented with immutable data,
either with records that are available in Java since
Java 14, or with third party libraries
like Project Lombok that provides smart Java
annotation like valued that generates all the code that
is necessary to make sure that your data classes are immutable.
So I hope that I motivated you to
take a deeper look at data oriented programming and how you can
apply it in Java to make and I'm quite sure that it
will make your system less complex.
And now come the question, what are you going to do with
all the free brain cells that are going to be available when
you move from classic object dataoriented programming
dataoriented programming please take a look at the book. You can scan
the QR code to be redirected to the book@manning.com
and you can enjoy a 50% discount
with the coupon that appears here on
the screen. If you are listening from a podcast,
the coupon is ML Sharvit two,
Mlsharvit two,
and if you Google data in the programming. You will get a link to
the web page of the book. It was a really pleasure to be
here at Conf 42. Thank you for having me. Enjoy the
insight coming from data oriented programming and apply
them with fun to your Java programs.