Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, let's talk about Unix shell.
More specifically about what we could do better now,
and even more specifically about what we could
do better since the we didn't.
I'm Ilya Sher. I'm a longtime bash user and
I'm programming and I'm doing DevOps.
But at 2013 I was fed up enough with this subpar user
experience when using Bash. So I started working on
my own shell. What's wrong with the shell? That would be the first question.
CTO a person that started working on his own shell.
Well, shell is basically two things. That's a
programming language and a user interface. Both of
them are not very good, I think. And the
difference is that the first problem, that the programming language is not
very good. I think this problem is understood.
And I'm saying that judging based
on other projects, they are actively working on
fixing the programming language. There are several projects,
they all agree with this problem of the
programming language. They are all working on solving that.
So that's why I assume this problem is widely understood.
Basically the problem is syntax. Arcane syntax comes
from too long ago. Error handling is
afterthought, not very good, and lack of structured data,
of course. So the big
other issue is the user interface.
And that's what I'm focusing on today. This talk is about user interface.
I just mentioned the programming language to leave whats aside.
And the user interface is basically the same as
a telegraph. That means that in this paradigm you
send text and you receive text. That's how you communication with
the other end. The fact that the communication
style of the shell and the telegraph, they are exactly the same
is not a coincidence. It's a historical development.
So let's overview how we got from telegraph
to the shell today. Okay, so we
had telegraph. Then somebody figured out,
okay, that's not convenient. Let's replace this button with something
more practical. They did keyboard and
printer. So the device, which is called teleprinter,
is basically a keyboard and a printer. To communication,
you need to have two of these devices. They are cross
connected, which means whatever you type on your
end is coming out of the printer
of the remote end, and whatever they type on their keyboard
comes out on our end out of the printer.
Then computers came and
they were using punched cards. It was not very convenient.
Somebody figured out, okay, we have teleprinter.
Let's connect the teleprinter to the computer.
And they did. And it worked.
Then another incremental improvement video
display unit. It looks like computer terminal,
which we will see in a moment in the next slide.
But it was exact replacement of
paper. So if you had new text, it was added at the
bottom. And all the other lines of the text were scrolled
up a bit. And I could like to
highlight whats all of these devices had.
No conceptual breakthrough. They were better technologies,
of course, but these were incremental improvements.
Nobody said, okay, hold a moment, let's rethink
the whole thing. This has got happened.
When did we have technological breakthrough with
these guys? This guy, VT 52,
which was released in 74 or maybe 75,
unclear, supported cursor movement.
That means that you can go with the cursor to any
location on the screen and overwrite the text that's there or
clear it, which is more specific use case.
And the reaction to that was as
follows. Billjoy invented a text editor
which was using this capability and basically
brought the text editing to computers as we know it today.
Which means the text is occupying the whole screen.
And you got with your cursor to the point that you want to edit,
and you edit the text there and it's replaced
at that point. That is as opposed
to previous text editors, which like the shell today
had a command line interface. And you were typing comments
such as add text, replace text, delete text. These are
all comments that you were typing. And you could not edit
the text at any point on the screen.
You just had these comments how Unix shell reacted
to this new capability? It didn't pretty
much until this day. So we have the
situation in the shell until this day
that most of the screen is not actually interactive. It's treated
like paper. So the text which is on the screen above the command line
is not anything to the shell. The shell
doesn't know about that. Shell cannot interact with that.
And the only interactions that you have in the interactive
shell is actually on one line. Sometimes you have
completion. So it's like few lines, but basically it's one line.
And I could like to fix that. I think that the screen
can be interactive and we should catch up with this
capability from the 75 and
make this wall part interactive.
How that would look like?
Well, the screen will have textual
representations of objects, somewhat like
links on the web. The shell would
trace the link between the text on the screen
and the objects. And the objects will
have description like, okay,
we are of type that our unique id is that to display
on the screen, we need to look like that. So we have this example.
We have a file on the screen and a CI CD pipeline.
In our case, AWS code pipeline. I'm not affiliated
how the interaction would look like,
let's say we want to interact with the code pipeline,
since everything is semantic. When you start interaction
with a search object on the screen, the shell can ask all the plugins
that it has. Which one of you guys is
handling objects of type code pipeline,
by the way, it can be more than one. So when we create a menu
for that object, the items in
the menu come from different plugins or maybe one plugin.
Also, these plugins can maybe provide the default
action. So what would happen if you left click on the
object, or if you navigate with the
cursor and press enter? So this interaction
that we have seen on the previous slide, it should be recorded because
the, the problem with the web interface, which this
interface moves into direction of the
problem with the web interface is that you don't have a
record of what you did, and that's very bad.
No one wants to accept that for serious work.
So if you did interact with something on the screen,
this interaction should be recorded, not only recorded, but also immediately
displayed to the user. And this
recording should be on the highest semantic possible label.
What do I mean by that? If you had several pipelines listed on that
and we started to interact with the one that failed,
the user interaction will be recorded as you are
interacting now with a pipeline that has a status failed.
Why is that important? Because next time, let's say tomorrow,
you come to see these pipelines and another pipeline failed, and the
flow that you were recording was actually looking at the failed one.
So if tomorrow another pipeline fails,
you will be looking at the failed pipeline. When you replay, you will not be
looking at the same exact pipeline as you were looking today.
Another example of why semantic
recording and not literal recording,
let's say you have instance with id one, two, three,
and you are interacting with that instance. This id one
CTO three is meaningless completely to the user. You're interacting
with that instance, not because it whats particular id, but because
it has some interesting property. For example, it has
a name tag of something or some other tag with some
particular value, or maybe it is residing
in the VPC of an interest, or maybe
it has a security group or some other different things,
or some combination of this. And you need to record
this interaction semantically. So tomorrow, when you have some
slightly different situation, you will not have
instance one, two, three, because it will be long gone. You will
have some other id for that instance, and you want to interact with that
instance, not the one that has id one,
two, three. And to do all of
that semantic understanding and semantic work
with objects. What do we need to do? We need to
understand the output and the typical concern
or let's say objection or argument against that
is, first of all, whats shell is got supposed to get into semantics and
it's too much work. So I want to refute
these two arguments immediately by
looking at what we already have. Okay,
let's look at exit code of a process. The shell
has to understand that in order to do even the basic error
handling. And the shell was understanding exit
codes for a long, long time. At some point
later in time, somebody added command line completion.
This feature is very valued and it's very powerful and
it's very practical and everybody uses it.
And guess what?
This needs semantic understanding of the programs
that we are running. And it was quite a
bit of work because we needed a kind of plugin for
each of these programs. And it's done.
We are looking at pretty much symmetrical
feature, which will be roughly
the same amount of work, or at least on the same order
of magnitude of work. So that's why I think it's
possible and it should be done. I would like to summarize what's
important in the UI. What should
be in the UI, first of all, is semantic understanding, and the more
your program understands the data that it works with,
the more powerful this program can be.
If we compare, for example notepad in Windows
and the jetbrains IDE, not affiliated jnbrains IDE,
you can do way more with jetbrains IDE.
You can edit programs in both of them, right? But the IDE
understands way more of the semantics
of the text that we are working with. Also, if you take
for example middle ground, like Vi, it's not a complete id.
Well, it could be, but if configured properly. But let's say it's not
a complete id. It has for example syntax highlighting, right?
So it understands somewhat, right? And we have
language plugins, they understand more. So all the
power comes from semantic understanding of
the data that you are working with. Also,
semantics, well, we have exit codes, we have command
line arguments. I think it's just like logical
continuation to get more semantics into the shell and understand
the output. Second big thing
that should be in a shell, in the UI of the shell is
capturing. You have to capture the interactions and
you have to capture as much as you can, and you
have to capture at the highest possible level of semantic
understanding of the interaction. That's how your
record replay facility could be powerful and applicable
to other situations. That's it. Thank you,
bye.