Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey, hello, and welcome, everyone.
Thank you for joining me and tuning out to this talk on harmonizing code and Melanie.
I'm actually quite thrilled because we're going to do something quite
special today, we're going to create our own new songs, new electronica
songs, in just a matter of speaking.
And for this, we'll be using music agent, which is actually a blend
of multiple technologies, which will enable us in creating those.
But hey, before doing that, let me first introduce myself.
I am Jan van Wassenhoven.
I am the lead architect at Sopra Stereo Benelux for the business
line design and development.
Also, quite a fun fact, I am the creator of the Scrum Programming Language.
And by Scrum Programming Language, I don't mean the Scrum methodology.
Of course, it's inspired by this methodology, but it's
the actual Scrum methodology.
programming language where you can type in, you can write your own scrum code.
And by doing so, you can actually call yourself a scrum master programmer.
don't hesitate in checking out that code and compiling
your own scrum code of course.
As we're going to talk about music, of course, you can find me on SoundCloud
as well as on Spotify as Mighty John, I got my own Instagram channel as well as
my blog where you can find any updates and any news on music agents as well.
Because, hey, we're going to talk about creating some new songs, some
new music, electronical music in this case, whereas I myself, I was always
a big fan of listening to music.
I was wanted to create my own music, but yeah, I'm lacking those capabilities.
I got my wife playing the org, the piano.
She knows how to sing.
I got my son playing the drums, the guitar, but yeah, from my end, I
did never get, got that worked out.
But there's something else I'm quite good at.
I know how to code.
I know how to develop and integrate multiple technologies all together.
And that's where the idea of MusicAgent came up.
Quick introduction to MusicAgent.
It's a homegrown project where the main coding is done in Python.
It's open source and it's fully available on HitUp.
So you can check it out and you can start creating your
own songs on your own, at home.
It's mostly based on using LLMs and image model generation.
So we'll be using OpenAI or Anthropic APIs and we'll be using BlankChain
for integration with those APIs.
So what does it actually do?
it's capable of publishing a complete new song.
And as an end result, you'll be having your Sonic Pi song code.
Whether or not you choose or not, you can also have a recording.
It's capable of integrating your own audio samples within your music.
It will generate an album cover.
If you want to, of course, and it will even generate a full
booklet with the lyrics and some documentation on your song, how the
song was composed and generated.
But first, while we talk about creating a song, we have to take
some, some measurements into account because creating a song is not about
just, Starting, starting to sing or, combining some instruments, no.
There are a lot of different phases involved in the
process of creating a song.
Whereas, on one hand, we have the songwriting.
We need to come up with an initial melody and create some new lyrics.
We have to structure the song.
We have to play some instruments.
We have to arrange the instruments throughout the, full play of the song.
And in the end, we need to capture that performance.
We need to make a real recording of our new song.
There's some polishing up to do, so some mixing, some remastering,
where we have some, maybe some silent pauses that we want to get rid of.
Maybe we want to add an intro or an outro with a fade in or a fade out.
So there's also that final polishing coming up.
And in the end we need to generate the final product.
So that also means even including an album cover and some technical
details, some technical information about the creation of the song.
So how do we do that with the music agent?
quite simply, we just pray to the AI God.
No, not fully like that.
It's a bit more complex.
But still, first things first, a music agent is actually based
on a MAS, a multi agent system.
And what does that actually mean?
So a MAS, a M A S, it's actually a system combining multiple autonomous
agents, where each of your agent is focusing, has an individual
focus and an individual mission.
So coming from a complex Problem, which is in this case creation of a song, we'll
be defining it in different parts where every agent can have its own individual
focus and its own mission as part of the creation of the goal, where in the
end they will work and interact with each other to achieve a common goal.
In the context of music agent, of course, this means we'll, splitting
up the different component tasks like the creation of the, setting
up the melody, defining the harmony.
Creating a ribbon, choosing the samples within the song where each and every
agent will have this different task.
But collaborating within those tasks, they will come up with a creative and cohesive
composition, a new song, a new electronica song at the end of the process.
So by doing that, we had to overcome some, different challenges, like we
had some technical hurdles, like.
the song code is being created, being generated in Sonic Pi, and
Sonic Pi itself has some limitations.
So we had to counter those.
It's quite limited in terms of synths, in terms of samples that can be used.
So we had to, find some, workarounds.
Also, the LLM, the usage of the LLM is quite, restricted, quite limited and
we need to set it in the right context.
But it's also all about promoting the right answers and how do we deal
with, token limitation because by the end, by promoting, multiple answers,
we'll come up with a broad context, a broad amount of, prompts that
we are being sent to those agents.
And of course, we have some limitations.
We have some token and character limitations while calling the APIs
for which we had to deal with.
Then again, while talking about multiple phase, we need to change the agents.
So how can we set the correct conversation context?
How do we make sure that the correct outcome is being
processed to the next agent.
So at the end that we come up with a full song.
And then finally, there's also the choice of model of AI model for the
different AIs, because depending on what they are doing, depending on
the need, It's quite some difference in between and we encountered
that while starting with GPT 3.
5 at the beginning and now using lots of 4.
0 for MINI or the Cloud 3.
5 SONNET.
For instance, there's a lot of difference in, in the outcome where one is better
in image generation, another one is already better in coding, for instance.
So a lot of, testing and, involved it, but it's also one of the
challenges that we need to overcome.
Let's take a look at the flow of Musicaging, how it actually works.
from, a user perspective, we will interact with music agents ba, with a CLI
or with a, a gui with a web interface.
So you get the choice to do, you can rather choose the CLI
or you can, do it by, gui.
That's your choice, to do But then when doing so, we provide some initial input on
how we want, want the song to look like.
Could be, a sentence, could be a full description of your song.
But that's where the first phase, the first agents, come into practice.
So we start with a design phase.
Design phase is all about coming up with a concept, describing the lyrics, setting
the arrangements, defining on the amount of verses, chorus, whether or not we use
a bridge, do we implement a solo, do we use an outro, but also which instruments
will play the verse, which instruments will be used during the chorus.
And, might, also interesting, while we'll be creating electronic music, we
want to include some, the possibility, the availability of samples.
So we can have our own, mashup of samples that can be used, but depending
on the initial input we gave, the concept we came up, we are gonna choose
within the, the listing of samples, which one is the most suitable in
usage, while creating that new song.
wrapping that all up, that's our design phase, that's, once we have
this phase, accomplished, we can actually start coding the song.
let's go.
We're getting to the creation phase.
So within the creation phase, our agent, one of our agents will come
up with a first proposal of a song in Sonic Pie code, and then the process
start iterating all over again.
This, this code, so we'll be starting to reviewed the code.
The code will be reviewed by other agents, and, being, adapted based on those, Review
input, the code will be adapted by our initial agent, our Sonic Pi coder again.
After some iterations, we even have the possibility also to
include human interaction from us ourselves as song creators.
So we can actually listen to the song, and by doing that, we can provide,
additional input, in order to create and improve the song, in the end.
Then, finally, we come in a mastering phase where we're actually starting
to check on, is everything, sounding quite properly, are there any,
fade in, fade outs needed, do we have some silent pauses that need
to be excluded, so we're actually starting the polishing of the song.
Where in the end we can come to the publishing phase and publishing phase
means actual, creating the recording, the album cover, as well as the full booklet
of the song, meaning the full technical details of the, with the lyrics, the,
the initial input that was used, but also the concept that we came up with,
that our agents came up with in the end.
Which will all be stored on your local drive, together with the booklet,
the song album, as well as the actual song, the WAV recording file.
on a side note, you might have noticed something.
While discussing this chain of agents, it might have, started
you thinking about something else.
And something else could have been this software development life cycle.
Because in the end, while we'll are developing also new software.
We're also gathering requirements like conceptualization of a song.
We're also doing an analysis.
We're also gathering requirements and coming up with a proper design
of our application software.
While in the second phase, we start to create the song in terms of MusicAgent,
but in the software development cycle, this would mean implementation and
coding and reviewing it, testing it, As we did for our song it's actually
quite the same in software development where our application is being tested,
being reviewed, could be sent back to the developer to be adapted.
Where in the end we're going to deploy it, we're going to actually produce
the software, it will be deployed and will be set up in production.
Same as creating a full song, your WAV, your recording file as
well with a full booklet and it can still be adapted afterwards.
But hey, let's get back to MusicAgent and a full glimpse
on the architecture overview.
So as mentioned, we can interact via CLI or via the GUI.
where we'll be calling a bunch of Python code scripts.
So these code scripts will actually, launch our different AI agents, which
will, start doing the work, depending on the phase we're in, depending
on which, part of the component of the song creation we're in.
And by doing so, they will do this based on, a couple of configuration files that
will, Tell the agents in what order they need to produce their task and how they
should conclude or should continue based on the input, based on the different
tasks that they were provided to.
The agents ourself, they will interact with, providers like OpenAI or Anthropic.
So they will be using those APIs in order to create, a new song
or create new images and so on.
Actually, also while, providing, while exchanging in between agents at
one point during the creation phase, there is also, in human interaction
possible, but while before doing so, you need to listen to the song.
So in order to be able to.
listen to the song.
We'll also be interacting with Sonic Pi, and we can do this by
using OSC, Open Sound Control.
So this will enable you to have some, some, playback of the song already,
while your agents are working on it.
You'll, get to have a hearing of the initial, song, proposition.
And by the end of course we'll have our, production of our song.
We have the actual booklet, the actual, WEV file, as well
as the album cover and so on.
And as mentioned, do throughout the process because we can hear the feedback.
There's always a possibility in return to the CLI or as well as the GUI to interact
with our agents and to have, to add some additional information or some, to provide
some remarks on the song to our agents.
to improve the final song that will be created in the end.
in practice, how does it look like?
I mentioned there are a couple of, config files that can be
used to set up Music Agent.
first of all, we have the configuration of our artist in which we will define
how the album will look like, what's our styling, but also the different type
of assistance that will be cooperating together throughout the song creation.
from an artist who will come up with a concept or a coder that will
actually write the code in Sonic Pi.
Secondly, once we got our, assistant, defined, we have our, artist,
configured, we need to define, the different phases throughout the process.
for instance, we have a phase of songwriting where a song, where the
song need to be, written, where we have our, songwriter agent that will actually
come up with, the lyrics of the song.
They will, be given some, advice, some, some remarks via the composer,
another agent, which already collects information as well as from us during the
composition, the conceptualization phase.
So as you can see within the, within the configuration as well, we will be
providing already some initial input like a theme, a melody, a rhythm, and
in an outcome we'll have, some lyrics as well as a structure of the song.
That being said, we get the assistance, we get the description of different phases,
like for instance songwrite, might as well be the recording, the album generation, as
well as the sonic bi coding of your song.
We need to bring them all together.
finally, a third one, a third configuration is needed, and
that's the actual sequence order of our different phases.
They can be, defined in a sequential order, but sometimes we will also
be needing to iterate multiple times all over the same phase again.
And, one good example of this, of course, is the writing of the song
while we'll be writing Sonic Pi code.
This code will be reviewed and will have to be modified afterwards.
So we can define ourselves the multiple of, the amount of iterations
that will be needed to go through the song and to correct this song.
just an example of MusicAgent practice using the basic flow.
pre configuration file will be defining the role of the artist, the
chaining and the different phases.
there are multiple configurations possible, I will show more
in the IDE afterwards.
But that's, there's a basic setup available.
Within music agent.
So we got the chaining.
So we start from a user query prompt or from the GUI, where we start a
different phase from coming up with a concept to actually writing the
song and creating the final song.
for instance, within the concept, we got our agents.
talking to one another, the artist talking to a composer, providing some initial
input to come up with a final idea for the song and a concept of the song and so on.
But for instance, when we go to the song code review, there are multiple
agents connected to one another.
Where one is doing the code review of the song, another agent will start
modifying and correcting the code based on the input of this code review.
And we can even go a bit further.
While adapting those different configuration file, we can extend
them with initial, with additional, possibilities, for instance, we can
add a human code review phase, which will actually enable, in setting up, in
sending out a, a, where we can define an agent to send out, OSC commands to
make sure that we can have a playback of the song where the human, can
review while listening to the song.
He can, provide some review and taps in some review via the console or
via the GUI where afterwards code modification can take place and where
we can re listen to the song until we are, completely, glad or agreeing with
the final composition of the song.
Also, we can extend it with, song recording.
Song recording can be done via Sonic Pi.
again, we'll be using Open Sound Control to interact with Sonic Pi,
where we can have a playback and generate an actual WV recording,
which is your actual song at the end.
that being said, that might be time for a little demo and an
introduction to the code itself.
first of all, let's go to the project itself.
I got my project opened in IntelliJ.
it comes up with a bunch of readme files.
It's quite extensive.
It's quite extensively explained how you can set it up locally,
how you can use it on your own.
As a code basis, it's quite limited.
There's a bunch of Python scripts, some configuration to be set up that can be
configured, but it comes already with the default stack of configuration, so you
can actually, by simply installing all the libraries, you can be quickly up and
running and start launching MusicAgent.
first I want to, bring your attention to one particular folder,
which is the agent configuration.
As I mentioned, we have multiple configuration files available that enable
us to define the creation of the song, the how we will chain up our agents.
So we start with the default, Default one that can be used, the one I showed
earlier on, which is the Mighty John, but of course we got the art, evaluation
and full other setups that can be used.
But basically, it comes to setting up the different assistants within your
artist configuration file, where you can even define your style, which will
be used for the cover art creation.
There is the music creation phase, where we define the
different phases of the music.
of the creation of a song, like for instance, as you can see, songwriting,
different segmentations where we define, which arrangement do we
have one verse, two verse, chorus.
there's the arrangements, that need, that are going to be defined, meaning
which instruments will be using, sampling phase, initial song coding.
And you can also notice now the different prompts being used.
throughout the process for those different agents, as well as the input
and output of those, those prompts.
these are the phases.
There's a third file, which is the creation chain, where we actually
define the different, phase to be used throughout the process.
Different agent configurations, where is the difference in between, might
be John, the default configuration consists of the simple song creation
with some, iterations on code review.
Whereas, for example, the art configuration, only consists
of creating cover art.
So you just bring out a concept, it will not create a song, but it will
come up with cover art creation, a cover album creation, whereas the full, for
instance, involves human interaction, involves cover creation, but also
the actual WEV recording in the end.
Next to that, in terms of setup, there's also the samples folders where you can
easily yourself integrate your own samples that can be used throughout the process.
it's, you can simply drag and drop your samples in and it will be
reloaded throughout the process.
And they will be used depending on your concept, depending on the direction
you're going to, they can be introduced, they can be used throughout the process.
besides that, we, have some setup folders.
More importantly, there are two ways in launching our, music agent.
as I mentioned, we can do this via the CLI.
Simply by running python run.
py.
And it's slowly starting up now.
where it'll provide you some choices, okay, which API provider you want to use.
In this case, I will be choosing, OpenAI.
it asks me the type of model, the type of, agent configuration
you'd like to be using.
in this case, I will be, start, I'll use mighty John.
And then we can go on in entering the name of the song, but besides doing this
via the API, of course, there is the, you can also use the web application.
as I showed you, you can use CLI for interacting with MusicAgent, but we can
also use the web browser for interacting.
So there's a, an application, a web application available with MusicAgent.
We can do it completely the same as we did within the CLI.
It also comes with a song configuration again, where you can choose the
API provider, the model to be used.
And of course, the agent type can be selected.
And even while interacting with it, you can even still make some
modifications to the configuration of your agents, as well as the
assistance genres that are included.
let's start creating our own song then.
Let me come up with a title.
Let's call it the banana song.
I'm almost at summer vibes.
the banana song, I will choose electro, but again the genres are
also depending on your configuration, your agent configuration, but
I'll choose electro in this case.
Let me make it a song about a couple of bananas at the beach, maybe, as we're
in a summer vibe mood, at the beach.
Drinking some cocktails.
the song, let it, the song is inspired by late 90's music.
Cause hey, I'm a 90's guy and I can't drink.
tend to appreciate that type of music.
So that being said, so we got everything set up.
We can start generating some new music.
So there we go.
And it's done.
And you notice Underneath, we have quite some setup being done already.
you can have different badges activated.
So first of all, we can see the input parameters being passed
by, through our music agent.
And you'll notice along the way, it's getting filled in with
everything provided by our agents.
On the right, we see our timeline.
So this is based on the setup for the default agent type, which contains the
timeline, a timeline with the different phases include like conceptualization,
songwriting, segmentation, and so on.
But we also notice the iterations when it comes to song code review.
In this run, the default setup, there's no human interaction, but
this is also due to the fact that we want to speed up a bit because,
the iterations can take quite long.
So that's why we do for, the sake of the demo, we do the basic setup, the default
stack, with a couple of iterations, as you notice, two cycles for code reviewing,
where we'll, almost At the end, proceeding down, you can also notice in, MusicAgent,
you can follow up the agent conversations in between, how they interact with one
another, where you can also see, the amount of information which is being
sent and, outputted along the way.
And while we notice, this, cover being created from a couple of bananas drinking,
A cocktail at the beach, of course.
Accordingly, you can follow up the logs.
And also quite interesting is also there's a view on the different code, the Sonic
Pi code, which is being created, which, you, where you also have the availability
to playback it or to send it to Sonic Pi.
if we return to IntelliJ, let's get back to our IDE.
You'll notice that in the meantime, a new folder has been created,
which includes the banana song.
And where we also can find back our Sonic by song creation, our complete
file, with the coding of the song.
It includes the album cover, as well as the readme, which is
the full booklet of your song.
let's have a quick look, if I can.
Take along all the code and put it back in Sonic Pi.
So it gives us something like this.
So let's give it a quick run.
Which actually sounds quite nice, isn't it?
Now
Just as a small remark, it doesn't end up always with a good result
because depending on the amount of iteration, depending on how your
agent was set up, it sometimes comes up with something completely eclectic
or it messes up some of the, charts, for instance, within Sonic Pi.
there's, There might be sometimes some modifications to be done, but then again,
as you have the code, you have the full logs of your song creation, you can
adapt and you can change accordingly.
But the more you adapt your agent, the more you make it more detailed in terms
of prompts, in terms of setup, and also the chaining, the more time you give
it, the better the results you get.
by the end.
And also if you interfere, if you, add some human interaction.
So if we go back to in our IDE, so back again to the agent config.
So let me take, for instance, the the full setup where you can notice that,
in this case we set up multiple code reviews, but also multiple human reviews.
On one hand we have the agents reviewing some code as well as human reviews,
but, there's also, when we take a look at the face config, we can even
set up the playback if I get there.
So this we can include code validation as well.
As well, along the way.
that was for a quick example of the code, so if we get back to the
presentation now, to come to a conclusion.
a small remark I wanted to make in the end.
This is, a quote from, Nick Cave, which you might know.
Quite, quite famous singer as well.
And he, at the beginning, when they sorted out JetGPT, the first AI
models, he got a lot of message mails coming in from fans, starting to say,
okay, I can write code as you do.
And, Quite importantly, he answered, he, because of the amount of, people
asking him, he said it's a blood and guts business here at my desk that
requires something me of me to initiate a new and fresh idea and it requires my
humaneness and It's quite interesting because while, although we're using AI in
this case to create a new song, but also in terms of coding, in terms of helping
us out while developing new software.
It's quite important that we as a human remain to keep the full
perspective to the full perspective on the context on what we're doing.
And whether we talk about song creation, whether we talk about software
development, we are in control of our AI.
And you may notice also while using music agent while creating a new song, it's
quite important that we give feedback that we set the right context and that we
direct our agent in the right direction.
Sometimes.
pop up with good results, but not every time.
And then we need to redirect them.
So it's a nice quote, quite important one also to keep in mind.
So as a wrap up, as a conclusion, the full code repository is
available on GitHub, where you can find MusicAgent on my account.
There's also more information to be found on the blog mightyjohn.
com.
So go ahead and check that one out.
out.
If you have some more questions after this talk, please connect with me.
Just ping me via LinkedIn or, one of the social, social platforms, of course, are
also possible, but Hey, let's get started.
Shall we, shouldn't we start creating some music?
Yes, we do.
okay.
I'll invite you to check out a music agent and start creating your own music.
Thank you for listening.
Thank you for watching and see you again on the next talk.
Thank you.