Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey everybody.
My name is Dmitry Volkov and I'm here to tell you about Python
packaging for busy people.
And what I mean by that is tools that save you time and don't annoy you.
And you might be starting to suspect something at this point.
So let me tell you upfront that this talk is going to include some opinions.
This is not the only way to do things, but I hope it's one that will save you time.
For some background on where I'm coming from.
I used to do some build engineering for projects like Ton, KasperskyOS,
and VNet, and the upshot is that these are large polyglot projects
which are annoying to build.
Say, VNet is a large C codebase with Go, Rust, and C vendor submodules, which
On top of that ships a fork of mobile Firefox and really it's a pain and
sometimes a pain to build developers are unhappy and it's hard to ship things.
So the intent of my work there was to improve developer experience
to let people ship faster.
and what I'm going to do today is tell you about how to do the
same for your Python projects.
If you're going to take one thing away from this talk, it's use UV.
Now, let me get to some details.
If you've been at it for a while, and, build things in Python for a
while and try to map out the space of concepts you came in contact with,
it might look something like this.
So there is pip and point trees, and there is Eggs and wheels, and this is
all a little confusing, so let's try to structure it with a nice graphics diagram.
This is what it looks like for the ways to specify dependencies in Python, the way
it progressed, and this is actually great.
What this tells you is that you can always use PyProjectTunnel and you're golden.
However, if you do the same chart for tools, it's not so great because
there is like 10 competing tools.
and, pip still works.
So, that's confusing.
let's try to break it down.
The way that we're going to break it down is to look through the use cases and see
how the tools match to the use cases.
So, say I want a library.
I pip install.
Say I want to share my project with friends.
I write the library down in requirements.
txt.
Easy.
Say, I want my project to work in half a year's time from now when NumPy updates.
I just write down the version.
Easy.
say I also want environment to sync with requirements automatically.
I use pbmf.
So that's good, right?
We should talk about reproducibility.
so this write down the version only works or is guaranteed to with PyPy.
The why is that PyPy enforces that when you upload a package with
a version, you can't change it.
You can only push a new version with something different.
However, as a practitioner, you often get stuff from elsewhere.
For example, if you install Torch, you get it from their own, registry.
So, if you want your project to work in a year's time the same way
you left it, you should pin a hash.
That's foolproof.
And there is a bunch of tools that do that.
you might have heard of some like Poetry or Condalock in the Conda ecosystem.
There are some gotchas with tools, or pain points, and let
me go through them for you.
So, first off is pip and piptools.
And the first thing you should know about that is that old pip can
just install incompatible packages.
So, if two of your dependencies want different versions of the same thing,
you just get a broken environment.
This is fixed though, in new versions.
Now, if you want to pin things like we discussed, the endorsed way is to generate
a log file on each platform manually.
what this comes down to is if you have a build for Mac OS and Windows and Linux,
you're supposed to have three build boxes, which each, generates log files
and commit them to version control.
So, no cross support.
And finally, in today's failure of technical communication, PIP's,
documentation literally includes a section telling you that a
secure way to use PIP is to use a flag you probably never heard of.
What the flag does is it disables PIP's dependency resolution logic.
Because you are supposed to have pre resolved the dependencies and
locked them and committed them to version control Because of some
details of how setuptools works.
Okay, so that was PIP.
The next item on the list is Poetry And let me tell you I
really wanted to love Poetry.
However, there were some pain points and here are some So the first thing is Poetry
has no command to upgrade dependencies.
So you wrote down the version, you wrote down the hash, you want to bump it.
Well, no built in command.
the next thing is we discussed that PyProjectToml is the standard
right now in, in the way Producer described, and the standard is going
to have different implementations and PIP and implementations and
they might differ sometimes.
this is going to bite you.
finally, for about a year in 2022, if you ctrl c poetry install, or if you run
two parallel poetry installs, like two projects you're installing dependencies
for at the same time, you get corrupted cache and poetry doesn't work anymore.
This was when I reached quit poetry.
Okay, so, On to conda and if you look at conda's website you will learn that conda
is an ecosystem and a philosophy which works for a project of any complexity and
if you don't run away screaming at that point let me tell you that google has 2
million hits for conda slow and this is for a Now, CondaLock inherits the reason,
though you can absolutely make it better with libmamba solver, and you should,
and it also inherits another reason.
It wants to support PIP requirements, and to do that, it ships a vendored poetry.
There are a few issues with that.
First off, vendored poetry can be old, like two years out of date,
which means you don't get the fixes.
In, say, PyProject compatibility, and Poetry is not the same as Pip, so a
project which, Pip installs, Quandalog will not necessarily install, and finally,
the vendor Poetry has been known to mess with, global, external Poetry sometimes.
So, I aired my grievances.
Now it's time to hear my praises, and my praise is that uv is good.
It solves some of the issues I discussed above with some design choices,
and it also has the great developer selling point of being very fast.
So fast I use it in drenv.
Which I couldn't with PEEP and absolutely, God forbid, could not with Conda.
there is still a gotcha, which is that UV is, has been pre released for a while.
because they're iterating on, on, on, on the format and don't
want to commit to a stable API.
So things sometimes really break, but I have found it to be absolutely worth it.
and the time I spent fixing.
breaks like that to be trivial as compared to the time I spent
fixing some of the issues above.
So, if you want to use pip, use uv.
If you conda, consider using pixie.
This is something that, condalock developers are considering
endorsing instead of condalock.
Okay, so, instead of pip, uv, instead of conda, maybe pixie.
Hope that saves you time, and thank you.