Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello, everyone.
Welcome to today's talk on integrating C and Python at Conf 42, Python 2025.
I'm Hariharan, currently a software engineer at AMD Advanced Micro Devices.
I previously worked as a lead software engineer at Athena Health, and I've
also worked at other organizations like Bain Capital and Bose Corporation.
My large areas of interest are DevSecOps, distributed systems,
applied AI, And the two languages that I frequently use are C and Python.
feel free to connect with me on LinkedIn and also feel free to follow me on GitHub.
So for today's talk, I've split them into around 10 parts.
So first we'll talk about, what the, about the problem statement and then slowly move
on to similarities and differences between Python and C Then, talk about, static
binding, dynamic binding, and then, brief overview of Boost Python and PyBind11.
And then let's look at some actual code and how the project is set up, some
minimal examples, a short demo, and then touch upon a few advanced topics.
And then, talk about distribution and packaging and then, wrap
it up with some top takeaways.
So let's get started.
in terms of introduction, in this session, I'll share how you can
leverage the strengths of both C and Python, to build robust applications.
The idea is to gain, insights into tools like Boost Python, PyBind11.
And other dynamic binding tools and learn how to seamlessly integrate,
cross language projects and discover strategies to optimize the code base.
So the goal is very simple.
By the end of this talk, you will understand when and why to choose
Boost Python, PyBind 11 or other tools.
And you'll walk away with a clear idea on how to set up a project and
write minimal and functional examples.
So let's get started.
So before, actually diving into the, diving into code and other contents.
So let's talk about why Python in the first place, right?
So we know that Python, Python currently is being rapidly used for
prototyping, due to, ease of learning, relatively simpler interfaces and usage.
faster development.
But when we look at C and Python, although, there are a lot of
differences, but there are a lot of similarities as well.
Modern C is trying to get closer and closer to Python, I would say.
for example, if you take a, if you take a look at these two snippet of
codes, of course they're very simple, where they try to add two numbers.
The idea is, even in Python, you have type hinting.
In the recent versions, where you can explicitly mention that a and
b are integers, whereas in C you have, it is enforced and then if you
do not mention what type it is, you essentially get a compilation error.
Similarly, like on the left you have a function that takes a vector
of strings and then displays hello for every string that it takes.
On the right you have a library called typing from which you could import a list.
And clearly, say that the function takes a list of strings.
Over here on the right, here we have a class called box, right?
So there are other types called type var and generic that allow you to
create classes and functions that work with any type, which is very
similar to the concept of C templates, which on the right hand side.
So on the left hand side, Type var and generic are used on the right.
You can clearly see the equivalent in, implemented using C templates.
The box class can hold any type and its method reflect that type, which is the
underlying concept of templates as well.
So even though modern C is.
Heading closer to, syntactically, that's a very broad statement, but even though
modern C is try, even though Python and C are, having a lot of similarities
in terms of performance, right?
C is much, much faster than Python.
Although the recent versions of Python 3.
3, 3. 13 have introduced new modes such as, modes where you can disable the
global interpreter lock modes where you can, have another just in time compiler
you can, and it's set to be like 15 percent faster for module imports
and 10 percent faster for function calls and other CPU intensive tasks.
but even in pure, even in, codebases that predominantly use Python, For
data intensive tasks, we end up using other packages such as NumPy, Scikit
learn, PyTorch, and so on, because, they have APIs and abstractions
that are much more, high performing.
and when you look deeper into those libraries, the core implementation
is either in C or C the performance is just one side of the coin.
In one of my previous workplaces.
We've had teams affiliated to a particular code base.
For example, the software development team in an embedded space might just
be writing C code, whereas the testing team might be writing Python code.
So sometimes when there has to be collaboration between the development
team and testing teams, there needs to be an overlap between C and
Python, rather the code that we write.
so there needs to be code that talks with both, the C code
base and the Python code base.
there are other needs where, we have to make C and Python talk to each other.
Rather than, the, so what, the point I'm trying to drive home is, sometimes
we might have, a Python code base, And we might want to optimize part of
that code base, for specific reasons.
So we might pick that batch of code from the Python code base integrated in
C for it to be more highly performant.
On the other hand, two teams, two different teams have to interact with
each other and they might have different objectives, but, One development team can
be writing C code, whereas the testing team can be writing Python code, but they
still have to interact with each other.
we might have to do some binding.
sometimes the need for binding can be performance, sometimes
the need for binding can be cross collaboration, but we'll learn more
as we, we'll learn how to do all that as we move on to the next slides.
for the purpose of this talk, if you want to follow along and actually try it out,
in your local IDE or compiler, some of the prerequisites are you need to have,
a working C compiler that at least has C 11, and, have, Python 3 installed.
you can either install it through, if you're on Ubuntu, you can either
install all these through APT get.
Or if you are on, on a macro system, you can use the Homebrew, to
install all of these dependencies.
build system like CMake is recommended.
That helps you to coordinate compilation linking library parts.
And since we'll be talking about Boost Python five 11, please, I mean
you can install it either through, similar through like package managers
or, you can also use Cmec to fetch it.
or if in case of five 11 you can just copy the headers into your project as well.
let's take a small look at, Boost and PyBait11 just to understand what they are.
Boost Python is essentially part of the larger Boost ecosystem.
It's actively maintained for many years.
It's stable, well documented, and ships Python module in the library.
PyBait11 is a more modern alternative.
or rather I would say it's slightly slower than Boost Python
also in terms of performance.
But, if you dislike additional library dependencies and want an easy install,
PyBrain 11 just might be your choice.
But, when I show you like actual code, I'll show you like both implementations
in both Boost Python and, PyBrain 11.
And I've also attached references in case you want to check them out.
they have extensive documentation for both these libraries.
Yeah, so let's take a look at actual code before going to advanced topics.
so here's my ID.
So let's take a look at on the left.
we have on the left, I have a PI bind and on the right, I have, boost.
So when we write in, so here we have a C plus here to have C plus a score where we
have, simple, function process function.
Just to add.
And then we can actually expose it to, we can define a Python module called
PyBindModule and essentially, add an, add a doc string and then, expose it and say
this is a function to add two numbers.
And, I, the corresponding CMake lists for this also clearly says that I want
to create a Python module and as a shared library and call it as PyBain module
for this file PyBain underscore module.
cpp, which we see here, right?
So generally the code that exposes it to Python is actually
written in a separate file.
But for the sake of simplicity here, I have it in the same file.
so when this is compiled, so I've already compiled it using
CMake for the sake of simplicity.
But when we actually compile this, it generates a shared library
called a PyBain underscore module.
And this pyvain underscore module is what is I actually, I can actually, and once I
actually like export, it to Python path, I can actually directly import, the pyvain
module and then directly call the function add, which is actually, implemented as
a C, which is actually a C function, which takes two integers A and B, right?
So the corresponding, Python function for that is over here, which is,
pi byte, pi, which we are directly calling through the pi byte, pi bind
underscore module, which we, which is actually like a shared library, right?
So a similar example in boost would be, over here, where here we imported the
pi byte, PyBind11 header, where here we import the boost header and then
here we basically say we want to create a Boost Python module and this is the
function that adds two numbers, right?
So similarly, when I, when I compile it, I create a, I again have a shared
library called boost underscore module.
And I can directly call the boost underscore module in my
Python, file as boost module dot add three, three comma four.
The syntaxes are fairly similar, but they're not, they were different.
The setup is, different though.
with regard to our, so this is very simple, right?
Look, just a function to expose, just a function, just features have a function to
add two numbers and it is exposed in C and we, and we expose it to the Python world.
We can do some advanced operations as well.
for example, we can also have, a class and, Here I have a class called person
and then you have a constructor with name and age and then with getters
and setters for the age and a function to get name and then some private
variables for name and age as well.
So in PyBind I can clearly say I want to, I am calling this head library as PyBind
class module and I say that hey, I, this is a class and I want to essentially allow
Python to call all these functions, right?
So similarly, when we go to some, as I mentioned here, we could
directly import, and this also I have pre built it over here.
I, once you, when you run CMake, it generates PyBindClass underscore module.
And I can directly import the PyBindClass underscore module, and then directly
call, directly call the constructor.
Say, I want to parse AliceN28.
And then it creates and person is like the instance of that class and then I
can directly call the helper functions.
So this is another example to where you can expose a C class to Python, right?
So similarly, we have, similarly, we could also do it in Boost.
So we import the Boost header and the class is very similar,
but then the exposing function is like, slightly different syntax
where we use the Boost namespace.
So this is an example of how we could expose C functions and C
classes onto the Python world.
But we can do much more advanced stuff.
We can also like, we can also map C exceptions to Python exceptions.
We can easily parse vectors other containers and iterators.
because both, PyBind11 and Boost have HTL container support.
We could also, PyBind11 has a NumPy header too, so we could also, and
Boost Python has a NumPy sub module, so they can also be, easily be
parsed between those two languages.
So the idea is, when we have heavy compute related operations, the idea is
to pick and choose them and move that logic into C In terms of distribution,
packaging, as you saw, like when I run cec, when I run CEC and built it
locally, I do get, for the I do get like a do two.so files over here and
on and for, C and for Boost Python.
Then I also get a boost, boost module Do so file here.
So the data, so file can essentially be added to the Python path, and
then the Python scripts can be run.
But, but for larger projects, the idea is to, use, create a pip installable
package where you use setup p or psych.
Or you can also use psyche.build to publish the pipe or any
other container registry that, that is of the user's choice.
So I have a simple P through which I could, Push it, push the Pi by module
to a content registry of my choice.
And others can just grab that module and then, add it to and install
it in their virtual environment.
And then essentially, run their scripts.
But, given all this, right?
So what about the overall binding performance?
the idea is to, move part pieces of code and read them, in C and then make
the two code, make the C code and the Python code talk to each other, but then,
but in terms of binding performance, are we like, are there any overheads?
There are definitely overheads, right?
Because data needs to go back and forth between C objects and Py objects.
There's an implementation difference between Boost.
py and PyBind11, where PyBind11, everything is like a smart pointer,
so it heavily relies on smart pointers and vectors, so it makes things slower.
So it's very imperative that we choose the right parts of our code
and product that need to be optimized.
I did find some benchmarks online that said Boost Python
is slightly faster than PyPy11.
And of course, C and Python are like, raw C and raw Python are much faster.
but there are some countermeasures that we can take to be, cautious.
Five 11 and Boost Python are special types that directly bin Python and c plus.
So we can directly use those objects like ic, instead of passing the
data, we can also use optimized type findings like, array T double.
And we can also use smart pointers instead of regular pointers.
because by being, garbage collected language, we can
avoid the whole landing pointer.
Problem.
There is, when we use both, PyBind11 and GhostPython, we actually create
shared libraries where we statically bind, the C code to Python, right?
but, there is also another concept called dynamic binding, through, that
through another library called cppyy, which directly writes C code into Python.
So there is no need for any sort of like compilation and, uh, like
adding into Python path or putting into a container history, we can
just write C directly into Python.
This is facilitated because of the C Kling interpreter.
In this screenshot, that you could directly write this Python code using
CPPY and then you could get that, you can actually call it in the global
namespace, and print the output.
This is much, much faster than PyBrain Python, and it's much more easy to use.
The advantages are, as I said, There's no setup, no installation compilation.
It has a lot more support, for templates, inheritance, callbacks, and
lambdas and other modern C features.
So the industry is moving towards using, CVPYY, in cross language projects.
So overall, the top takeaways are, Boost Python and PyBind11 are excellent
tools to blend the power of C and the expressiveness, expressiveness of Python.
if you're already part of a Boost ecosystem, It's probably
saner to use Boost Python and use its robust set of features.
if you just want a lightweight, open source project, you can
choose PyBind 11, but of course it comes with performance overheads.
but the, but in terms of like static binding, the idea is very simple.
Write the, choose which parts of code that you want to edit in C use
a module defining macro to expose it, and then compile it as a shared
library, and import it in Python.
But, and then you can also expose it as a pip installable package,
or, put it in a context registry.
On the other hand, if you would like to dynamically bind it, You can use
cppyy that uses the cling interpreter and for reference, all the code that
have been mentioned in the slide deck are in my GitHub repository.
So feel free to check it out in case you want to run if in
case you want to run those code locally and then get a feel of it.
Thank you.