C++ and Python: Building Robust Applications by Offloading Compute-Heavy Workloads

Video size:

Abstract

Are you ready to take your development skills to the next level by combining the best of Python and C++? This strategy ensures you’re using the right tool for the right job, resulting in a robust and scalable architecture. It’s not just about theory we’ll cover the practical aspects. Don’t miss it!

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello, everyone. Welcome to today's talk on integrating C and Python at Conf 42, Python 2025. I'm Hariharan, currently a software engineer at AMD Advanced Micro Devices. I previously worked as a lead software engineer at Athena Health, and I've also worked at other organizations like Bain Capital and Bose Corporation. My large areas of interest are DevSecOps, distributed systems, applied AI, And the two languages that I frequently use are C and Python. feel free to connect with me on LinkedIn and also feel free to follow me on GitHub. So for today's talk, I've split them into around 10 parts. So first we'll talk about, what the, about the problem statement and then slowly move on to similarities and differences between Python and C Then, talk about, static binding, dynamic binding, and then, brief overview of Boost Python and PyBind11. And then let's look at some actual code and how the project is set up, some minimal examples, a short demo, and then touch upon a few advanced topics. And then, talk about distribution and packaging and then, wrap it up with some top takeaways. So let's get started. in terms of introduction, in this session, I'll share how you can leverage the strengths of both C and Python, to build robust applications. The idea is to gain, insights into tools like Boost Python, PyBind11. And other dynamic binding tools and learn how to seamlessly integrate, cross language projects and discover strategies to optimize the code base. So the goal is very simple. By the end of this talk, you will understand when and why to choose Boost Python, PyBind 11 or other tools. And you'll walk away with a clear idea on how to set up a project and write minimal and functional examples. So let's get started. So before, actually diving into the, diving into code and other contents. So let's talk about why Python in the first place, right? So we know that Python, Python currently is being rapidly used for prototyping, due to, ease of learning, relatively simpler interfaces and usage. faster development. But when we look at C and Python, although, there are a lot of differences, but there are a lot of similarities as well. Modern C is trying to get closer and closer to Python, I would say. for example, if you take a, if you take a look at these two snippet of codes, of course they're very simple, where they try to add two numbers. The idea is, even in Python, you have type hinting. In the recent versions, where you can explicitly mention that a and b are integers, whereas in C you have, it is enforced and then if you do not mention what type it is, you essentially get a compilation error. Similarly, like on the left you have a function that takes a vector of strings and then displays hello for every string that it takes. On the right you have a library called typing from which you could import a list. And clearly, say that the function takes a list of strings. Over here on the right, here we have a class called box, right? So there are other types called type var and generic that allow you to create classes and functions that work with any type, which is very similar to the concept of C templates, which on the right hand side. So on the left hand side, Type var and generic are used on the right. You can clearly see the equivalent in, implemented using C templates. The box class can hold any type and its method reflect that type, which is the underlying concept of templates as well. So even though modern C is. Heading closer to, syntactically, that's a very broad statement, but even though modern C is try, even though Python and C are, having a lot of similarities in terms of performance, right? C is much, much faster than Python. Although the recent versions of Python 3. 3, 3. 13 have introduced new modes such as, modes where you can disable the global interpreter lock modes where you can, have another just in time compiler you can, and it's set to be like 15 percent faster for module imports and 10 percent faster for function calls and other CPU intensive tasks. but even in pure, even in, codebases that predominantly use Python, For data intensive tasks, we end up using other packages such as NumPy, Scikit learn, PyTorch, and so on, because, they have APIs and abstractions that are much more, high performing. and when you look deeper into those libraries, the core implementation is either in C or C the performance is just one side of the coin. In one of my previous workplaces. We've had teams affiliated to a particular code base. For example, the software development team in an embedded space might just be writing C code, whereas the testing team might be writing Python code. So sometimes when there has to be collaboration between the development team and testing teams, there needs to be an overlap between C and Python, rather the code that we write. so there needs to be code that talks with both, the C code base and the Python code base. there are other needs where, we have to make C and Python talk to each other. Rather than, the, so what, the point I'm trying to drive home is, sometimes we might have, a Python code base, And we might want to optimize part of that code base, for specific reasons. So we might pick that batch of code from the Python code base integrated in C for it to be more highly performant. On the other hand, two teams, two different teams have to interact with each other and they might have different objectives, but, One development team can be writing C code, whereas the testing team can be writing Python code, but they still have to interact with each other. we might have to do some binding. sometimes the need for binding can be performance, sometimes the need for binding can be cross collaboration, but we'll learn more as we, we'll learn how to do all that as we move on to the next slides. for the purpose of this talk, if you want to follow along and actually try it out, in your local IDE or compiler, some of the prerequisites are you need to have, a working C compiler that at least has C 11, and, have, Python 3 installed. you can either install it through, if you're on Ubuntu, you can either install all these through APT get. Or if you are on, on a macro system, you can use the Homebrew, to install all of these dependencies. build system like CMake is recommended. That helps you to coordinate compilation linking library parts. And since we'll be talking about Boost Python five 11, please, I mean you can install it either through, similar through like package managers or, you can also use Cmec to fetch it. or if in case of five 11 you can just copy the headers into your project as well. let's take a small look at, Boost and PyBait11 just to understand what they are. Boost Python is essentially part of the larger Boost ecosystem. It's actively maintained for many years. It's stable, well documented, and ships Python module in the library. PyBait11 is a more modern alternative. or rather I would say it's slightly slower than Boost Python also in terms of performance. But, if you dislike additional library dependencies and want an easy install, PyBrain 11 just might be your choice. But, when I show you like actual code, I'll show you like both implementations in both Boost Python and, PyBrain 11. And I've also attached references in case you want to check them out. they have extensive documentation for both these libraries. Yeah, so let's take a look at actual code before going to advanced topics. so here's my ID. So let's take a look at on the left. we have on the left, I have a PI bind and on the right, I have, boost. So when we write in, so here we have a C plus here to have C plus a score where we have, simple, function process function. Just to add. And then we can actually expose it to, we can define a Python module called PyBindModule and essentially, add an, add a doc string and then, expose it and say this is a function to add two numbers. And, I, the corresponding CMake lists for this also clearly says that I want to create a Python module and as a shared library and call it as PyBain module for this file PyBain underscore module. cpp, which we see here, right? So generally the code that exposes it to Python is actually written in a separate file. But for the sake of simplicity here, I have it in the same file. so when this is compiled, so I've already compiled it using CMake for the sake of simplicity. But when we actually compile this, it generates a shared library called a PyBain underscore module. And this pyvain underscore module is what is I actually, I can actually, and once I actually like export, it to Python path, I can actually directly import, the pyvain module and then directly call the function add, which is actually, implemented as a C, which is actually a C function, which takes two integers A and B, right? So the corresponding, Python function for that is over here, which is, pi byte, pi, which we are directly calling through the pi byte, pi bind underscore module, which we, which is actually like a shared library, right? So a similar example in boost would be, over here, where here we imported the pi byte, PyBind11 header, where here we import the boost header and then here we basically say we want to create a Boost Python module and this is the function that adds two numbers, right? So similarly, when I, when I compile it, I create a, I again have a shared library called boost underscore module. And I can directly call the boost underscore module in my Python, file as boost module dot add three, three comma four. The syntaxes are fairly similar, but they're not, they were different. The setup is, different though. with regard to our, so this is very simple, right? Look, just a function to expose, just a function, just features have a function to add two numbers and it is exposed in C and we, and we expose it to the Python world. We can do some advanced operations as well. for example, we can also have, a class and, Here I have a class called person and then you have a constructor with name and age and then with getters and setters for the age and a function to get name and then some private variables for name and age as well. So in PyBind I can clearly say I want to, I am calling this head library as PyBind class module and I say that hey, I, this is a class and I want to essentially allow Python to call all these functions, right? So similarly, when we go to some, as I mentioned here, we could directly import, and this also I have pre built it over here. I, once you, when you run CMake, it generates PyBindClass underscore module. And I can directly import the PyBindClass underscore module, and then directly call, directly call the constructor. Say, I want to parse AliceN28. And then it creates and person is like the instance of that class and then I can directly call the helper functions. So this is another example to where you can expose a C class to Python, right? So similarly, we have, similarly, we could also do it in Boost. So we import the Boost header and the class is very similar, but then the exposing function is like, slightly different syntax where we use the Boost namespace. So this is an example of how we could expose C functions and C classes onto the Python world. But we can do much more advanced stuff. We can also like, we can also map C exceptions to Python exceptions. We can easily parse vectors other containers and iterators. because both, PyBind11 and Boost have HTL container support. We could also, PyBind11 has a NumPy header too, so we could also, and Boost Python has a NumPy sub module, so they can also be, easily be parsed between those two languages. So the idea is, when we have heavy compute related operations, the idea is to pick and choose them and move that logic into C In terms of distribution, packaging, as you saw, like when I run cec, when I run CEC and built it locally, I do get, for the I do get like a do two.so files over here and on and for, C and for Boost Python. Then I also get a boost, boost module Do so file here. So the data, so file can essentially be added to the Python path, and then the Python scripts can be run. But, but for larger projects, the idea is to, use, create a pip installable package where you use setup p or psych. Or you can also use psyche.build to publish the pipe or any other container registry that, that is of the user's choice. So I have a simple P through which I could, Push it, push the Pi by module to a content registry of my choice. And others can just grab that module and then, add it to and install it in their virtual environment. And then essentially, run their scripts. But, given all this, right? So what about the overall binding performance? the idea is to, move part pieces of code and read them, in C and then make the two code, make the C code and the Python code talk to each other, but then, but in terms of binding performance, are we like, are there any overheads? There are definitely overheads, right? Because data needs to go back and forth between C objects and Py objects. There's an implementation difference between Boost. py and PyBind11, where PyBind11, everything is like a smart pointer, so it heavily relies on smart pointers and vectors, so it makes things slower. So it's very imperative that we choose the right parts of our code and product that need to be optimized. I did find some benchmarks online that said Boost Python is slightly faster than PyPy11. And of course, C and Python are like, raw C and raw Python are much faster. but there are some countermeasures that we can take to be, cautious. Five 11 and Boost Python are special types that directly bin Python and c plus. So we can directly use those objects like ic, instead of passing the data, we can also use optimized type findings like, array T double. And we can also use smart pointers instead of regular pointers. because by being, garbage collected language, we can avoid the whole landing pointer. Problem. There is, when we use both, PyBind11 and GhostPython, we actually create shared libraries where we statically bind, the C code to Python, right? but, there is also another concept called dynamic binding, through, that through another library called cppyy, which directly writes C code into Python. So there is no need for any sort of like compilation and, uh, like adding into Python path or putting into a container history, we can just write C directly into Python. This is facilitated because of the C Kling interpreter. In this screenshot, that you could directly write this Python code using CPPY and then you could get that, you can actually call it in the global namespace, and print the output. This is much, much faster than PyBrain Python, and it's much more easy to use. The advantages are, as I said, There's no setup, no installation compilation. It has a lot more support, for templates, inheritance, callbacks, and lambdas and other modern C features. So the industry is moving towards using, CVPYY, in cross language projects. So overall, the top takeaways are, Boost Python and PyBind11 are excellent tools to blend the power of C and the expressiveness, expressiveness of Python. if you're already part of a Boost ecosystem, It's probably saner to use Boost Python and use its robust set of features. if you just want a lightweight, open source project, you can choose PyBind 11, but of course it comes with performance overheads. but the, but in terms of like static binding, the idea is very simple. Write the, choose which parts of code that you want to edit in C use a module defining macro to expose it, and then compile it as a shared library, and import it in Python. But, and then you can also expose it as a pip installable package, or, put it in a context registry. On the other hand, if you would like to dynamically bind it, You can use cppyy that uses the cling interpreter and for reference, all the code that have been mentioned in the slide deck are in my GitHub repository. So feel free to check it out in case you want to run if in case you want to run those code locally and then get a feel of it. Thank you.

Slides

Download slides (PDF)

See all 53 talks at this event!

Conf42 Python 2025 - Online

February 06 2025 - premiere 5PM GMT

C++ and Python: Building Robust Applications by Offloading Compute-Heavy Workloads

Video size:

Abstract

Summary

Transcript

Slides

Hariharan Ragothaman

Lead Software Engineer @ athenahealth

Join the community!

Featured event

2025

2024

Info

Conf42 Python 2025 - Online

February 06 2025 - premiere 5PM GMT

C++ and Python: Building Robust Applications by Offloading Compute-Heavy Workloads

Video size:

Abstract

Summary

Transcript

Slides

Hariharan Ragothaman

Lead Software Engineer @ athenahealth

Join the community!