Did you ever wonder how databases are built and what’s going on in the background of everything you see? We asked Memgraph’s Core team to give us a glimpse into their everyday processes and experiences. Our distributed team of four great engineers works on Memgraph’s distributed database engine. Antonio Andelic, Benjamin Antal, Jure Bajic and Kostantinos Kyrimis spill the beans about their work, challenges and plans for the future.
Let’s start with an easy one, what do you do as a part of a team Core?
Benjo: I am working on the Memgraph product itself which is a graph database. I’m responsible for all features and everything from the planning up to the release, including implementing the features, writing tests and guiding others to try them out. Apart from the pure software engineering activities on the development side, the whole team plans the releases together. Everybody comes up with their own ideas that might become part of the product in the next release.
What languages and tools are you using daily in the Core team, and what new ones you would like to try?
Jure: For infrastructure, I am currently using Ansible, Vagrant, Docker, and in a smaller amount some monitoring services like ELK stack and, of course, Grafana, Bash and Linux. I use CMake as my build tool and VSCode as my text editor (or IDE if you have enough extensions :D ). I also use gdb for debugging purposes. I am definitely interested in seeing more of Go or Rust since I think that ecosystem for those languages is very well defined and they come with some out of the box solutions that you have to struggle to solve in other languages.
Antonio: I’m using Vim for everything. We believe that everyone knows with which tools they feel the most productive, so we have no restrictions. I code primarily in C++, which is our main language. We recently started looking more and more into Rust, and I believe it has a lot of potential for future use cases.
What has been the most challenging part of your role?
Benjo: In my opinion, the most challenging part of working on Memgraph is finding the tiny details of use-cases and making them work because a lot of the different functionalities are connected in the application. Finding them requires a good understanding of Memgraph and critical thinking while making them work is usually easier.
Antonio: Learning all the details about a complex system like Memgraph. Databases are really complex piece of software, and good theoretical and practical knowledge is required to be efficient while working on them. Also, I learned a lot about what it means to build a team and take care of your teammates every day. Everyone is different and they require a different approach, but in the end, what’s most important is the will to improve and be open to new ideas and feedback.
What are some specific problems you have to solve as a part of a Core team?
Antonio: We have a lot of interesting problems to solve. For the performance itself, we need to think about how to speed up certain parts of Memgraph, like loading the data from the CSV. Also, as memory management is an important aspect of our database, we are pretty sensitive to any memory-related issues. They mainly include minimising the amount of memory used for a certain workload. Memgraph is a complex software and including any new features requires you to think beforehand about many different parts of the system and how it can affect them. And, to make things a bit harder and even more interesting, we need to solve those issues while keeping the underlying graph structure. We don’t spend all of our time writing graph theory algorithms, but the graph structure introduces a lot of constraints and ideas that need to be used for any solution you design. Recently, we started defining work for the distributed workload. That’s a difficult topic by itself, but throwing in graphs takes it to a whole other level!
What was the feature that took the longest to develop?
Jure: Most recent feature that I have implemented was Bolt notifications. In order to implement them, I had to get to know the specification of Bolt protocol and find out how to pass them to clients. After that, I had to go through our query engine and find the places where we create, delete and update objects and create notifications from that point. That was a cool task since I got to know the codebase a lot more.
Antonio: I would say implementing the replication was something I spent the most time on. It had a lot of different and interesting problems that needed to be solved, including finding a way to test something like that. I learned a fascinating language called Clojure and implemented some tests using Jepsen to solve that issue.
Benjo: I am currently working on distributing Memgraph on multiple machines. This required a lot of team effort on the research side and a lot of planning on my side. I haven’t started developing it, but it has already raised a lot of tasks to solve, from which quite a few are really challenging and interesting.
What’s the most interesting part of the Memgraph database engine?
Kostas: The MVCC that empowers the storage engine is definitely an interesting piece of software.
Benjo: My favourite part is the built-in graph algorithms of the query language such as filtering variable length paths, breadth first search and weighted shortest path algorithms. I think they are really nice. Apart from that, the flexibility in terms of schema is also valuable.
Antonio: I think that Memgraph’s storage engine is what is responsible for its great performance. It’s a nice implementation of a great research paper. It uses a lot of interesting ideas to achieve great performance while containing the memory consumption as much as possible while providing ACID guarantees.
Jure: It is CS theory manifested :D. Databases solve many common problems that you will encounter in real-life solutions, like concurrency, durability, virtualization, reliability. It also delves into distributed systems, etc. There is really a lot to learn and do in this field.
What’s the most exciting project you’ve been working on?
Benjo: I think the most fun part for me was exploring the existing RPC (remote procedure call) libraries and deciding which one of them should we use. I learned a lot of surprising things and had fun implementing and measuring the experiments we came up with.
Kostas: Temporal types were very fun to develop but I think that WASM is a very exciting and promising technology.
Antonio: Right now, I’m working on introducing alternative ways to connect to Memgraph. As Memgraph is a streaming platform, we need a way to stream the data from the server to our clients. We think that WebSocket fits nicely with that idea. For that, we had to learn two great C++ libraries, Boost Beast and Boost Asio. It’s still a work in progress, but I think it’s really fun to play with those things because you learn a lot of interesting, new concepts while seeing the impact on the other tools built around Memgraph.
What’s the next big thing for the Core in 2022?
Kostas: Distributed, Streams and WASM!
Benjo: First, the biggest thing is jumping on WebAssembly and hopefully shipping some nice integrations with it. Distributed architecture is a very big thing for the second half of 2022.
Antonio: We need to improve Memgraph from the performance and memory consumption side even more. Also, our integration with other streaming platforms like Kafka needs to be even better and more stable. But, by far, the biggest thing we would like to introduce is support for sharding the data on multiple nodes and running the queries in a distributed manner.
What’s the best thing about your team?
Kostas: I like our ability to discuss ideas openly and that we are not afraid to be rigorous with code reviews. We respect each other, and this kinda paves the way for very fruitful and interesting discussions.
Antonio: The best thing about the team is its relaxed environment. Everyone is really open to jokes, small talk, or discussions about more advanced topics. This makes all our meetings a lot easier and much more efficient. Everyone respects each other’s opinions and is not afraid to admit when they are wrong. All our daily meetings start on a really positive note, which helps a lot to connect with the people from your team even though everyone is in a different country. Recently, we even started to have virtual lunches where we play some online games together and, what’s more important, talk about everything except work.
And what do you like most about working at Memgraph?
Kostas: The people in the company and the database :D
Benjo: Debates! It sounds silly, but at Memgraph, if I encounter an opinion that doesn’t match mine, we can usually discuss our thoughts and share our knowledge in a very efficient and pleasing manner. All of the debates end with either one person learning a critical piece of information or with a new idea that is better than those we had initially. These constructive discussions are helping us move forward and pushing us to better things.
Jure: Challenging work, learning new technologies, getting chance to advance my knowledge in database systems and building a tool for others to use.
Antonio: I like the opportunities you get here. No idea is discarded immediately. If you think something will help Memgraph as a product, you have support from everyone in the team and the company. There is no task that is above your level. If you think you can handle it, do it.
Could you share some tips about helpful materials for further education and development?
Kostas: I am mostly a self-learner. I like to read books and hack things on my own. Some of the books that influenced me are Elements of Programming by A. Stepanov and P. McJones, Introduction to Algorithms by T.H. Cormen et al., Computer Architecture: A Quantitative Approach by D.A. Patterson and J. L. Hennessy, and Computer Systems: A Programmer’s Perspective by R. Bryant. Also, there is so much valuable knowledge on the C++ conference CppCon. I am such a big fan, and you can check the video of my talk when I attended as a speaker in 2019. :)
Jure: There are definitely noteworthy channels on YouTube like mCoding and Cᐩᐩ Weekly With Jason Turner. I am currently reading one of the must-reads Operating Systems: Three Easy Pieces by Andrea and Remzi Arpaci-Dusseau. Also, I’m following a course on Databases from CMU.
Antonio: I read many technical books, and I think that’s the best way to understand certain topics completely. Apart from that, I like to check out some videos from CppCon, CppWeekly and listen to CppCast. I would like to mention two courses from CMU by Andy Pavlo: Database Systems and Advanced Database Systems. If you know nothing about the database internals and would like to learn, this is the way to do it! Awesome guy with great knowledge about the topic, with a lot of materials and fun, recorded lectures.
And finally, what would you say to someone coming into this role?
Benjo: We are not perfect (who is?), but we are always open to new ideas. If you join us and think you know a better solution for something or a better way of doing some process, just speak up! We will be happy to improve our environment and make our work easier. Who wouldn’t like to make their work easier, right?
Jure: The Core team will give you a real opportunity to make changes in the company as well as outside of it because you’ll have an impact on Memgraph’s users. You can feel a real accomplishment when building such a powerful product. Most importantly, I want everyone to know they can reach out to anyone from Memgraph and ask questions about the job or company.
Antonio: Don’t be afraid! If you love working in C++ and have a solid understanding, you’ll fit right in. As long as you have the will to improve, there is nothing that you can’t learn while working with us. If you’re afraid of graphs and/or databases, don’t worry! I didn’t know almost anything about them two years ago, and now I’m leading a team working on a graph database system.
Huge thank you to everyone in the Core team! If this got you excited and you want to dive deep into the world of databases, good news - we’re hiring!