3 posts tagged with "C++20"

Why is DocWire built on C++20?

July 24, 2026 · 10 min read

Reeshabh Choudhary

Principal Solutions Architect - DocWire

“Hard times create strong men, strong men create good times, good times create weak men, and weak men create hard times” - G. Michael Hopf.

Well, the current timing being hard or easy for a person does depend on his/her perspective about life, but at DocWire, we were sure about one thing: if we are setting out to build something, it must be built to endure. There was never a doubt about the potential of DocWire as an infrastructure layer to be used by developers and enterprises to quickly spin up data processing pipelines. However, we did a lot of contemplation regarding the choice of technology underneath it.

There was always an option to build just another SDK acting as a bloated abstraction, but it would not have survived the test of time. With LLMs invading the market and cloud repatriation into the mix, along with concerns of data sovereignty and auditability, we could already see the trend shifting towards local compute, which many at the time could not foresee (even a majority do not acknowledge this concern currently).

The Return of On-Premises and Edge Computing.

Having observed the recent outage reports of leading cloud providers such as AWS, GCP, Azure, etc., it would not be an understatement to say that the “Cloud First” approach of the 2010s was an overstated affair. The cascade of failures in the year 2025 alone is quite intimidating. In the span of 10 days in October 2025, both AWS and Microsoft Azure suffered widespread outages. The AWS outage was triggered by a DNS automation bug inside DynamoDB; the Azure outage was caused by a misconfiguration in Azure Front Door.

In both cases, a single internal failure cascaded across dependent services globally. And the DocWire team has been reading such incidents for a long time to understand the consequences, but more importantly to zero in on what is required on the infrastructure side to navigate through such scenarios.

One idea that stuck: “Not everything needs to go to the cloud; computation can be done locally, and only required results can be moved to the cloud”.

As we continued examining enterprise deployments, another pattern emerged. The motivation for local computation was no longer driven only by reliability or latency but it was increasingly driven by governance. Across industries, regulators now expect organizations to maintain meaningful control over their data, infrastructure, and operational processes rather than simply delegating responsibility to a cloud provider. Frameworks such as the EU's Digital Operational Resilience Act (DORA) strengthen requirements around oversight, auditability, and third-party ICT risk. Healthcare regulations such as HIPAA, privacy laws including GDPR, and financial-sector rules likewise emphasize accountability, data governance, and auditability. In the public sector and defense, deployment choices are determined by the applicable authorization regime—for example, FedRAMP authorizes compliant cloud services for U.S. federal workloads, while ITAR-controlled and classified environments require approved security controls and handling procedures. At the same time, legislation such as the U.S. CLOUD Act has heightened concerns about cross-border jurisdiction over data stored with foreign-owned cloud providers. Collectively, these trends have shifted the discussion from "Is the cloud secure?" to "Who ultimately controls the system, the data, and the operational risk?"

However, our concern was not just about data security, sovereignty, and compliance readiness. For a system to manage things at a larger scale in an enterprise, it has to efficiently manage workloads tied to physical processes such as manufacturing systems, trading platforms, logistics coordination, etc. And in these circumstances, latency variance creates an operational risk. While cloud computing delivers easily accessible, centralized computational power and storage, edge computing emphasizes localized, lower-latency data processing. Moreover, there is tighter control over physical resources. If another node on the network gets attacked, an isolated bare-metal environment remains stable.

Having weighed these pros and cons, we started realizing the potential around DocWire as a solution. We wanted the enterprise (explicitly the developers) to have tighter control over their process flow. A flexibility that allows them to manage easily what needs to be done locally and decide what goes to the cloud. In essence, the idea was to give power back to the development authority involved in managing such workloads. But the choice of technology to build upon remained elusive.

The Software Development Mindset

Software development has grown multifold in the last few decades, and especially with LLM coding agents in the picture, things are not the same as they used to be for a traditional developer. And it will be fair to say that the current state of software development is more about wrappers around wrappers, and architectures that care less about machine and performance but more output-oriented. Performance and predictability have been tossed in favor of bloated architectures, black-box frameworks, and infinite cloud computing resources. However, such a compromise breaks down the system in a longer run and especially around stressors which the common models are not built to predict. Especially for mission-critical systems and edge computing, reliability is not a feature but a strict mathematical requirement.

DocWire has been built with the philosophy to restore “Mechanical Sympathy” and bring extreme engineering discipline back to data processing.

We chose to solve this problem at the architectural level rather than relying on abstractions and leaving wires tangled and loose. Core Principle: Deterministic systems should be engineered to deliver predictable behavior within well-defined bounds for latency, memory consumption, and resource utilization. Achieving this requires deliberate architectural choices such as bounded memory allocation where appropriate, explicit ownership of resources, predictable execution paths, platform-aware scheduling, and continuous measurement to verify that performance targets are consistently met. System memory and CPU cycles are not to be treated as infinite resources. Efficient engineering is a conservative process, and it must be respected.

C++: DocWire’s choice

With various options at hand, the choice for a solution that was being built to give power back to the developer community could not have been corrupted by notions like ease and complexity. We, at DocWire, chose to build in C++, not because we were looking at this problem from a ‘choice of language’ perspective but from an architectural perspective. C++ is chosen because no other language delivers the combination of compile-time determinism, zero-cost abstraction, direct hardware access, and memory layout control that document processing at the production edge requires.

And if one digs deeper, one can easily find out that every major production AI inference engine uses C++ at its core. Be it NVIDIA TensorRT, Meta's PyTorch C++ API, or llama.cpp, etc., they are written in C++ not by preference but by necessity. The performance per watt, the memory layout control, and the zero-overhead abstraction model that these workloads demand have no equivalent in managed runtimes.

Contrary to the narrative of C++ being a legacy language, it has survived the test of time. Its adoption has always fared well in mission-critical industries such as finance, defense, etc. And in current times, when the above-discussed problems are more evident than ever, C++ offers what other choices don't: “Building systems that combine high-level abstraction with precise hardware control”.

And it is not just DocWire, which is realizing C++ potential, but even the legendary entrepreneur Elon Musk has recently acknowledged that xAI architecture is being written completely in C/C++.

What C++ offers:

Predictable code & performance:

Aligned with our motive to return control to the developers, C++'s greatest strength is that it gives the developers precise control over how software executes. Unlike languages such as Java, C#, JavaScript, or Python, standard C++ applications do not depend on a garbage collector, bytecode interpreter, or just-in-time (JIT) compiler. Instead, source code is compiled directly into native machine code before deployment. And the consequences are significant:

Deterministic Execution: A C++ function performs exactly the work that the programmer wrote. There is no runtime deciding when to optimize code*, relocate objects, or reclaim memory. Applications with strict timing requirements, such as financial trading systems, medical devices, autonomous robots, spacecraft, and industrial control systems, cannot tolerate random pauses caused by garbage collection. C++ avoids these pauses entirely unless a project deliberately introduces such mechanisms. *Note: Compiler can decide about some optimizations but it is predictable. Same version of compiler, same input means same output every time, and it can be tested and guaranteed.
Explicit memory management: Modern C++ encourages Resource Acquisition Is Initialization (RAII), smart pointers (std::unique_ptr, std::shared_ptr), stack allocation, and deterministic destruction. The reason being: Heap allocation is unpredictable even in C/C++ since the responsibility is tied to the OS. However, it is possible to manage it explicitly: preallocate larger parts of memory, place smaller objects there, and then release the whole block. This leads to predictable memory usage, predictable cleanup, and fewer memory leaks.
Zero-overhead abstractions: "What you don't use, you don't pay for. What you do use, you couldn't hand code any better." -- Bjarne Stroustrup (while explaining C++ core design principle).

Templates, inline functions, constexpr evaluation, move semantics, and compile-time polymorphism allow developers to build high-level abstractions that often compile down to code identical to handwritten C.
Cache-friendly programming: C++ allows developers to optimize data layout and cache locality. Few higher-level languages expose this degree of control. For a compute-intensive environment, every microsecond matters.

Mature Standards

C++ does not fall under the sovereign control of a single company to maintain its standards, but the language is standardized through the International Organization for Standardization (ISO). Committee members include experts from various companies, ranging from MS, Google, Red Hat, etc. This is a major reason why C++ evolution reflects broad industry needs rather than a single vendor's priorities.

And one of the remarkable capabilities of C++ is to maintain backward compatibility. Large codebases written twenty years ago often compile with modern compilers after relatively small modifications.

Native Ecosystem

C++ possesses one of the richest ecosystems in software engineering, and its influence extends far beyond standalone applications. Core components of nearly every modern operating system are implemented in C or C++. Be it Windows, Linux kernel modules, Android libraries, or macOS frameworks, C++ integrates naturally. And then there is a plethora of open-source libraries offering various utilities.

Moreover, C++ frequently acts as the performance layer beneath higher-level languages. For example, Python packages such as TensorFlow, PyTorch, OpenCV, etc. often wrap C++ libraries. And C++ comes with a mature tooling system with options in compilers, build systems, and debuggers to choose from.

Proven Longevity

Created in the early 1980s, C++ remains among the world's most widely used programming languages across critical industries such as finance, aerospace, automobile, healthcare, cloud infrastructure, etc. Historically, C++ has been chosen to build infrastructure that is built to last long.

And it is the adaptability of C++ to adopt modern language features inspired by newer languages, while maintaining compatibility, which gives it an edge. It is an actively developed language rather than just a legacy.

In a nutshell: DocWire is built on C++ to deliver Uncompromising Predictability, Auditability by Design, Edge Optimization, and Robustness against chaos.

🔗References:

https://www.cloudzero.com/blog/aws-and-azure-outages/

https://fin.ai/learn/hipaa-gdpr-compliant-ai-agents

https://www.hbs.net/blog/cloud-repatriation-trends-cost-ai-and-the-push-towards-hybrid

https://www.citadelsecurities.com/careers/career-perspectives/why-c-is-growing-and-what-c26-means-for-production-systems/

https://arxiv.org/abs/2508.11269

https://x.com/elonmusk/status/2071385784154759468

Build Docwire Pipe-chain in 6 easy steps

May 7, 2026 · 8 min read

Reeshabh Choudhary

Principal Solutions Architect - DocWire

Introduction

Docwire is a powerful data extraction tool, developed on Modern C++, that converts text from nearly all known file formats into searchable and editable data. Powered by the Tesseract OCR engine, DocWire is a solution for digitizing text from many image types, MS Office files, e-mails, or e-mail attachments. DocWire outputs data to plain text that may be transmitted for further processing.

One of the interesting aspects of Docwire SDK is its ability to process documents locally (or even make an OpenAI API call) through a series of customizable steps that can be added or removed as per requirements. For example, consider the following code example ():

std::filesystem::path("data_processing_definition.doc") | content_type::detector{} | office_formats_parser{} | PlainTextExporter() | out_stream;

In the above pipeline processing, a document is being picked, its content type is being detected, and accordingly, a required parser is being applied, and then the text output is being exported. Now, if we wish, we can add additional steps in the pipeline, for example:

std::filesystem::path("data_processing_definition.doc") | content_type::detector{} | office_formats_parser{} | PlainTextExporter() | local_ai::model_chain_element("Translate to spanish:\n\n") | out_stream;

🔗Explore Docwire code examples in the official examples documentation.

Now, we have added a local model to translate the text in the document to Spanish and then stream the output. It seems as if the product is moving on a conveyor belt, and necessary customizations can be applied, such that output of the previous step acts as an input of the next step, exactly how a pipeline chain would work. In software terms, this emulates exactly how the Unix pipe operator | works in the terminal.

Not only is this feature cool, but the engineering behind its implementation is also praiseworthy. However, before we dive into the exact implementation, we try to build the intuition as usual:

We are trying to build a processing pipeline. The element to be processed can be of various types, as Docwire itself supports more than 100+ file formats. However, for the sake of brevity, we take the simplest example and keep the focus on how the pipe chaining works.

1st Step

We start by defining the entity we want to process, and we keep it simple:

/**
 * A simple Message struct or `class`
 */
struct Message {
  virtual ~Message() = default;
};

struct StartMessage : Message {};
struct TextMessage : Message {
  std::string text;
  TextMessage(std::string t) : text(std::move(t)) {}
};
struct EndMessage : Message {};

We have defined a base entity and various types of such entities, and based on the types, the parsing steps will decide how to act.

Note: In C++, classes are primarily structs. I have taken the approach of structs here, as the intention is to keep the implementation short and minimal.

The intended behavior is as follows:

A message entity will be passed around and at each stage of parsing/processing in the pipeline, based on the processing result, it will be decided what output to forward to the next step, or whether we need to propagate something upstream, such as errors or cancellations.

2nd Step

So, we define the respective structure first to capture such behaviors, and then we define the structure for chain elements, the base entity that will ensure the necessary behaviors are inherited by different chain elements while parsing/processing.

// Whether to continue or not
enum class Continue { Yes, No };

//Aliases
using Msg = std::shared_ptr<Message>;
using Callback = std::function<Continue(Msg)>;

// Whether to forward a message or bubble out
struct MessageCallbacks {
  Callback front;
  Callback back;
};

// Structure of a basic pipeline chain element
struct ChainElement {
  // the main processing function which will be custom implemented by respective Chain Elements
  virtual Continue process(Msg msg, MessageCallbacks next) = 0;
  // If yes: element consumes message and propagate
  virtual bool is_generator() const { return false; }
  // If yes: element consumes message but does not propagate
  virtual bool is_leaf() const { return false; }
  // Destructor
  virtual ~ChainElement() = default;
};

3rd Step

Now, we define different parsing chain elements:

struct SimpleParser : ChainElement {

  bool is_generator() const override { return true; }
  Continue process(Msg msg, MessageCallbacks next) override {
    if (dynamic_cast<StartMessage *>(msg.get())) {
      std::cout << "Parser reading file...\n";

      next.front(std::make_shared<TextMessage>("Hello "));
      next.front(std::make_shared<TextMessage>("DocWire "));
      next.front(std::make_shared<TextMessage>("Pipeline!"));
      next.front(std::make_shared<EndMessage>());
      return Continue::Yes;
    }
    return next.front(msg);
  }
};

struct TextFilter : ChainElement {
  bool is_generator() const override { return false; }
  Continue process(Msg msg, MessageCallbacks next) override {
    if (!dynamic_cast<TextMessage *>(msg.get()))
      return Continue::No;
    return next.front(msg);
  }
};

struct TextExporter : ChainElement {
  bool is_leaf() const override { return true; }
  Continue process(Msg msg, MessageCallbacks) override {
    if (auto t = dynamic_cast<TextMessage *>(msg.get()))
      std::cout << "Exported: " << t->text << "\n";
    return Continue::Yes;
    ;
  }

};

Note: TextExporter is a leaf node in this chain; it does not propagate the message forward. We can treat it as the last step of finishing the pipeline processing.

However, there is one piece which is missing: how do we chain the pipeline through the | operator, and more importantly, are we going to use references of elements in the processing chain, or are we going to own them?

4th Step

/**
 * A Class template to own or borrow references
 */
template <typename T> class ref_or_owned {
  std::shared_ptr<T> owned;
  T *ref = nullptr;

  // move ownership of a heap object into owned,
  //  and we store a raw pointer alias (ref) for fast, uniform access.
public:
  // reference
  ref_or_owned(T &t) : ref(&t) {}

  // owned
  ref_or_owned(std::shared_ptr<T> t) : owned(std::move(t)), ref(owned.get()) {}

  T &get() { return *ref; }
  const T &get() const { return *ref; }
};

In C++, objects can come from different places: Example 1:

auto parser = std::make_shared<SimpleParser>();
// This means that the program is responsible for keeping it alive
// Multiple parts of the program can safely share it

Example 2:

SimpleParser parser;
// The object lives somewhere else, and we are just borrowing it.

Our pipeline should be supporting both cases; hence, we have a helper class template ref_or_owned which does not care about an object being borrowed or owned. For a borrowed object, it stores a reference, and for an owned object, it takes ownership and keeps it alive.

5th Step

Now, we define the structure for a basic parsing engine that inherits the properties of a Chain element, but its job is to couple two chain elements, which we refer to as lhs (left side element of processing chain or the element whose output will serve as input to the next) and rhs (right side element of processing chain).

For example, if we write: parser 1 | parser 2, this means that the output of parser 1 will be fed to the output of parser 2, upon which it will do further processing.

// Shared object pointer
using Element = std::shared_ptr<ChainElement>;

// Basic Parsing engine
struct ParsingChain : ChainElement {
  // should handle elements whether borrowed or owned
  ref_or_owned<ChainElement> lhs;
  ref_or_owned<ChainElement> rhs;

  // Constructors
  ParsingChain(Element a, Element b) : lhs(std::move(a)), rhs(std::move(b)) {}
  ParsingChain(ChainElement &a, ChainElement &b) : lhs(a), rhs(b) {}
  ParsingChain(ref_or_owned<ChainElement> a, ref_or_owned<ChainElement> b)
      : lhs(a), rhs(b) {}

  bool is_generator() const override { return lhs.get().is_generator(); }

  bool is_leaf() const override { return rhs.get().is_leaf(); }

 // Processes the `msg` arriving at the chain, and passes it to `lhs`
 // When `lhs` wants to propagate the message, it redirects to `rhs`
  Continue process(Msg msg, MessageCallbacks cb) override {
    MessageCallbacks lhs_cb{
    // front of lhs → rhs
                            [&](Msg m) { return rhs.get().process(m, cb); },
                            // back of lhs → back of chain
                            cb.back};
    return lhs.get().process(msg, lhs_cb);
  }
};

Apart from handling Chain elements in its constructor, the above structure also facilitates the processing of elements and their chaining from one stage to another.

6th Step

Now, we need to make use of the | operator to do the chaining for our cause and execute the pipeline once it is complete.

Element operator|(Element a, Element b) {
  return std::make_shared<ParsingChain>(a, b);
}

Element operator|(ChainElement &a, ChainElement &b) {
  return std::make_shared<ParsingChain>(a, b);
}

ParsingChain operator|(ref_or_owned<ChainElement> a,
                       ref_or_owned<ChainElement> b) {
  ParsingChain chain{a, b};

  if (chain.is_generator() && chain.is_leaf()) {
    chain.process(std::make_shared<StartMessage>(),
                  MessageCallbacks{[](Msg) { return Continue::Yes; },
                                   [](Msg) { return Continue::Yes; }});
  }

  return chain;
}

The 3rd overload of the | operator basically checks if the pipeline has both a generator and the leaf nodes, and if the answer is affirmative, it automatically starts execution of the pipeline.

Conclusion

And our feature is ready, which can be tested via the following program (given here for just reference):

int main() {
  SimpleParser parser;
  TextFilter filter;
  TextExporter exporter;
  // auto chain = parser | filter | exporter;

  // chain.process(std::make_shared<StartMessage>(),
  //               [](Msg) { return Continue::Yes; });

  auto chain = std::make_shared<SimpleParser>() |
               std::make_shared<TextFilter>() |
               std::make_shared<TextExporter>();

  MessageCallbacks root{[](Msg) { return Continue::Yes; },
                        [](Msg) { return Continue::Yes; }};

  chain->process(std::make_shared<StartMessage>(), root);
}

📎 Following is the link to the actual Docwire implementation of this feature:

Reducing Compile Time Dependencies

May 6, 2026 · 21 min read

Reeshabh Choudhary

Principal Solutions Architect - DocWire

Introduction

In this blog post, we discuss Docwire's adaptation of the PIMPL idiom, which has greatly helped us reduce not only compile-time dependencies but also maintain a healthy encapsulation level over implementation details. However, to lay bare the conceptual thinking behind this adaptation, we do not want to present the actual implementation of the PIMPL idiom directly; rather, we want the reader to understand the intuition behind it. Once we have covered the necessary ground, the strange-looking code will automatically make sense. So bear with us!

Addressing Dependencies in C++

Managing dependencies well has always been the philosophy of C++ design to ensure solid code. And the reason being C++’s greatest strength is that it supports two powerful methods of abstraction: object-oriented programming and generic programming, which help manage dependencies and complexities (Sutter, 1999, #). Usually, when we discuss dependencies concerning code, we often talk about run-time dependencies like class interaction, but here, our concern lies with managing compile-time dependencies.

Have a look at the minimalistic code example below:

//--------<d.h>---------------------
class D {
public:
  int num = 10;
};
//--------------<y.h>----------------
#include "d.h"
#include <memory>
class Y {
public:
  explicit Y(const D &d);
  ~Y();
  void someImpl();

private:
  D d_;
};
//-------------<y.cpp>---------------
#include "y.h"
#include "iostream"

Y::Y(const D &d) : d_(d) {}

Y::~Y() = default;

void Y::someImpl() { std::cout << d_.num << std::endl; }
------------------------------

The file to notice here is y.h since this is the file that will be included in some main.cpp. Usually, programmers #include many more headers than necessary, which unfortunately degrades build times, especially when a popular header file includes too many other headers. Ours above is a simplistic one, yet enough to convey the message. Can we somehow remove any header from this file while still having our code compile and run successfully?

When we review the code closely, we see that a certain D appears as a private data member of our class Y as well as a parameter inside its constructor. In C++, we can easily encapsulate the private parts of a class from unauthorized access; however, it requires a bit more work to encapsulate dependencies on a class's private parts, due to the header approach borrowed from the C-Language. A genuine argument may be raised that a client code does not need to care about access to private members of a class; however, since the privates are visible in the header, the client code does have to depend upon any types they mention.

Classic PIMPL approach

To better insulate clients from a class's private implementation details, a special form of handle/body idiom (Coplien, 1991, #), often called the PIMPL idiom, is used. A PIMPL is a pointer pointing to an undefined class, which will be used to hide the private members (and later implementation details) of the current class. The PIMPL idiom leverages C++'s ability to allow pointers to incomplete types and forward declare an entity, such as a type, variable, constant, or function for which a complete definition is yet to be provided. It just allows the compiler to validate the code and tidy up loose ends to produce a neat-looking object file.

`//--------<d.h> using PIMPL Idiom-----------------------------
#include <memory>
class D; // a forward declaration
class Y {
public:
  explicit Y(const D &d);
  ~Y();
  void someImpl();

private:
  // D d_;
  struct YImpl;// a forward declaration
  std::unique_ptr<YImpl> yPiml_; // a simple pointer can be used as well
};
`

See how thed.h header has been replaced with a forward declaration since it is being mentioned as a parameter in the constructor. But, more importantly, pay attention to the forward declaration of the type YImpl; the implementation of it is yet to be seen, and a pointer yPiml_ to hold its object. Now, the private details go to our implementation file y.cpp, which is not visible to the client’s eyes.

`//--------------<y.cpp>---------------------------------
#include "y.h"
#include "d.h"
#include <iostream>
#include <memory>


// YImpl implementation: it contains the private member of Y
struct Y::YImpl {
  D d_;
  YImpl(const D &d) : d_(d) {}
  };

Y::Y(const D &d) : yPiml_(std::make_unique<YImpl>(d)) {}

Y::~Y() = default;

void Y::someImpl() { std::cout << yPiml_->d_.num << std::endl; }`

🔍Consider the memory layout of our class Y. Since it depends only on the data members whose sizes are known, in this case, we just have one data member yPiml_. Hence, on a 64-bit system, the size of class Y will be 8 bytes, and on a 32-bit system, its size will be 4 bytes. Notice, now the implementation of the class can be changed, i.e., private members can be freely added or removed, and it would not require compiling client code. The binary layout of our class is now stable. We can take one step further and even add some other implementation details inside.

Image description

Now, if we modify our class and add some operation implementations as well inside our YImpl, ABI remains stable.

Cost of the PIMPL idiom

⚠️However, with every design choice is associated a trade-off, and it would be naive not to consider the cost of the PIMPL idiom in our design. Now, every class Y object dynamically allocates its YImpl object on the heap, and this now requires every construction and deconstruction to allocate/deallocate memory. Moreover, each access to a hidden member requires at least one extra indirection, which makes the case for a potential cache miss.

Heap allocation is an expensive operation. Not only does it require subtleties such as free arena lookup and memory management, but it also affects locality. Modern CPUs never directly read RAM but load cache lines first. Stack allocations are naturally contagious, but heap allocators scatter memory as it serves as per availability basis. Since the OS does not give memory per allocation, and memory comes in large chunks, the malloc() operating internally does not allocate sequentially per object, but it subdivides large regions and keeps a track of used/free blocks and tries to reuse memory whenever possible.

In our case, since we are using a pointer to some random heap address, the class objects are usually in different cache lines, and a potential miss becomes more likely. Moreover, the CPU predicts patterns for the hardware prefetcher; however, heap addresses are random, and hence, that also fails.

Say, our modified class now looks like:

//------------<y.h>-----------------------------------
#include <memory>
class D; // a forward declaration
class Y {
public:
  explicit Y(const D &d);
  ~Y();
  void someImpl();
  virtual void notify();
  void run();

private:
  struct YImpl;// a forward declaration
  std::unique_ptr<YImpl> yPiml_; // a simple pointer can be used as well
};

//------------<y.cpp>------------------------------------------
#include <memory>

struct Y::YImpl {
  D d_;

  YImpl(const D &d) : d_(d) {}

  void do_run(Y *owner) { 
   owner->notify(); // delegation of policy back to interface
   }
};

Y::Y(const D &d) : yPiml_(std::make_unique<YImpl>(d)) {}

Y::~Y() = default;

void Y::someImpl() { std::cout << yPiml_->d_.num << std::endl; }

void Y::run() { 
  yPiml_->do_run(this); // YImpl must remain unaware of derived types
}

void Y::notify() {// custom implementation}

👀Look closely, and you will find that our class Y has now been made a polymorphic base class. Earlier, the compiler was aware of its definite behavior, but now it acts as an extensible interface. Now, if there is a derived class from our class Y, the compiler does not know whether the actual object is of type Y or its derived type. The call to the virtual function will be decided at run time.

But more importantly, the implementation YImpl does not depend on derived interface types. It is delegating the policy back to the interface Y via owner->notify(). It is still loosely coupled and can handle private implementation details with ease. So, how does the memory layout look for class Y now, and what happens when run is executed?

Image description

📌Note: During compile time, when the compiler detects a virtual function, it generates a vtable symbol and places it in the read-only data section. At runtime, there is only one vtable and all objects of the class point to it.

🔴The problem is evident: we are now chasing pointers🏃‍➡️. Rather than having contiguous memory and predictable access, we are now hopping between the stack and the heap and potentially missing cache lines.

Now we have two major problems at hand. First is now obvious that we have to deal with dynamic polymorphism, and the other one is that our PIMPL adaptation is not generic. Considering large frameworks with a lot of classes, imagine having to declare a pimpl pointer in every class with a forward declaration of the implementation entity. But how do we make this interface generic so that we don’t have to repeat the same code again and again?

Thankfully, C++ has a simple solution that helps us escape such problems: Templates (with some tweaks!).

One Step at a time!

Escaping Dynamic Polymorphism via CRTP

Polymorphism, derived from the Greek word polymorphos, is the ability to associate different specific behaviors with a single generic notation. However, when we talk about general polymorphism, it is almost always about run-time behavior detection, also known as dynamic polymorphism. A reason for it being, historically, C++ started with supporting polymorphism only through the use of inheritance combined with virtual functions. However, templates also allow us to associate different specific behaviors with a single generic notation, but this association is generally handled at compile time, which we refer to as static polymorphism (Vandevoorde et al., 2017, #). However, the approach using templates does not rely on the factoring of common behavior in base classes. Instead, templates provide us with different behaviors based on derived classes at compile time only. Since no indirection through pointers is needed a priori and nonvirtual functions can be inlined much more often, the generated code is much faster, but executable code size may be large. This approach is more type-safe since all the bindings are checked at compile time.

In simpler terms, rather than determining one specific behavior out of possible derived behaviors at run time and then generating code specific to it, we have all code templates of different behaviors based on derived classes at compile time only. However, this requires that the base class must be able to access the derived class. This is made possible by adapting the Curiously Recurring Template Pattern (CRTP) idiom introduced by James Coplien.

By passing the derived class down to its base class via a template parameter, the base class can customize its own behavior to the derived class without requiring the use of virtual functions. This makes CRTP useful to factor out implementations that can only be member functions (e.g., constructors, destructors, and subscript operators) or are dependent on the derived class’s identity.

Below is one possible adaptation of the CRTP idiom to make our solution detect polymorphic behavior at compile time:

//--------------<y.h>---------------
#pragma once
#include <concepts>
#include <memory>
class D;

template <typename Derived>
class Y {
public:
  explicit Y(const D &d);
  ~Y();

  void someImpl();
  void run();

protected:
  void notify(); // forwarded to Derived

private:
  struct YImpl;
  std::unique_ptr<YImpl> yPimpl_;
};

//-----------<y.cpp>----------------
#include "y.h"
#include "d.h"
#include <iostream>

template <typename Derived>
struct Y<Derived>::YImpl {
  D d_;

  YImpl(const D &d) : d_(d) {}

  void do_run(Y *owner) {
    owner->notify(); // calls CRTP notify
  }
};

template <typename Derived>
Y<Derived>::Y(const D &d) : yPimpl_(std::make_unique<YImpl>(d)) {}

template <typename Derived>
Y<Derived>::~Y() = default;

template <typename Derived>
void Y<Derived>::run() {
  yPimpl_->do_run(static_cast<Derived *>(this));
}

template <typename Derived>
void Y<Derived>::someImpl() {
  std::cout << yPimpl_->d_.num << std::endl;
}

template <typename Derived>
void Y<Derived>::notify() {
  // forward to real derived class
  static_cast<Derived *>(this)->notifyImpl();
}

//---------<derived.h>------------
#pragma once
#include "y.h"
#include <iostream>

class DerivedCL: public Y<DerivedCL> {
public:
    using Y<DerivedCL>::Y;

    void notifyImpl()
    {
        std::cout << "Derived notify\n";
    }
};

🔔The intuition behind the code changes above is to avoid the usage of virtual functions to avoid dynamic polymorphism, yet preserve the benefits of the PIMPL idiom. However, there is one subtle and more important caveat here:

“Template code is generated only when a template is instantiated, and instantiation happens where the full definition is visible. Templates are compiled on demand, not beforehand.”

C++ compiles .cpp files independently, and linking is done at a later stage. Suppose a main.cpp is making use of our existing structure. It instantiates a derived class of Y, passes necessary parameters, and calls upon required member functions. But, the problem is that when it gets compiled, the compiler will see only:

template<typename T>
class Y;

The implementation details are present in y.cpp file and are not visible to main.cpp, as it has not been linked yet. We then have two options: either move definitions in the header or explicitly instantiate the template (least favorable). This is the core reason why C++ STL is header-only.

🔔 There is another problem with the code above: what if the derived class fails to implement the required function definition? How do we ensure that behavior exists to satisfy our structure?

Here is the more refined version:

//---------<y.h>-----------------------------
#pragma once

#include <memory>
#include <iostream>

class D; //forward declaration

template <typename Derived>
class Y {
public:
    explicit Y(const D& d);
    ~Y();

    void run();
    void someImpl();

protected:
    // Non-Virtual Interface (CRTP forwarding)
    void notify();

private:
    struct YImpl;
    std::unique_ptr<YImpl> yPimpl_;
};


/****Implementation*******/

#include "d.h"

template <typename Derived>
struct Y<Derived>::YImpl {

    D d_;

    explicit YImpl(const D& d)
        : d_(d) {}

    void do_run(Y* owner)
    {
        owner->notify();
    }
};

template <typename Derived>
Y<Derived>::Y(const D& d)
    : yPimpl_(std::make_unique<YImpl>(d))
{}

template <typename Derived>
Y<Derived>::~Y() = default;


template <typename Derived>
void Y<Derived>::run()
{
    yPimpl_->do_run(this);
}

template <typename Derived>
void Y<Derived>::someImpl()
{
    std::cout << yPimpl_->d_.num << std::endl;
}

template <typename Derived>
void Y<Derived>::notify()
{
    // Compile-time contract check
    static_assert(
        requires(Derived d) { d.notifyImpl(); },
        "Derived must implement: void notifyImpl();"
    );

    // Static polymorphic dispatch
    static_cast<Derived*>(this)->notifyImpl();
}
//----------<derived.h>----------
#pragma once
#include "y.h"
#include <iostream>

class DerivedCL : public Y<DerivedCL> {
public:
    using Y<DerivedCL>::Y;

    void notifyImpl()
    {
        std::cout << "Derived notify\n";
    }
};

In the code presented above, template implementation details have been moved to the header only, and more importantly, the CRTP contract is being enforced by static_assert without any circular constraints. Everything is resolved at compile time without any need for a vtable pointer. However, a problem still lurks around: the owner must be manually passed everywhere {do_run(Y* owner)}. A good idea would be to make the YImpl owner aware.

//-------updated <y.h>--- YImpl is owner aware -----------
#pragma once

#include <iostream>
#include <memory>

class D; // forward declaration

template <typename Derived> class Y {
public:
  explicit Y(const D &d);
  ~Y();

  void run();
  void someImpl();

protected:
  // Non-Virtual Interface (CRTP forwarding)
  void notify();

private:
  struct YImpl;
  std::unique_ptr<YImpl> yPimpl_;
};

/****Implementation*******/

#include "d.h"

template <typename Derived> struct Y<Derived>::YImpl {
  Derived &owner_;
  D d_;

  YImpl(Derived &owner, const D &d) : owner_(owner), d_(d) {}

  void do_run() {
    // no owner parameter needed anymore
    owner_.notifyImpl();
  }
};

template <typename Derived>
Y<Derived>::Y(const D &d)
    : yPimpl_(std::make_unique<YImpl>(static_cast<Derived &>(*this), d)) {}

template <typename Derived>
Y<Derived>::~Y() = default;

template <typename Derived>
void Y<Derived>::run() { yPimpl_->do_run(); }

template <typename Derived>
void Y<Derived>::someImpl() {
  std::cout << yPimpl_->d_.num << std::endl;
}

template <typename Derived>
void Y<Derived>::notify() {
  // Compile-time contract check
  static_assert(
      requires(Derived d) { d.notifyImpl(); },
      "Derived must implement: void notifyImpl();");

  // Static polymorphic dispatch
  static_cast<Derived *>(this)->notifyImpl();
}

However, to make the YImpl owner aware, we have introduced a problem. We introduced Derived& owner_; inside the implementation, which causes potential pitfalls when move semantics come to the picture. What happens when a piece of code tries to move the owner itself?

DerivedCL a(d);
DerivedCL b = std::move(a);

After the move semantics play, our implementation will be left with a null-pointer. Hence, a guarantee needs to be provided that once the owner is moved, the owner's reference is updated inside the implementation.

The constraint we have currently is that our implementation of the PIMPL idiom, YIMPL, remains unaware when a move happens. This event occurs outside its scope. Hence, it needs to be communicated once the owner detects a move operation. Moreover, there is one more subtlety we need to be aware of. We are storing the owner’s reference Derived& owner_; inside our implementation, and references cannot be rebound unless they are of type std::reference_wrapper<T>.

//---Updated <y.h> with owner's reference update --------
#pragma once

#include <iostream>
#include <memory>

class D;

struct impl_base {
  virtual ~impl_base() = default;

  virtual void set_owner(void *) {}
};

template <typename T> class impl_owner : public impl_base {
protected:
  impl_owner(T &owner) : owner_(owner) {}

  T &owner() { return owner_.get(); }
  const T &owner() const { return owner_.get(); }

  void set_owner(void *new_owner) override {
    owner_ = *static_cast<T *>(new_owner);
  }

private:
  std::reference_wrapper<T> owner_;
};

template <typename Derived> class Y {
public:
  explicit Y(const D &d);
  ~Y() = default;

  Y(Y &&other) noexcept;
  Y &operator=(Y &&other) noexcept;

  void run();
  void someImpl();

private:
  struct YImpl;
  std::unique_ptr<impl_base> impl_;

  void rebind_owner();
};

#include "d.h"

template <typename Derived> struct Y<Derived>::YImpl : impl_owner<Derived> {
  D d_;

  YImpl(Derived &owner, const D &d) : impl_owner<Derived>(owner), d_(d) {}

  void do_run() { this->owner().notifyImpl(); }
};

template <typename Derived> Y<Derived>::Y(const D &d) {
  impl_ = std::make_unique<YImpl>(static_cast<Derived &>(*this), d);
}

template <typename Derived>
Y<Derived>::Y(Y &&other) noexcept : impl_(std::move(other.impl_)) {
  rebind_owner();
}

template <typename Derived>
Y<Derived> &Y<Derived>::operator=(Y &&other) noexcept {
  impl_ = std::move(other.impl_);
  rebind_owner();
  return *this;
}

template <typename Derived> void Y<Derived>::rebind_owner() {
  if (impl_)
    impl_->set_owner(static_cast<Derived *>(this));
}

template <typename Derived> void Y<Derived>::run() {
  static_cast<YImpl *>(impl_.get())->do_run();
}

template <typename Derived> void Y<Derived>::someImpl() {
  auto *impl = static_cast<YImpl *>(impl_.get());

  std::cout << impl->d_.num << std::endl;
}

In the updated code above, we have introduced a communication channel to allow the owner to communicate lifecycle changes to the YImpl. The owner now no longer depends on the implementation layout. Rather than holding std::unique_ptr<YImpl>, the owner now holds std::unique_ptr<impl_base>.

An argument can be made that virtualization is back in our code, but this time we are not virtualizing behavior, but passing on a communication at run time, and this event is also not as frequent as it happens only when move semantics is at play.

🎯Our final goal should be a reusable infrastructure where any class T automatically gets a PIMPL, and optionally allows the implementation to call back into its owner safely (even after move semantics).

👉Hence, the final leap, and we present you the Docwire adaptation of the PIMPL idiom.

The Final Leap: Docwire’s PIMPL adaptation

What we have developed so far is not a feature of a specific class, but an improved capability of a class. And this capability should be enabled for all other classes in the framework. ©️Following is the actual code in the Docwire framework for the PIMPL adaptation:

#ifndef DOCWIRE_PIMPL_H
#define DOCWIRE_PIMPL_H

#include <memory>

namespace docwire
{

template <typename T>
struct pimpl_impl;

class with_pimpl_base {};

struct pimpl_impl_base
{
	virtual ~pimpl_impl_base() = default;
	virtual void set_owner(with_pimpl_base&)
	{
	}
};

template <typename T>
class with_pimpl_owner;

template <typename T>
class with_pimpl : public with_pimpl_base
{
protected:
	using impl_type = pimpl_impl<T>;

	template <typename... Args>
	impl_type* create_impl(Args&&... args)
	{
		if constexpr (std::is_base_of_v<with_pimpl_owner<T>, impl_type>)
		{
			static_assert(std::is_constructible_v<impl_type, T&, Args...>,
				"Template specialization of pimpl_impl<T> that inherits from with_pimpl_owner<T> is required to have constructor with T&, Args... arguments");
			return new impl_type(static_cast<T&>(*this), std::forward<Args>(args)...);
		}
		else
		{
			static_assert(std::is_constructible_v<impl_type, Args...>,
				"Template specialization of pimpl_impl<T> is required to have a constructor with Args... arguments");
			return new impl_type(std::forward<Args>(args)...);
		}
	}

	template <typename... Args>
	explicit with_pimpl(Args&&... args)
		: m_impl(static_cast<pimpl_impl_base*>(create_impl(std::forward<Args>(args)...)))
	{
	}

	with_pimpl(with_pimpl<T>&& other) noexcept
		: m_impl(std::move(other.m_impl))
	{
		if (m_impl)
			set_impl_owner();
	}

	with_pimpl(std::nullptr_t) {}

	with_pimpl& operator=(with_pimpl&& other) noexcept {
		if (this != &other)
		{
			m_impl = std::move(other.m_impl);
			if (m_impl)
				set_impl_owner();
		}
		return *this;
	}

	template <typename DeferInstantiation = void>
	impl_type& impl() { return *static_cast<impl_type*>(m_impl.get()); }

	template <typename DeferInstantiation = void>
	const impl_type& impl() const { return *static_cast<impl_type*>(m_impl.get()); }

private:
	std::unique_ptr<pimpl_impl_base> m_impl;

	void set_impl_owner()
	{
		m_impl->set_owner(*this);
	}
};

template <typename T>
class with_pimpl_owner : public pimpl_impl_base
{
protected:
	with_pimpl_owner(T& owner) : m_owner(owner) {}
	T& owner() { return m_owner; }
	const T& owner() const { return m_owner; }

	void set_owner(with_pimpl_base& owner) override
	{
		m_owner = static_cast<T&>(static_cast<with_pimpl<T>&>(owner));
	}

private:
	std::reference_wrapper<T> m_owner;
	friend with_pimpl<T>;
};

} // namespace docwire

#endif

We start with the intent of making PIMPL usable and introduce a specialization:

template<class T>
class with_pimpl;
---------------------------
template<typename T>
struct pimpl_impl;

We move the PIMPL implementation outside the class, and now the owner type decides implementation, and binding happens automatically. The class with_pimpl implements a generic, reusable PIMPL framework which aims to centralize ownership while maintaining owner reference, enforce correct construction rules, support move semantics safely, and hide implementation completely from headers. With the approach above, the following is achievable:

class parser : public with_pimpl<parser> {};

The derived class gains a full PIMPL system. The class with_pimpl comes with a construction engine, create_impl, which dynamically creates the implementation object and supports cases where the implementation needs an owner reference to call a public API.

using impl_type = pimpl_impl<T>;

This is the type definition that resolves the concrete implementation type and is returned by create_impl. However, when we store an implementation object inside the with_pimpl class, we store it via a base-class pointer: std::unique_ptr<pimpl_impl_base> m_impl; rather than pimpl_impl<T>. Had it been the latter case, then every translation unit, including the header, must always be aware of pimpl_impl<T>, which beats the purpose of the entire exercise. Any change in implementation would cause the whole code to compile. As a workaround, we define:

struct pimpl_impl_base
{
	virtual ~pimpl_impl_base() = default;
	virtual void set_owner(with_pimpl_base&)
	{
	}
};

And while instantiating the template through a class T in our .cpp file elsewhere, we write:

template<>
struct pimpl_impl<T> : pimpl_impl_base
{
...
};

As a result, an IS-A relationship is created between pimpl_impl<T> and pimpl_impl_base, since C++ supports Standard polymorphic upcast. And it also paves the way to store different concrete types behind one uniform interface.

🕵️There is a strange code, though: template <typename DeferInstantiation = void> impl_type& impl();

C++ performs template instantiation whenever the compiler wants to check correctness, and in doing so, it may instantiate templates earlier than expected to verify validity.

The definitions of entities generated by a template are not limited to a single location in the source code. The location of the template, the location where the template is used, and the locations where the template arguments are defined all play a role in the meaning of the entity. When a C++ compiler encounters the use of a template specialization, it will create that specialization by substituting the required arguments for the template parameters. This implies that the compiler often needs access to the full definition of the template and some of its members at the point of use (Vandevoorde et al., 2017, #).

👀Look at the code above closely, especially the following segment:

template <typename T>
class with_pimpl
{
protected:
using impl_type = pimpl_impl<T>;

//---rest of the code
};

Here, pimpl_impl is only forward declared. The real implementation lives elsewhere. When the compiler is compiling the header, it will come across the following code segment:

impl_type& impl() { return *static_cast<impl_type*>(m_impl.get()); 

At this juncture, the compiler must verify the validity of the cast and dereferencing and check whether the return expression is well-formed or not. And to do this validation, it may need semantic information about pimpl_impl<T>. When the compiler encounters the inline definition of impl(), it may instantiate this member function while forming pimpl_impl<T>. 🔑Here, impl_type might be incomplete, since a complete definition of pimpl_impl<T> may not have been provided to the compiler yet. In such cases, operations such as casting and dereferencing can become ill-formed and break compilation. In C++, templates follow Point-of-Instantiation rules, which direct a compiler to instantiate member functions as soon as the class template is instantiated, not when called. This is where ‘Deferred Instantiation’ comes to help. We convert a normal member function into a member function template, and in C++, Function templates are instantiated ONLY when used.

👉In short, the PIMPL idiom is a tradeoff between compile-time scalability and memory locality. For our use case, the Docwire being an SDK, we decided to weigh towards scalability. If it had been a performance-critical application, then memory locality would have been preferred. Having said that, this does not mean we do not care about performance! 😈

🔗Docwire Code Repo Link

References

Coplien, J. O. (1991). Advanced C++ Programming Styles and Idioms. Addison Wesley. Sutter, H. (1999). Exceptional C++. Addison-Wesley. Vandevoorde, D., Gregor, D., & Josuttis, N. M. (2017). C++ Templates: The Complete Guide (2nd ed.). Addison Wesley.

The Return of On-Premises and Edge Computing.​

The Software Development Mindset​

C++: DocWire’s choice​

What C++ offers:​

Predictable code & performance:​

Mature Standards​

Native Ecosystem​

Proven Longevity​

🔗References:​

Introduction​

1st Step​

2nd Step​

3rd Step​

4th Step​

5th Step​

6th Step​

Conclusion​

Introduction​

Addressing Dependencies in C++​

Classic PIMPL approach​

Cost of the PIMPL idiom​

Escaping Dynamic Polymorphism via CRTP​

The Final Leap: Docwire’s PIMPL adaptation​

References​

The Return of On-Premises and Edge Computing.

The Software Development Mindset

C++: DocWire’s choice

What C++ offers:

Predictable code & performance:

Mature Standards

Native Ecosystem

Proven Longevity

🔗References:

Introduction

1st Step

2nd Step

3rd Step

4th Step

5th Step

6th Step

Conclusion

Introduction

Addressing Dependencies in C++

Classic PIMPL approach

Cost of the PIMPL idiom

Escaping Dynamic Polymorphism via CRTP

The Final Leap: Docwire’s PIMPL adaptation

References