On Script Upgrades & OP_CAT

Written by: jesse (Asula)

Thanks to reardencode, tyler, red, jim, rajiv and the Asula team for their thoughtful feedback and comments.

There is increasing momentum to improve Bitcoin’s native programming language, Script. Changes to Script are essential for Bitcoin to support more robust and secure applications. However, any upgrade to Script requires a soft fork upgrade by nodes to be implemented. These require a lot of coordination and support across the Bitcoin community, meaning that even the most minor changes to Script must also survive the meandering path of a soft fork to be added to the network.

This leads us to question: what changes to Script warrant such excitement from the community? There are indeed a number of Script improvement proposals, but a proposal to (re)add OP_CAT, the operation for concatenation, has garnered a majority of attention as of late. What is it about concatenation that makes it an attractive candidate for a soft fork?

In this article, we answer each of these questions. We’ll take a fundamental look at Script, the design of Script upgrades and the proposition of OP_CAT.

Bitcoin Transactions & Script

Let’s first get an overview of the role Script plays in Bitcoin. We assume the reader has a high-level understanding of how Bitcoin transactions work (and if you do, feel free to skip this part), but let’s briefly revisit some basics:

Bitcoin transactions are built with two parts, inputs and outputs. The ownership of a bitcoin is derived by the ability to spend an unspent transaction output, or UTXO. Bitcoin UTXOs are chained together, such that the previous UTXO is the input for a future output.

Scripts are used to define and fulfill bitcoin locking conditions. In order to spend from a previous output, the spender provides some data that first unlocks the Script encumbering the bitcoins, and then re-locks the coins to a new Script that requires new spending conditions:

alt_text

This explains why developers focus on improving Script to improve Bitcoin application support. The expressivity of a Bitcoin transaction depends on the spending criteria that can be verified by Script (or in the example above, the ability to refer to the pink key in a Script).

The Limitations of Script

We will need to go one level deeper here to grasp how Script upgrades enter the picture.

Script is a forth-like, stack-based language that is composed of opcodes and elements of data. Transaction logic from locking and unlocking Script is expressed by reading and manipulating data with the opcodes using a stack (or stacks for more complex Script).

Scripts verify the result of opcodes based on whether operations with the stack data evaluates to true or false. We can see an overview of this system:

alt_text

Script itself is designed to prioritize security and simplicity over flexibility to ensure that Bitcoin transactions are reliable and secure. As such there are deliberate constraints to this system.

For example, in order to conserve computational resources required by nodes, a few specific limits are enforced:

the stack for a specific Script can take 1000 elements,
the total Script size is capped at 10,000 bytes (though not applicable for Taproot txs)
individual input stack elements are limited to 520 bytes

There are also constraints on what data can be accessed, and how individual opcodes can access or manipulate this data. Of course each of these are implicitly connected, but it helps to refine our overview of constraints when considering how Script might be improved:

alt_text

Note: 10k byte limits don’t apply to Taproot. 520 bytes only apply to inputs since opcodes cannot produce anything >32 bytes.

Specifically, the limits on how script can verify data for a spending transaction are explained by the following three opcodes:

CHECKLOCKTIMEVERIFY: A spender can include a time lock field in a transaction (nLocktime). A spending condition could be accessed if the nLocktime reaches a predefined value.
CHECKSEQUENCEVERIFY: A spender can include a time lock field in a sequence number in a transaction input (nSequence), which is useful for relative timelocks.
CHECKSIG (with CHECKSIGVERIFY,and/or variants CHECKSIGADD, CHECKMULTISIG and CHECKMULTISIGVERIFY): Checks and verifies that the signature from a spend matches the public key specified in the previous transaction output. This is crucial to define the owner of a UTXO.

Script’s are primarily limited by how CHECKSIG is implemented. The CHECKSIG operation takes new transaction data, hashes it, and then verifies a signature provided in the unlocking Script against the public key from the previous locking Script. However, the Script cannot directly access or inspect the data that went into creating the hash, or the hash itself. We’ll update our basic transaction from above to show CHECKSIG:

alt_text

An oversimplified view of a pay-to-pubkey-hash tx (P2PKH)

CHECKSIG can verify the overall integrity of the transaction by confirming that the signature matches the public key, however it does not expose specific transaction details to the spender. This explains the crux of Bitcoin’s programmability; CHECKSIG cannot access other transaction details.

Improving Script with Introspection & Covenants

New opcode proposals offer improvements on how and/or what data is accessed by opcodes in order to expand what is possible in Bitcoin. This quality is suggestively called “introspection”. Introspection refers to the ability for a Script to access data to verify a spending transaction. Technically Script today has a form of introspection, just with poor specificity; the nSequence, nLocktime and CHECKSIG opcodes are all ways to access data to verify a transaction spend.

Many opcode proposals offer alternative methods to CHECKSIG for accessing transaction inputs. Others introduce new input types or ways to handle inputs. As long as we have a grasp on how Scripts currently work today (which we should have from above), we don’t need to go through how these work.

Most importantly, better introspection enables a pattern for UTXOs called “Covenants”. Covenant UTXOs use introspection to specify spending criteria required from a future spending output, however this spending criteria is not enforced when it is first created. We’ll use a few examples to illustrate this.

A basic form of introspection could add conditions constraining a specific address to only spend coins based on the presence of specific data:

alt_text

We can imagine a more comprehensive example of such introspection that accesses an entire “template” of a future transaction spending criteria, and commits to it. A popular application can use this intermediating template to improve transfer security:

alt_text

In this example, when a withdrawal transaction is executed to a specific address (in pink), an escrow is created. The escrow allows the specific address to receive funds only after a timelock is elapsed. This requires a covenant because the template is nested in the original cold storage vault, but it does not yet control the coins. The cold storage would pre-define this transaction upon setup. If an attacker attempts to steal funds, this added security layer provides the original owner a grace period to intervene.

With this view of Script, we can begin to reason about Script improvements, and why certain proposals get recognition from Bitcoin developers more than others. When it comes to considering new opcodes, proposed opcodes might be feasible to add if they are logically simple and self-containing.

Opcodes are sometimes described as “small” because the operations they perform on the stack are logically basic, and thus they are straightforward to implement. Because these have less complex logic, it’s less likely they come with bugs. It may also be more manageable to control for risks when they are combined with Bitcoin’s existing opcodes.

For a change to be self-containing, a new opcode would only include logic that exists within the bounds of Script, and does not implicate other parts of consensus. The ability to verify spending criteria from a past and future output is acceptable in this context. However, for example, a more complex & computationally intensive opcode would increase requirements for nodes. Or in a more drastic case, opcodes like Ethereum’s LOAD/STORE would require Bitcoin nodes to maintain a shared state. These are fundamentally different from the Covenant UTXOs introduced by introspection opcodes most popular today.

Generally, there is a tradeoff for proposed improvements to Script; requirements to add an opcode should be minimal for Bitcoin’s consensus, however utility should be widespread or useful for some high-demand activity. Much of the attention recently garnered by OP_CAT indicates that it uniquely meets this tradeoff of simplicity and utility.

The Proposition of OP_CAT

OP_CAT is the opcode for concatenation, and was recently proposed in BIP347. It is simple to implement and adds a generic tool that can enable high-demand use cases when combined with other features in Script.

Background

OP_CAT was in the original Bitcoin implementation, however it was removed in CVE 2010-537 as a response to a value overflow incident. OP_CAT could contribute to an overloading attack on nodes by excessively growing stack elements if repeatedly combined with the OP_DUP opcode. This would overload a CPU’s memory and crash nodes, but a swift patch to CVE 2010-537 removed CAT with other opcodes that risked similar incidents. The network has since enforced some limitations to prevent such spam attacks (e.g. the resource limitations from before). Specifically for OP_CAT, the renewed proposal includes an upper bound on output stack elements to address this concern.

There is some positive impact on community sentiment from this background. However we should be aware that Bitcoin today is different than it was in 2010. Bitcoin has introduced new features to the network: there are new opcodes, different address types, larger blocks, and many other changes (for example respectively covered by CHECKSEQUENCEVERIFY, CHECKLOCKTIMEVERIFY, P2SH, Taproot upgrade, and the SegWit upgrade. For a complete list of changes see here).

Rigorous testing will be important to understand how adding a concatenation opcode could be used with these new features. This is currently taking place on the Signet test network.

Use Cases

Concatenation serves a variety of use cases, a handful of which are specified in the bitcoin-inquisition. We will focus on Merkle Tree verification and Introspection-based use cases since these are more relevant for more practical security and scaling applications.

Merkle Path Verification

The utility of merkle trees in blockchains cannot be overstated. They allow a single 32 byte hash of a merkle tree root to represent a larger data structure, enabling efficient verification of arbitrary data on-chain. Merkle path verification is used to verify a proof that some data (e.g. ‘F’ below) was included in a merkle tree:

alt_text

OP_CAT can verify merkle paths when combined with Bitcoin’s existing hashing algorithms. To enable this, a transaction provides necessary data in the merkle path, and Script successively concatenates and hashes these parts to rebuild the merkle root. Script can then check if the rebuilt merkle root is equal to a merkle root previously placed on the stack:

alt_text

Transaction can rebuild to the root using the yellow path data, proving that F was in the merkleized data.

Merkle trees are not new to Bitcoin. For example, in a Bitcoin block header a merkle root commitment is used to represent all transactions in a block. When a user checks if a transaction is included in the block, they use a merkle proof to assert the transaction is present in the tree without recomputing the entire tree. These were also introduced in the Taproot upgrade: in Taproot a single Pay-to-Taproot (P2TR) address represents multiple spending paths using this same property.

In contrast to Taproot, inclusion proofs with OP_CAT provide a more direct format for dealing with merkle trees. In this context, CAT may be considered a natural progression from Taproot; Spending criteria can be embedded in different paths in P2TR transactions and represent logic of arbitrary complexity and size in a verifiable 32 byte hash.

However, the use cases for merkle proofs are more interesting if logic stored in merkleized data is used to power a covenant UTXO. This is where OP_CATs utility becomes more compelling: with its introspection trick.

Covenants with OP_CAT

In 2021, Andrew Poelstra presented a trick to enable introspection by using OP_CAT with a Schnorr signature (a similar method building on this trick was demonstrated by Robin Linus using the ECDSA signature scheme).

Without diving into the math of the trick itself, we can broadly compare the introspection enabled through the CAT+Schnorr Trick to the first covenant example we presented above; a specific transaction input details can be restricted in an output (e.g. spending an amount of coins based on other data being present). The trick is used to place specific transaction details onto the stack, and thus can check info from a future signature against these requirements.

UTXOs can successively use introspection to enforce logic on the same information across transactions, thus allowing data to persist across multiple UTXOs. When used with Bitcoin’s familiar methods for storing data in transactions, like Taproot transactions (with Ordinal theory) or OP_RETURN (with Runes), this can provide a sequential, stateful smart contract in Bitcoin.

Scaling

To put all of these pieces together: with covenant UTXOs from the Schnorr trick, efficiently working with data in merkle trees, as well as Bitcoin’s familiar storage techniques, OP_CAT can power a basic virtual machine (e.g. with CatVM, or first shown with MATT) and even verify FRI-based proof systems (e.g. for Circle STARKs by Starkware). These can be used for building shared UTXOs with unilateral withdrawals or more secure two way pegs into rollups or side chains. This work is being done by Taproot Wizards, Starkware, Weikeng Chen and many other contributors pushing the boundaries of how OP_CAT can enable scalable applications on Bitcoin.

A discussion on both the CatVM and the proof verification warrants an entirely separate article, so for now we will hold off. The best resources for more on these can be found in the Bitcoin Wildlife Sanctuary repo, or other useful resources that cover CatVM.

We should also note that such features from OP_CAT would bring vital improvements to existing Bitcoin scaling options. This includes enabling eltoo and timeout trees for Lightning, or by improving Ark. These fall into another category for Bitcoin scaling that are beyond the scope of this piece, but nonetheless highlight the utility of OP_CAT and covenants. All Bitcoin-based applications stand to benefit when covenants enter the picture.

Concerns & Tradeoffs

There are primarily two criticisms for OP_CAT: its efficacy in the introspection trick, and concerns on covenants, MEV & generally increased usage of Bitcoin.

Inefficiencies & Combinations with Other Opcodes

The Schnorr trick is widely understood as a suboptimal way for Bitcoin to get introspection. The result from use of this trick is a larger Script size, which ultimately takes up more resources in blocks and is more expensive.

A more refined criticism claims that the primary motivation for CAT is this introspection, and so it would be more reasonable to include opcodes that directly improve or provide introspection, like TXHASH or CheckSigFromStack (CSFS). TXHASH (and TXHASHVERIFY) generates a hash of transaction components, enabling scripts to validate and impose requirements based on specific transaction details (similar to the first introspection example above). CSFS can be combined with CAT to check signatures against arbitrary data, in a similar way to TXHASH but not constrained to the hash of a transaction (see more discussion on these from the bitcoin-dev mailing list).

So why shouldn’t an OP_CAT fork include CSFS, TXHASH or other introspection opcodes? Perhaps it should, however we should be reminded of the tradeoff that proposals make; OP_CAT has received attention because of its balance between its simplicity and generic utility. Introspection is possible by an extension of CAT, but not by design. It could include an optimized version of introspection, but adding complexity and more utility-specific design choices may be detrimental to momentum garnered by OP_CAT when taken alone.

Concerns with Covenants, MEV & Bitcoin

The proposed reintroduction of OP_CAT has also raised general concerns about covenant risks and centralizing or harmful forms of MEV. What if Bitcoin recreates Ethereum’s centralized block production and exposes users to MEV?

The bottom line for this discussion is that OP_CAT would introduce more stateful activity that settles on Bitcoin’s base layer, however where opportunities arise for miners to extract value depends on: which products attract more value, volume or attention, and what type of access (if any) miners have to these apps & products.

A case can be made that OP_CAT may actually shield Bitcoin from MEV risks in practice (see Eric Wall’s article). From the perspective of Bitcoin L1, the L2 block production process would be a more permissioned task that is separate from mining. Also due to Bitcoin’s ~10 min block times and probabilistic finality, it’s unlikely for trading activity or other high value applications to occur on the base layer. As a result, it’s possible most activity would occur at a separate scaling layer. Bitcoin nodes would only validate L2 blocks/withdrawals via OP_CAT in Script, thus “shielding” from such MEV opportunities.

The other side of this argument argues that OP_CAT would bring more activity on Bitcoin’s base layer, leading to more opportunities for miners to extract value. Indeed, there is an argument that Bitcoin could attract apps because it has security or liquidity benefits, but the details of how such apps are designed and lead to MEV raises more questions than answers: How exactly would such smart contracts be enabled and adopted at scale as a result of OP_CAT? How do these avoid problems from Bitcoin’s 10 min block times and probabilistic finality? What asset is being traded and how is this different from what we have today

Bitcoin is the only asset recognized on the chain, so any methods to creatively recognize an asset or protocol (e.g. with metaprotocols like Runes) would still be subject to these limits, and require separate off-chain roles. Furthermore, concerns around MEV as a result of Bitcoin usage may be discredited when considering the state of the chain today; Users frequently expose themselves to extractive activity even without OP_CAT. Speculating users are often exploited, e.g. by sniping bots when trading Ordinals.

To conclude

Changes to Bitcoin do not have to be scary. As with any change, there is a risk that must be controlled for. However if we do the work to understand certain additions to Script we can see that certain changes are not threatening to Bitcoin’s deeper mechanics. The simpler the upgrade, the more we can limit the unknown. This provides for a case in favor of OP_CATs addition to Script, as well as potentially many other important yet minimal changes.

We hope this piece equips Bitcoiners to understand Script and reason about Script improvements. Ultimately, we should be aware that without improving the primitive Script language, Bitcoin’s application layer risks further ceding power of the individual to regulated and centralized platforms. Applications and products that are made possible from feasible changes to Script can be quite beneficial for Bitcoin’s utility, as we have discussed with the example of OP_CAT.