# Uncovering and Fixing an Inflation Bug in Aleo

- **Authors**: Suneal Gong
- **Date**: February 19, 2025
- **Tags**: security, zk, audit, aleo

In November 2024, we discovered an inflation bug in the Aleo mainnet. We immediately reported this bug to the Aleo team. The bug was identified and then quickly fixed. Fortunately, no exploitation was detected after a thorough scan.

Thanks to the Aleo team for their prompt and professional action.

In this article, I will explain the context of the bug, how it could have been exploited, and how it was fixed.

## How Does Aleo Work?

Aleo is a blockchain network that utilizes ZKP (zero knowledge proof) to achieve privacy and programmability. Aleo uses the Varuna proof system that is adapted from [Marlin](https://eprint.iacr.org/2019/1047). On Aleo, the user generates the proof of a transaction's execution and the validator (network) verifies the validity of the transaction by verifying the proof. This approach makes the network efficient, as validators do not need to re-execute transactions. Moreover, users can keep their privacy by hiding the execution details.

### Transition

Aleo's transactions are composed of transitions. A Transition represents the execution of a function of the Aleo contract. Aleo's contract language is [Leo](https://www.leo-lang.org/), a straightforward Rust-like DSL (domain-specific language). For more details about Leo read [Exploring Leo: A Primer on Aleo Program Security](https://blog.zksecurity.xyz/posts/aleo-program-security/). Below is a minimal executable function (a transition) on Aleo:

```rust
transition add_private_number(public a: u32, private b: u32) -> u32 {
    let c: u32 = a + b;
    return c;
}
```

In the code the `add_private_number` accepts one public input `a` and one private input `b`, adds the two numbers and returns the private result `c` (on Aleo, the input and output are private by default). As `b` and `c` are private, the values in the transaction are encrypted and not revealed to others. Only the transaction sender can decrypt the encrypted values. When a user sends a transaction calling the function, the network ensures `c` is the sum of `a` and `b` without knowing the value of `b` and `c`.

### The Record Type

Currently, a transition can handle trivial data types like `bool` and `u64`, which are clonable and not suitable for representing assets (like a token). For this purpose, Aleo introduces the `Record` model. The `Record` is a special data type that is basically the leaf of a shielded pool. It generalizes the design of Zcash to allow for various private objects.

A `transition` can take a `Record` type as input and output a `Record` as well. The difference is that a `Record` can only be created by returning one from a transition. Each `Record` has an owner and can only be spent by the owner. A `Record` is consumed when it's passed as the input of a transition. When it is consumed, a Zcash-like nullifier is emitted, which ensures the `Record` cannot be used again. In other words, a `Record` represents non-clonable assets, which is similar to the `Object` concept in the [Move language](https://move-language.github.io/move/), but private and invisible to others.

A program can define its own `Record` types with custom fields. The fields are also private by default. Only the creator and owner are able to decrypt the private fields. Below is an example program used in [Exploring Leo: A Primer on Aleo Program Security](https://blog.zksecurity.xyz/posts/aleo-program-security/):

```rust
program example_program0.aleo {
    record Token {
        owner: address,
        amount: u128
    }

    transition private_transfer_token(private receiver: address, private input_token: Token) -> Token {
        let new_token: Token = Token {
            owner: receiver,
            amount: input_token.amount
        };
        return new_token;
    }
}
```

The code above implements private token transfers. It defines a record called `Token` with `owner` and `amount` fields, which are private by default. The `private_transfer_token` transition takes the `Token` record as input, consumes it, and outputs a new token with the same amount and a new owner. The `input_token` will be consumed because it is passed as an input of the function. The `new_token` will be created because it's returned by the function. This ensures the total supply of the token remains unchanged. As `private_transfer_token` is executed off-chain and the input/output are private, the network won't learn the sender, receiver, and amount.

### On-Chain Finalize Logic

The transition is useful for off-chain computation. However, it can't access on-chain states. For example, for a decentralized exchange (DEX) to perform a swap between two tokens, it will need to fetch and update the current on-chain price. For this purpose, in Aleo a function can define a finalize logic that is executed publicly on-chain.

The finalize logic can read and write on-chain states (e.g., the `data_map` in the example below). When a user creates a transaction, they execute the transition and generate the proof locally, and the transition can emit a `Future` type that specifies the logic to be executed on-chain. Once the transaction is verified and accepted by the network, all the validators will execute the `Future`. This approach also prevents multiple users from generating proofs for updating the same value, which could cause [conflict update issue for multi-user application](https://www.cryptologie.net/article/604/the-zk-update-conflict-issue-in-multi-user-applications/). Here's an Aleo program making use of the finalize functionality:

```rust
program example_program1.aleo {

    mapping data_map: address => u64;

    // transition is executed off-chain
    async transition square_counter(public a: u64) -> Future {
        let square: u64 = a * a;
        return finalize_square_counter(self.caller, square);
    }

    // finalize is executed on-chain
    async function finalize_square_counter(caller: address, square: u64) {
        let v: u64 = data_map.get_or_use(caller, 0u64);
        data_map.set(caller, square + v);
    }
}
```

The `square_counter` transition accepts one input `a` and computes the square of it. Then it outputs the `Future` type, which contains the function (`finalize_square_counter`) and parameters (`self.caller` and `square`). The `Future` here represents the on-chain execution logic "deferred" at proving time, similar to the async task in Rust or JavaScript. In this example the on-chain execution logic is the `finalize_square_counter` function defined by the contract. As the function will be executed publicly by all validators in a similar fashion to ethereum, it can read and write the on-chain `data_map` mapping. Given the `caller` and `square`, the function retrieves the previous value in the slot of the `caller`, adds the `square` and writes back to the mapping.

Run the above `square_counter` transition in [Leo playground](https://play.leo-lang.org/) with `leo run square_counter 1u64` you can get the result:

![Future Output](/img/aleo-bug/future-output.png)

You can see the output of the transition is exactly a `Future` type. The future contains the program id, function name and the arguments of the function. If the transition is valid, then the validator will execute the finalize logic with the given arguments.

Finally, we can get a high-level idea of the full lifecycle of a transaction that uses all of these features in Aleo. Here the user wants to generate the execution of a smart contract method which is already deployed on Aleo:

1. The user executes the transition locally, generating the transition input and output.
2. 
The user generates a proof of execution for that transition (using ZKPs) and includes it in a transaction along with:

cryptographic commitments for each input used (as well as output produced)
relevant plaintext data for each public input/output and encrypted data for each private input/output.

Finally the user sends all of that in a transaction to the network.

   The user generates a proof of execution for that transition (using ZKPs) and includes it in a transaction along with:

   - cryptographic commitments for each input used (as well as output produced)
   - relevant plaintext data for each public input/output and encrypted data for each private input/output.

   Finally the user sends all of that in a transaction to the network.
3. 
Validator verifies the proof of the transaction. That is, given the function and the commitment to the input, the commitment of the output is correct. The validator also verifies that the given input/output data correctly corresponds to the commitment.

   Validator verifies the proof of the transaction. That is, given the function and the commitment to the input, the commitment of the output is correct. The validator also verifies that the given input/output data correctly corresponds to the commitment.
4. If the transaction proof is valid, the network accepts the transaction. If the transition outputs a `Future`, then execute the corresponding finalize logic on-chain.

Now you know enough to understand the bug that we found. Let's get into it!

## The Vulnerability

As part of our continuous security process with Aleo, we managed to uncover an important bug. The bug was caused by an insecure way to commit to the input/output, allowing an attacker to bypass the finalization logic.

In Aleo, a transition is checked by the validator in two steps:

1. A validator first checks the proof of execution contained in a user's transaction to ensure the associated claim: "given the executed function and commitments to its inputs, the resulting output commitment is correct".
2. Then, they verify that the given input/output data correctly corresponds to the commitment.

Below is the code snippet of the second step that checks the output data. Given the output, the `verify` function checks that the `Output` correctly corresponds to the output commitment/hash. This is off-circuit code executed by every validator for each transaction. Can you find where the vulnerability is introduced?

```rust
/// The transition output.
#[derive(Clone, PartialEq, Eq)]
pub enum Output<N: Network> {
    [...]
    /// The ciphertext hash and (optional) ciphertext.
    Private(Field<N>, Option<Ciphertext<N>>),
    /// The output commitment of the external record. Note: This is **not** the record commitment.
    ExternalRecord(Field<N>),
    /// The future hash and (optional) future.
    Future(Field<N>, Option<Future<N>>),
}

impl<N: Network> Output<N> {
    pub fn verify(&self, function_id: Field<N>, tcm: &Field<N>, index: usize) -> bool {
        // Ensure the hash of the value (if the value exists) is correct.
        let result = || match self {
            [...]
            Output::Private(hash, Some(value)) => {
                match value.to_fields() {
                    // Ensure the hash matches.
                    Ok(fields) => match N::hash_psd8(&fields) {
                        Ok(candidate_hash) => Ok(hash == &candidate_hash),
                        Err(error) => Err(error),
                    },
                    Err(error) => Err(error),
                }
            }
            Output::Future(hash, Some(output)) => {
                match output.to_fields() {
                    Ok(fields) => {
                        // Construct the (future) output index as a field element.
                        let index = Field::from_u16(index as u16);
                        // Construct the preimage as `(function ID || output || tcm || index)`.
                        let mut preimage = Vec::new();
                        preimage.push(function_id);
                        preimage.extend(fields);
                        preimage.push(*tcm);
                        preimage.push(index);
                        // Ensure the hash matches.
                        match N::hash_psd8(&preimage) {
                            Ok(candidate_hash) => Ok(hash == &candidate_hash),
                            Err(error) => Err(error),
                        }
                    }
                    Err(error) => Err(error),
                }
            }
            [...]
            Output::ExternalRecord(_) => Ok(true),
        };
        [...]
    }
}
```

The vulnerability arises from the hash/commitment schema for different output types. The commitment schema of the `Output` only includes the data without absorbing the subtype of it. That means we can create two `Output` of different subtypes that have the same commitment. For example, `Output::Future(hash, Some(output))` is checked by concatenating the data into `preimage` and getting the hash of it. `Output::Private(hash, Some(value))` is checked by directly hashing the `value`. Then we can create `Output::Private` with exactly the same hash by simply setting the `value` in the `Output::Private` as `preimage` of the `Output::Future`.

This means that the commitment schema is not binding. Unfortunately, Aleo relies solely on commitment checks for input/output validation without explicit type checks elsewhere. Therefore, an attacker can replace the output with data of a different type in the transaction and still pass the check. This design flaw introduces a critical vulnerability.

Going a step further, remember that the `Future` data represents the finalized logic that is to be executed on-chain. If the `Future` data is replaced by another type of data (e.g., `Output::Private`), then the finalize logic won't be executed on chain. In this way an attacker can bypass any finalize logic in its transaction. One of the impacts is that the attacker can issue an arbitrary amount of Aleo token.

Consider the following function which exists in Aleo's token contract:

```rust
program example_program2.aleo {
    record Token {
        owner: address,
        amount: u64
    }

    mapping public_balance: address => u64;

    async transition transfer_public_to_private(recipient: address, amount: u64) -> (Token, Future) {
        let new_token: Token = Token {
            owner: recipient,
            amount: amount
        };
        let f: Future = finalize_transfer(self.caller, amount);
        return (new_token, f);
    }

    async function finalize_transfer(caller: address, amount: u64) {
        let balance: u64 = public_balance.get_or_use(caller, 0u64);
        let new_balance: u64 = balance - amount; // transaction will revert if underflow happens
        public_balance.set(caller, new_balance);
    }
}
```

The `transfer_public_to_private` transfers public token (stored on-chain in the `public_balance` mapping) to private token (stored by record). It issues the new token record and subtracts the same amount of balance in one transaction. If the public balance is insufficient, the transaction will revert. However, if the `finalize_transfer` logic is skipped, the whole transaction is partially executed. The new token record will still be created by the transition but the on-chain balance is not subtracted. In this way an attacker can issue an arbitrary amount of the token.

Here is how the attack can be performed:

1. The attacker executes the `transfer_public_to_private` transition with `recipient` as its own address and a large `amount`. Then the attacker generates the proof and transaction like normal.
2. The transaction contains two output of the transition: a `Output::Token` type and a `Output::Future` type. The attacker replaces the `Output::Future` with data of `Output::Private` type that has the same commitment value. The attacker sends the transaction to the network.
3. The validator verifies the transaction and decides it's valid: The proof of transaction is valid (because it is not touched) and the attached input/output data are valid (because they have correct commitment).
4. The validator accepts the transaction. It creates the emitted `Token` record. It finds that there is no `Future` type in the output (which is replaced by `Output::Private` type) and then won't execute the on-chain finalize logic.
5. After the transaction is executed, the new `Token` is created but the on-chain public balance is not subtracted. In this way the attacker creates a new `Token` for free.

Note that if such an attack happens, we can find it by checking the input/output data mismatch.

### The Fix

Upon discovering the issue, we immediately reported it to the Aleo team. The Aleo team confirmed the issue and scanned all existing transactions for signs of exploitation. Fortunately, no exploitation was found. Then [the fix](https://github.com/ProvableHQ/snarkVM/pull/2582) was proposed and merged. Since the commitment schema involves multiple parts of the protocol (including the circuit), it was challenging to modify. Therefore, it is fixed by adding explicit checks to ensure that the input/output data type of a transition is correct.

```rust
pub fn verify_execution(&self, execution: &Execution<N>) -> Result<()> {
    [...]
    // Ensure the input and output types are equivalent to the ones defined in the function.
    // We only need to check that the variant type matches because we already check the hashes in
    // the `Input::verify` and `Output::verify` functions.
    let transition_input_variants = transition.inputs().iter().map(Input::variant).collect::<Vec<_>>();
    let transition_output_variants = transition.outputs().iter().map(Output::variant).collect::<Vec<_>>();
    ensure!(function.input_variants() == transition_input_variants, "The input variants do not match");
    ensure!(function.output_variants() == transition_output_variants, "The output variants do not match");
    [...]
}
```

Later, the fix was deployed to all validators. [More tests](https://github.com/ProvableHQ/snarkVM/pull/2607) were added to ensure that malformed inputs and outputs are detected.

## Timeline

- **November 24, 2024** The issue was found and reported to Aleo. Aleo confirmed the issue.
- **November 25, 2024** The Aleo team scanned the entire history of the blockchain and found no evidence of exploitation.
- **December 3, 2024** [The fix](https://github.com/ProvableHQ/snarkVM/pull/2582) was proposed and merged.
- **December 4, 2024** The fix, along with other normal upgrades, was rolled out to all validators.

## Summary

We discovered a critical vulnerability in Aleo that could have allowed arbitrary token minting. We promptly reported the issue to the Aleo team, and together, we worked to fix it.

The key takeaway from this experience is the importance of following the "TLV" pattern (Type, Length, Data) when committing and hashing data. This helps ensure that commitments are secure and prevent exploitation.

Security is a continuous and evolving process. Zero-knowledge projects that are not transparent about their security practices are more prone to vulnerabilities. By consistently raising the bar and rigorously reviewing systems, we can identify and mitigate critical issues early. As the zk ecosystem grows, we remain committed to securing zk projects and contributing to their long-term reliability and safety.

---

This article was published on the [ZK/SEC Quarterly](https://blog.zksecurity.xyz) blog by [ZK Security](https://www.zksecurity.xyz), a leading security firm specialized in zero-knowledge proofs, MPC, FHE, and advanced cryptography. ZK Security has audited some of the most critical ZK systems in production, discovered vulnerabilities in major protocols including Aleo, Solana, and Halo2, and built open-source tools like [Clean](https://github.com/Verified-zkEVM/clean) for formally verified ZK circuits. For more articles, see the [full list of posts](https://blog.zksecurity.xyz/llms.txt).
