feat: support compress public inputs by johntaiko · Pull Request #55 · taikoxyz/zkevm-circuits

johntaiko · 2023-01-12T09:25:34Z

TODOs:

~~use 32 bytes per rlc item in compressed public inputs~~
fill the FIXED_HUFFMAN_TABLE gracefully (maybe get from witness and be shared with proposer
~~pi-circuit export compressed public inputs from compress-circuit~~

Brechtpd · 2023-01-12T17:35:01Z

+    raw_public_inputs: [Column<Advice>; 32],
+    compressed_public_inputs: [Column<Advice>; 32],


Was there a specific reason you chose to store 32 bytes/row? In any case, like I said on the call, decoding one Huffman encoded word/row makes more sense to me and should be sufficient. With 32 bytes per row in the worst case we'd have to do a lookup/bit to decode the data which would mean 32*8=256 lookups!

I think we need to split origin_rpi's row into 32 rows, u256 -> [u8;32], can I treat the Filed in oirigin_rpi as a u256?

Using 32bytes/row is easy to constraint between the origin raw public inputs and split raw public inputs, just compare the combined 32byte data and origin_rpi row

But now, I have no sense to constraint between them if we use the u8/row, so I need help.

orpi rpi

rpi[0] rpi[0][0]

rpi[1] rpi[0][1]

rpi[0][2]

......

rpi[1][31]

orpi: origin raw public inputs
rpi: split raw public inputs(into bytes stream

With 32 bytes per row in the worst case we'd have to do a lookup/bit to decode the data which would mean 32*8=256 lookups!

Now, we just encode the raw_public_inputs, and compare with compressed_public_inputs from l1 constract.
So, we lookup code by value(byte)

I would say don't worry too much about getting all the 32 bytes on a single row again for now (this will still change when the circuit switches to keccak which is also byte based). But the mechanism it the same as you'll need to collect the (un)compressed data stream to check if it matches the original data stream. Whether you see it as encoding (variable number of compressed bits/row) or decoding (variable number of processed bits/row) you'll have to somehow collect those bits/bytes over multiple rows. So you just use the rlc trick with some extra logic.

So the easy example is like this, where on each row simply a single byte is added to the rlc:

data acc

byte[0] acc = byte[0]

byte[1] acc = (acc, rot::prev()) * r + byte[1]

byte[2] acc = (acc, rot::prev()) * r + byte[2]

.... ....

.... data_rlc

At the end you get the data_rlc which it the encoding of all bytes. However, for the compression part or even if the circuit has to consume multiple bytes per row (like needing 32 bytes on some rows) the logic just gets a bit more complicated. On some rows you add 32 bytes to acc, on other row you may not add any bytes to it. For the compression/decompression circuit you may even have to do this on the bit level where you only want to add a couple of bits to the rlc (depends on the implementation because you can also think of ways to keep the rlc accumulation on the byte level). At the end though you'll have to end up with a value that is equal to data_rlc, so that's simply a check you add at the bottom of the circuit.

Now, we just encode the raw_public_inputs, and compare with compressed_public_inputs from l1 constract.
So, we lookup code by value(byte)

Ah yes that would indeed mean only up to 32 lookups. That' still a lot though (no way around this many lookups for some circuits but here seems pretty inefficient to require that).

I'm sorry, I didn't make it clear, I want to say that how can I constraint the txlist between the tx_table's format and the raw_public_inputs(byte/row)'s format

The tx field in tx_table maybe has been rlc encoded, but the raw_public_inputs that will be compressed into txlist doesn't have rlc encoded fields

The same tx field of txlist in tx_table has been put which is easy for using rotation

Brechtpd · 2023-01-26T21:13:45Z

Starting some discussion on how to do RLC calculation with words of variable number of bits. Here are some possibilities which work but don't look optimal yet but it's a start and they are pretty simple.

Do RLC over bits

Easy, but needs a lot of columns because each bit needs to be stored separately. So does not scale very well, especially if codewords can get large.

Do RLC over N-bit parts using lookups

There's quite a few different cases that aren't that easy to implement directly using custom gates (or the implementation would require extra cells as well to do equality checks). Instead we can just hard-code all possible cases in a table and then lookup the correct way to combine the different parts.

The inputs are

The number of bits modulo 8 of the current bit stream
The number of bits to process (i.e. the number bits in the coded word).

The outputs are (per part!, but in a single lookup over a lot of columns):

num_bits: the number of bits in the part
r_mult: selector for if we need to multiple by r (the randomness), needs to be done after a byte has been completed
shift_factor: the factor with which the part needs to be multiplied
decode_factor the factor with which the part needs to be multiplied when we decode it

There is another output of the number of bits modulo 8 of the resulting bitstream.

(some of the values above may be redundant and maybe could be eliminated, also for some custom gates can be used instead but may require a high degree so just reading from the lookup table is perhaps not so bad. Some data can also be skipped for the "middle parts" e.g. shift_factor/num_bits for every part except the first and last part can be made 1/8 so some room for optimization here by exploiting this).

Even if the coded words can get very large, this table is very small (in height): for a max word size of 32 we only need to store 8*32 = 256 rows in the table. The table is quite wide though because we need to store 4 values/part.

For each part we of course also have to store the actual data, so in total we need to store 5 values per part (4 of which come from the lookup above).

We can then loop over the parts like this:

let data_rlc = data_rlc.prev();
let parts: [Part; MAX_PARTS] = /*lookup_data + values*/;
for part in parts.iter() {
    // We need to make sure each part (given by the prover) isn't larger allowed
    require!((part.value, part.num_bits) => @"range_lookup");
    // Add the part to the rlc
    let mult = ifx!{ part.r_mult => { r } else { 1.expr() }};
    data_rlc = data_rlc * mult + part.data * part.shift_factor;
}
require!(coded_word => decode(parts));

Note that if r_mult == 0 and shift_factor ==0 data_rlc remains the same, which can be used when no more bits need to be added to the rlc.

This method doesn't make a lot of sense when we do the rlc over bytes because we still quite a few columns. However, when do the rlc over 16-bit values (which we can definitely do because the required lookup tables for this are still quite small, needs ~2**17) this methods starts to pay off and is pretty efficient.

…ompress_circuit

johntaiko · 2023-02-02T07:13:13Z

Here is my example:

const MAX_PARTS = 4
const WORD_BITS = 8

if:
	input = "B~"
	code: B = 1011101(7)
	code: ~ = 1111111111101(13)
then:
	output = 10111011111111111101

circuit layout and assignment

constraints

let data_rlc = data_rlc.prev();
let parts: [Part; MAX_PARTS] = /*lookup_data + values*/;
for part in parts.iter() {
   // We need to make sure each part (given by the prover) isn't larger allowed
   require!((part.value, part.num_bits) => @"range_lookup");
   // Add the part to the rlc
   let mult = ifx!{ part.r_mult => { r } else { 1.expr() }};
   data_rlc = data_rlc * mult + part.data * part.shift_factor;
}
require!(coded_word => decode(parts));

input	data	num_bits	r_mult	shift_factor	decode_factor	data_rlc
'B'	1011101	7	0	2^1	2^0	1011101
'B'		0	0	0	0	1011101
'B'		0	0	0	0	1011101
'B'		0	0	0	0	1011101
'~'	1	1	0	2^0	2^12	10111011
'~'	11111111	8	1 (enough 1byte for multiply randomness)	2^0	2^4	10111011*r+11111111
'~'	1101	4	1	2^4	2^0	(10111011r+11111111)r+11010000
'~'		0	0	0	0	(10111011r+11111111)r+11010000

explain:

because the first 'B' only have 7bits encoded data, so we need more 1bit from second encoded word to combine 1byte
the second '~' has 13bits, and moves 1bit to previous byte, so the remaining bits can be split into two parts '11111111' and '1101'
shift_factor and decode_factor will be used to calc the RLC and huffman code
the sum of all num_bits from parts equal the huffman code length

Brechtpd · 2023-02-02T23:34:54Z

Nice!

One question I have when looking at your table, are you planning to process a single part per row? I was thinking of just doing all parts on a single row because it's a bit easier, but doing a part/row also works for me. The only downside I think is that we don't decode a single byte/row anymore so the data is more spread out, but I don't immediately see an issue with that.

johntaiko · 2023-02-03T06:09:57Z

Nice!

One question I have when looking at your table, are you planning to process a single part per row? I was thinking of just doing all parts on a single row because it's a bit easier, but doing a part/row also works for me. The only downside I think is that we don't decode a single byte/row anymore so the data is more spread out, but I don't immediately see an issue with that.

I see, you mean that all 4 parts in one cell, because the max num_bits <= 32, so we can use one Cell(31bytes) represent all parts, each part in 4bytes(32bits)

Cell layout

......	4bytes	4bytes	4bytes	4bytes
......	1st part	2nd part	3rd part	4th part

I think it's more efficient, let's take this approach

Brechtpd · 2023-02-03T13:55:45Z

I don't think that is possible because we do have to check that each part is within the allowed range of bits (we cannot do a range check for 32 bit number, that would require a circuit with height >= 2**32).

I just meant doing mostly what you did, but instead of using MAX_PARTS rows to decode a single code word, we just put all that data and do all the work on a single row.

…ompress_circuit

johntaiko · 2023-02-06T12:34:25Z

I don't think that is possible because we do have to check that each part is within the allowed range of bits (we cannot do a range check for 32 bit number, that would require a circuit with height >= 2**32).

I just meant doing mostly what you did, but instead of using MAX_PARTS rows to decode a single code word, we just put all that data and do all the work on a single row.

Sorry, I misunderstood it. I found it doesn't work when use 4parts/cell(split cell and combine cell is complex

Brechtpd · 2023-02-06T17:15:11Z

+        CompressCircuitConfigArgs { randomness }: Self::ConfigArgs,
+    ) -> Self {
+        let q_enable = meta.selector();
+        let word_begin = meta.advice_column();


I think this can be fixed, because it's just set to true every MAX_PARTS rows. If it's fixed you don't need to do a boolean constraint on the value, if it's an advice column you'd have to add a boolean constraint and have additional constraints to only set it to true every MAX_PARTS rows, so a lot more to check.

Brechtpd · 2023-02-06T17:17:33Z

+            //  3. num_bits0 + num_bits1 + num_bits2 + num_bits3 = huffman_code_length
+            let input = meta.query_advice(input, Rotation::cur());
+            let code = decode::data_expr(&parts);
+            let word_begin = meta.query_advice(word_begin, Rotation::cur());


Suggested change

let word_begin = meta.query_advice(word_begin, Rotation::cur());

let q_word_begin = meta.query_fixed(word_begin, Rotation::cur());

When it's a fixed selector using q_ prefix makes that more clear.

Brechtpd · 2023-02-06T17:20:55Z

+        meta.create_gate("do RLC over parts", |meta| {
+            let mut cb = BaseConstraintBuilder::new(MAX_DEGREE);
+            cb.condition(meta.query_advice(word_begin, Rotation::cur()), |cb| {
+                let mut data_rlc_prev = meta.query_advice(data_rlc, Rotation::prev());


Suggested change

let mut data_rlc_prev = meta.query_advice(data_rlc, Rotation::prev());

let mut data_rlc_expr = meta.query_advice(data_rlc, Rotation::prev());

Or some other name, data_rlc_prev is a bit confusing because it gets updated in the loop so then it's not the previous rlc anymore.

Brechtpd · 2023-02-06T17:22:55Z

+                    //      part.r_multi(bool)
+                    //      part.shift_factor(0..2^32) 2.pow(0..32)
+                    //      part.decode_factor(0..2^32) 2.pow(0..32)
+                    cb.require_boolean("r_multi", part.r_multi.clone());


If you add the lookup as you described in the comments, r_multi will be checked by the lookup so you won't need to check it's boolean using a custom gate, the lookup table will only contain 0 or 1 for the values so the check is already done by the lookup.

Brechtpd · 2023-02-06T17:38:57Z

+                        let mut shift_factor = 0;
+                        let mut decode_factor = 0;
+
+                        if code_len != 0 {


Can reuse code from compress_and_rlc?

Brechtpd · 2023-02-06T17:44:13Z

+            //      + data3 * decode_factor3 = huffman_code
+            //  3. num_bits0 + num_bits1 + num_bits2 + num_bits3 = huffman_code_length
+            let input = meta.query_advice(input, Rotation::cur());
+            let code = decode::data_expr(&parts);


As you're doing the part/row route, you'll have to be careful to correctly construct your table to return the correct data for each row between 0..MAX_PARTS. Perhaps you'll need to add some row_id to the lookup table or something like that. For example, because you're planning to decode the parts like data0 * decode_factor0 + data1 * decode_factor1 + data3 * decode_factor3, decode_factor may be row dependent, though not sure. Also the decoding could perhaps be done differently so this isn't necessary.

johntaiko · 2023-02-06T18:01:26Z

Because of the Rotaion::prev() is not woring in row zero, so I will reversal the table just like pi_circuit do.

Brechtpd · 2023-02-06T18:45:37Z

Another way around Rotation::prev()problems (except of course the obvious: add a selector q_first that is only enabled on the first row and don't use the previous value) is to instead use Rotation::next(). That way all the data necessary to do the calculations is on the current row, and the circuit verifies the starting data on the next row is correct.

Brechtpd · 2023-02-07T15:18:16Z

+        macro_rules! range_check {
+            ($column:ident, $range:ident) => {
+                meta.lookup(concat!(stringify!($column), " range check"), |meta| {
+                    let column = meta.query_advice($column, Rotation::cur());
+                    let q_enable = meta.query_selector(q_enable);
+                    vec![
+                        (
+                            q_enable.clone() * FixedTableTag::$range.expr(),
+                            fixed_table[0],
+                        ),
+                        (q_enable * column, fixed_table[1]),
+                    ]
+                });
+            };
+        }
+
+        range_check!(input, Byte);
+        range_check!(data, Byte);
+        range_check!(num_bits, Range32);
+        range_check!(r_multi, Bool);
+        range_check!(shift_factor, Exponent8);
+        range_check!(decode_factor, Exponent32);


Hmmm so not sure I understand why you do it like this. Shouldn't this be a single lookup into a specifically created table? Because this will simply check that each value separately is in the correct range, but this will not check that all values together are the correct values to use for the current state no? Or how are you planning on checking that these values are the ones necessary (e.g. 5 bits are already used, we decode a word with 2 bits so we need to add these 2 bits to the data stream multiplied by 2. These values are all interconnected and so if you put them all in a single table and lookup all values together you know all of them are correctly set and in the expected range, using a single lookup).

Yeah, you're right. So, I need check some ranges through all parts of one word

Because I have check the data(decode_factor)/num_bits in "lookup huffman table". So I only needs to check the input/r_multi/shift_factor

I still don't really understand. For example shift_factor, how do we verify it's the correct value in the circuit? Using my example above:

5 bits are already used, we decode a word with 2 bits so we need to add these 2 bits to the data stream multiplied by 2.

How does the circuit verify that shift_factor needs to be 2 in this case? Any other value would not create the correct rlc encoding of the data stream. It seems to me the prover could currently supply e.g. 1 which should not be valid.

Brechtpd · 2023-02-07T15:19:31Z

+                        self.q_not_end.enable(&mut region, offset)?;
+                    }
+
+                    macro_rules! assign_part {


feat: support compress public inputs

b24b861

github-actions Bot added the crate.zkevm-circuits label Jan 12, 2023

johntaiko mentioned this pull request Jan 12, 2023

Circuit implementation for compression #42

Open

fix: enable the q_enable selector

537d997

johntaiko force-pushed the feat/compress_circuit branch from 499e27c to 537d997 Compare January 12, 2023 09:42

johntaiko requested review from Brechtpd, smtmfft and xiaodino January 12, 2023 15:03

johntaiko linked an issue Jan 12, 2023 that may be closed by this pull request

Circuit implementation for compression #42

Open

Brechtpd reviewed Jan 12, 2023

View reviewed changes

Comment thread zkevm-circuits/src/compress_circuit.rs Outdated

Brechtpd reviewed Jan 12, 2023

View reviewed changes

Comment thread zkevm-circuits/src/compress_circuit.rs Outdated

Brechtpd reviewed Jan 12, 2023

View reviewed changes

Comment thread zkevm-circuits/src/compress_circuit.rs Outdated

Brechtpd reviewed Jan 12, 2023

View reviewed changes

Comment thread zkevm-circuits/src/compress_circuit.rs Outdated

johntaiko self-assigned this Jan 18, 2023

johntaiko requested a review from Brechtpd January 19, 2023 07:13

refactor: circuit layout

d24e214

Merge branch 'main' of github.com:taikoxyz/zkevm-circuits into feat/c…

c4f552b

…ompress_circuit

johntaiko added 2 commits February 5, 2023 16:31

Merge branch 'main' of github.com:taikoxyz/zkevm-circuits into feat/c…

71261fd

…ompress_circuit

feat: support compress circuit

069fab9

johntaiko force-pushed the feat/compress_circuit branch from c7f1182 to 73ab9c9 Compare February 6, 2023 13:05

fix: add a flag for word begin

7b6a67a

johntaiko force-pushed the feat/compress_circuit branch from 73ab9c9 to 7b6a67a Compare February 6, 2023 17:41

Brechtpd reviewed Feb 6, 2023

View reviewed changes

johntaiko added 2 commits February 7, 2023 02:26

chore: remove unnecessary table assign

76f73d2

chore: rename constraint name

82e56de

johntaiko added 2 commits February 7, 2023 19:57

chore: reuse compress_and_rlc

e159785

feat: add range check

d56d048

johntaiko force-pushed the feat/compress_circuit branch 5 times, most recently from 1e06a66 to d9e82ac Compare February 7, 2023 13:33

improve: use macro to reduce code

1c00b99

johntaiko force-pushed the feat/compress_circuit branch from a2b2b0a to 1c00b99 Compare February 7, 2023 13:41

Brechtpd reviewed Feb 7, 2023

View reviewed changes

fix: remove unnecessary lookups

fb7771e

		raw_public_inputs: [Column<Advice>; 32],
		compressed_public_inputs: [Column<Advice>; 32],

data	acc
byte[0]	acc = byte[0]
byte[1]	acc = (acc, rot::prev()) * r + byte[1]
byte[2]	acc = (acc, rot::prev()) * r + byte[2]
....	....
....	data_rlc

	let word_begin = meta.query_advice(word_begin, Rotation::cur());
	let q_word_begin = meta.query_fixed(word_begin, Rotation::cur());

	let mut data_rlc_prev = meta.query_advice(data_rlc, Rotation::prev());
	let mut data_rlc_expr = meta.query_advice(data_rlc, Rotation::prev());

Conversation

johntaiko commented Jan 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johntaiko Jan 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johntaiko Jan 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Brechtpd commented Jan 26, 2023

Do RLC over bits

Do RLC over N-bit parts using lookups

Uh oh!

johntaiko commented Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Brechtpd commented Feb 2, 2023

Uh oh!

johntaiko commented Feb 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Brechtpd commented Feb 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johntaiko commented Feb 6, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johntaiko commented Feb 6, 2023

Uh oh!

Brechtpd commented Feb 6, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

johntaiko Feb 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

johntaiko commented Jan 12, 2023 •

edited

Loading

johntaiko Jan 18, 2023 •

edited

Loading

johntaiko Jan 18, 2023 •

edited

Loading

johntaiko commented Feb 2, 2023 •

edited

Loading

johntaiko commented Feb 3, 2023 •

edited

Loading

Brechtpd commented Feb 3, 2023 •

edited

Loading

johntaiko Feb 8, 2023 •

edited

Loading