- Originally from Olaoluwa Osuntokun (laolu32@gmail.com) with some edits from me!
Last week nearly 30 (!) Lightning developers and researchers gathered in Oakland, California, for three days to discuss several matters related to the current state and evolution of the protocol. This time around, we had much better representation for all the major Lightning Node implementations than at the last LN Dev Summit (Zurich, Oct 2021).
Like the previous LN Dev Summit, attendees kept notes throughout the day that attempted on the best effort basis to capture the relevant discussions, decisions, and new relevant research or follow up areas to reevaluate on.
You can find the meeting notes in full here.
Simple Taproot Channels
During the last summit, Taproot was an important discussion topic. However, even though the network had deployed the soft fork, we were still watching the 🟩's stack up on the road to ultimate activation. Several months later, Taproot has been fully activated, with the ecosystem starting to progressively deploy more and more advanced systems/applications that take advantage of the new features.
One fundamental deployment model that came out of the last LN Dev summit was the concept of an iterative roadmap that progressively revamped the system to use more taproot-y features instead of a "big bang" approach that would attempt to package up as many things as possible into one more extensive update. At a high level, the iterative roadmap proposed that we unroll a pre-existing and more significant proposal into more bite-sized pieces that can be incrementally reviewed, implemented, and ultimately deployed (see my post on the LN Dev Summit 2021 for more details).
Extension BOLTs
Before we started on the first day, I wrote up a minimal proposal that attempted to tackle the first two items of the Taproot iterative deployment schedule (musig2 funding outputs and simple tapscript mapping). I called the proposal "Simple Taproot Channels" as it set out to do a mechanical mapping of the current commitment and script structure to a more taproot-y domain. Rather than edit 4 or 5 different BOLTs with a series of "if this feature bit applies" nested clauses, I instead opted to create a new standalone "extension bolt" that defines new behaviour on top of the existing BOLTs, referring to the BOLTs when necessary. The document's style was inspired by the "proposals" proposal (very meta), popularized by cdecker and adopted by t-bast with his papers on Trampoline and Blinded Paths.
If the concept catches on, extension BOLTs provide us with a new way to extend the spec: rather than insert everything in-line, we could instead create new standalone documents for more prominent features. Having a single self-contained document makes the proposal easier to review and gives the author more room to provide any background knowledge, primaries, and rationale. Then, over time, as the new extensions become widespread (e.g., Taproot is the default channel type), we can fold in the extensions back to the primary set of "core" BOLTs (or make new ones as relevant.
More minor changes to the spec, like deprecating an old field or tightening up some language, will likely still follow the old approach of mutating the existing BOLTs. Still, more extensive overhauls like the planned PTLC update may make the extension BOLTs a better tool.
Tapscript, Musig2, and Lightning
As mentioned above, the Simple Taproot Channels proposal does two main things:
- Move the current 2-of-2 P2WSH SegWit v0 funding output to a single key P2TR output, with the single key being an aggregated musig2 key.
- Map all our existing scripts to the tapscript domain, using the internal key (keyspend path) for things like revocations, which can potentially allow nodes to store less state for HTLCs.
Of the two components, #1 is by far the trickiest. Musig2 is a very elegant protocol (not to mention the spec, which all of you should check out). Still, as the signatures aren't deterministic (like RFC 6979), both signers need to "protect themselves at all times" to ensure they don't ever re-use nonces, which can lead to a private key leak (!!).
Rather than create some pseudo-deterministic nonces scheme (which may work until the Blockstream Research team squints vaguely in its direction), I opted to make all nonces 100% temporary and tied to the lifetime of a connection. Musig2 defines a public nonce which is two individual 33-byte nonces. This value needs to be exchanged before signing can begin (but can be sent before sides know they're aggregated keys). One important thing to note is that the channels today have asymmetric states. So we need a pair of public nonces: one that I'll use to sign my commitment and one I'll use to sign yours. Lightning channels with symmetric state like eltoo can get by with only exchange a single set of nonces, as there's only one message per state.
A nonce exchange takes place in a few places:
- During initial funding: I send my public nonce in the
open_channelmessage, and you send yours in theaccept_channelmessage. After this exchange, we can both generate signatures for the refund commitment transactions. - After the channel is "ready", we send another set of nonces so that we can sign the next state. This is similar to the existing revocation key exchange: I need your next nonce/key before I can sign a new state.
- Upon channel re-establishment, a new set of nonces is sent, as they're 100% temporary. The current draft also requires that you use the new nonces to sign again if you were retransmitting a sig, as it's possible you went to retransmit but left off an expired/trimmed HLTC (could lead to nonce re-use and also needing to remember nonces).
- Each time I revoke my channel, I send you a single nonce, my "local nonce" (naming needs some work here), which lets you sign for a new state.
- Each time I send a new sig, I also send you another nonce, my "remote nonce."
- When I send a shutdown (co-op close), I send a single public nonce so we can sign the next co-opc close offer.
When I send a closing_signed, I send another nonce, so we sign another set once you send your offer.
The final flows aren't 100% yet finalized, as we'll need some implementations drafted to make sure the nonce handling and script mapping work out properly.
Lightning Channels & Recursive Musig2
One other excellent topic was leveraging recursive musig2 (so musig2 within musig2) to make channels even more multi-sigy. The benefit here is that Bob & Carol can each have individual keys (which might be aggregated keys themselves) and make a channel with Alice, who only knows of them as Barol and doesn't know they're another pair of keys at play. This is cool as it allows node operators, wallets, and lightning platforms to experiment with various key/signing trees that may add more security, redundancy, or flexibility. When this first came up, someone brought up that while the scheme is "known" in the initial paper, they weren't sure how to write proof. During the session, someone emailed one of the musig2 authors asking for more details and if it's safe to implement and roll out.
Thankfully they quickly replied and explained that the recursive proof musig (correct me again if I'm wrong) wasn't left out due to impossibility. Still, that proof in the existing Random Oracle Model (used to derive a bound for the number of nonces needed) would lead to a blow-up in the number of nonces required. Attempting to write the proof in some other model would likely lead to better results (proved with two nonces as base musig2), but would be pretty complicated, so hard to read and even review for correctness.
Assuming everything checks out, then a useful mental model explained by the musig2 BIP author is a sort of tree structure. Considering I'm a signer, and we assemble the other signer as a sibling leaf in a binary tree, I need to wait for the sibling nonce/key before aggregating that into the final value. So if there're three signers, I wait for the regular public nonce, but the other signers sum their respective nonces into a single nonce, then send that to me. A similar operation is carried out for key aggregation, with the rest of the protocol being mostly the same.
Ultimately, even if wallets/nodes aren't ready to roll something like this out today, we at least want to make sure the proposed flow is compatible with Simple Taproot Channels. Ideally, we'd have a toy implementation to verify our understanding and show it's possible/sound. But instead, I volunteered to hack up a simple recursive musig2 demo, as there doesn't seem to be any code in the wild that implements it.
Lightning Gossip
Gossip V2: Now Or Later?
Another big topic related to Taproot was how we should update the gossip network: the gossip protocol today has all channels validated by nodes, which requires that the nodes understand how to reconstruct the funding output based on the set of advertised keys. Furthermore, the protocol today assumes a SegWit v0 P2WSH multi-sig is used. Therefore, considering we had everything implemented today, a node wouldn't be able to advertise its new taproot channels to the rest of the public graph as they wouldn't understand how to validate it.
This presents a new opportunity: we already need to rework gossip for Taproot, so should we go ahead and re-design the entire thing with an eye for better privacy and future extensibility?
A proposal for the "re-design the entire thing" was floated in the past by Rusty. It does away with the strict coupling of channels to channel announcements and instead moves them to the node level. Each node would then advertise the set of "outputs" they have control of, which would then be mapped to the total capacity of a node, without requiring that these outputs self identify themselves on-chain as Lightning Channels. This also opens the door to different, potentially more privacy-preserving proofs-of-channel-ownership (something zkp).
On the other hand, we could follow the path of Simple Taproot Channels and map musig2+schnorr onto the existing gossip network. These are more minor changes in total, with the main benefit being the ability to only send one sig (aggregated musig2 sig of keys) instead of 4 individual sigs. I made a very lofty proposal in this direction here.
Ultimately we decided to take the "just musig2 aspects" from gossip v1.5 (not the real name) and the "let's refresh all the messages with TLV goodness" from the gossip v2 proposal. This gives us a smaller package to implement and lets us potentially rejigger the messages to be more extensible and remove cruft like the node colour that almost nothing uses, but we all validate/store.
The follow-up work in this area is a more concrete proposal that updates the relevant gossip messages to be Taproot aware, TLV'd and also updates the set of requirements regarding how to validate the channels in the first place (so given two keys, verify that applying the keyagg method of musig2 lead to what' in the funding output).
Gossip v2 will likely happen "eventually", but the relatively large design space needs to be explored to adequately analyze what privacy and extensibility properties we'll get out of it.
Applying Mini Sketch to LN Gossip
One issue we have today is that other than the initial scid query mechanism added to the protocol, there isn't a great way to ensure that your peer has all the latest updates. In addition, these days, many nodes pretty aggressively rate limit other nodes, so you might even have trouble sending out your update in the first place.
A recent paper (that I haven't read yet) analyzes the gossip network today to work out exactly how long it takes things to propagate, total bandwidth usage, etc. Mini sketch (the grandchild of IBLTs ;)) is an efficient set reconciliation protocol designed for Bitcoin p2p mempool syncing but can be applied to other protocols.
An attendee has been working on brushing off some older work to try to see how we could apply it to the LN protocol to give nodes a more bandwidth-efficient way to sync channel updates and also achieve better update propagation. This supplements some existing investigative work done by Alex Meyers, with more concrete designs regarding: what goes into the sketch and the various size parameters that need to be chosen.
Channel Jamming
An attendee talked about the various proposed solutions to channel jamming, evaluating them on several axes, including punishment/monetary, local vs global reputation, the feasibility of mechanism design, UX implications, and implementation complexity. Unfortunately, the presenter didn't present a new concrete proposal but went through the various tradeoffs, ultimately concluding that they factor monetary penalties wherein the funds are distributed across the route, rather than being provably burnt to miners. However, they alluded to some future upcoming work that attempts a more rigorous analysis of the proposed solutions, their tradeoffs, and potential ways we can parametrize solutions to be more effective (how much should they pay, etc.).
For those looking to brush up on the latest state of research/mitigations in this area, I recommend this blog post by Bitmex research.
Onion Messages & DoS
The topic of DoS concerns onion messages (in isolation, so not necessarily related to things like bolt12 that take advantage of them came up. During a whiteboarding session, some argued that DoS isn't much of an issue, as nodes can leverage "back propagation congestion control" to inform the source (who may not be the sender) that they'll start to drop or limit their packets, with each node doing this iteratively until the actual source of the spam has been clamped. Attendees threw a few lofty designs around, but more work needs to be done to specify something so others can appropriately analyze it concretely.
On the other side of the spectrum, rather than attempt to rate limit at the node level (where each node has its own policy), nodes could opt instead to forward anything as long as the sender pays them enough. I proposed a lofty approach that combined AMP and Onion Messages earlier this year. At a high level, I make an AMP payment, which pushes extra coins to all nodes on a route and drops off a unique identifier to them. Then, when I send an onion message, I include this identifier, with each node performing their own account regarding the amount of bandwidth an ID has.
Ultimately a few implementations are pretty close to deploying their implementation of onion messages, so no matter the intended use case, it would be good to have code deployed alongside to either rate limit or price resource consumption accordingly. Otherwise, we might end up in a scenario where DoS concerns were brushed aside but became a significant issue later.
Blinded Paths, QR Codes & Invoices
Blinded paths is a new-er proposal to solve the "last mile" privacy issue when receiving payments on LN. Today invoices to unadvertised channels contain a set of hop hints anchored at public nodes in the graph and leak the scid of the unadvertised channel (points on-chain to the channel receiving payments). A solution for the on-chain leak, SCID channel aliases is being widely rolled out. Channel aliases instead use a random value in the invoice, allowing receiving nodes to break that on-chain link and rotate out the value periodically. With the on-chain leak addressed, it's still the case that you give away your "position" in the network since, as a sender, I know that you're connected to node N with a private channel.
Blinded paths address this node-level last-mile privacy leak by replacing hop hints with a new cryptographically blinded path. At a high level, the receiver can construct a "hop hint" of length greater than 1, gather the public keys of each node, and then blind them such that: the sender can use them for pathfinding but doesn't know exactly which nodes they are.
There're two types of blinded paths: those in onion messages and those used for actual payments. The latter variant was only formalized earlier this year, as before, people were mainly interested in using them for fetching BOLT12 invoices via onion messages. One issue that pops up when attempting to use blinded paths for regular payments is the size of the resulting invoice. As blinded paths are fragments of publicly known paths, as a receiver, you want to stuff as many of them into the invoice as possible since they MUST be taken to route towards you. Invoices are typically communicated via QR codes, which have a hard limit regarding the amount of information that can be packed in. On the other hand, for invoice fetching, all that matters is that a path exists so that you can get by with stuffing less of then in a QR code.
As a result, blinded paths aren't necessarily compatible with the widely deployed BOLT 11 based QR codes. Instead, a way to fetch invoices on demand is required. Both BOLT-12 and LNURL provide standardized methods for nodes to fetch invoices, though their transport/signalling medium of choice differs. Blinded routes are technically compatible with BOLT 11 invoices but may be hampered because you can only include so many routes.
Another consideration is that blinded paths require more maintenance, unlike hop hints. Furthermore, since they traverse public routes, policy changes like a fee update may invalidate an entire fixed set of routes. One proposed solution is that forwarding nodes should observe their older policy for some time (so a grace period) and also that blinded paths should have an explicit expiry (similar to the existing invoice expiry).
One other implication is that the receiver's set of routes matters more: if they don't send enough or select them poorly, the sender may never be able to reach them even though a path exists in theory. More hands-on experience is needed so the spec authors can better guide implementations and wallets regarding best practices.
Friend-of-a-friend Balance Sharing & Probing
A presentation was given on friend-of-a-friend balance sharing. The high-level idea is that if we share some information within a local radius, this provides the sender with more details to choose a potentially more reliable path. The tradeoff here ofc is that nodes will be giving away more information spies can potentially use to ascertain payment flows. To minimize the amount of information shared, the presenter proposed that just 2 bits of data be transferred. Some initial simulations showed that local information performed better than global information (?). Some were puzzled regarding how that's possible, but assuming the slides+methods are published, others can dig further into the model/parameter used to signal the inclusion.
Arguably, information like this is already available via probing, so one line of thinking is something like: "why not just share some of it", which may lead to more minor internal failures? This is related to a sort of tension between probing to increase payment reliability and as a tool to degrade privacy in the network. On the other hand, others argued that probing provides natural cover traffic since they are payments, though they may not be intended to succeed.
On channel probing, a makeshift protocol was devised to make it more complex in practice, sacrificing too much on the axis of payment reliability.
At a high level, it proposes that:
- nodes more diligently set both their max_htlc amount, as well as the max_htlc_value_in_flight amount
- a 50ms (or select other value) timer should be used when sending out commitment signatures, independent of HTLC arrival
- nodes leverage the max_htlc value to set a false ceiling on the max in-flight parameter
- for each HTLC sent/forwarded, select two other channels at random and reduce the "fake" in-flight ceiling for some time
Some more details still need to be worked out, but some felt that this would kick start more research into this area and make balance mapping slightly more complicated. From afar, it may be the case that achieving balance privacy while also achieving acceptable levels of payment reliability might be at odds with each other.
Eltoo & ANYPREVOUT
One of the attendees is currently working on fully implementing eltoo and specifying the exact channel funding+update interaction if it were rolled out alongside the existing penalty-based channels in the protocol. As this version of eltoo is based on Taproot, we could compare notes to find the overlapping set of changes (nonce handling, etc.), which permits cross review of the proposals. This type of work is superb, as only by fully implementing something end to end can you work out all the edge cases and nuances.
ANYPREVOUT hasn't changed significantly as of late. However, an attendee shared plans to create a mega all-future-feasible-soft-forks fork of bitcoind that would package up various unmerged proposal soft fork packages (from bitcoind) into an easy to run+install binary/project attached to a signet. The hope is that giving developers an easy way to interact with proposed soft fork proposals (vs debasing some ancient pull requests) can facilitate broader participation in testing/implementation/review.
Trampoline Routing
There was a presentation on Trampoline routing explaining the proposal's motivation, history, and current state. The two prominent cases we've narrowed down on are:
- A mobile user doesn't necessarily want to sync the entire graph, so they can use Trampoline to maintain a subset and still be able to send payments.
- A mobile user wants to be able to instate a payment, go offline, and return at a later time to learn about the final state of the payment.
Use case #2 seems to be the most promising when combined with other proposals for holding HTLCs at an origin node (call it an "LSP") 13.
This would allow a mobile node to send a payment and then go offline. The LSP can continuously retry the payment or only when it knows the receiver is online to accept the payment. This may dramatically improve the UX for LN on mobile, as things suddenly become a lot more asynchronous: I do something, go offline, and the LSP node can fulfil the payment in the background, then wait for me to come online to settle the final hop.
Trampoline can also be composed well with blinded routes (blinded route from the last Trampoline to receiver) and MPP (internal nodes can split themselves with local information).
One added tradeoff is that since the sender doesn't know the entire route, they need to overshoot regarding fees and CTLVs. We've known this for a while, but until Trampoline is more widely rolled out, we won't have a very good feel regarding how much extra senders will need to allocate.
Node Fee Optimization & Fee Rate Cards
Over the past few years, a common thread we've seen across successful routing nodes is dynamic fee setting as a way to encourage/discourage traffic. Routing nodes can utilize the set of fees of a channel to either make it too expensive for other nodes to route through ("it's already depleted; don't try unless you'll give me 10m sats", which no one would) or very cheap, which incentivize flows in the other direction. For example, suppose all nodes are constantly sending out updates of this nature. In that case, it can generate a lot of traffic and leak more balance information over time (which some nodes are already doing: using fees/max_htlc to communicate available balances).
One attendee proposed allowing nodes to express a sort of fee gradient via a static curve/bucket/function instead of dynamically communicating what the latest state of the fee+liquidity distribution looks like. A possible manifestation could be a series of buckets, each of which with varying fee rates. For example, if your payment consumes 50% of the channel balance, you pay this rate; otherwise, if it's 5%, you pay this rate, etc. This might allow nodes to capture the same dynamics as they do with more dynamic fee updates, but in a way that leaks less information and consumes less gossip bandwidth.
The Return of Splicing
Splicing is one of those things that was discussed a long time ago but was never really fully implemented and rolled out. A few attendees have started to look at the problem, building off of the interactive-tx scheme that the dual-funding protocol extension uses. The main intricacy discussed was if concurrent splices should be allowed or not, and if so, how we would handle the various edge cases. For example, if I propose a splice to add more funds via my input, but that turns out to already be spent, then the splicing transaction we created is invalid and can never be confirmed.
However, if we allow another splice to take place, and another one, and another one, then ideally, one of them will confirm and serve as the news anchor for the channel.
In a world of concurrent splices, the question of "what is my Lightning balance" becomes even murkier. Wallet and implementations will likely want to show the most pessimistic value while also ensuring that the user can effectively account for all their funds and what they can spend on/off-chain.
LNURL + BOLT12
LNURL and BOLT12 are both standardized ways that answer: how can I fetch an invoice from Bob? However, LNURL differs from BOLT12 in that it uses the existing BOLT 11 invoice format and uses an HTTP based protocol for the negotiation process. BOLT12, on the other hand, is a suite of protocol additions that includes (amongst other things) a new invoice format (yay TLV!) and a way to use onion messages to fetch an invoice via the network.
Assuming blinded paths are widely rolled out, how invoices are obtained becomes more critical as blinded paths mean that you can't fit much in the traditional QR encoding. As a result, fetching invoices on demand may become a more commonplace flow, with all its tradeoffs. There was a group discussion on how we could unify everything by allowing BOLT12 to be used over LNURL or the other way around.
One proposal was to add a new query parameter to the typical LNURL QR code contents. This would mean that when a wallet scans an LNURL QR code if they know of the extra param and what BOLT12 is, they can use the enclosed offer to fetch the invoice.
An alternative proposal was to extract the BOLT12 invoice format from the greater BOLT12 "Offers" proposal. Assuming blinded paths is only specified regarding BOLT12 invoices, this would mean an LNURL extension could be rolled out that allowed returning BOLT12 invoice rather than BOLT 11 invoices. This would enable the ecosystem to slowly transition to a shared invoice format, even if there may be fundamental disagreements regarding how wallets/nodes should fetch the invoices in the first place.
It's worth noting that we can combine both of these proposals:
- If a wallet knows how to BOLT12 Offers, they can take the enclosed offer and use it.
- If they don't know about Offers but can send with the BOLT invoice format, they can fetch that and complete the payment.
This might be an excellent middle ground as it would tend all wallets/implementations to be able to decode and send with a BOLT12 invoice and leave the question of how the node/wallet should fetch it up to the application/wallet/service. In the end, if paths never quite intersect, then it's still possible to add route blinding to BOLT 11, with LNURL sticking with that invoice format to take advantage of the new privacy enhancements