You are absolutely right about what we feel intuitively - LSPs should beat the shit out of the competition. But surprisingly it did not. Across 10 different LSP servers, across 5 different levels of prompt complexity it did not. Mind you, I painstakingly warmed up the LSP servers that needed it warmed. Some liked it cold and it fared equally non impressively. The pattern I saw was, LLMs (sonnet w.6 with cc) was very clever to use whatever it had to get to a verifiable answer. It could do it just with bash for sure. But as the prompt complexity grew the cost also rose.
Treesitter is sitting in a sweet spot here. a vrainy LLM can find the shortest path with high quality with treesitter and a few bash calls.
I hope someone with a large budget can reproduce these with latest Opus/gpt.
My gut feeling is that higher reasoning models tend to use grep more effectively. But intuitively lsp should still win there.
Treesitter is sitting in a sweet spot here. a vrainy LLM can find the shortest path with high quality with treesitter and a few bash calls.