Vijay Daita

vdaita@stanford.edu

I recently graduated from UIUC where I studied Computer Science and Economics and will be starting my MS in Computer Science at Stanford in Autumn 2025. Currently, I'm interning at Exa AI, working on enhancing search quality for recent events and code tasks.
I'm broadly interested in agent-based assistants, SE, and MLSys. Over the past few years, I've worked on the following projects:

LLM Optimization

  • Optimized speculative decoding for code editing using HuggingFace Transformers and PyTorch with researchers at UIUC, and integrated it into a custom GUI developed for VSCodedetails
  • Wrote and evaluated custom CUDA kernels for block-sparse attention and flash-decodingvdaita/ece408-final-project
  • Wrotean essay on a training-free approach to improving pooling methods when computing a coarse attention map for block-sparse attention

NLP for Information Retrieval and Analysis

  • An assistant for retrieving and summarizing papers from Arxivvdaita/arxiv-assistant
  • Research assistantship at the Gies School of Business
    • Automated quote extraction to provide qualitative evidence
    • Used named entity recognition and LLM-based few-shot classification for quantitatively understanding industry trends and company priorities
    • Used sentence and word embeddings to understand patterns of nationalistic sentiment in company websites over time.

LLMs for Software Engineering

  • Developed a website that allowed users to chat with the documentation of a given library using Next.jsvdaita/repohelper
    • Created scrapers that used search results to get content from any website (currently disabled for cost)
  • Built a command-line utility using Typer for people to modify their code with LLMs, expanding on Aider by using multi-step reflection and explorationvdaita/superdocs-python
  • Built a VSCode extension using React, and created a backend using serverless functions and Postgres to authenticate users and manage integration of other servicesvdaita/superdocs
  • Worked on an evaluation of LLM understanding of code over long contextsevalplus/repoqa

Other

  • Worked as a Course Assistant for ECE408 (Applied Parallel Programming)
  • Worked as a Course Assistant for CS374 (Introduction to Algorithms and Models of Computation)
  • Worked as a full-stack developer at the Carle School of Medicine, where I worked on integrating LLMs with a Next.js apps to autogenerate quizzes for patients based on their medication.
  • Founded LongLakeTech.com (sold in August 2024), selling frontend development and data science services

Posts

June 22, 2025

Using First Token of a Block to Find Relevant KV Blocks

June 22, 2025

Playing Around With Token Compression

February 15, 2025

Blazing-Fast Code Editing via Multi-Layer Speculation

January 26, 2025

Looking at Linearizing Large Language Models

January 4, 2025

Selecting Blocks for Block-Sparse Attention

June 10, 2024

RepoQA: Evaluating Long-Context Code Understanding