All Tags
Browse through all available tags to find articles on topics that interest you.
Browse through all available tags to find articles on topics that interest you.
Showing 1 results for this tag.
Evaluating Long-Context Reasoning in LLM-Based WebAgents
This paper introduces a benchmark for evaluating long context reasoning capabilities of WebAgents through sequentially dependent subtasks that require retrieval and application of information from extended interaction histories. It observes a dramatic performance degradation as context length increases and proposes an implicit RAG approach for modest improvements.