-
MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications
Paper • 2409.07314 • Published • 54 -
Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks
Paper • 2407.21072 • Published • 2 -
Named Clinical Entity Recognition Benchmark
Paper • 2410.05046 • Published • 17 -
5
MEDIC Benchmark
📊Explore LLM performance through benchmark evaluations
Collections
Discover the best community collections!
Collections including paper arxiv:2409.07314
-
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities
Paper • 2408.00765 • Published • 13 -
Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent
Paper • 2407.21646 • Published • 18 -
LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection
Paper • 2408.04284 • Published • 26 -
Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability
Paper • 2408.07852 • Published • 16
-
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper • 2312.00752 • Published • 140 -
Elucidating the Design Space of Diffusion-Based Generative Models
Paper • 2206.00364 • Published • 15 -
GLU Variants Improve Transformer
Paper • 2002.05202 • Published • 2 -
StarCoder 2 and The Stack v2: The Next Generation
Paper • 2402.19173 • Published • 138
-
MedS^3: Towards Medical Small Language Models with Self-Evolved Slow Thinking
Paper • 2501.12051 • Published -
Bridging Language Barriers in Healthcare: A Study on Arabic LLMs
Paper • 2501.09825 • Published • 14 -
Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators
Paper • 2501.09484 • Published • 19 -
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
Paper • 2501.07171 • Published • 50