Tech 4 months ago DeepSeek introduces FlashMLA to increase AI efficiency on Nvidia GPUs FlashMLA has a paging key-value cache with a block dimension of 64 for memory monitoring.