How we test performance

Back of a gaming phone
By Christian de LooperPublished June 19, 2026

The best phone processors can all pretty much handle everything you can throw at them. For the most part, Apple, Qualcomm, and MediaTek all power the most powerful phones. But there is still more to how a phone performs than simply which chip it is — different phones have different cooling capabilities, different software optimizations and background tasks, and more.

That’s why testing performance can matter in the first place. If you’re looking for a phone for heavy mobile gaming and or other ultra-demanding tasks, then you may well want the best of the best — even if you’re choosing between two phones with the same underlying hardware.

Here’s how we test a phone’s performance.

What we measure

Performance comes down to a few things: how fast the processor (CPU) handles everyday tasks and heavy workloads, how the graphics chip (GPU) handles demanding 3D work, and how responsive the phone feels on the web. We measure each with a standard, repeatable benchmark so results are comparable from one phone to the next, and we pay as much attention to sustained behavior as to peak scores — because a phone that posts a huge number and then throttles hard is a different experience from one that holds steady.

Loading chart…

Synthetic benchmarks are the right tool here precisely because they're repeatable. Real apps and games vary too much from run to run and version to version to compare fairly; a fixed benchmark puts every phone through an identical workload under the same conditions.

Thermal state changes everything in performance testing. A chip that's already warm from a previous task will throttle sooner and score lower, so a result is only meaningful if the starting conditions are controlled. We let each phone cool down and settle to a room-temperature baseline before testing, so the peak figures reflect what the chip can actually do from a clean start rather than how much heat it happened to be carrying.

CPU

We measure CPU performance with GeekBench 6, which reports two numbers. The single-core score reflects everyday responsiveness — opening apps, loading screens, and the countless quick tasks that rely on one core doing something fast; it's the figure most closely tied to how snappy a phone feels moment to moment. The multi-core score reflects heavy, parallel workloads — demanding apps, multitasking, and anything that can spread across all the chip's cores at once.

Browser

Because so much of what people do happens on the web, we measure web responsiveness separately using Speedometer 3.1, a benchmark that simulates the work of real web apps — building and updating page elements the way an active site does. We run it several times and average the results. It's a good proxy for how quick the phone feels browsing, scrolling, and loading pages, which is some of the most common use a phone sees. Crucially, this is one score that varies dramatically even between phones with the same chip.

GPU

Graphics is where sustained behavior matters most, so we test it under prolonged load rather than a single quick pass. The main test runs 3DMark Wild Life Extreme over 20 consecutive loops, which pushes the graphics chip hard and steadily heats the phone up. From that run we take the peak performance, then watch how far it falls as the phone warms and throttles — the gap between the two is captured as a stability figure, showing how much of its peak the phone can actually hold under continued load. We also record the temperatures it reaches and the range of frame rates across the run, since a phone that runs hot and swings between high and low frame rates feels less consistent than one that holds a steady rate. A separate test, 3DMark Solar Bay, measures ray tracing — a more advanced lighting workload that newer games are starting to lean on.

Loading chart…

Capturing how a phone sustains its performance is a big part of our overall score. A phone that peaks high and then throttles heavily is likely to score lower than a device with a lower peak, that can keep performance consistent.

On-device AI

On-device AI — the acceleration behind things like photo processing, voice transcription, and assistant features — is something we're actively weighing, but it isn't part of our performance score today. The reason is comparability: the available AI benchmarks rely on different frameworks and accelerators on each platform, so their numbers don't line up cleanly between an iPhone and an Android phone, and between Android phones with different underlying hardware. Until on-device AI can be measured in a way that's genuinely comparable across platforms, we'd rather leave it out than publish a figure that isn't apples-to-apples — and we're still working out whether it warrants a test of its own at all.

What the performance score reflects

The performance score is a roll-up of two things: processor performance, which includes everyday responsiveness, heavy multi-core workloads, and web responsiveness; and graphics performance, measured under sustained load. Processor performance carries the most weight, since it underpins most of what a phone does day to day, while graphics counts for less — it matters most to people who push demanding games. Throughout, sustained behavior is built into the result rather than treated separately, so a phone is credited for the performance it holds onto, not just the peak it can flash. A high performance score means a phone that is fast across processor and graphics work and holds that speed up under load, not one that posts a big number and then fades.

FAQ

Do benchmark scores actually matter for how a phone feels day to day?

For most people, no — every current flagship chip is fast enough that you won't notice a difference opening apps or scrolling. Benchmarks earn their keep if you push a phone hard with heavy gaming or long exports, where thermals and sustained performance start to separate devices that look identical on paper.

If two phones use the same chip, will they perform the same?

Not necessarily. Cooling design, software optimization, and background tasks all vary, so two phones with identical silicon can land in different places. Web responsiveness is the clearest example — it can differ a lot even on the same processor.

What's the difference between peak and sustained performance, and why do you weight the latter so heavily?

Peak is the best a phone can do from a cold start. Sustained is what it holds onto once it heats up and throttles. A phone that posts a huge peak and then drops hard can feel worse in a long session than one with a lower peak that stays steady, which is why sustained behavior is built into the score rather than reported as a separate footnote.

Why isn't on-device AI part of the performance score?

The AI benchmarks available right now rely on different frameworks and accelerators across iPhone and Android, so the numbers don't line up cleanly between platforms. Until it can be measured in a genuinely comparable way, we'd rather leave it out than publish a figure that isn't apples-to-apples.

Related