The follwing information and simple tool has been written to troubleshoot and fix a HCL Notes Client issue I came accross. For multiple weeks I have been hunting a performance problem on my new Thinkpad T41 with a modern CPU.
It turned out that modern CPUs with a combination of E-Cores and P-Cores can cause weird performance problems. Windows "tries" to schedule processes (and acutally threads) on the cores which should best fit the application need. But this might not always workout when an application has mixed a mixed workload.
The HCL Notes Client Team & Product Management are aware of this very specific problem and are actively looking into a solution. This might not hit you depending on how many P-Cores you have and the work-load on your machine. But it is good to be aware that this can happen -- not only with Notes but also other applications.
The writeup also describes the background of the problem and how applications can work more smoothly with hybrid CPU architectures. My tool is the currently recommended way to troubleshoot and work-around in case you have issues.
Other applications like VMware workstation also had to make adjustments for running with CPUs which have a mix of E-Cores and P-Cores.
In my special case the rendering performance in my Notes Client dropped dramatically during Sametime calls and other a bit more demanding operations. You could really watch the client painting Windows in slow motion.
Windows 11 provides new APIs to allow applications let Windows know that a thread in an application is better suited for a P-Core. This functionality would need to be build into the application.
The HCL Notes and other applications don't use those new API calls. To be frank I have not heard about those APIs before I started the research. I did know about hypbrid CPUs with P-Cores and E-Cores and also did hear about issues on VMware workstation from friend. But I never thought that a wrong scheduling on Windows side could slow down an application like this. I did some research to find a temporary work-around for my own environment after looking into all other settings, driver updates including GPU driver and settings.
IMHO Microsoft should have raised this in more public to have developers aware -- Like Apple does when they introduce new functionality. In this case it is a combination of Intel new CPUs and how Microsoft handles them.
Setting the P-Core affinity for my Notes Basic Client really helped. And the same should work for the Standard Client. Below are the details behind the issue and the available APIs.
Modern CPUs (Intel hybrid, increasingly others) combine Performance cores (P-cores) and Efficiency cores (E-cores). Windows uses a sophisticated scheduler (with help from Intel Thread Director on supported systems) to decide where threads run.
The key point:
Applications cannot directly choose cores, but they can strongly influence scheduling behavior.
Applications should explicitly signal that their work is performance-sensitive:
THREAD_POWER_THROTTLING_STATE state = {0};
state.Version = THREAD_POWER_THROTTLING_CURRENT_VERSION;
state.ControlMask = THREAD_POWER_THROTTLING_EXECUTION_SPEED;
state.StateMask = 0; // disable throttling → prefer performance
SetThreadInformation(
GetCurrentThread(),
ThreadPowerThrottling,
&state,
sizeof(state)
);- Disables EcoQoS behavior
- Signals the scheduler to prefer P-cores
- Works in conjunction with Windows scheduling and hardware feedback
Relevant APIs to explore:
SetThreadInformationTHREAD_POWER_THROTTLING_STATEThreadPowerThrottling
If an application needs more control, it can use CPU Sets.
-
Query CPU topology:
GetSystemCpuSetInformation
-
Select CPUs:
- Typically
EfficiencyClass == 0→ P-cores
- Typically
-
Assign:
SetThreadSelectedCpuSetsorSetProcessDefaultCpuSets
- More flexible than affinity masks
- Forward-compatible with future CPU designs
- Can still allow controlled fallback
Relevant APIs:
GetSystemCpuSetInformationSetThreadSelectedCpuSetsSetProcessDefaultCpuSets
SetThreadPriority(...);
SetPriorityClass(...);Important:
- Affects when a thread runs
- Does not control which core type is used
Still useful in combination with QoS hints.
Relevant APIs:
SetThreadPrioritySetPriorityClass
SetThreadAffinityMask(...);-
Guarantees placement
-
But:
- brittle across hardware
- bypasses scheduler intelligence
- can reduce overall performance
Relevant API:
SetThreadAffinityMask
For most applications:
- Disable power throttling (QoS hint)
- Use normal or slightly elevated priority
- Let Windows + Intel Thread Director optimize execution
This provides the best balance of:
- performance
- efficiency
- portability
- Force “P-core only” execution without affinity or CPU sets
- Control scheduling decisions completely
- Override system-wide power or thermal policies
In real-world environments (admins, power users, performance tuning), stronger control is often required.
Example:
$p = Get-Process myapp
$p.ProcessorAffinity = 0x00FOr when starting the process
cmd /c "start /affinity F myapp.exe"
The key challenge here is to find out which cores are the P-Cores and what affinity mask this results in. It's acutally not that easy to find out which cores are the P-Cores. There are extranal tools to help. But you can also use Windows API as shown in the small tool below.
- Hard restriction to selected CPUs (e.g., P-cores)
- Deterministic behavior
- Requires mapping logical CPUs to P-cores
- Not portable across systems
- Disables scheduler flexibility
$p.PriorityClass = "AboveNormal"Important:
- Improves scheduling responsiveness
- Does not prevent E-core usage
Affinity (P-cores) plus Above Normal priority
This effectively:
- Forces execution on P-cores
- Ensures good scheduling responsiveness
This is a practical and widely used workaround, especially when applications are not QoS-aware.
In theory:
- Enumerate threads
- Call
SetThreadInformationon each
In practice:
- Threads are short-lived
- Requires continuous monitoring
- Race conditions and access issues
- Not reliable or scalable
This approach is generally not recommended.
For application developers:
- Use Thread Power Throttling (QoS) first
- Use CPU Sets if deterministic behavior is required
- Avoid hard affinity unless necessary
For operators and power users:
- Use affinity plus priority for strong control
- Accept trade-offs in flexibility and portability
Windows scheduling on hybrid CPUs is:
- policy-driven, not strictly controlled
- optimized using runtime feedback and hardware signals
Trying to override the scheduler usually leads to fragile solutions and inconsistent results.
The best results come from working with the scheduler (QoS). The most deterministic results come from overriding it (affinity or CPU sets).
The following simple tool nshcpuset demonstrates a practical external approach to steer selected processes toward P-cores using modern Windows APIs. The better approach would be that the application tells Windows which threads need the faster P-Cores -- For example for UI operations.
This utility:
-
Detects CPU topology using CPU Sets
-
Identifies P-cores based on
EfficiencyClass -
Scans running processes and optionally filters them
-
Applies:
- P-core CPU Set restriction
- Above Normal process priority
This provides deterministic behavior without modifying the target application.
The tool queries Windows for CPU set information:
-
Uses
GetSystemCpuSetInformation -
Builds an internal list of CPUs:
- Logical processor index
- CPU Set ID
- Efficiency class
It then determines:
- The maximum
EfficiencyClass - Treats those CPUs as P-cores
This avoids hardcoding CPU layouts and works across different hybrid architectures.
Each CPU is classified as:
- P-core → highest
EfficiencyClass - E-core → lower
EfficiencyClass
This is the same mechanism Windows uses internally for scheduling decisions.
The tool supports two modes:
-
Enumerates all running processes (
CreateToolhelp32Snapshot) -
Retrieves executable path (
QueryFullProcessImageName) -
Extracts:
- Company name (via version info)
- Code signing subject (via certificate APIs)
This allows identifying processes from specific vendors (e.g HCL in our case).
- Matches processes by name or substring
- Applies P-core mapping to matching processes
For each matching process:
OpenProcess(PROCESS_SET_INFORMATION | PROCESS_QUERY_LIMITED_INFORMATION, ...)SetPriorityClass(hProc, ABOVE_NORMAL_PRIORITY_CLASS);This improves scheduling responsiveness but does not control core selection by itself.
SetProcessDefaultCpuSets(hProc, ids, count);Where:
ids[]contains only CPU Set IDs for P-cores
This effectively constrains the process to P-cores while still using the modern CPU Sets mechanism (instead of legacy affinity masks).
The tool intentionally uses CPU Sets instead of SetProcessAffinityMask because:
- CPU Sets are topology-aware
- They are forward-compatible with future CPU designs
- They integrate better with the Windows scheduler
This is the recommended modern alternative to affinity masks.
- The process is restricted to P-cores at the scheduler level
- New threads automatically inherit the process CPU Set restriction
- No need to track or modify individual threads
However:
- CPU classification depends on
EfficiencyClass - Behavior may vary across vendors and future CPU designs
- Scheduler decisions (e.g., load balancing) still apply within the selected CPU set
This approach is useful when:
- Applications are not QoS-aware
- You need deterministic performance behavior
- You cannot modify the application itself
Typical usage:
nshcpuset.exe -set domino
nshcpuset.exe -set notes,java
Example Output:
nshcpuset.exe -set nlnotes.exe
CPU layout:
----------------------------------------
CPU 0 -> EfficiencyClass=1 (P-core)
CPU 1 -> EfficiencyClass=1 (P-core)
CPU 2 -> EfficiencyClass=1 (P-core)
CPU 3 -> EfficiencyClass=1 (P-core)
CPU 4 -> EfficiencyClass=0 (E-core)
CPU 5 -> EfficiencyClass=0 (E-core)
CPU 6 -> EfficiencyClass=0 (E-core)
CPU 7 -> EfficiencyClass=0 (E-core)
CPU 8 -> EfficiencyClass=0 (E-core)
CPU 9 -> EfficiencyClass=0 (E-core)
CPU 10 -> EfficiencyClass=0 (E-core)
CPU 11 -> EfficiencyClass=0 (E-core)
P-Cores : 4
E-Cores : 8
Applying to nlnotes.exe PID=24404
Adjust priority for PID=24404
Applied P-core preference to PID=24404 (4 cores)
For readers who want to explore further:
GetSystemCpuSetInformationSetProcessDefaultCpuSetsSetPriorityClassCreateToolhelp32SnapshotProcess32First / Process32NextQueryFullProcessImageNameGetFileVersionInfo / VerQueryValueCryptQueryObject / CertFindCertificateInStore
This simple tool represents a practical “Plan B”:
- Instead of relying on applications to provide QoS hints
- It enforces P-core usage externally using CPU Sets
It complements the recommended in-application approach and provides a reliable fallback when applications are not optimized for hybrid CPU scheduling.