pfSense: AES-NI Hardware Crypto Acceleration in KVM

Achtung! Dieser Artikel ist älter als ein Jahr. Der Inhalt ist möglicherweise nicht mehr aktuell!

I already mentioned that I’m using pfSense as firewall and router as a KVM guest. I wanted to connect the place where I live with the place of my grandparents over a site-to-site VPN using OpenVPN. For this purpose I’ve bought a PcEngines APU.1D4. A test in my local gigabit LAN was very low HD Streaming, but the APU.1D4 was not the bottleneck.

No delegation of hardware features

I found the bottleneck on my KVM hostsystem. The configured CPU for the pfSense machine was the QEMU CPU. I have configured this for most machines to reduce overhead simulating a „real“ CPU. But the QEMU CPU ist not capable of delegation hardware features like AES-NI, SSE, etc.

I’ve changed the settings to emulate a „Sandy Bridge“ processor which afterwards is recognized an an E3 processor.

After a reboot 

_dmesg_ reported the following:

CPU: Intel Xeon E312xx (Sandy Bridge) (3100.06-MHz K8-class CPU)
 Origin = "GenuineIntel" Id = 0x206a1 Family = 0x6 Model = 0x2a Stepping = 1
 Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
 Features2=0x9fb82203<SSE3,PCLMULQDQ,SSSE3,CX16,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,HV>
 AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
 AMD Features2=0x1<LAHF>
 TSC: P-state invariant

After the switch nearly all CPU features from the hostsystem are available in the FreeBSD based system. Also the needed AES-NI feature. You can test with OpenSSL  if the AES-NI feature can be used:

[2.2.6-RELEASE][admin@firewall.veloc1ty.lan]/root: openssl engine -t -c
(cryptodev) BSD cryptodev engine
 [RSA, DSA, DH, AES-128-CBC, AES-192-CBC, AES-256-CBC]
 [ available ]
(rsax) RSAX engine support
 [RSA]
 [ available ]
(dynamic) Dynamic engine loading support
 [ unavailable ]

The kernel module cryptodev (at least it’s called a kernel module in the BSD environment) can be used by OpenSSL. But before we can start another speed test we have to inform pfSense to use hardware features.

pfSense settings

Maybe you have to tell pfSense to use hardware acceleration. At least in my setup the change was not automatically recognised. Select via web GUI System -> Advanced -> Miscellaneous in the category „Cryptographic Hardware Acceleration“ the option „AES-NI CPU-based Acceleration (aesni)“ and save the changes.

pfSense-aes-ni-hardware-acceleration

After a reboot should every service using cryptographic function use AES-NI.

Speedtest

The speedtest afterwards was done with AES-128-CBC, because that’s what I wanted to use for my VPN. The results are pretty awesome:

Without EVP API:

[2.2.6-RELEASE][admin@firewall.veloc1ty.lan]/root: openssl speed aes-128-cbc
Doing aes-128 cbc for 3s on 16 size blocks: 24173237 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 64 size blocks: 6624025 aes-128 cbc's in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 1691361 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 1024 size blocks: 902516 aes-128 cbc's in 2.99s
Doing aes-128 cbc for 3s on 8192 size blocks: 115047 aes-128 cbc's in 3.01s
OpenSSL 1.0.1l-freebsd 15 Jan 2015
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128 cbc 129260.55k 141312.53k 144706.31k 308863.13k 313339.02k

With EVP API:

[2.2.6-RELEASE][admin@firewall.veloc1ty.lan]/root: openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 1850387 aes-128-cbc's in 0.27s
Doing aes-128-cbc for 3s on 64 size blocks: 1769459 aes-128-cbc's in 0.38s
Doing aes-128-cbc for 3s on 256 size blocks: 1523368 aes-128-cbc's in 0.32s
Doing aes-128-cbc for 3s on 1024 size blocks: 975349 aes-128-cbc's in 0.20s
Doing aes-128-cbc for 3s on 8192 size blocks: 225272 aes-128-cbc's in 0.06s
OpenSSL 1.0.1l-freebsd 15 Jan 2015
built on: date not available
options:bn(64,64) rc4(16x,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx)
compiler: clang
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-128-cbc 111458.61k 295824.66k 1217505.43k 5113637.77k 29526851.58k

Now the bottleneck was gone and speeds up to 120 MBit/s were possible.

Further Reading:

Update: 2015-06-09

You can now see in the web GUI on the dashboard the possible crypto hardware features:

pfSense_used_hardware_crypto_features


Du hast einen Kommentar, einen Wunsch oder eine Verbeserung? Schreib mir doch eine E-Mail! Die Infos dazu stehen hier.

🖇️ = Link zu anderer Webseite
🔐 = Webseite nutzt HTTPS (verschlüsselter Transportweg)
Zurück