Wednesday, 21 November 2012

A vectorised bitwise-OR function for kdb+

A long time ago I started writing some shared-library functions for kdb+ which would offer bitwise comparison of integer vector types using SSE vector registers. I came up with something which worked well enough but wanted to know if it could be made to go any faster. I wondered in particular about Intel's prefetch instructions and needed a way of verifying whether they made any difference.

Monday, 19 November 2012

Intel Performance Monitoring: Loose Ends

This post is part of the series on performance monitoring with Intel MSRs on Linux:
- A Linux Module For Reading/Writing MSRs
- Intel MSR Performance Monitoring Basics
- Fun with MSRs: Counting Performance Events On Intel
- Scripting MSR Performance Tests With kdb+
- Scripting MSR Performance Tests With kdb+: Part 2
- Intel Performance Monitoring: Loose Ends (this post)

If you haven't already, you'll need to download the q 3.0 trial version for Linux from Kx Systems. Although it's the 32-bit version, it is fully-functional apart from the fact that it is time-limited to somewhere around an hour's use before you need to restart it. I was labouring under the misapprehension that it could be run on a 64-bit system, but since q is a dynamically-linked application, you'll need to do something with chroot and 32-bit libraries if you want to try that.

Wednesday, 14 November 2012

Scripting MSR Performance Tests With kdb+: Part 2

This post continues the series on performance monitoring with Intel MSRs on Linux using the batch-oriented kernel module to read and write values from and to the MSRs. The previous posts can be found here:
- A Linux Module For Reading/Writing MSRs
- Intel MSR Performance Monitoring Basics
- Fun with MSRs: Counting Performance Events On Intel
- Scripting MSR Performance Tests With kdb+
- Scripting MSR Performance Tests With kdb+: Part 2 (this post ;)
- Intel Performance Monitoring: Loose Ends

This time I'm going to build the shared library used by kdb+ to launch and control the test run. It's fairly simple, since the fiddly work of calculating the values to be written to the IA32_PERFEVTSELx, IA32_FIXED_CTR_CTRL and IA32_PERF_GLOBAL_CTRL MSRs has already been done. What it will do is own the process of stopping, clearing and staring the counters, as well as running a baseline to test the fixed costs of the interation with the MSR kernel module.

Tuesday, 13 November 2012

Scripting MSR Performance Tests with kdb+

This post is part of the series on performance monitoring with Intel MSRs on Linux:
- A Linux Module For Reading/Writing MSRs
- Intel MSR Performance Monitoring Basics
- Fun with MSRs: Counting Performance Events On Intel
- Scripting MSR Performance Tests With kdb+ (this post)
- Scripting MSR Performance Tests With kdb+: Part 2
- Intel Performance Monitoring: Loose Ends

One of the issues with coding performance monitoring code is the management of the PMC/FFC configuration scripts. As you can see from my previous posts (1, 2, 3), using the scripts with the MSR kernel driver is easy, but getting the right data into the script in the first place is a bit more tricky. You could certainly provide helper functions in order to facilitate the twiddling of the various bits in the IA32_PERFEVTSELx registers. However, to make it useable I think it should be possible to look up the different performance monitoring events by name.

Sunday, 11 November 2012

Fun With MSRs: Counting Performance Events On Intel

This post is part of the series on performance monitoring with Intel MSRs on Linux:
- A Linux Module For Reading/Writing MSRs
- Intel MSR Performance Monitoring Basics
- Fun with MSRs: Counting Performance Events On Intel (this post)
- Scripting MSR Performance Tests With kdb+
- Scripting MSR Performance Tests With kdb+: Part 2
- Intel Performance Monitoring: Loose Ends

Hi, the last two posts have laid some groundwork for this post, in which I hope to show how you can measure various performance-related events using Intel's MSRs. This post assumes you have at least installed the MSR kernel module discussed in this earlier post. All we're going to do this time is record two MSR configuration scripts to memory and execute some arbitrary code to measure some performance metrics. One script will configure the MSRs and reset the counter values to zero, while the other will read the accumulated values after the test code has executed.

Friday, 9 November 2012

Intel MSR Performance Monitoring Basics

This post is part of the series on performance monitoring with Intel MSRs on Linux:
- A Linux Module For Reading/Writing MSRs
- Intel MSR Performance Monitoring Basics (this post)
- Fun with MSRs: Counting Performance Events On Intel
- Scripting MSR Performance Tests With kdb+
- Scripting MSR Performance Tests With kdb+: Part 2
- Intel Performance Monitoring: Loose Ends

In the previous post I published code to create, build and install a Linux kernel module which would permit a user to execute a batch of commands to read from or write to Intel MSRs. This post will provide some background on using MSRs and controlling their behaviour.

A Linux Kernel Module For Reading/Writing MSRs

This post is part of the series on performance monitoring with Intel MSRs on Linux:
- A Linux Module For Reading/Writing MSRs (this post)
- Intel MSR Performance Monitoring Basics
- Fun with MSRs: Counting Performance Events On Intel
- Scripting MSR Performance Tests With kdb+
- Scripting MSR Performance Tests With kdb+: Part 2
- Intel Performance Monitoring: Loose Ends

It's been a while since the last post, mostly because I've been trying to get my head around the way the Intel performance monitoring instructions work. Rolling your own test-harness to measure how many clock-ticks, µops or L1 cache misses have taken place in a given stretch of code is quite involved — but don't let that put you off, it's pretty cool once you've got it all working. Of course, you don't have to roll your own, but it is in the best British traditions of pottering around in the garden shed, taking things to bits just to see how they work.