内存连续对于性能的影响-benchmark

周末在家做了个内存是否连续对于性能影响的benchmark,主要是想看下cpu cache影响具体有多大。

一共设计了两种情况。

  1. 直接用操作系统的内存管理,每次new一个对象,然后搞个vector存对象地址。
  2. 预申请一块连续地址,每次new到连续的地址中。

要测试的benchmark也比较简单。主要就是下面三个。

  1. 对象的内存申请速度,也就是new一个对象大概要花多长时间。
  2. 对象的随机访问速度,每次随机访问一个对象的属性。
  3. 对象的顺序访问速度,顺序遍历访问所有对象。

预期现象是预分配连续内存的情况下,比操作系统分配内存的情况下,速度快一些。但是具体快多少就不太确定了。

以下是代码,discontiuous是操作系统管理内存的,continuous是预分配内存的。

#include <iostream>
#include <chrono>
#include <vector>

using namespace std::chrono_literals;

const int INSTANCE_CNT = static_cast<int>(1e7);
const int QUERY_CNT = static_cast<int>(1e7);
const int SEQUENTIAL_ACCESS_CNT = 100;

struct A
{
	int id;
	A() { id = 0; }
	A(int id_) { id = id_; }
};

// 网上抄的
int getRandomNumber(int min, int max)
{
    static const double fraction = 1.0 / (RAND_MAX + 1.0);
    return min + static_cast<int>((max - min + 1) * (std::rand() * fraction));
}

void discontinuous()
{
	A** aVec = new A*[INSTANCE_CNT];
	auto start = std::chrono::high_resolution_clock::now();
	for (int i = 0; i != INSTANCE_CNT; ++i)
	{
		A* ins = new A{i};
		aVec[i] = ins;
	}
	auto end = std::chrono::high_resolution_clock::now();
	std::chrono::duration<double, std::milli> elapsed = end - start;
	std::cout << "New Operator Allocating Cost: " << elapsed.count() << " ms\n";

	int ind = 0;
	int val = 0;

	start = std::chrono::high_resolution_clock::now();
	for (int i = 0; i != QUERY_CNT; ++i)
	{
		ind = getRandomNumber(0, INSTANCE_CNT);
		A* ins = aVec[ind];
		val = ins->id;
	}
	end = std::chrono::high_resolution_clock::now();
	elapsed = end - start;
	std::cout << "New Operator Random Access Cost: " << elapsed.count() << " ms\n";

	start = std::chrono::high_resolution_clock::now();
	for (int r = 0; r != SEQUENTIAL_ACCESS_CNT; ++r)
	{
		for (int i = 0; i != INSTANCE_CNT; ++i)
		{
			A* ins = aVec[i];
			val = ins->id;
		}
	}
	end = std::chrono::high_resolution_clock::now();
	elapsed = end - start;
	std::cout << "New Operator Sequential Access Cost: " << elapsed.count() << " ms\n";
}

void continuous()
{
	A* arr = new A[INSTANCE_CNT];
	void* place = arr;
	auto start = std::chrono::high_resolution_clock::now();
	for (int i = 0; i != INSTANCE_CNT; ++i)
	{
		A* ins = new (arr + i) A{i};
	}
	auto end = std::chrono::high_resolution_clock::now();
	std::chrono::duration<double, std::milli> elapsed = end - start;
	std::cout << "Preallocating Allocating Cost: " << elapsed.count() << " ms\n";

	int ind = 0;
	int val = 0;

	start = std::chrono::high_resolution_clock::now();
	for (int i = 0; i != QUERY_CNT; ++i)
	{
		ind = getRandomNumber(0, INSTANCE_CNT);
		A ins = arr[ind];
		val = ins.id;
	}
	end = std::chrono::high_resolution_clock::now();
	elapsed = end - start;
	std::cout << "Preallocating Random Access Cost: " << elapsed.count() << " ms\n";

	start = std::chrono::high_resolution_clock::now();
	for (int r = 0; r != SEQUENTIAL_ACCESS_CNT; ++r)
	{
		for (int i = 0; i != INSTANCE_CNT; ++i)
		{
			A ins = arr[i];
			val = ins.id;
		}
	}
	end = std::chrono::high_resolution_clock::now();
	elapsed = end - start;
	std::cout << "Preallocating Sequential Access Cost: " << elapsed.count() << " ms\n";
}



int main()
{
	discontinuous();
	continuous();
	return 0;
}

执行完的结果大概如下。(防止被编译器优化掉,用的Debug。)

能看出来速度都快了不少。