各种内存分配器的性能对比

概要

本文比较各种内存分配器(Allocator)的性能。参与本次比较的Allocator有:

对比一:单个Allocator实例仅申请少量的小块内存

测试方法:单个Allocator仅申请一个int,对比其速度。

测试程序(参见<stdext/Memory.h>):

template <class LogT>
class TestCompareAllocators : public TestCase
{
    WINX_TEST_SUITE(TestCompareAllocators);
        WINX_TEST(testComparison1);
    WINX_TEST_SUITE_END();
 
public:
    enum { N = 60000 };
 
    void doNewDelete1(LogT& log)
    {
        log.print("===== NewDelete =====\n");
        PerformanceCounter counter;
        for (int i = 0; i < N; ++i)
        {
            int* p = new int;
            delete p;
        }
        counter.trace(log);
    }
 
    void doAutoFreeAlloc1(LogT& log)
    {
        log.print("===== AutoFreeAlloc =====\n");
        PerformanceCounter counter;
        for (int i = 0; i < N; ++i)
        {
            AutoFreeAlloc alloc;
            int* p = STD_NEW(alloc, int);
        }
        counter.trace(log);
    }
 
    void doScopeAlloc1(LogT& log)
    {
        log.print("===== ScopeAlloc =====\n");
        BlockPool recycle;
        PerformanceCounter counter;
        for (int i = 0; i < N; ++i)
        {
            ScopeAlloc alloc(recycle);
            int* p = STD_NEW(alloc, int);
        }
        counter.trace(log);
    }
 
    void testComparison1(LogT& log)
    {
        for (int i = 0; i < 4; ++i)
        {
            log.newline();
            doAutoFreeAlloc1(log);
            doNewDelete1(log);
            doScopeAlloc1(log);
        }
    }
};

测试结果:

===== AutoFreeAlloc =====
---> Elapse 283773 ticks (79.28 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 134936 ticks (37.70 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 8285 ticks (2.31 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 184090 ticks (51.43 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 116476 ticks (32.54 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 9831 ticks (2.75 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 179652 ticks (50.19 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 130553 ticks (36.47 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 8387 ticks (2.34 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 137255 ticks (38.34 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 103869 ticks (29.02 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 8215 ticks (2.29 ms) (0.00 min) ...

测试结论:

在单个Allocator仅申请少量内存时,AutoFreeAlloc性能最差,new/delete次之,ScopeAlloc最好。

对比二:单个Allocator实例申请大量的小块内存

测试方法:单个Allocator申请6万个int,对比其速度。

测试程序(参见<Memory.h>):

template <class LogT>
class TestCompareAllocators : public TestCase
{
    WINX_TEST_SUITE(TestCompareAllocators);
        WINX_TEST(testComparison2);
    WINX_TEST_SUITE_END();
 
public:
    enum { N = 60000 };
 
    void doNewDelete2(LogT& log)
    {
        int i, *p[N];
        log.print("===== NewDelete =====\n");
        PerformanceCounter counter;
        for (i = 0; i < N; ++i)
        {
            p[i] = new int;
        }
        for (i = 0; i < N; ++i)
        {
            delete p[i];
        }
        counter.trace(log);
    }
 
    void doAutoFreeAlloc2(LogT& log)
    {
        log.print("===== AutoFreeAlloc =====\n");
        PerformanceCounter counter;
        {
            AutoFreeAlloc alloc;
            for (int i = 0; i < N; ++i)
            {
                int* p = STD_NEW(alloc, int);
            }
        }
        counter.trace(log);
    }
 
    void doScopeAlloc2(LogT& log)
    {
        log.print("===== ScopeAlloc =====\n");
        BlockPool recycle;
        PerformanceCounter counter;
        {
            ScopeAlloc alloc(recycle);
            for (int i = 0; i < N; ++i)
            {
                int* p = STD_NEW(alloc, int);
            }
        }
        counter.trace(log);
    }
 
    void testComparison2(LogT& log)
    {
        for (int i = 0; i < 4; ++i)
        {
            log.newline();
            doAutoFreeAlloc2(log);
            doNewDelete2(log);
            doScopeAlloc2(log);
        }
    }
};

测试结果:

===== AutoFreeAlloc =====
---> Elapse 1771 ticks (0.49 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 117791 ticks (32.91 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 2657 ticks (0.74 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 1637 ticks (0.46 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 114280 ticks (31.93 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 2516 ticks (0.70 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 1668 ticks (0.47 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 118661 ticks (33.15 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 2553 ticks (0.71 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 1678 ticks (0.47 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 113231 ticks (31.63 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 3287 ticks (0.92 ms) (0.00 min) ...

测试结论:

在单个Allocator申请大量的小块内存时,AutoFreeAlloc性能最好,ScopeAlloc次之(但和AutoFreeAlloc很接近,差异不显著),new/delete最差。

总体结论

结论1:

生成一个新的AutoFreeAlloc实例是一个比较费时的操作,其用户应注意做好内存管理的规划。而生成一个ScopeAlloc实例的开销很小,你甚至可以哪怕为生成每一个对象都去生成一个ScopeAlloc都没有关系(当然我们并不建议你这样做)。

结论2:

AutoFreeAlloc有较强的局限性,仅仅适用于有限的场合(局部的复杂算法);而ScopeAlloc是通用型的Allocator,基本在任何情况下,你都可通过使用ScopeAlloc来进行内存管理,以获得良好的性能回报。

Comments

测试环境
winxguiwinxgui 1200961134|%e %b %Y, %H:%M %Z|agohover

CPU:1.2 G
操作系统:Windows XP
编译器:Visual C++ 6.0
优化选项:Maximize speed(最大速度)
C库:Multithreaded DLL
配置:Release版本

last edited on 1200962281|%e %b %Y, %H:%M %Z|agohover by winxgui + show more
unfold 测试环境 by winxguiwinxgui, 1200961134|%e %b %Y, %H:%M %Z|agohover
Add a new comment
page_revision: 13, last_edited: 1202120730|%e %b %Y, %H:%M %Z (%O ago)
Unless stated otherwise Content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License