GC Allocators vs. APR Pools

概要

本文比较各种内存分配器(Allocator)的性能。参与本次比较的Allocator有:

测试环境

CPU:1.66 G (2CPUs)
操作系统:Windows XP
编译器:Visual C++ 6.0
优化选项:Maximize speed(最大速度)
C库:Multithreaded DLL
配置:Release版本

对比一:单个Allocator实例仅申请少量的小块内存

测试方法:单个Allocator仅申请一个int,对比其速度。

测试程序(参见<stdext/memory/apr_pools.h>):

template <class LogT>
class TestAprPools : public TestCase
{
    WINX_TEST_SUITE(TestAprPools);
        WINX_TEST(testComparison1);
    WINX_TEST_SUITE_END();
 
private:
    apr_pool_t* m_pool;
 
    void setUp()
    {
        apr_pool_initialize();
        apr_pool_create(&m_pool, NULL);
    }
 
    void tearDown()
    {
        apr_pool_destroy(m_pool);
        apr_pool_terminate();
    }
 
public:
    enum { N = 60000 };
 
    void doNewDelete1(LogT& log)
    {
        log.print("===== NewDelete =====\n");
        std::PerformanceCounter counter;
        for (int i = 0; i < N; ++i)
        {
            int* p = new int;
            delete p;
        }
        counter.trace(log);
    }
 
    void doAprPools1(LogT& log)
    {
        log.print("===== APR Pools =====\n");
        std::PerformanceCounter counter;
        for (int i = 0; i < N; ++i)
        {
            apr_pool_t* alloc;
            apr_pool_create(&alloc, m_pool);
            int* p = (int*)apr_palloc(alloc, sizeof(int));
            apr_pool_destroy(alloc);
        }
        counter.trace(log);
    }
 
    void doAutoFreeAlloc1(LogT& log)
    {
        log.print("===== AutoFreeAlloc =====\n");
        std::PerformanceCounter counter;
        for (int i = 0; i < N; ++i)
        {
            std::AutoFreeAlloc alloc;
            int* p = STD_NEW(alloc, int);
        }
        counter.trace(log);
    }
 
    void doScopeAlloc1(LogT& log)
    {
        log.print("===== ScopeAlloc =====\n");
        std::BlockPool recycle;
        std::PerformanceCounter counter;
        for (int i = 0; i < N; ++i)
        {
            std::ScopeAlloc alloc(recycle);
            int* p = STD_NEW(alloc, int);
        }
        counter.trace(log);
    }
 
    void testComparison1(LogT& log)
    {
        for (int i = 0; i < 4; ++i)
        {
            log.newline();
            doAutoFreeAlloc1(log);
            doAprPools1(log);
            doNewDelete1(log);
            doScopeAlloc1(log);
        }
    }
};

测试结果:

1. 在apr采用动态库方式链接时(MultiThread DLL):

===== AutoFreeAlloc =====
---> Elapse 98513 ticks (27.52 ms) (0.00 min) ...
===== APR Pools =====
---> Elapse 86419 ticks (24.14 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 65082 ticks (18.18 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 30482 ticks (8.52 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 103194 ticks (28.83 ms) (0.00 min) ...
===== APR Pools =====
---> Elapse 86880 ticks (24.27 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 65423 ticks (18.28 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 30303 ticks (8.47 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 101709 ticks (28.41 ms) (0.00 min) ...
===== APR Pools =====
---> Elapse 86671 ticks (24.21 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 66583 ticks (18.60 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 29864 ticks (8.34 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 103621 ticks (28.95 ms) (0.00 min) ...
===== APR Pools =====
---> Elapse 85810 ticks (23.97 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 65145 ticks (18.20 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 30139 ticks (8.42 ms) (0.00 min) ...

2. 在apr采用静态库方式链接时(MultiThread):

===== AutoFreeAlloc =====
---> Elapse 99585 ticks (27.82 ms) (0.00 min) ...
===== APR Pools =====
---> Elapse 85211 ticks (23.80 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 69748 ticks (19.49 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 30120 ticks (8.41 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 104932 ticks (29.31 ms) (0.00 min) ...
===== APR Pools =====
---> Elapse 85284 ticks (23.83 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 69428 ticks (19.40 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 30052 ticks (8.40 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 101735 ticks (28.42 ms) (0.00 min) ...
===== APR Pools =====
---> Elapse 85380 ticks (23.85 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 71826 ticks (20.07 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 30170 ticks (8.43 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 103206 ticks (28.83 ms) (0.00 min) ...
===== APR Pools =====
---> Elapse 87675 ticks (24.49 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 71195 ticks (19.89 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 30045 ticks (8.39 ms) (0.00 min) ...

测试结论:

在单个Allocator仅申请少量内存时,AutoFreeAlloc性能最差,APR Pools次之(但和AutFreeAlloc差异不显著),new/delete再次之,ScopeAlloc最好。

另外需要注意的是:这里为apr_pool增加parent(类似于ScopeAlloc有BlockPool),按APR Pools的设计,理论上速度应该有所提升,但是实际测试的结果性能提升并不明显(这里没有给出不设置parent时的比对数据)。

对比二:单个Allocator实例申请大量的小块内存

测试方法:单个Allocator申请6万个int,对比其速度。

测试程序(参见<stdext/memory/apr_pools.h>):

template <class LogT>
class TestAprPools : public TestCase
{
    WINX_TEST_SUITE(TestAprPools);
        WINX_TEST(testComparison2);
    WINX_TEST_SUITE_END();
 
private:
    apr_pool_t* m_pool;
 
    void setUp()
    {
        apr_pool_initialize();
        apr_pool_create(&m_pool, NULL);
    }
 
    void tearDown()
    {
        apr_pool_destroy(m_pool);
        apr_pool_terminate();
    }
 
public:
    enum { N = 60000 };
 
    void doNewDelete2(LogT& log)
    {
        int i, *p[N];
        log.print("===== NewDelete =====\n");
        std::PerformanceCounter counter;
        for (i = 0; i < N; ++i)
        {
            p[i] = new int;
        }
        for (i = 0; i < N; ++i)
        {
            delete p[i];
        }
        counter.trace(log);
    }
 
    void doAprPools2(LogT& log)
    {
        log.print("===== APR Pools =====\n");
        std::PerformanceCounter counter;
        {
            apr_pool_t* alloc;
            apr_pool_create(&alloc, m_pool);
            for (int i = 0; i < N; ++i)
            {
                int* p = (int*)apr_palloc(alloc, sizeof(int));
            }
            apr_pool_destroy(alloc);
        }
        counter.trace(log);
    }
 
    void doAutoFreeAlloc2(LogT& log)
    {
        log.print("===== AutoFreeAlloc =====\n");
        std::PerformanceCounter counter;
        {
            std::AutoFreeAlloc alloc;
            for (int i = 0; i < N; ++i)
            {
                int* p = STD_NEW(alloc, int);
            }
        }
        counter.trace(log);
    }
 
    void doScopeAlloc2(LogT& log)
    {
        log.print("===== ScopeAlloc =====\n");
        std::BlockPool recycle;
        std::PerformanceCounter counter;
        {
            std::ScopeAlloc alloc(recycle);
            for (int i = 0; i < N; ++i)
            {
                int* p = STD_NEW(alloc, int);
            }
        }
        counter.trace(log);
    }
 
    void testComparison2(LogT& log)
    {
        for (int i = 0; i < 4; ++i)
        {
            log.newline();
            doAutoFreeAlloc2(log);
            doAprPools2(log);
            doNewDelete2(log);
            doScopeAlloc2(log);
        }
    }
};

测试结果:

1. 在apr采用动态库方式链接时(MultiThread DLL):

===== AutoFreeAlloc =====
---> Elapse 581 ticks (0.16 ms) (0.00 min) ...
===== APR Pools =====
---> Elapse 2589 ticks (0.72 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 72242 ticks (20.18 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 1609 ticks (0.45 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 583 ticks (0.16 ms) (0.00 min) ...
===== APR Pools =====
---> Elapse 2059 ticks (0.58 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 71592 ticks (20.00 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 1524 ticks (0.43 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 593 ticks (0.17 ms) (0.00 min) ...
===== APR Pools =====
---> Elapse 1918 ticks (0.54 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 72295 ticks (20.20 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 1543 ticks (0.43 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 559 ticks (0.16 ms) (0.00 min) ...
===== APR Pools =====
---> Elapse 1989 ticks (0.56 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 72174 ticks (20.16 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 1530 ticks (0.43 ms) (0.00 min) ...

2. 在apr采用静态库方式链接时(MultiThread):

===== AutoFreeAlloc =====
---> Elapse 581 ticks (0.16 ms) (0.00 min) ...
===== APR Pools =====
---> Elapse 2828 ticks (0.79 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 74279 ticks (20.75 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 1419 ticks (0.40 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 645 ticks (0.18 ms) (0.00 min) ...
===== APR Pools =====
---> Elapse 2015 ticks (0.56 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 71949 ticks (20.10 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 1384 ticks (0.39 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 594 ticks (0.17 ms) (0.00 min) ...
===== APR Pools =====
---> Elapse 2133 ticks (0.60 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 72109 ticks (20.14 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 1399 ticks (0.39 ms) (0.00 min) ...

===== AutoFreeAlloc =====
---> Elapse 597 ticks (0.17 ms) (0.00 min) ...
===== APR Pools =====
---> Elapse 2096 ticks (0.59 ms) (0.00 min) ...
===== NewDelete =====
---> Elapse 72354 ticks (20.21 ms) (0.00 min) ...
===== ScopeAlloc =====
---> Elapse 1513 ticks (0.42 ms) (0.00 min) ...

测试结论:

在单个Allocator申请大量的小块内存时,AutoFreeAlloc性能最好,ScopeAlloc次之,APR Pools再次之(但和ScopeAlloc无显著差异),new/delete最差。

相关参考

Add a New Comment
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License