Multi-Threaded Instrumented Profiler

This page highlights the workings of my implementation of an instrumented profiler

Here’s what the profiler looks like:

Profiler Tool and Report Generation

The profiler is split into a few different classes, they are:

  • ProfileLogScope
  • Profiler
  • ProfilerReport
  • ProfilerSample

Each profiler class provides specific functionality required by the profiler. Each class breaks down the required profiler functionality by breaking down the various aspects of the profiler and handling the required logic within itself. By encapsulating the required features in their own respective classes, the profiler can be expanded to add new functionality as and when required as new class implementations that tie into the Profiler interface.

Profile Log Scope

Tier 1 of the profiler was to simply be able to observe different sections of execution logic and determine the time taken for execution. Below is the class implementation of ProfileLogScope:

std::string GetProfileTimeString(double seconds);

//------------------------------------------------------------------------------------------------------------------------------
class LogProfileScope
{
public:
	LogProfileScope(char const *name)
	{
		m_name = name;
		m_start_time = GetCurrentTimeHPC();
	}

	~LogProfileScope()
	{
		uint64_t duration = GetCurrentTimeHPC() - m_start_time;
		DebuggerPrintf("%s took %s (%llu HPC Units) \n", m_name, GetProfileTimeString((double)duration).c_str(), (uint64_t)duration);
	}

	uint64_t m_start_time;
	char const* m_name;
};

#define COMBINE1(X,Y) X##Y  // helper macro
#define COMBINE(X,Y) COMBINE1(X,Y)

#define PROFILE_LOG_SCOPE( tag )			LogProfileScope COMBINE(__scopeLog, __LINE__) ## ( tag )
#define PROFILE_LOG_SCOPE_FUNCTION()		PROFILE_LOG_SCOPE(__FUNCTION__);

The usage of Profile Log Scope is highlighted below:

void App::LoadGameBlackBoard()
{
	PROFILE_LOG_SCOPE("App::LoadGameBlackBoard");

	const char* xmlDocPath = "Data/Gameplay/GameConfig.xml";
	tinyxml2::XMLDocument gameconfig;
	gameconfig.LoadFile(xmlDocPath);
	
	if(gameconfig.ErrorID() != tinyxml2::XML_SUCCESS)
	{
		printf("\n >> Error loading XML file from %s ", xmlDocPath);
		printf("\n >> Error ID : %i ", gameconfig.ErrorID());
		printf("\n >> Error line number is : %i", gameconfig.ErrorLineNum());
		printf("\n >> Error name : %s", gameconfig.ErrorName());
		ERROR_AND_DIE(">> Error loading GameConfig XML file ")
		return;
	}
	else
	{
		XMLElement* rootElement = gameconfig.RootElement();
		g_gameConfigBlackboard.PopulateFromXmlElementAttributes(*rootElement);
	}
}

The output of the code above in the output window of visual studio can be seen as:

App::LoadGameBlackBoard took 0.000001 milliseconds (7520 HPC Units) 

Profiler

The profiler implementation serves as an interface that handles the logic to maintain profiled samples and generate reports for the desired frame from a buffer of frame history

To perform these functions, the Profiler interface is implemented as follows:

class Profiler
{
public:
	Profiler();
	~Profiler();

	//------------------------------------------------------------------------------------------------------------------------------
	//Methods
	bool			ProfilerInitialize();
	void			ProfilerShutdown();

	void			ProfilerSetMaxHistoryTime(double seconds);

	void			ProfilerPause();
	void			ProfilerResume();
	void			ProfilerTogglePause();

	void			ProfilerPush(const char* label);
	void			ProfilerPop();

	void			ProfilerUpdate();

	void			ShowProfilerTimeline();
	void			PopulateGraphData(float* floatArray, float* allocArray, int& timeArraySize, int& allocArraySize, float& maxTime, float& maxAlloc);
	void			MakeTimelineWindow();
	void			MakeAllocationsWindow();
	void			EraseOldTrees();

	void			ProfilerAllocation(size_t byteSize = 0);
	void			ProfilerFree();

	void			ProfilerBeginFrame(const char* label = "Frame");
	void			ProfilerEndFrame();

	bool			IsProfilerOpen();

	// We can only really 'view' a complete tree
	// these functions return the most recently finished tree
	// use a shared_mutex for accessing trees from the system (as it will try to destroy old ones constantly)
	// `history` is how many frames back we should try to get
	ProfilerSample_T*			ProfilerAcquirePreviousTree(std::thread::id id, uint history = 0);
	ProfilerSample_T*			ProfilerAcquirePreviousTreeForCallingThread(uint history = 0);
	void						ProfilerReleaseTree(ProfilerSample_T* node);

	//------------------------------------------------------------------------------------------------------------------------------
	// Static Methods
	static	Profiler*			CreateInstance();
	static	void				DestroyInstance();
	static	Profiler*			GetInstance();

	//------------------------------------------------------------------------------------------------------------------------------
	// Dev Console Events
	static	bool				Command_PauseProfiler(EventArgs& args);
	static	bool				Command_ResumeProfiler(EventArgs& args);
	static	bool				Command_ProfilerReport(EventArgs& args);

private:

	ProfilerSample_T*			AllocateNode();
	void						FreeTree(ProfilerSample_T* root);
	void						FreeNode(ProfilerSample_T* node);
	void						RepopulateReportData();


	double			m_maxHistoryTime = 3;
	bool			m_isPaused = false;

	bool			m_showTimeline = false;

	std::vector<ProfilerSample_T*>			m_History;
	std::shared_mutex						m_HistoryLock;

	size_t									m_AllowedSize = 1048576;	//1024 KibiBytes or 1 MebiByte

	//------------------------------------------------------------------------------------------------------------------------------
	//ImGUI values

	float*									m_timeArray;
	float*									m_allocArray;
	float*									m_timeArrayStart;
	float*									m_allocArrayStart;
	float									m_maxAlloc = 0;
	float									m_maxTime = 0;
	int										m_timeArraySize = 0;
	int										m_allocArraySize = 0;

	float									m_reportFrameNum = 0;
};
class ProfilerLogObject
{
public:
	ProfilerLogObject(char const *label)
	{
		gProfiler->ProfilerPush(label);
	}

	~ProfilerLogObject()
	{
		gProfiler->ProfilerPop();
	}
};

To use the profiler, the developer must first call the ProfilerBeginFrame function to tell the profiler the start of the profiler frame and call the ProfilerEndFrame function to indicate the end of the current profiler frame. Within each frame, there needs to be a call to ProfilerPush and ProfilerPop when profiling sections of code. Each ProfilerPush allocates new memory to store the ProfilerSample_T object that contains sample information such as the number of allocations performed, the call stack as well as the size of the allocations.

Profiler Sample

To create a sample to be used by profiler as a record of various code executions performed in the profiled frame, we incorporate a structure that contains some frame specific information. The ProfilerSample_T struct is highlighted below:

struct ProfilerSample_T
{
	//Parent Node
	ProfilerSample_T*			m_parent = nullptr;

	//Node's children
	ProfilerSample_T*			m_lastChild = nullptr;
	ProfilerSample_T*			m_prevSibling = nullptr;

	char						m_label[64];

	//Timing
	uint64_t					m_startTime = 0;
	uint64_t					m_endTime = 0;

	std::thread::id				m_threadID;
	uint						m_refCount = 0;

	// memory
	// alloc_count, byte_count
	int							m_allocCount = 0;
	size_t						m_allocationSizeInBytes = 0;

	int							m_freeCount = 0;
	size_t						m_freeSizeInBytes = 0;

	void						AddChild(ProfilerSample_T* child);
};

This sample gives enough context over memory usage within the scope of the Profile frame being recorded.

Profiler Report

To generate a report, we make use of a ProfilerReport class which stores a collection of report nodes that were generated for each profile frame. The report node performs all the sorting logic to generate required profiler views and retrieves the sample data from ProfilerSample_T. The implementation is highlighted below:

enum ProfilerReportMode
{
	TREE_VIEW,
	FLAT_VIEW
};

//------------------------------------------------------------------------------------------------------------------------------
class ProfilerReportNode
{
public:
	ProfilerReportNode(ProfilerSample_T* node, ProfilerReportNode* parent = nullptr);
	~ProfilerReportNode();

	void					GetChildrenFromSampleRoot(ProfilerSample_T* root, ProfilerReportNode* parent);

	void					SortByTotalTime();
	void					SortBySelfTime();

	void					GetSelfTime();

	ProfilerReportNode*		m_parent = nullptr;

	int						m_allocationCount = 0;
	size_t					m_allocationSize = 0;
	
	int						m_freeCount = 0;
	size_t					m_freedSize = 0;

	uint					m_numCalls = 0U;

	float					m_totalPercent = 0.0f;
	float					m_selfPercent = 0.0f;

	char					m_label[64];

	//Optional:
	uint64_t				m_totalTimeHPC = 0U;
	double					m_totalTime = 0;
	double					m_selfTime = 0;

	double					m_avgTime = 0;		//Avg time per call including children
	double					m_avgSelfTime = 0;	//Avg time per call not including children

	double					m_maxTime = 0;		//longest a certain sample took in the collection
	double					m_maxTimeSelf = 0;	//longest a certain sample took in the collection but without including it's children

	//My kids
	std::vector<ProfilerReportNode>		m_children;

	bool					operator==(const ProfilerReportNode& compare) const;				
};

bool						ComparatorTotalTimeSort(ProfilerReportNode* elementA, ProfilerReportNode* elementB);
bool	 					ComparatorSelfTimeSort(ProfilerReportNode* elementA, ProfilerReportNode* elementB);
class ProfilerReport
{
public:
	ProfilerReport();
	~ProfilerReport();

	ProfilerReportNode*		GetFrameInHistory(uint history = 1);
	ProfilerReportNode*		GetRoot();

	static ProfilerReport*	CreateInstance();
	static ProfilerReport*	GetInstance();
	static void				DestroyInstance();

	void					DrawTreeViewAsImGUIWidget(uint history);
	void					DrawFlatViewAsImGUIWidget(uint history);

	void					SetReportMode(ProfilerReportMode mode);
	ProfilerReportMode		GetMode() { return m_activeMode; }
	void					SetRoot(ProfilerReportNode* root);

	//NOTE: This is only for the UI buttons and is to be accessed only by ProfilerReportNode
	int						m_sortTotalTime = 0;
	int						m_sortSelfTime = 0;

private:
	void					InitializeReporter();

	void					GenerateTreeFromFrame(ProfilerSample_T* root);
	void					GenerateFlatFromFrame(ProfilerSample_T* root);
	void					AddToFlatViewVector(ProfilerReportNode& rootNode);

	void					PopulateTreeForImGUI(ProfilerReportNode* root);
	void					PopulateFlatForImGUI();

	static bool				Command_ProfilerToggleMode(EventArgs& args);
	static bool				Command_ProfilerReportFrame(EventArgs& args);

	ProfilerReportNode*		m_root = nullptr;

	uint					m_lastHistoryFrame = 0U;

	ProfilerReportMode		m_activeMode = TREE_VIEW;

	int						m_clicked = 0;

	std::vector<ProfilerReportNode*>	m_flatViewVector;
	void GenerateFlatViewVector(ProfilerReportNode* m_root);
};

The profiler uses Dear ImGUI to create the profiler view that is initiated from the developer console. The view displays a histogram for the time taken during execution as well as a corresponding allocations graph to highlight the memory usage within the profiled frame.

On clicking the profiler frame, we can obtain a tree view as well as a flat view of the frames profiled sorted by their execution times and allocations if needed.

Retrospective

What went well 🙂

  • Implementing Profiler logic
  • A clear breakdown of Report and Samples
  • Abstracting irrelevant data from Profiler to maintain a hierarchy of used classes

What went wrong 🙁

  • Encountered bugs when using uints especially when using bad defaults to check against
  • Encountered issues when modifying imGUI code to allow for clicks on histograms

What I would do differently 🙂

  • Implement a simpler profiler widget
  • Allow for some navigation and options on the widget to switch memory tracking modes