Technical exploration of LLM inference metrics, batching strategies, and GPU optimization with TensorRT-LLM - from latency metrics to in-flight batching