Get in Touch

Course Outline

Introduction to Biren GPU Architecture

  • Overview of Biren and its use cases
  • Hardware layout: cores, memory, and compute clusters
  • Comparison with NVIDIA and AMD GPUs

Setting Up the Biren Programming Environment

  • Installing the Biren SDK and runtime
  • Understanding the toolchain and compiler model
  • Basic project structure and build process

GPU Programming with the Biren Stack

  • Thread and block models
  • Memory management and data transfers
  • Kernel development and launch patterns

Porting from CUDA to Biren

  • Translation techniques for CUDA code
  • Common API mappings and adaptations
  • Code conversion labs and practice

Debugging and Profiling

  • Using Biren’s debugger and profiler
  • Identifying performance bottlenecks
  • Optimizing memory access patterns

Optimization Techniques

  • Thread scheduling and instruction pipelining
  • Loop unrolling and shared memory usage
  • Advanced kernel tuning for improved throughput

Case Study and Application Examples

  • Training a model using Biren accelerators
  • Porting and profiling a vision or NLP model
  • Comparing performance against CUDA/NVIDIA

Summary and Next Steps

Requirements

  • A solid understanding of GPU architecture and parallel processing concepts
  • Practical experience with CUDA, OpenCL, or comparable GPU programming environments
  • Familiarity with deep learning frameworks such as PyTorch or TensorFlow

Audience

  • HPC developers
  • AI infrastructure engineers
  • Performance optimization specialists
 21 Hours

Number of participants


Price per participant

Testimonials (2)

Upcoming Courses

Related Categories