Kulin Seth

Kulin Seth

Profile Summary

Software engineer with GPU programming experience interested in Machine learning with research background in Compute and Signal processing.

Professional Experience

Qualcomm Technologies Inc.

Sr. Software Engineer Feb 2011 - Present

Graphics driver development work

  • Worked on Driver bring-up for new low-level graphics API - Vulkan
  • Involved in Driver rearchitecture for OpenGL ES features, focusing on shader-related tasks. Some of the features worked on - Binary Cache, SSO, Advanced Blending
  • Performance analysis : Compute workloads (analyzed latency characteristics of global/local memory), Shader optimizations (load commands, constant caching), Framebuffer pre-rotation
  • Helped with OpenGL ES 3.0 and GLSL shading language 300 specification and extension for Clip/Cull distance

Compiler front-end work

  • Design of Driver/Compiler interface, for HW/GLSL symbol metadata.
  • Implemented features such as UBOs, Mixed-precision support, Shader bypass
  • Worked on optimizations such as Shader patching, YUV CSC matrix (static vs dynamic branching), constant folding, function inlining (samplers) etc.

Analog Devices Inc.

Coop / Industry Project Janruary 2009 - August 2009

Worked on data cache prefetching using stream buffers for next-gen Blackfin architecture utilizing compiler guided optimizations. It was part of work presented at Embedded Systems Conference.

  • Project involved trace-driven simulations and detailed memory modeling to study the effects on performance
  • Used profiling and instrumentation for compiler guided optimization
  • Detailed architectural core simulation using execution-driven M5 simulator by developing Blackfin port
  • BDTI power benchmarking for Blackin processors

Analog Devices Inc.

DSP Applications Engineer June 2007 - June 2008

  • Developed a generic Verification Test Generator for post-silicon validation on SHARC processors
  • Porting of real-time Ogg Vorbis Decoder on SHARC processors

Relevant Projects

Machine Learning Experience

Projects Feb 2015 - Present

Course work done

Miscellaneous projects done

  • Object classification using Tensorflow
  • TensorFaces paper implementation using Torch
  • Low level GPU API implementation for mapping Machine learning workloads to GPU clusters
  • Large margin nearest neighbor algorithm comparison. These involve GPU implementations for the algorithms studied

OpenCL: Framework for heterogeneous embedded platforms

Masters ThesisJanuary 2010 - February 2011

  • Developed platform simulator of ARM model and embedded GPU for design space exploration. ARM model was taken from OVPSim library of CPU models and embedded GPU by modifying GPGPUSim simulator. AMBA AHB bus model was used for ARM model and embedded GPU communication
  • Developed compilation toolchain using LLVM, Clang frontend/NVIDIA's OpenCL drivers. NVIDIA's driver was used to compile OpenCL kernels to PTX which was in-turn was translated to LLVM IR. The LLVM IR was used for IR-IR transformations.
  • Developed functional runtime driver for OpenCL to drive the GPU model
  • Studied correlation between compiler optimizations and micro-architectural details for performance benefits
  • Benchmark used were OpenCL SDKs and HPEC(High Performance Embedded Computing)

Programming Skills

Languages
  • General : C/C++, Python, Standard ML
  • Shading : GLSL, Compute kernels CL/CUDA
  • Assembly : ARM, MIPS, DSP (Blackfin/SHARC)
  • IR : LLVM-IR, SPIRV, PTX
Frameworks
  • Machine learning : Scikit-learn, Torch, Tensorflow
  • General : Matlab, Numpy, Scipy
  • GPU : OpenGL, OpenCL, CUDA
OS
Android, Linux, Windows

Education

Northeastern University

Master of Science, Computer Engg (Thesis) September 2008 - December 2010

Research Advisor: Prof. David KaeliGPA: 3.75/4.0

National Institute of Technology, Surathkal

Bachelor of Technology, Electronics & Communications Engg. September 2003 - May 2007

Academic Advisor: Prof. Sumam DavidGPA: 8.53/10.0

Awards

Northeastern University 2009-2011 : Graduate Research Assistantship

Qualcomm Inc. : Various Qualstars for different contributions

Analog Devices Inc.: ADI Employee performance award

Other Projects/Experience

Indian Space Research Organization

Intern May 2006 - July 2006

Title: Implementation of range compression used in QLP/NRTP SAR processors on ADSP 101-S Tiger-Sharc Processors

The processing steps involved in range compression are given below

  • FFT of individual pulse return
  • FFT of Reference function
  • Complex multiplication in frequency domain
  • Inverse FFT of the product
A single processor does not meet the required timing constraint of 0.3ms. The Analog Devices TigerSHARC processor has been chosen because of its raw processing power and inherent multiprocessing capability via its link ports. The main part is the data distribution and output collection in the multi-processing environment keeping timing synchronization in mind. The data size is 16K complex samples at the rate of 3000 Hz (PRF rate).

Indian Institute of Science

Summer Research Project May 2005 - July 2005

Title: Cloud tracking using image processing algorithm

Implementation of image processing algorithm called scale-space classification on the satellite data to extract the cloud features. This data was further processed using noise removing methods like threshold classification by using cumulative-histogram technique. This technique was applied on data obtained for MUMBAI’S heavy downpour during July 2005.

Compiler Project

Compilers Course project Fall 2009

Developed all compilation stages for the Compilers course project. It was later extended to add more language features and corresponding Compiler changes

Evaluation of sorting algorithms

Fundamentals of Computer Engg Course ProjectFall 2008

Evaluated and compared different sorting algorithms for performance.

Implementation of lossless image compression algorithm

DSP Architectures Course Project Spring 2007

Wavelet transform in integer domain was the algorithm used on BF-535 fixed point processor. Decompression algorithm was applied on to regenerate the original image to verify the correctness of the method

Classification of singing voice and instrumental sounds using SVMs

DSP Lab Spring 2006

Mel Frequency Cepstral Coefficients with Discrete wavelet transform were used as feature vectors and Torch toolkit for Support Vector Machines (with Radial Basis Function kernel) in MATLAB environment.

LZW compression/decompression using ARM assembly

Microprocessor Course Project Spring 2005

File or line of text was inputted and compressed output was stored in memory. Decompression algorithm was also implemented to verify the correctness of the implementation.