Loading

Tolga Birdal

Researcher & Entrepreneur

Welcome to my personal web site.

Social Profiles

About Me

  • Tolga Birdal
  • December 17, 1983
  • tbirdal (at) gmail (dot) com

As a co-founder, leading researcher and director of two great companies, I guess it would be appropriate to define myself as an eager researcher and entrepreneur, working in the field of computer vision, augmented reality and pattern recognition.

I grew up in Izmir / Turkey, the most relaxed and liberal city of all and moved to Istanbul, the most chaotic of all. Presumably, those are the reasons why I love Jazz & Drumming, with a delicate taste of wine on the side.

Just enjoy.


Wisdom is not a product of schooling, but a lifelong attempt to acquire it. Albert Einstein

  • Profile Photo Slider 1
  • Profile Photo Slider 2

Myself on Stack Exchange:

profile for tbirdal on Stack Exchange, a network of free, community-driven Q&A sites

Employment

  • Co-Founder & CEO 2011 - Present
    Gravi Information Technologies and Consultancy Ltd.

    Gravi Information Technologies and Consultancy Ltd is a Turkish start-up company founded in 2010 by a group of electronics and computer engineers (including myself) experienced in computer vision and computer graphics areas. Gravi is supported by KOSGEB, which is a government organization supporting small and medium sized enterprises in Turkey, under the R & D Support Program.

  • Co-Founder & Chief Engineer2008 - 2011
    BeFunky Inc.

    BeFunky Photo Effects allow everyday people to easily create photographically rich and artistic results from their digital images without the need for any technical knowledge. These "one-click" photo effect options produce desired results effortlessly and each effect comes with the option to make simple adjustments.

  • Intern2009
    Mitsubishi Electric Research Labs

    Designing and developing an algorithm for simulating human breathing in 4D
    Implementation of Random Walks for Image Segmentation
    Implementation of true real-time Bilateral Filtering

  • Working Student2008 - 2010
    Technical University of Munich

    I worked on 3D reconstruction system and extended it to 4D analysis. One other aspect of this work was visualization of temporal 3D volumetric data.

  • Intern2007
    Carnegie Mellon University

    Worked on 3D optical avoidance under supervision of Assoc. Prof. Metin Sitti and shape matching using segmentation maps under supervision of Prof. Martial Hebert and Dr. Yan Ke.

  • Lead Developer2007 - 2008
    Vistek Isra Vision

    I developed many industrial computer vision systems on OCR/OCV, Barcode Reading, Robot Control, Object Classification. I also designed complete systems using Halcon framework.

Awards

  • SIU Alper Atalay Best Paper Award Ranked 3rd
  • Sait Halman Computer Science Honor Prize at Robert College
  • Motorolla Best Widget Award
  • ITURO Robot Competition Award (2nd)
  • Projistor Robot Competition Award (1st)
  • Merit Scholarship, Sabanci University
  • Ranked 10th in Agean Chess Tournament

Education

  • Ph.D. in Mathematics & Computer Science

    2013 - Present
    Technical University of Munich

    Doctor Rerum Naturalium (PhD)

    Thesis: Measuring Deforming 3D Shapes Using Non-Overlapping Multiple View Geometry

  • M.Sc. in Computational Science & Engineering

    2008 - 2011
    Technical University of Munich

    Master's Thesis: 3D Deformable Surface Recovery Using RGBD Cameras

  • B.A. in Electronics Engineering

    2004 - 2008
    Sabanci University

    Undergraduate Thesis: VAPMed - A Medical Imaging Framework for Collaborative Research

  • Science Diploma

    1999 - 2004
    Robert College

    Robert College is proudly the best high school in Turkey.

  • Science Diploma

    1996 - 1999
    Bornova Anatolian High School

    This is the place where my first step to academic life was taken.

Software Skills

  • C, C++, C#, Java
  • Assembler - SSE2 - SSE4.1, AVX
  • CUDA, OpenMP, OpenCL
  • MATLAB, Maple
  • OpenCV, OpenGL, PCL, Halcon, Visionscape, VTK
  • Blas, Lapack, and other numerical libraries
  • Sketchup, Rhino3D
  • Actionscript, Javascript, SQL
  • Windows, Linux, MacOS

Hardware Knowledge

  • Xilink & ModelSim
  • Industrial Interfaces: GigE, GenICam, Serial, TCP/IP
  • Basler, Sony, UEye, The Imaging Source and many other industrial cameras
  • Many industrial vision hardware

Download My Resume

Teaching

  • Assistant in Computer Vision

    Topics: Introduction to Computer Vision, Human Visual System, Image Formation, Pointwise Image Operations, Image Intensity Transformations, Geometric/Coordinate Transforms, Interpolation, Image Neighborhood Operations, Spatial Filtering, Edge Detection, Feature Extraction, Principal Component Analysis and Applications, Morphological Image Processing, Basic Segmentation, Thresholding techniques, Motion/Dynamic Scenes, Color and texture, Object/Shape Modeling / Recognition.

    Objective: To teach the fundamentals of computer vision, which tries to ?make computers see and interpret? using the observations in the form of multiple 2D (or 3D) images.

  • Assistant in Pattern Recognition

    Statistical Pattern Recognition: Parameter Estimation and Supervised Learning, Bayesian Decision Theory, nonparametric approaches (Parzen windows, Nearest Neighbor), Linear Discriminant Functions, Feature extraction/selection; Pattern Recognition via Neural Networks; Syntactic Pattern Recognition; Nonmetric Methods, Unsupervised Learning and Clustering, Hidden Markov Models, Classifier Combination

    Objective: To give a systematic account of the major topics in pattern recognition, with emphasis on real world applications like automated face recognition, speech recognition, DNA sequence identification

Courses @ TUM

  • Numerical Programming II (5 ECTS)

    Advanced course on Numerical Programming. Topics covered mainly include PDEs. The topics of Numerical Programming I are covered in more detail.

  • Parallel Programming (5 ECTS)

    This course mainly focuses on MPI and OpenMP. Throughout the semester, students implement various parallel algorithm using MPI and OpenMP. For MPI programs, supercomputer inside Garching campus @ TUM is utilized.

  • Scientific Computing II

    This course provides a deeper knowledge in two important fields of scientific computing: Solution of large sparse systems of linear equations: Gaussian elimination, relaxation methods, multi-grid methods, steepest descent, conjugate gradient methods Molecular dynamics simulations: The physical model, the mathematical model, approximations and discretization, aspects of implementation, examples of nano-fluidic simulations.

  • Software Engineering for Engineers

  • Augmented Reality (5 ECTS)

    Augmented Reality (AR) allows users to view computer information that is graphically embedded within the real three-dimensional world. Using a semi-transparent head-mounted display (HMD) attached to a wearable computer, a user can inspect and manipulate objects while viewing information about these objects in the HMD. This information is typically displayed as virtual objects in the real world, thus augmenting the perception of the user. The wearable computer enables users to carry their work as they normally do, without imposing constraints on their mobility or their hand. AR applications span from medical minimally invasive surgery to manufacturing, from machine inspection and repair to games and tourist guides. This class presents the technical foundations of Augmented Reality - as used in current international research and applications.

  • Numerical Programming I (8 ECTS)

    This course provides an overview of numerical algorithms. Topics are: Floating point arithmetic, Solving Linear systems, Interpolation, Quadrature, Eigenvalue problems, Basics of iterative methods, Basics of numerical methods for ordinary differential equations. The course will start with a short revision of mathematical foundations for numerical algorithms.

  • Scientific Computing & Scientific Computing Lab (5 ECTS)

    These courses provide an overview of scientific computing, i.e. of the different tasks to be tackled on the way towards powerful numerical simulations. The entire "pipeline" of simulation is discussed: Mathematical models: derivation, analysis, and classification, Numerical treatment of these models: discretization of (partial) differential systems, grid generation, Efficient implementation of numerical algorithms: implementation on monoprocessors vs. parallel computers (architectural features, parallel programming, load distribution, parallel numerical algorithms), Interpretation of numerical results & visualization, Validation, Numerical methods for stationary and instationary partial differential equations, Solvers for large, Sparse systems of linear equations, adaptivity and adaptively refined discretization grids, Applications from fluid dynamics and heat transfer.

Courses @ Sabanci University

  • EE 684 Advanced Computer Vision (3 Credits)

    Special topics in telecommunications. Materials included: Camera Calibration, 3D Reconstruction, Spline Curves, Differential Geometry, and paper implementations.

  • EE 566 Pattern Recognition (3 Credits)

    Statistical Pattern Recognition: Parameter Estimation and Supervised Learning, Bayesian Decision Theory, nonparametric approaches (Parzen windows, Nearest Neighbor), Linear Discriminant Functions, Feature extraction/selection; Pattern Recognition via Neural Networks; Syntactic Pattern Recognition; Nonmetric Methods, Unsupervised Learning and Clustering, Hidden Markov Models, Classifier Fusion...

  • EL480 Advanced Computer Interfacing (3 Credits)

    Special topics in electronics. My research topic was implementing a template matching algorithm on NVIDIA GPUS using CUDA libraries provided by NVIDIA. GPGPUs can compute common tasks almost 50-200 times faster than a normal CPU. Parallel computing in GPUs play a very important role in high performance computing.

  • EE 585 Medical Image Analysis (3 Credits)

    Special topics in Electronics. Level sets, Active Contours, Registration and medical imaging techniques are discussed and covered. Implementation homework is given.

  • TE 407 Computer Vision (4 Credits)

    Digital image fundamentals. Elements of image processing systems. Image model and imaging geometry. Image sampling and quantization. Digital Convolution, Point Operations (image brightness modification, contrast enhancement, thresholding), Global operations (Histogram equalization), Neighborhood operations (Image smoothing, image sharpening), Geometric operations (Display adjustment, image warping, magnification and rotation), Temporal (frame based) operations, Edge Detection and Segmentation, Contours, shape modeling, Morphology, Texture analysis, Color image processing, Image Compression.

  • TE 407 Computer Vision (4 Credits)

    Digital image fundamentals. Elements of image processing systems. Image model and imaging geometry. Image sampling and quantization. Digital Convolution, Point Operations (image brightness modification, contrast enhancement, thresholding), Global operations (Histogram equalization), Neighborhood operations (Image smoothing, image sharpening), Geometric operations (Display adjustment, image warping, magnification and rotation), Temporal (frame based) operations, Edge Detection and Segmentation, Contours , shape modeling, Morphology, Texture analysis, Color image processing, Image Compression.

  • CS 404 Artificial Intelligence (3 Credits)

    This course is a broad technical introduction to fundamental concepts and techniques in artificial intelligence. Topics include expert systems, rule based systems, knowledge representation, search, planning, managing uncertainty, machine learning, and neural networks. Important current application areas of artificial intelligence, such as computer vision, robotics, natural language understanding, and intelligent agents, will be discussed.

  • TE 405 Digital Audio & Speech Processing (3 Credits)

    Digital Speech and Audio Processing Linear Predictive modeling of Speech. Pitch Estimation Speech Modeling. Speech Coding using Linear Predictive methods: LPC-10, CELP, MELP algorithms. Speech recognition, and synthesis. Audio processing and coding, MPEG standard.

  • CS 401 Computer Architectures (3 Credits)

    This is an introductory course on computer architecture. Topics include: basics of the von Neumann machine, instruction set architecture, instruction formats addressing modes, machine language, instruction fetch, decode and execution cycle, data path and arithmetic logic unit design, arithmetic algorithms, hardwired and microprogrammed control organization, input and output organization, memory interface.

  • EL 308 Microcomputer Based System Design (3 Credits)

    This is an introductory course on computer architecture. Topics include: basics of the von Neumann machine, instruction set architecture, instruction formats addressing modes, machine language, instruction fetch, decode and execution cycle, data path and arithmetic logic unit design, arithmetic algorithms, hardwired and microprogrammed control organization, input and output organization, memory interface.

  • EL 310 Hardware Description Languages (HDL) (3 Credits)

    Introduction to hardware description languages; VHDL fundamentals, behavioral and structural models; syntax and basic rules; design entry; behavioral simulation; logic synthesis and synthesizable code development; design mapping to standard cells and/or field programmable gate array (FPGA).

  • CS 408 Computer Networks (3 Credits)

    This course is an introduction computer networks. Topics include network architectures, local and wide-area networks, network technologies and topologies; data link, network, and transport protocols, point-to-point and broadcast networks; routing, addressing, naming, multicasting, switching, internetworking congestion/flow/error control, quality of service, and network security.

  • CS 307 Operating Systems (3 Credits)

    This course covers fundamental aspects of operating systems: management of resources such as CPU, memory space and peripheral devices. Topics include concurrent processes, mutual exclusion, process communication, cooperation, deadlocks, semaphores, scheduling, and protection. The course will also highlight important aspects of operating systems such as UNIX, Windows, etc.

  • TE 302 Discrete-Time Signals and Systems (3 Credits)

    Review of linear discrete-time systems and sampled and discrete-time signals; Fourier analysis, discrete and fast Fourier transforms; interpolation and decimation; design of infinite-impulse response and finite impulse response filters. Introduction to real time processing using Digital Signal Processors (DSP) chips.

  • MATH 306 Statistical Modeling (3 Credits)

    Statistical inference; estimation, confidence intervals, hypothesis testing; analysis of variance; goodness of fit tests; regression and correlation analysis; Bayesian methods; introduction to design of experiments; use of statistical software.

  • TE 301 Introduction to Signal Processing and Information Systems (3 Credits)

    Discrete-time Fourier transform. Discrete-time processing of continuous-time signals. Basic communication concepts, modulation, AM, FM, pulse amplitude modulation. Laplace transform, system response. Z-transform. Systems characterized by differential and difference equations. Control systems and feedback. Uncertainty and randomness in signals and systems.

  • ENS 211 Signals (3 Credits)

    Continuous and discrete, periodic and aperiodic signals, impulse, unit step. The concept of signal space, inner product, norm, bases. Fourier series, bandwidth. Gibbs effect. Fourier transform. DFT. Filtering. Sampling of continuous signals, aliasing. Bandlimited reconstruction, interpolation.

  • EL 202 Electronic Circuits II (4 Credits)

    Concepts of lumped and distributed circuits; frequency dependence of circuit characteristics; introduction to feedback circuits and feedback amplifiers; concepts of stability, phase margin and compensation; multi-stage amplifier circuits, power amplifier circuits, oscillators. Laboratory exercises are provided to reinforce the theory of operation of these circuits.

  • ENS 203 Electronic Circuits I (4 Credits)

    Passive components, basic circuit analysis, first order circuits, transient and steady state analysis, second order RLC circuits, resonance, amplifier fundamentals, operational amplifiers, introduction to diodes and transistors. Prerequisite for EL 202.

  • CS 303 Logic and Digital System Design (4 Credits)

    Number systems and conversion, boolean algebra, the assertion level concept; minterm and maxterm expensions, Karnaugh maps,and Quine McCluskey minimization, combinatorial logic circuit design, NAND and NOR gate based design. State machines and sequential circuits flip-flops, minimization of state tables, state assignment. Higher level digital system desin using SSI-MSI blocks such multiplexers/decoders, adders, memory and programmable gate arrays; bus oriented systems. Asynchronous sequential circuits, flow tables, timing hazards.

  • MATH 203 Introduction to Probability and Statistics (3 Credits)

    Experiments and events. Probability axioms. Counting techniques. Conditional probability, independent events, discrete and continuous sample spaces. Random variables and distribution function. Some standard distributions. Sampling and statistics. Also part of the "core course" pools for the BIO, TE, MS degree programs, and simultaneously one of the required courses for the FASS degree program in Economics.

  • MATH 202 Differential Equations (3 Credits)

    First-order differential equations and solution methods. Direction fields, qualitative methods, numerical approximations. Higher-order linear differential equations. Linear systems. Nonlinear systems, asymptotic behaviour of solutions. Laplace transform. Also part of the "core course" pools for the BIO, MAT, ME, EL, TE, MS degree programs.

  • MATH 201 Linear Algebra (3 Credits)

    Systems of linear equations; Gaussian elimination. Vector spaces, subspaces, linear, independence, dimension, change of basic. Linear transformations. Inner product, orthogonality. Eigenvalues. Diagonalization and canonical forms. Cayley-Hamilton Theorem.

  • CS 202 Data Structures (3 Credits)

    Introduction to theoretical aspects of computing: modeling algorithms and their run times, computational complexity. Linear data structures (lists, stacks, queues) trees (tries, binary search trees, AVL trees, tree traversals), hashing and hash tables, graphs and their representations, graph algorithms (depth first and breadth first search, single source shortest path algorithms), sorting algorithmic paradigms (divide and conquer, greedy, dynamic programming).

  • CS 201 Introduction To Computing (3 Credits)

    This course is intended to introduce students to the field of computing (basic computer organization, data representation, concepts, algorithmic thinking and problem solving), as well as giving them intermediate level programming abilities in an object-oriented programming language (currently C++). Also part of the "core course" pools for the CS, BIO, MAT, ME, EL, TE, MS degree programs.

Noncontact 3D Measurement for Automated %100 Inspection

As part of my external PhD work and for one of our clients, we have developed a non-contact 3D measurement machine, which automatically retrieves the large part to be measured and outputs the measurement report with a 1-1 correspondence to the old fashioned CMMs. The entire project was managed by me. Furthermore the core algorithms for 3D vision and measurement were coded solely by me. The project was a success in satisfying the sharp accuracy and precision demands of the industry.


BEFUNKY!

BeFunky is an online application that allows users to recreate images/videos as digital paintings, cartoons, and comics without the need for any professional skills or having to download specific tools. The sophisticated cartoonizing algorithm lets the user create a very cartoon-like image and a detailed sketch. Also user has the option to warp the image to caricaturize more. Examples can be found on the web site, membership is free. During my bachelor studies I have taken a step to co-found this nice webpage.

befunky.com

Brake Disk Inspection for Automated Quality Control

This project was in collaboration with Gate Electronics, where for an important customer we have assembled a complete system, composed of 4 cameras, light units, 2 separate protection cabins. There are more than 60 classes of disks to inspect and more than 10 algorithms were involved to inspect different parts effectively, including image matching, laser profile measurement, intensity variation validations and code reading. Other than managing this project, I was also in the role of the lead developer, playing the most fundamental part in realizing this project.


FrozenTime : A Multicamera Framework For BulletTime Effect

FrozenTime is a novel, repeatable, compact system and architecture for capturing on the fly bullet-time (Matrix like) videos. This system involves >50 cameras to capture a flawless. This software contains:

- Synchronous video capture

- On the fly chrome keying

- State of the art video stabilization

- Video output with low disk footprint

This project is used in Coca Cola Advertising Tent and was one of the most interesting works. The project page, for the moment, is only in Turkish.

Read more →

Istanbul-o-matik, an interactive projection mapping installation

With me being the CEO, Gravi, as a team engaged in this interactive real-time mapping project to be showcased at the first Istanbul Design Biennial at Istanbul Modern Museum. The idea was to create an abstract view of Istanbul emphasizing the history, culture and future as well as the current structural problems of the city.

Read more →

Surfact

Gravi SurfACT opens up an entirely new way to attract visitors’ and customers’ attention in organizations and events, such as concerts, exhibitions and fairs. Gravi Interactive Floors uses the projection area on the floor as a display and the users’ body movements for interaction. Without the need of any remote control or external device and with its high playability, Gravi SurfACT succeeds to be the center of interest in any place it is installed.

Read more →

Particle beam radiotherapy on GPU

Capturing CT data for the breathing simulation is not possible due to the non-real-time techniques employed in computer tomography. However, the breathing movements of patients should always be considered when developing medical imaging algorithms. As it wasn't really possible to acquire this data, under supervision of Dr. Fatih Porikli, I have developed a simulation algorithm, which generates a 4D video of breathing patient, out of a single 3D CT scan. This CUDA implementation respected the rigidness of the bones, while applying reasonable deformations to other tissues and organs. The result was an easy to use dataset and testbed for many tracking algorithms.


RoboChess: Chess Playing Robot

Robochess is a chess playing robot, which is able to play against a person. Cameras are used to locate initial and final positions of the pieces. A gripper and a 3D XYZ Cartesian robot controlled by a PLC were used to grab the pieces and position them. The chess pieces aren't specially designed, except they have a certain height. The chess engine is also developed by me. The robot is also able to connect via internet, meaning that if you have one of these you can play against a real opponent who's playing online chess in his computer while you play through RoboChess. Macromedia Flash interface is also available. RoboChess was demonstrated in ARIF 2006. The application is developed in Microsoft Visual C# 1.1, Festo PLC Program. All the core algorithms were coded in C. (Developed in 1.5 months)


Robo112: Autonomous Vision Based Helper Robot

Robo112 was designed to help a crippled human being to go and grab objects, especially if the object is on the ground. It can follow his special marks and arrives at the destination point, which is previously marked. Robo112 knows the wanted object by reading text (OCR). You show some text to it, which is previously taught, and it finds the object matching to the text (which is also previously taught. Multi Layer Perceptrons were used for OCR and template matching by image pyramids was used for object matching. This robot is also capable of detecting faces and following them in pretty much varying environments. Click here to go to his web page

Robust Matching of 3D CAD Models to Multiple Views

Nowadays, multi-cameras are ubiquitous in our world, because of the fact that they are able to provide much more information than a single camera does. As the camera prices decrease, people are extensively benefiting from using large amount of cameras. Many applications such as augmented reality, video surveillance, 3D reconstruction and industrial inspection already use multiple cameras. The recent research predicts that such applications will continue to utilize many cameras. Additionally, the market research shows that such a generic measuring system has a lot of use, especially in Automobile Industry, White-Goods Industry, Electronics Industry and so on.

Read more →

Sub-pixel Accurate Edge Detection and Linking

Precise detection and sub-pixel edge localization is of great importance in increasing the accuracy of measurement techniques. In this project, I presented a very accurate sub-pixel localization and further linking algorithm and form a thorough framework for sub-pixel edge analysis, treating edges as connected regions and redefining linking operation as an analogous to connected component labeling. The edges are detected using a novel third order filter with a sub-pixel linking stage similar to hysteresis thresholding. However, using the classical Canny approach is not possible due to sub-pixel edge points. On the image shown on the right, the smooth sub-pixel edges are linked and painted on the image. Each connected edge piece is painted in a different color. Notice that on the junction points, the edges are correctly split.

Read more →

Recovering 3D Deformations Using RGBD Cameras

In this work, we study the problem of 3D deformable surface tracking with RGBD cameras, specifically Microsofts Kinect. In order to achieve this we introduce a fully automated framework that includes several components: automatic initialization based on segmentation of the object of interest, then robust range flow that guides deformations of the object of interest and finally representation of the results using mass-spring model. The key contribution is extension of the range flow work of Spies and Jahne [1] that combines Lucas-Kanade [2] and Horn and Shunk [3] approaches for RGB-D data, makes it to converge faster and incorporates color information with multichannel formulation. We also introduced a pipeline for generating synthetic data and performed error analysis and comparison to original range flow approach. The results show that our method is accurate and precise enough to track significant deformation smoothly at near real-time run times.

Read more →

Real-time Illumination, Clutter and Occlusion Invariant Shape Matching

As vision moves towards more semantic and tougher problems, low-level vision still suffers from unpaid attention. Academia begins to take those low level problems such as template matching for granted, however when the moment comes to choose a method, which really works, most methods become unsatisfactory. At this point, it is not hard to observe that despite the recent advancements in template matching techniques, the final word on rotation and scale invariant matching under unpredictable illumination conditions and significant occlusion is still not said. While feature based methods seem to provide effective tools, meeting the real-time constraints require undesired tricks for optimization. In this work, our aim is to take a well-known 2D robust shape matching framework and refactor it so well that it would undoubtedly satisfy the runtime restrictions. Read more →

Real-time Detection and Tracking Framework for Augmented Reality

Even though many feature based techniques exist for localizing and tracking planar (and even non-planar) templates, it is still a question of wonder on how to implement a proper algorithm, which could really detect and track templates, under perspective deformations, illuminations changes and in clutter, with rotation invariance. In this work we uncover this mystery and provide insights and experimentation on implementing a really real-time, robust AR base.

Read more →

A Hierarchical HMM for Reading Challenging Barcodes

In state of the art manufacturing processes, barcode labeling is a ubiquitous method to track products and goods. Thus, it is of great importance to have a powerful machinery of decoding them, even under severe deformations, damages, blur, occlusion and bad illumination conditions. The applications are numerous. From assisting blind people to industrial automated inspection, technology demands solid barcode reading algorithms. Yet, to the best of our knowledge, no existing well-established framework exists to accomplish this task. In this work, we propose an algorithm for real-time decoding of barcodes, with state of the art accuracy. Our method is based on a very well-studied hierarchical HMM framework and the decoding process is posed as a Viterbi dynamic programming, which allows us to use pruning strategies to search a large state space in real-time.

Read more →

Efficient Random Walks in C

I implemented a soft-real-time implementation of the famous Random Walks tracking algorithm in Ansi C taking advantage of sparse computations. The result was used in tracking ultrasound images smoothly and efficiently. A nice OpenGL based video processing GUI in QT was complementary.

Read more →

Constant Time O(1) Bilateral Filtering

At MERL, Dr. Fatih Porikli has developed an algorithm for constant time bilateral filtering of images. When implemented on GPU using CUDA there was an improvement of 25 folds, compared to a somehow optimized OpenMP implementation. The details of the implemented algorithm are presented in this paper:

Constant Time O(1) Bilateral Filtering

Read more →

An Algorithm for Efficient Chroma Keying

Project FrozenTime required a significantly robust and fast green-box chroma keying algorithm, more advanced than current propositions. Utilizing Inverse Covariance - Khachiyan's Ellipsoid relations, this algorithms turned out to be very feasible.

Read more →

Workflow Analysis Using 4D Reconstruction Data

This project targets the workflow analysis of an interventional room equipped with 16 cameras fixed on the ceiling. It uses real-time 3D reconstruction data and information from other available sensors to recognize objects, persons and actions. This provides complementary information to specific procedure analysis for the development of intelligent and context-aware support systems in surgical environments.

TUM Project Webpage

Spatio Temporal Shape Matching

Under supervision of Dr. Yan Ke and Prof. Martial Hebert, I have worked on the project of reconstructing spatio-temporal shapes to be used in conjunction with action recognition.

Robust Matching of 3D CAD Models to Multiple Views

Nowadays, multi-cameras are ubiquitous in our world, because of the fact that they are able to provide much more information than a single camera does. As the camera prices decrease, people are extensively benefiting from using large amount of cameras. Many applications such as augmented reality, video surveillance, 3D reconstruction and industrial inspection already use multiple cameras. The recent research predicts that such applications will continue to utilize many cameras. Additionally, the market research shows that such a generic measuring system has a lot of use, especially in Automobile Industry, White-Goods Industry, Electronics Industry and so on.

One of the biggest problems involved in using multi-camera setups is robust 3D measurement of CAD parts, where environment and process dependent noise is significant. Such systems require projective registration of a CAD model to multiview camera images. Until now, many studies are carried out in order to achieve the task of fitting CAD models to multiple, monochrome photographs. In this work, we will be posing this problem as an ICP-like optimization where the global geometric poses of the individual cad parts are refined from an automatically chosen initial guess. We make use of accurate sub-pixel edges and robust functions in order to be resilient to outliers and corrupted observations. While being straightforward this method greatly enjoys from the fact that the methods used are well-studied and proven to work well under many conditions. Our approach is invariant to the structure of the geometry and sufficiently immune to errors in the initialization. While being extendible and easy to apply, this technique inherently computes the correspondences of the CAD model to the sub-pixel edges, which might further be exploited for recalibration of the measurement system not from a predefined grid, but automatically from an erroneous measurement sample.

Eventually, we perform extensive tests on real data and demonstrate both numerically and visually that the accuracy of the system is even on a globally calibrated and inaccurate system is reasonable for the industrial standards. Last but not least, we discuss the opportunities in this field and how the current measurement systems can be improved to reach the most accurate measurements.

This work is not yet published, but a paper will be available soon.

Below is a sample video:

Click here for better resolution.

Click here for the informal poster of an early stage version .

Accurate Sub-pixel Edge Detection and Linking

Precise detection and sub-pixel edge localization is of great importance in increasing the accuracy of measurement techniques. In this project, I presented a very accurate sub-pixel localization and further linking algorithm and form a thorough framework for sub-pixel edge analysis, treating edges as connected regions and redefining linking operation as an analogous to connected component labeling. The edges are detected using a novel third order filter with a sub-pixel linking stage similar to hysteresis thresholding. However, using the classical Canny approach is not possible due to sub-pixel edge points.

Real-time Illumination, Clutter and Occlusion Invariant Shape Matching

As vision moves towards more semantic and tougher problems, low-level vision still suffers from unpaid attention. Academia begins to take those low level problems such as template matching for granted, however when the moment comes to choose a method, which really works, most methods become unsatisfactory. At this point, it is not hard to observe that despite the recent advancements in template matching techniques, the final word on rotation and scale invariant matching under unpredictable illumination conditions and significant occlusion is still not said. While feature based methods seem to provide effective tools, meeting the real-time constraints require undesired tricks for optimization. In this work, our aim is to take a well-known 2D robust shape matching framework and refactor it so well that it would undoubtedly satisfy the runtime restrictions.

To do so, the choice of matching technique plays a very important role. Hough based approaches provide certain robustness, yet when rotation space search comes into account, the memory and computation requirements increase exponentially. From thereon, we re-attack the problem of conventional template matching (searching over the spatial domain) and introduce novel ideas to make matching metrics surprisingly appealing.

HMM Real-time Illumination, Clutter and Occlusion Invariant Shape Matching

In state of the art manufacturing processes, barcode labeling is a ubiquitous method to track products and goods. Thus, it is of great importance to have a powerful machinery of decoding them, even under severe deformations, damages, blur, occlusion and bad illumination conditions. The applications are numerous. From assisting blind people to industrial automated inspection, technology demands solid barcode reading algorithms. Yet, to the best of our knowledge, no existing well-established framework exists to accomplish this task. In this work, we propose an algorithm for real-time decoding of barcodes, with state of the art accuracy. Our method is based on a very well-studied hierarchical HMM framework and the decoding process is posed as a Viterbi dynamic programming, which allows us to use pruning strategies to search a large state space in real-time.

Real-time Detection and Tracking Framework for Augmented Reality

Even though many feature based techniques exist for localizing and tracking planar (and even non-planar) templates, it is still a question of wonder on how to implement a proper algorithm, which could really detect and track templates, under perspective deformations, illuminations changes and in clutter, with rotation invariance. In this work we uncover this mystery and provide insights and experimentation on implementing a really real-time, robust AR base. Our AR framework, developed mainly by myself in Gravi Labs enjoys from a reliable tracking. The current work is in fusing this AR framework with Oculus, for an amazing mixed reality experience. Here are some techniques, which are used jointly in our framework, to achieve such robustness and speed (10ms/frame):

- Real-time camera pose estimation

- Scale Invariant Agast feature points

- Threaded environment for context switching between tracking and detection

Click here for better resolution.

Recovering 3D Deformations Using RGBD Cameras

Deformable surfaces are ubiquitous in real world and thus are of great interest to computer vision researchers. They exist in various forms such as packets, flags, clothing, organs, bodies and etc. For this reason, their application areas are extensive ranging from sports to entertainment, from medical imaging to machine vision. While the research in the area is quite new, many advanced methods are already being developed. Most of these methods rely on stereo computations or try to solve the under-constrained problem of recovering deformations from monocular scenes. Recently, there have been an increasing number of depth (RGBD) cameras available at commodity prices. These cameras can usually capture both color and depth images in real-time, with limited resolution and accuracy.
In this thesis, we study the problem of 3D deformable surface reconstruction with such RGBD cameras. Specifically, we base our implementation on Microsoft’s Kinect. Our method can handle the global and significant deformations. We deliver our novel method as an easy tool for learning deformations, material invariant tracking and naturally a generic algorithm for 3D deformation recovery.
The contribution of this thesis is three-fold. We start by proposing a new but straightforward algorithm for automatically segmenting a surface of interest from RGB-D data, which we use to initialize our tracker. Next, we take an existing surface flow framework called range flow, then improve and adapt it for our case of 3D deformation capture. This step is nothing but a surface-flow tracker. Finally, to make this tracker more robust against noise, we propose a mass spring model based post filter. The post processing step acts as a model based constraint, which attracts the individual vertices together to form an inextensible tracking capability. Our post filter is chosen to be a cloth model, which is very well-studied in the realm of computer graphics. Last but not least, we thoroughly discuss the results and how the system behaves. The algorithm performs soft-real-time when implemented on a CPU. We also explain the parallelization aspects while paving the way for a real-time implementation on the GPU. Overall, we present a fundamental system for 3D tracking of deformable surfaces. As well as being extendible, we show that there is also room for various improvements and advancements.

My Master Thesis →

Constant Time O(1) Bilateral Filtering

Dr. Fatih Porikli's work on bilateral filtering presented three novel methods that enable bilateral filtering in constant time O(1) without sampling. Constant time means that the computation time of the filtering remains same even if the filter size becomes very large. The first method takes advantage of the integral histograms to avoid the redundant operations for bilateral filters with box spatial and arbitrary range kernels. For bilateral filters constructed by polynomial range and arbitrary spatial filters, our second method provides a direct formulation by using linear filters of image powers without any approximations. Lastly, it is shown that Gaussian range and arbitrary spatial bilateral filters can be expressed by Taylor series as linear filter decompositions without any noticeable degradation of filter response. All these methods drastically decrease the computational time by cutting it down constant times (e.g. to 0.06 seconds per 1MB image) while achieving very high PSNR’s over 45dB. In addition to the computational advantages, those methods are straightforward to implement. At MERL, I implemented this work on GPU using CUDA there was an improvement of 25 folds, compared to a somehow optimized OpenMP implementation. The details of the implemented algorithm are presented in this paper:

Constant Time O(1) Bilateral Filtering

Real-time Random Walks Image Segmentation

Quoting Leo Grady, "A novel method is proposed for performing multi-label, interactive image segmentation. Given a small number of pixels with user-defined (or pre-defined) labels, one can analytically and quickly determine the probability that a random walker starting at each unlabeled pixel will first reach one of the pre-labeled pixels. By assigning each pixel to the label for which the greatest probability is calculated, a high-quality image segmentation may be obtained. Theoretical properties of this algorithm are developed along with the corresponding connections to discrete potential theory and electrical circuits. This algorithm is formulated in discrete space (i.e., on a graph) using combinatorial analogues of standard operators and principles from continuous potential theory, allowing it to be applied in arbitrary dimension on arbitrary graphs."

Due to GPGPU programming and optimized C code, I have managed to implement and run Random Walks algorithm in real-time. The video below demonstrates the initial results of the implementation.

For more information please check Random Walks paper of Leo Grady .

Chroma Keying Algorithm

Project FrozenTime required a significantly robust and fast green-box chroma keying algorithm, more advanced than current propositions. Utilizing Inverse Covariance - Khachiyan's Ellipsoid relations, these algorithms turned out to be very feasible.

Below is a demonstration video:

Interactive Projection Floors

Gravi SurfACT opens up an entirely new way to attract visitors’ and customers’ attention in organizations and events, such as concerts, exhibitions, fairs. Gravi Interactive Floors uses the projection area on the floor as a display and the users’ body movements for interaction. Without the need of any remote control or external device and with its high playability, Gravi SurfACT succeeds to be the center of interest in any place it is installed.

The customizable infrastructure originating from Gravi’s unique technology accompanies the sensitive game controls and realistic graphics. With score based games such as Balloon Shoot, Penalty, Exploding Bricks, Air Hockey, Football and Billiards, your visitors can enjoy the joyful atmosphere you created for them. The difficulty levels are adjustable. With unlimited interactions capability and visual effects, you can attract all the attention, especially when the flow of visits and the transition are high. Up to now, we have developed 14 interactive effects and we can endlessly customize them to fully cover your marketing, publicity and promotion needs.

Below is a sample video from runtime:

Istanbul-o-matik, an interactive projection mapping installation

With me being the CEO, Gravi, as a team engaged in this interactive real-time mapping project to be showcased at the first Istanbul Design Biennial at Istanbul Modern Museum. The idea was to create an abstract view of Istanbul emphasizing the history, culture and future as well as the current structural problems of the city. We also wanted the user to create her own experience through some interaction. The design team worked over two months, photographing the city and animating the images using motion graphics approach. The design team's output was 100.000 texture fragments. Rendering these randomly accessible textures (because of the interaction), and composing a scene through blending and projection in real-time proved to be a challenge. The biggest task was handling I/O operations between 'disk and memory' and 'CPU and GPU'. Our team has harnessed the power of CPUs and GPUs jointly to achieve the real-time rendering. The generated content was projected on a 4.5x6m 3D maquette using high quality projectors. Yet this brings up the task of precise scene-projector calibration, which was difficult due to such immersive scale. Our proprietary scene-camera-projector calibration algorithms and interfaces, which involve me in the core of the development process, enabled us to solve this problem effectively, to the every bit of the pixel on the screen. In the end the visualization was controlled by nine different interactions to create a specialized view combining the interactions of multiple participants in the room. The installation was very well received by critics and featured in national media. Please check the website for further info.

FrozenTime

FrozenTime is a novel, repeatable, compact system and architecture for capturing on the fly bullet-time (Matrix like) videos. This system involves >50 cameras to capture a flawless. This software contains:

- Synchronous video capture

- On the fly chrome keying

- State of the art video stabilization

- Video output with low disk footprint

For more information on how Frozen-Time is shot, you might want to check this Wikipedia page. This project is used in Coca Cola Advertising Tent and was one of the most interesting works. The project page, for the moment, is only in Turkish.

Below is an introductory video:

Publications

  • Towards A Complete Framework For Deformable Surface Recovery Using RGBD Cameras

    IROS'12 Workshop on Color-Depth Fusion in Robotics

    Tolga Birdal, Diana Mateus Slobodan Ilic

    In this paper, we study the problem of 3D deformable surface tracking with RGBD cameras, specifically Microsofts Kinect. In order to achieve this we introduce a fully automated framework that includes several components: automatic initialization based on segmentation of the object of interest, then robust range flow that guides deformations of the object of interest and finally representation of the results using mass-spring model. The key contribution is extension of the range flow work of Spies and Jahne [1] that combines Lucas-Kanade [2] and Horn and Shunk [3] approaches for RGB-D data, makes it to converge faster and incorporates color information with multichannel formulation. We also introduced a pipeline for generating synthetic data and performed error analysis and comparison to original range flow approach. The results show that our method is accurate and precise enough to track significant deformation smoothly at near real-time performance.

    Article in PDF

  • A Novel Method For Image Vectorization

    arXiv:1403.0728

    Tolga Birdal, Emrah Bala

    Vectorization of images is a key concern uniting computer graphics and computer vision communities. In this paper we are presenting a novel idea for efficient, customizable vectorization of raster images, based on Catmull Rom spline fitting. The algorithm maintains a good balance between photo-realism and photo abstraction, and hence is applicable to applications with artistic concerns or applications where less information loss is crucial. The resulting algorithm is fast, parallelizable and can satisfy general soft realtime requirements. Moreover, the smoothness of the vectorized images aesthetically outperforms outputs of many polygon-based methods.

    Keywords: vectorization, vector conversion, image processing, segmentation

    Article in PDF

  • Flow Enhancing Line Integral Convolution Filter

    ICIP 2010

    Tolga Birdal, Emrah Bala

    Visualization of vector fields is an operation used in many fields such as science, art and image processing. Lately, line integral convolution (LIC) technique [1], which is based on locally filtering an input image along a curved stream line in a vector field, has become very popular in this area because of its local and robust characteristics. For smoothing and texture generation, used vector field deeply affects the output of LIC method. We propose a new vector field based on flow fields to use with LIC. This new hybrid technique is called flow enhancing line integral convolution filtering (FELIC) and it is highly capable of smoothing an image and generating high fidelity textures.

    Article in PDF

  • A Factorization Based Recommender System for Online Services (Çevrimiçi Servisler için Ayrısım Tabanlı Tavsiye Sistemi)

    SIU 2013 Alper Atalay Best Paper Award Ranked 3

    Umut Simsekli, Tolga Birdal, Emre Koc, A. Taylan Cemgil

    Along with the growth of the Internet, automatic recommender systems have become popular. Due to being intuitive and useful, factorization based models, including the Nonnegative Matrix Factorization (NMF) model, are one of the most common approaches for building recommender systems. In this study, we focus on how a recommender system can be built for online services and how the parameters of an NMF model should be selected in a recommender system setting. We first present a general system architecture in which any kind of factorization model can be used. Then, in order to see how accurate the NMF model fits the data, we randomly erase some parts of a real data set that is gathered from an online food ordering service, and we reconstruct the erased parts by using the NMF model. We report the mean squared errors for different parameter settings and different divergences.

    Article in PDF (Turkish)

  • Real-time automated road, lane and car detection for autonomous driving

    DSP in Cars 2007

    Tolga Birdal, Aytul Ercil

    In this paper, we discuss a vision-based system for autonomous guidance of vehicles. An autonomous intelligent vehicle has to perform a number of functionalities. Segmentation of the road, determining the boundaries to drive in and recognizing the vehicles and obstacles around are the main tasks for vision guided vehicle navigation. In this article we propose a set of algorithms, which lead to the solution of road and vehicle segmentation using data from a color camera. The algorithms described here combine gray value difference and texture analysis techniques to segment the road from the image, several geometric transformations and contour processing algorithms are used to segment lanes, and moving cars are extracted with the help of background modeling and estimation. The techniques developed have been tested in real road images and the results are presented.

    Article in PDF

Patents

  • METHOD AND SYSTEM FOR GENERATING ONLINE CARTOON OUTPUTS

    United States 20090219298 - 2009

    Tolga Birdal, Mehmet Ozkanoglu, Abdi Tekin Tatar

    A method and system for generating user-accessible effects. The method includes receiving a library of operators, each operator including a set of operations performable on an image. The method includes receiving an effect definition from a designer via a graphical user interface, wherein the effect definition includes a set of operators from the library to be executed on a user-provided image and parameters associated with each operator. The method includes saving the effect definition to an accessible memory. The method includes uploading the effect definition to a server wherein the effect definition is accessible to a user over a network.

    Visit Patent Webpage

  • METHOD AND SYSTEM FOR PROVIDING AN IMAGE EFFECTS INTERFACE

    United States Patent 20100223565 - 2010

    Tolga Birdal, Emrah Bala, Emre Koc, Mehmet Ozkanoglu, Abdi Tekin Tatar

    A method and system for generating user-accessible effects. The method includes receiving a library of operators, each operator including a set of operations performable on an image. The method includes receiving an effect definition from a designer via a graphical user interface, wherein the effect definition includes a set of operators from the library to be executed on a user-provided image and parameters associated with each operator. The method includes saving the effect definition to an accessible memory. The method includes uploading the effect definition to a servers wherein the effect definition is accessible to a user over a network

    Visit Patent Webpage

Thesis

  • 3D Deformable Surface Recovery Using RGBD Cameras

    Master Thesis At Technical University of Munich, 2011

    Tolga Birdal

    Deformable surfaces are ubiquitous in real world and thus are of great interest to computer vision researchers. They exist in various forms such as packets, flags, clothing, organs, bodies and etc. For this reason, their application areas are extensive ranging from sports to entertainment, from medical imaging to machine vision. While the research in the area is quite new, many advanced methods are already being developed. Most of these methods rely on stereo computations or try to solve the under-constrained problem of recovering deformations from monocular scenes. Recently, there has been an increasing number of depth (RGBD) cameras available at commodity prices. These cameras can usually capture both color and depth images in real-time, with limited resolution and accuracy.
    In this thesis, we study the problem of 3D deformable surface reconstruction with such RGBD cameras. Specifically, we base our implementation on Microsoft’s Kinect. Our method can handle the global and significant deformations. We deliver our novel method as an easy tool for learning deformations, material invariant tracking and naturally a generic algorithm for 3D deformation recovery.
    The contribution of this thesis is three-fold. We start by proposing a new but straightforward algorithm for automatically segmenting a surface of interest from RGB-D data, which we use to initialize our tracker. Next, we take an existing surface flow framework called range flow, then improve and adapt it for our case of 3D deformation capture. This step is nothing but a surface-flow tracker. Finally, to make this tracker more robust against noise, we propose a mass spring model based post filter. The post processing step acts as a model based constraint which attracts the individual vertices together to form an inextensible tracking capability. Our post filter is chosen to be a cloth model, which is very well-studied in the realm of computer graphics. Last but not least, we thoroughly discuss the results and how the system behaves. The algorithm performs soft-real-time when implemented on a CPU. We also explain the parallelization aspects while paving the way for a real-time implementation on the GPU. Overall, we present a fundamental system for 3D tracking of deformable surfaces. As well as being extendible, we show that there is also room for various improvements and advancements.

    My Master Thesis