Diptanu Sarkar
hello [at] diptanu [dot] com

I am a computer science graduate student at the Rochester Institute of Technology. I am currently working as a software engineering intern at Wayfair's Infrastructure team.

Previously, I worked as a graduate research assistant at the Center on Access Technology under Dr. Micheal Stinson. Earlier, I've earned my bachelor's degree in electronics from NIT Agartala, India in 2015. I also worked with the Infosys Engineering team in Bengaluru, India.

profile photo


I am passionate about natural language processing and large-scale distributed systems. Specifically, I am interested in information retrival and automatic speech recognition.

Automatic Speech Recognition to provide better accessibility for Deaf or Hard of Hearing
Micheal Stinson, Lisa B. Elliot, Donna Easton, Diptanu Sarkar, Prionti Nasir

A model based on word importance in utterances using machine learning for Automatic Speech Recognition (ASR) systems to provide better phone captioning and accessibility to the deaf or hard of hearing (DHH) community.

Automatic Language Identification in Text  [Live Demo]
Technology Stack: Python, NumPy, SciPy, Scikit-learn, Flask

Developed an automatic language identification model using the Bi-gram, Naive Bayes, Artificial Neural Network to detect ten different natural languages. The model is trained using the WiLI-2018 benchmark dataset, and the highest accuracy achieved on the test dataset is 99.7% with paragraph text.

Bot to play Doom
Technology Stack: Python, PyTorch, Gym, OpenCV

Designed an intelligent agent to play the game Doom in a 3D environment using Deep Convolutional Q Learning.

E-commerce Webservice
Technology Stack: Java, JSP, Springboot, JDBC, MySQL

Developed an online bookselling web-service with the users, reviews, orders management capabilities and features like advanced search, view book details, customer 2-D authentication, shopping cart, and checkout.

Part-of-Speech (POS) Tagger for the English Language
Technology Stack: Python, NLTK, Bag-of-Words, Hidden Markov Model, Bayes Net, Naive Bayes

Implemented a part-of-speech tagger in the English language using the Hidden Markov Model, Bayesian Net, and Naive Bayes. Then, compared the performance of the Forward-Backward Algorithm and the Viterbi Algorithm. The model resulted in over 91.2% word accuracy with 63.6% sentence accuracy.

Image Classification using Deep Neural Networks
Technology Stack: Pyhton, PyTorch, NumPy, OpenCV

Built image classification deep learning architectures - AlexNet, VGG16, and ResNet using transfer learning and fine-tuning in PyTorch. Final model accuracies achieved are AlexNet-81.2%, VGGNet-85.6%, ResNet-84.7% on 10K test images.

Data Structures & Algorithms: Asymptotic Analysis & Notations , The Startup

In this article, the importance of asymptotic analysis is explained, followed by the introduction to asymptotic notations. The worst, average, and best case time complexity analysis are also briefly discussed.

Automatic Language Identification in Short Utterances

Language Identification in Natural Language Processing is the process of identifying the spoken language in speech utterances. This blog examines three different models to recognize languages automatically - Dynamic Hidden Markov Networks model, Deep Neural Network model, and Long Short-Term Memory Recurrent Neural Network model.

Detecting Emotions in Lyrics

Music stimulates strong human emotions and feelings. Music platforms provide highly customized playlists to every user along with playlists based on moods. Emotions are subjective, and the subjective nature of emotions makes emotion detection a very challenging task when applied to music. Previously, music emotion detection solely relied on acoustic features. In recent studies, it’s observed that using music lyrics features along with acoustic features significantly improves the classification result.

Appreciate the aesthetics? Credit: Jon Barron.
Last updated: 15 Feb 2020