Media Tech Start Ups - Made In India, For The World

Today's episode features Frammer and Neural Garage

2 May 2025 5:00 PM IST

| 2 May 2025 5:00 PM IST

Reading Time:

Media Tech Start Ups - Made In India, For The World

In Episode 8 of The Media Room, media expert and author Vanita Kohli-Khandekar speaks to Frammer's co-founder and CEO, Suparna Singh. Frammer is an AI-based video tool that ingests hours of video, and then edits and packages it into bite sized videos and snippets for social media. She also talks to Mandar Natekar, co-founder and CEO of Neural Garage, which offers a visual dub technology that can lip sync voice dubs for other languages. Tune in for insights into the work of some of the most cutting edge media tech startups in India.

NOTE: This transcript is done by a machine. Human eyes have gone through the script but there might still be errors in some of the text, so please refer to the audio in case you need to clarify any part. If you want to get in touch regarding any feedback, you can drop us a message on [email protected].

—

TRANSCRIPT

Vanita Kohli-Khandekar (Host): In the first episode of my two-part series on media tech firms from India, we spoke to Baskar Subramaniam, co-founder and CEO of Amagi, an established media tech firm which is founded in India and working for some of the largest media firms in the world. In this second and concluding episode, I will be speaking to two firms that I found interesting. The first is Frammer AI.

It's been set up by Suparna Singh, Arijit Chatterjee, and Kavaljit Singh. Frammer offers an AI-based video tool that ingests hours of video, edits it, and creates everything from social media posts to teasers within minutes. Singh used to head NDTV for many years and was responsible for its digital push long before other news publishers had pivoted there.

Over to Suparna.

—

Vanita Kohli-Khandekar: Hi, Suparna.

Welcome to the media room. Wonderful to have you here. I mean, I'm seeing you in a non-NDTV avatar now.

You were the CEO of NDTV the last time I interviewed you. Suparna, tell us about Frammer and what it does precisely for my listeners, and then I'll ask you more questions based on that.

Suparna Singh: Sure. So, as a very quick introduction, Vanita, first of all, thank you for having me. It's a pleasure to be talking with you, talking sharp with you.

Few people in the industry know as much about this space as you, so it's a special privilege. What Frammer does is it takes any video and makes it completely digital ready, right? So, by digital ready, I mean that it completely ships it fully ready to be published on social media platforms.

Now, in this attempt or in this service, what it does is it starts out by generating a transcript. It then creates auto captions or subtitles for the video. It then identifies the most engaging content from that video, what we call key moments.

It packages each of these key moments, especially for different social media platforms. As you know, the requirements for what Twitter has is different from what YouTube needs for video and so on. So, it does all that hard work.

And the last step is that it converts those key moments from landscape into portrait mode. So, what it's actually doing is starting out with a long piece of video, could be 30 minutes, could be 10 minutes long, and synthesising it all the way down to maybe like a one-minute YouTube short, if that is what the publisher requires. So again, Frammer's job is to take any content and make it completely digital ready, make it highly discoverable and therefore monetizable.

Vanita Kohli-Khandekar: Super, that's really very clear. We discussed the last time we talked about Frammer, we discussed the cost and time implications, and also potential users, because you mentioned entertainment case users. I think you started with the India Today Group and Zee, if I'm not mistaken.

So, could you take us through some clients and what they're doing? And I also, sorry, I'm asking a lot of questions together. One of the things you told me was that you're not going to be targeting influencers.

You're going to target largely the entertainment and news businesses. Just take us through some of the thinking there.

Suparna Singh: Of course, let's start with the last part of your question, Vanita. As far as influencers go or B2C models go, there are actually a lot of tools that are out there which are designed for prosumers to do similar things. We don't want to get into that space because, as I said, it's already quite crowded.

And also, I feel it's highly commoditised. The only way to really kind of find a footing there is to offer a cash discount, which is not really a sustainable or the healthiest business model. Also, very importantly, my domain expertise, the founding team's expertise, are pressed in premium content, you know, in publisher quality content.

And so, we wanted to design something that would solve a big problem for all publishers. We knew this from my 30 years of being at NBTV, the last 10 of which were highly focused on maxing out on traffic and revenue on digital. So, I don't know if you recall, but NBTV, much ahead of the others in nearly 2009, had said that it would put its digital piece at the heart of its operations.

At that time, people thought we were bunkers for doing that because they felt that, you know, broadcast was so much more important. So, anyway, with that focus on digital, what we found early on and what publishers, apart from NBTV, came to at a later stage in the revolution is that all the eyeballs have moved from the large screen to the small screen. But content is being produced still for the large screen.

By that, I mean 16 by 9 ratio, right? And so, it takes so much time, so many people to, you know, take one video and carve it into small, high-quality clips. Now, you can kind of do a cut-and-paste job, but then what happens is that the output is not high quality.

So, either algos treat it as spam, label it as spam, so actually that hurts the publisher instead of helping them, right? Or if publishers don't go down that route, then what they're doing is limiting the amount of short video that they create. So, our aim was to fill that gap for publishers.

Again, we are a B2B SaaS offering and the reason for that is because we are trained and our platform is trained to deal with highly complex content. So, as a simple example, to, you know, process a speech or let's say an interview, which is about political news or hard news or even a reality show, right? The content can be complex, there are nuances.

As you can imagine, there would be very big repercussions if editorial integrity is lost or if accuracy is lost or context is lost. So, our platform, Framma, is designed to handle all those nuances, sensitivities, and complexities. Sometimes with clients, what we do is we show them their content, where it is processed through Framma versus a prosumer tool, the kind that we spoke about just a short while ago.

The result is shockingly different, right? So, basically, our platform, again, designed for complex content, prosumer tools designed for much simpler content. Also, another important point of differentiation, Vanita, is that we are a complete end-to-end platform.

What is available in the prosumer space are tools which solve one or two problems. Like I said, we do everything and I had listed those steps, including generating transcripts, auto captioning, all that. So, we don't want our user to ever have to leave our platform.

They stay in our ecosystem, they don't need to toggle between different softwares, whether it is to generate a transcript, whether it is to identify key moments, or whether indeed it is to create thumbnails or headlines.

Vanita Kohli-Khandekar: You know, when you say you want the user to stay on your platform, if you could just take me through the journey a publisher has, Vanita. So, if there's a live client that you can talk about, it would be nice. And also, some sense of where your clients are spread.

Because last time we spoke, you were also doing a lot of developmental work in the US markets. So, I don't know where you are right now in your client portfolio and is it largely entertainment and news or is it largely news only?

Suparna Singh: Sure. So, I'll start with the second part. As far as client use cases goes, we started out, if you recall, by thinking this would be a good use case for media publishers, right?

That has expanded now to basically any corporate brand that wants short video as a marketing asset. I'll reiterate that because short video now is an essential way for any brand to communicate with its audience. In particular, Gen Z, Millennials, all of them, right?

The attention economy is a very real thing. So, our use case started out with, like you mentioned, the India Today group. We're also working with ABP and I'll explain how.

But interestingly, we've gone beyond that into entertainment. So, for example, we're working with a couple of Spanish networks on identifying the highlights of their telenovelas, right? Also working with their reality shows.

In India, working with Peak 15, which is, as you know, a venture fund, they use us to create all their highlights and content for YouTube, for example. So, the use case is significant. One of our biggest international clients is Brightcove.

You know of them. They're a US-listed company. They are one of the biggest digital media organisational systems in the world.

We are their official AI partner. They signed us up in September and they have thousands of clients across the world, again, ranging from corporates who may use us to summarise their internal Zooms or their webinars, which they want to put out for public consumption, to, of course, media publishers, to sports channels, for example. So, those are the kind of clients we're dealing with.

Let's look at a use case. ABP, for example. They are generating hour upon hour of content, right?

Live programming. Framers solves two use cases for them. One is that a part of our platform, what we call Sherpa, handles all their live programming.

So, basically, we work with their live stream and on the fly, short clips are being created for them in real time as the anchor or as the bulletin is running and those clips then can be published straight away to social media. To give you an example, this could take about as little as two minutes to publish a clip from a live stream. The second part of the other part of our platform deals with edited video.

So, when, for example, a debate show in the evening is recorded, they toss it into Framer and then Framer will get that entire debate show ready to be published on YouTube or on their site and app. And in that attempt, it will also not just create the headlines and the thumbnails and all the stuff that I talked about, but it will identify, again, the highlights. Each of those are then published on Facebook, on YouTube or on YouTube Shorts or Insta Reels.

So, that is kind of the use case that we are working with right now.

Vanita Kohli-Khandekar: Human intervention, any issues there because, you know, especially because news was sensitive issue. So, I remember you mentioned some level of human intervention.

Suparna Singh: So, the level of human intervention is we need just one user to operate the platform. Let me just explain how it works. You basically just upload a video, you being the user, and the user checks what features they want.

For example, do they want key moments, do they want key moments and vertical video, right? They checkbox that, then they sit back and then they go on to do other things, not have to worry about this. When the content is ready, Frammer alerts them via either WhatsApp or an email, whatever the client wants, and then the user quickly reviews the content.

If there are any issues that they want corrected, there are simple edit tools provided for every stage, and then they hit publish. Like you said, you know, because of my experience in news and because news is sensitive content, we always emphasise, please keep one human in the loop. It just requires one, nobody with technical skills, right?

So, for example, the entire video can be edited or changed by a user just using the transcript, i.e. the written word. That means you don't need anybody with technical skills to operate the platform. And I think this is what has been particularly attractive for some of our clients because video editors, as you know, are a precious resource in companies and an expensive resource, right?

Now, somebody on the desk, a content creator, a producer can operate this by themselves and create the output. The reason I say that we insist on a human in the loop is, first of all, if AI were perfect, then human beings would be out of jobs in this space. That is certainly not our intent.

AI does make mistakes. Occasionally, it hallucinates, but guess what? So do human beings.

I see enough content, as do you, which is published with spelling mistakes or typos and things like that. And so I think AI basically being integrated into our work processes is now a way of life. And the only thing that is required is to keep a close watch on what is carving out for you.

Vanita Kohli-Khandekar: Some sense of the capital here, if it has been put in, where have you raised it from? And what sort of time frame are we looking for?

Suparna Singh: So, our pre-seed round was $700,000 and that was raised in September 2023. And that was an angel investor. Our seed round came a year later.

That was Lumikai. They put in $2 million. That was at a post-money valuation of $15 million.

We are comfortable right now with our runway, but I think we will start looking to raise our next round in about three to four months. Right now, our attempt is to sign up a couple of big US clients who we are in very advanced talks with. So naturally, we would want to go to investors after we've got those signed up.

We are right now in advanced trials with them.

Vanita Kohli-Khandekar: But tell me, when you're talking about overseas clients, is there a Brightcove overlap here? Because your Brightcove's AI partner, they service a whole lot of the globe.

Suparna Singh: So actually, Brightcove just helps us with our go-to-market strategies. So first of all, the reason Brightcove signed us up as their AI partners, because they didn't have an AI solution for creating short video and the demand was building on their front, right? And actually, Vanita, maybe two years ago, when we spoke to companies, quite a few networks were debating whether they should try and build something like this organically within their systems.

And then I think it took about maybe six months for networks to realise that this is actually a very specialised space and that their CTOs, their teams don't have either the expertise or the time to devote to creating something like this. Where Framers calls is that its integration is actually very, very easy. We have APIs for everything should clients require them.

It takes us a maximum of about three days to integrate. We are not a services model. So basically, it's an unboxed product that goes out to the client.

And there's no overlap with Brightcove because Brightcove has about 3,000 clients. They pitch us to those clients. Some of those clients actually don't want it to be routed through Brightcove to them.

They would like to work individually with us because that way they get our front end. If they go with Brightcove, it comes through Brightcove's front end. So it can be a little bit confusing for the user.

It just depends from network to network on what their preferences are. Also, some of the clients who we are working with, especially in the US and in Mexico, are actually not Brightcove clients. So we have managed to talk directly to quite a few big networks.

And like I said, we're running our trials with them as we speak. Any challenges while doing this one?

Vanita Kohli-Khandekar: Because video tools are a dime a dozen. And to explain how you're different can become, to my mind, at least from the outside, that to me seems like a big challenge. I'm sure there are others just curious on what the challenge has been.

And also, it's very clearly B2B. You do not intend to do any B2C.

Suparna Singh: Yes, we don't intend to do B2C. We are completely focused on B2B.

As far as our product roadmap goes, what we would like to do is consolidate the position we have here, here being the news and entertainment space. And we will certainly be getting into sports over the next year, year and a half. By that, I mean be live with sports within the year.

And that is because the sports market is becoming bigger and bigger. There are new sports streamers that are being set up. And actually, in sports, the requirement is at multiple levels.

You have the streamer, you have a league, you have individual teams, you have broadcast networks, and you have players who work with their marketing agencies. So there's a whole variety of potential use cases there. The sports market is terribly underserved.

There is really only one credible solution, maybe one or two new ones that are coming up that are well-established in that space. So we see a big opening there. Secondly, a lot of the clients that we work with currently, for example, Spanish clients, those networks also own sports channels, particularly soccer.

So they're interested in dealing with one vendor for multiple solutions. So that gives us a little bit of an opening to work with straight away. As far as challenges go, when we set up, I think companies were terrified of AI.

They didn't know what it was. They didn't know if it would work. And if they did understand both those things, then they were pretty confident that they would like to build this in-house, as I said earlier.

A year and a half down the line, I think people's expectations and understanding has matured. People are now hungry for a tool like this. As you know, many newsrooms have already begun integrating AI quite extensively.

And by the way, the ones that haven't, their journalists are using chat GPT anyway, for research and things like that. So basically, or whether it's perplexity AI or whatever. So to pretend that AI is not in newsrooms now is publishers fooling themselves, if that is what they choose to believe.

Our idea has always been to give content creators at an enterprise level, the lift that is required so that they're not stuck doing grunt work. They should not be doing the work that can be done by AI if it requires a lot of manual input for no reason. So far, there weren't tools that could do this.

Now there are. For example, in today's day and age, if somebody is manually transcribing content, or if they're sitting there and physically entering the captions that are required, then that is terribly inefficient. So the challenges at that time were, A, people were worried about AI.

Then it became, can we build something like this on our own? Now I think the challenge is how to basically keep up with the demand to the extent that I'm not trying to oversell Pharma by saying that we have so much business that that would be a happy problem to have. I think the challenge really is in people's expectations because they have gone from AI is going to be rubbish to, but why isn't AI perfect?

The answer to which is always, well, if it were you, me, and everybody else, we would not be required in this space anymore. So it's a little bit of managing people's expectations that yes, the user must always review the content. And the second thing is, Vanita, publishers sometimes, especially if they are large, do tend to expect some amount of customisation, right?

So that is a little bit of negotiation that always happens. And the happiest outcome in those situations is that anything that makes sense for our product roadmap, for the wider part of our universe, right, of clients, we are happy to incorporate. Something that feels too specific to just one network, we have to put on hold, or specific to just one client, I don't want to keep saying networks, then we do put that on hold.

So I think right now at inflexion point, this is one of the challenges that we are facing. But it's an off-the-shelf thing that you licence, I'm assuming, and it goes on to someone's screen. Yeah.

So basically what we do is we have a business model where we enter into an annual contract with a publisher. They sign up for a monthly commitment of a number of hours, right, per month. So basically we give them tier-based pricing as obviously as the tier gets bigger.

So for example, 500 hours is cheaper than 300 hours. And then we have, you know, therefore complete line of sight on what our revenue is going to be at any time. The publisher gets an alert if they are busting their monthly limit, for example.

So the pricing is per hour, and again, we operate via annual licences. And as far as pricing goes, it entirely depends on the volume that the creator or the publisher is signing up for.

Vanita Kohli-Khandekar: Suparna, you know, it's been what, 23? So we are looking at over two years almost from now. Any pivoting or any learning that has made you pivot in these two years?

Anything which struck out? You know, you must have started out with an initial blueprint. But typically, as it happens with most businesses, you know, there's a probable thing that a business plan doesn't survive the first rush with business.

So I'm just curious about if any pivoting or any learning happened because...

Suparna Singh: Sure. I think what we've found, and what we've learned, and this is a wonderful part of having supportive clients on board very early on, right, particularly the India Today group, which taught us so much from, you know, processing literally hours and hours of content a day, is that the amount of pure editing features that we had to include is actually very, very high, higher than what we thought we would be.

It was a challenge to build them, but it did require allocating a certain amount of resources and bandwidth to those features sooner than we had anticipated. It's good it served us well because, like I said, our goal is to be an end-to-end solution where the user never needs to toggle over to another tab to fix anything that is required. So I think that is one thing that we had underexplored, which was how many pure editing features would be required.

And I think the world is also moving increasingly towards a web-based editing system rather than the kind of standalone editing systems that have conventionally flourished, for example, Avid or Adobe, right. People want a lighter, easier to use model that can actually be operated by somebody who does not have technical skills. I cannot emphasise enough how much that has served us well.

That's really been a big selling point.

Vanita Kohli-Khandekar: What about this vertical video thing? Because I would have thought that would also be a big selling point because the ability to do this quickly must be something.

Suparna Singh: The requirement, in fact, is always for short video. The difference that we see is Indian publishers want short video in landscape and then as an add-on vertical, right. But Indian publishers tend to want vertical video to be about three minutes for monetization, for other reasons, right.

Internationally, publishers care much more about short video, possibly because of TikTok, everything that has stemmed from TikTok. You'll recall, Vanita, that about four or five years ago, companies that were considered, you know, the most serious publishers, for example, BBC, Washington Post, etc., were not really keen on TikTok, weren't exploring it. Now, all of those companies seem to have dedicated teams that are working only for TikTok.

So, the difference is Indian publishers are, I would say, equally focused on landscape and vertical. Internationally, the focus is much more on vertical and internationally, companies definitely want content under 90 seconds. We find Indian publishers still want content, short video, to be about three minutes long.

Vanita Kohli-Khandekar: I'm curious why that is so. Is it because of CTV consumption? I'm just curious.

Suparna Singh: I think a lot of it stems upon the challenges that Indian publishers have with advertising on very short content. But I think they recognise that the advantage is that you do this for branding and to connect with audiences and to disseminate your content to a wider audience.

Because younger people, and in fact, most of us now, right, at the end of the day, literally at the end of the day, at night, are watching content on our phones, not on TV screens. And so, I think that Indian publishers are very smart in recognising that even if there's less revenue coming in than they would like from vertical content, that is absolutely where their audience is and that is where their brand connect has to come from. So, as an example, when we create vertical video, right, the amount of development that we've had to do in terms of overlay work, right, without getting too technical, the kind of fonts that should be available, whether those fonts should be animated.

And vertical, by the way, is not just 9x16, as you know, it's 4x5. Facebook is a different requirement. So, Indian publishers are very aggressive about capturing that part of it for sure.

Vanita Kohli-Khandekar: I'm assuming that the social media giants, their video processing is in-house. They would have their own tools to do this. They would not use a camera.

But your clients will be more in the curated entertainment, sports, news, the professionally generated content creators. Is that correct to say, potential for your product lies more in those areas?

Suparna Singh: Absolutely. And, you know, so as an example, so yes, YouTube, you know, and the other companies may offer basic tools, right. But there are a couple of things and the reason I want to emphasise this is because we keep coming back to our B2B standing.

So, for example, our transcription accuracy, both within India and internationally, is ranked much higher than what, for example, YouTube generates as a free transcript. You know, when you log on to any YouTube video, you can click the transcript button and it shows you the transcript. Ours is actually refined.

It has been trained. We've done special training, for example, for the pronunciation of Indian proper nouns. So, our accuracy is much higher.

So, again, big tech does stuff at a broader level. We are actually servicing very deep vertical specific needs. Entertainment.

Now, Hoi Choi, you know, is a massive Bengali OTT, under-recognised in terms of its scale of content, right. I was shocked when I discovered what the strength of their archives is, for example. They are using us to create short video summaries under 90 seconds based around their characters, based around storylines, etc.

And again, they use these not just to serve their audience, but also as a marketing asset. So, all the logos, graphics, jackets that they want, templates, right, all that is done by us, automated through AI. So, this is what I mean when I say that short video is not just about content consumption or brand connection.

It is essential now as a marketing asset. We perform all those requirements.

Vanita Kohli-Khandekar: With all the good luck to you, I just worry about the deluge. I mean, even us recording this podcast is adding to the deluge, as I see it. But that's a different conversation, guys.

You mean in terms of the spamming of people with too much content? No, nobody's, you're not spamming. You choose to listen to a podcast.

But you know, the fact is immense amount of content is being created. Immense. Whether it's video, whether it's audio, whether it's published content.

So, as Prasoon Joshi said recently in some interview with me, we're getting indigestion because of the amount of content and we are consuming it.

Suparna Singh: So, Vanita, just on that, you know, I think there's a difference between older people and younger people. Younger audiences, I think, are very quick in terms of watching content. So, they'll do 60 seconds and then they're done, right?

Then they swipe to the next video. Again, and it's an excellent point you brought up. There are many reports and I'm sure you've seen them, right?

Where now, because of this deluge of content or because of the over-creation in some ways of content, how you capture the audience within the first 10 seconds of a video, for example, the hook that you generate, that is essential to getting people to watch you. So, that, for example, Frammer specialises in, right? We want to make sure that we grab the viewer straight away.

So, it's not just getting them to click on the video. Then you have to get their attention and then hold it so that the falloff rate is very low. So, all this comes down to, are there tools that can generate short video at scale?

Probably, right? Are there good B2B, high quality platforms that can do this? Much smaller ecosystem, right?

And then within that smaller ecosystem, who can do it with a high level of accuracy designed to generate engagement? Like I said, in the first 10 seconds, can you offer up a strong hook? Another example, when you're framing the video vertically, how accurate is the framing so that people's heads are not being chopped off, right?

And again, a point of differentiation between Indian publishers and foreign publishers, again, possibly because vertical video is so much more important internationally right now, publishers will ensure that we get their feed without Aston Super stickers, right? So, that when it's vertically framed, it looks terrific. Indian publishers still don't provide a clean feed.

So, they're okay to live with Astons and tickers being cut to the bottom, right? Again, slightly technical. All it means is that the quality of the vertical video, right, improves.

So, how publishers recognise this and then what they configure internally to be able to offer the best possible vertical video and quality to their audiences, that will determine whether one, they're rewarded by algos, two, whether they're treated as spam or as a high value proposition by audiences, particularly younger audiences.

Vanita Kohli-Khandekar: Fascinating. So, what you're telling me is, while it creates a deluge, it also helps the discoverability within the deluge for the companies which are competing within it.

Suparna Singh: Yeah, Vanita. So, not to, you know, labour the point, but for example, SEO rewards whoever puts out news fastest, right? So, it has to be a combination of who was out there fastest, whose metadata was the best, were the correct keywords used, was the content good, right?

So, all of this, if you're using something like Frammer to get your content out there quickly, but with high quality, you will have a higher chance of being discovered. And again, ultimately, publishers don't want discoverability just for Brand Connect, that has to be monetised. So, the point of Frammer is to increase volume with quality, lead to engagement, lead to revenue.

Vanita Kohli-Khandekar: So, last question, though it's too early to ask you revenue numbers and all, I just wanted to get a sense of the composition of your clients. I mean, where is the bulk of your business coming from now? If I was to say top three markets, what would they be?

Top markets for us right now are US, Mexico, and then India.

Suparna Singh: This is in terms of top line or a number of clients or? In terms of number of clients. As far as top line goes, I think that, you know, the next four months will actually determine that because like I said, right now, we're in the process of converting some international clients.

Spanish channels, as we know, have so much daily fresh content, right? It's much more than even an Indian news channels. And the other thing, of course, with Spanish networks, for example, Vanita, is their archives are massive.

Telenovelas that are sitting in their archives that they want to be able to offer up to new audiences, again, by creating them in vertical video. Now, it's fascinating that some Spanish networks are now shooting and offering soap operas completely in vertical video from scratch. They just don't shoot them in landscape.

And this is something we realised over the last four or five months. So we're trying to see how to integrate with content like that. So I think there's a combination of the attention economy that has led to the proliferation of vertical video and then what AI is enabling that vertical video to do.

I don't think that in the last 15 years, Vanita, we've seen a change like this in video production or in consumption. You tell me if I'm exaggerating the point.

Vanita Kohli-Khandekar: I am so blank on video editing, it's not funny. So I'm not even going to debate this point.

Suparna Singh: At least on editing, you know, it's more about how video is consumed now. Like I think it's a dramatic shift. And partly my point is that there is a change in audience habits, but it is also the fact that AI now allows publishers to serve those audience habits at a scale and at a pace that was not possible earlier because it required human beings.

Absolutely.

Vanita Kohli-Khandekar: It's even in sound editing or in filming itself, the amount of technology that has made things easier. You know, earlier, I don't know if you remember, it used to take 24 hours at times to record some of the most famous songs we've seen in Hindi cinema. Now, in a few minutes for that.

Suparna Singh: But you look at auto dubbing, you know, for example, I think one of the prime minister's recent speeches was auto dubbed and released in multiple Indian languages. You do auto dubbing, you do lip syncing. So suddenly the same piece of content, as you know, whether it's a Hindi movie song or, you know, a Tamil movie, everything can be converted into multiple languages with perfect lip sync in a matter of days.

Thank you so much for talking to us and loads of good luck to you. I hope it does very well and you go global with a big bang. Thank you so much, Vanita.

I really appreciate you making the time for this.

Vanita Kohli-Khandekar: And thank you for all your insights, always.

—

Vanita Kohli-Khandekar (Host): News, sports, entertainment have different key moments, need different kinds of short videos and highlights that can be created. Even within sports, say cricket, kabaddi or rugby all have different rules that the AI needs to be trained in. And therefore, I find Frama fascinating for the amount of drudgery it takes out of video editing and its ability to connect to the touch points that take those videos to the market.

The second firm is Mandar Natekar, Subhashish Saha and Anjan Banerjee's Neural Garage. It offers a visual dub technology, which is a video based generative AI technology that creates studio quality lip sync. You know, it sounds very tech tech, but essentially it means that when Shahrukh Khan speaks in Marathi or Malayalam, it will seem like he's speaking in that language.

His facial movements and expressions and what he says will be in sync with what is coming out of his mouth. Over to Mandar Natekar, co-founder and CEO of Neural Garage to tell you more.

—

Vanita Kohli-Khandekar: Hi Mandar, thank you for joining us on the media room.

Mandar, I'm very curious because the first time I heard about Neural Garage was of all the people from Ashish Fairwani two years back. And he told me that watch out for this company. And I said, okay, I will watch out.

And then when I've seen, I've had a chat with you last year also on what the visual dub tool that you have. I want you to first take my listeners through what exactly Neural Garage does, what is your big product? And also remember Mandar, the last time we spoke, you didn't have enough big examples.

I want you to bring that into my listeners this time.

Mandar Natekar: Sure, sure. So Neural Garage is a deep tech startup focused on generative AI technology. We've built our proprietary model visual dub that is meant for the global entertainment media and content industry.

So for example, you've seen, especially in the last five years after COVID, content localisation has hit a large inflexion point across multiple streaming platforms and even theatrical releases, broadcast networks, and literally localisation has become a way to acquire new markets and audiences. But what happens is localisation happens through dubbing and dubbing by itself is broken because audio and visual don't match. So for example, imagine you're watching a Korean film, but with English audio, your audio input is English, but visually the film still looks Korean because the actors are still speaking in Korean, right?

Now as lovers of cinema and as followers of technology, we fight what if we could solve this problem at scale? What if now a Korean film with English audio could actually look like it has been filmed in English itself and how incredible it would be to be able to consume content that is not native to you when filmed, but becomes native for you when localised. So that really was a magic pill that we took and the idea for visual dub came for that.

So visual dub name also represents visually dubbing content. We've been building this technology out that meets the exacting standards of the industry and I'm very happy to say that now we are ready to meet global scale. Are you rolled out last year or before that?

We actually rolled out last year and we rolled it out for films, but on a shorter duration which is ad films. We felt that whilst our North Star was films and the big content on streaming platforms, we felt that we have to test the waters with something that offers the same complexity but a smaller duration and that is when ad films happen. And we know for example in India almost all ad films are shot in Hindi, then they're dubbed in multiple languages and for regional markets.

So it still becomes a case of dubbed films for regional consumers, right? So that is a problem that we set out to solve and some of the largest brands in the country now use our technology for their ad films and now that we've kind of acquired all relevant learning, we decided to put this technology out for films and currently we've been tested by very large studios like Prime Focus, Double Negative, by Yash Raj, by Excel Entertainment and we did our first film project for Dharma Dharma Productions on a film called Kesari Chapter 2 that just released. So our journey towards the big theatrical production has just started.

Vanita Kohli-Khandekar: Can you tell me about some of the ad films? I think you did the ultra tech one with Shah Rukh speaking in various ways. So just you know and can you also share with my listeners, I know this because I've seen your demo reels etc.

But just take my listeners through one ad or one film and the whole point is that the lip-sync is in complete match with what the, I saw the Tom Cruise clip actually from Mission Impossible, you know the last one, Dead Reckonings part one and just share with my listeners what exactly, because we are not here, we are on an audio medium so it could just bring it alive for them.

Mandar Natekar: I'll run you through the example of Ultra Tech Cements. So basically they had shot for a film with Mr. Khan and that film was shot in Hindi and then it ran on all Hindi networks but obviously beyond a point of time they wanted to extend the same film across English networks and a Hindi film on English networks would not look so good right, it wouldn't look congruent. So they decided to get it dubbed in Mr. Khan's voice in English and now that the dubbed version was ready, visual dubber technology was used to make it look as if it was filmed in English itself. So now in the same production shoot, now you have the capability thanks to visual dub of actually filming in two different languages at the same cost and that film was then run on all English networks to great success and we had a great time working with Ultra Tech Cements on that and it was received quite fabulously.

Vanita Kohli-Khandekar: There are cost implications for studios or advertising companies when you do this. Any examples you can share or any sort of average numbers on what this means, for example say Kalki is shot in two languages or three I'm not sure but when a film is shot or ad film is shot in two languages or three languages, it's much more expensive than when you dub it accurately in those. So any numbers there on cost implications, what does it cost to use visual dub and what does it save you in costs?

Mandar Natekar: Okay so let me first, I mean let's assume that there's a film, let's say Kalki which was shot in Hindi and also in Telugu, it was shot in Telugu and Hindi both. Now imagine the production cost in literal terms doubles because it is the cost of actors, the cost of talent, character actors, the cost of production, cost of equipment, cost of location, cost of permissions, everything is into two because it's a factor of time. But I see a future where for example if you're shooting a multilingual, you don't need to shoot twice, all you need to do is shoot in one language which you are most comfortable with and then in post-production using visual dub, you can now sync the whole film to a new language and to many more.

The cost for example right now since we are putting the cost out but I would assume that as technology becomes more accessible and adoption is driven, it should cost not more than 50 to 60 lakh rupees per language which is nothing when you compare it to a production cost of shooting an entire film.

Vanita Kohli-Khandekar: You're talking about 50 to 60 lakh for a three hour film. Yes that's our intention. Is this technology being deployed only in India or are you finding clients elsewhere in the world also?

Mandar Natekar: So what has happened is Vanita you know typically the world largely believes that technology which has been built with so much of finesse usually gets built in the US. So a lot of people for example the work that we executed even for Dharma, Dharma actually went and met with a couple of other folks in the US and then they reached out to us because they felt that they could not work on their scenes. So we have built our technology for a global market and you know after our recent win at South by Southwest which is one of the biggest festivals that happens in Austin, we have started getting queries from Universal, from Paramount and from some other large studios, a very large studio based in Singapore called Space Station.

They have all reached out to us organically because there is a massive demand for this technology. Once we start scaling up is when we should be able to serve other markets too. I mean our ambition is global.

Vanita Kohli-Khandekar: Some sense on the investment that has come in so far, the sizing of the market and when do you expect the business to sort of start breaking even? I'm assuming it will be a medium to long-term thing.

Mandar Natekar: So we have currently in October 22, we raised a round of a million and a half USD from investors, one of them being a very big VC out of Bangalore called Xfinity Ventures and currently we're raising another round of 7 million to scale. If you ask me right now basis our business that comes in and current cost, we are probably a year away from breakeven at our current scale but our objective is to drive global scale after utilising the new round that we'll raise. In my opinion in next three to four years, we should be a roughly 100 million dollars revenue company and possibly at 80% margins of profitability.

80%? Yes, yes.

Vanita Kohli-Khandekar: Why are investors not queuing to your door? Is that kind of marching here?

Mandar Natekar: They are, they are but Vanita what happens is typically investors in India, they don't understand the entertainment industry. They understand D2C, they understand self-services and products. Understanding entertainment is difficult for some people but you know our work shows for itself.

So we do have a lineup of investors who are interested but we're looking for the right investor who can be a strategic investor. So we are very chilled. The kind of pipeline we have of people interested in working with us gives us a lot of comfort and confidence so that we can wait this out till we find the right partners.

Vanita Kohli-Khandekar: Super. I'm always chuffed about Media Tech Ventures which come out of India. Just two questions more.

One is you know what kind of challenges or barriers this you know brings and also have you had to pivot at any point and say no this won't happen, this should be done this way. You know has that happened? Any learning which has forced you to make a strategic pivot and also challenges?

Mandar Natekar: Okay so I'll come to the pivot first. So for example when we started off our technology was focused only on the lips area right but when we met roughly about 100 people in the industry including directors, producers, studios, we realised that the entire face is important not just the lips because what is important is preservation of resolution, no pixelation, very natural output and 100% quality retention. So we realised that this was very very important if we have to meet the industry standards and hence we completely pivoted our research.

I don't know if I have mentioned that this technology is completely proprietary to us including the research behind in the R&D. We have also filed for multiple patents in India, US and Europe. So we had to do a complete research pivot to be able to deliver these four things for the industry.

One complete preservation of resolution, complete transformation of face from all the way from eyes to the neck. Third part was zero pixelation on transformations. Fourth was EXR compatibility so that you can plug in into any post-production software and last but not the least is irrespective of screen size there should be no visible noticeability on any restructuring done on the face.

This is something that we have been able to build very very strategically.

Vanita Kohli-Khandekar: Were there any issues with studios or even with ad production firms on the copyright thing? Because when you take Tom Cruise's face and mess around with his or even Akshay Kumar in Kesari Chapter 2 or anybody, you take it and you're messing with the facial expression or with the acting per se. Has there ever been any pushback on that side?

I was curious about that.

Mandar Natekar: Look we've worked off with tonnes of people. I mean the current Dream Eleven films that have Aamir Khan and Rohit Sharma, we've actually deployed our voice models for both Aamir Khan and Rohit Sharma. So still we are not a B2C platform.

We are a complete B2B platform. We deliver the tech to the best suitcases for films. The brands and the directors take the relative permissions and the fact that our work goes on television on air means that there is some level of comfort and acceptance to our technology.

In fact recently, I will not take this name but he is India's biggest cricketer. He is currently shooting for an ad film and there is some generative work that is expected out of that campaign. When they reached out to us, the agency, they said look the celebrities team wants your creds.

So when they shared our creds, they immediately gave a go at saying no we know these folks. They have done credible work with other big stars. So we don't have a problem working with their technology.

So now we are getting a reputation of somebody who delivers great quality. So that really helps for us.

Vanita Kohli-Khandekar: Well that reminds me, how long does the process take? Let's say for a film or an ad film or you know depending on the length of the piece of programming, how long does the process take?

Mandar Natekar: So for example, let's say the film is one minute long and it has been dubbed in five languages. We should be able to turn everything around including feedback in less than three hours. Oh that's good.

So for a three hour film it would be? See three hour film would take longer because films have multiple characters at the same time and then there are very complex scenes. In our opinion, we should be able to turn around one full feature film in one language in six weeks at the current moment.

Vanita Kohli-Khandekar: Okay we missed that question on challenges or barriers to your operations whether it's in India or globally anything you'd like to add there?

Mandar Natekar: I wouldn't say barriers. The challenge is being able to access other markets fast. So us being from India, you know how it is right?

Generally India is known for its products and services. India is not really known for original technology. People generally get very surprised and then you know the technology we're building which is for films requires very high level of output right?

So they get a little surprised at how are we able to do it. So the challenges of the India brand but I think once we are out and once we have more people using it, I think that should not be a problem because now we get recognised and especially after the South by Southwest win which was a global competition that we won against very established global players. I think it's a matter of time before we get acceptance but I don't see it as a challenge.

I just see as an outcome of time.

Vanita Kohli-Khandekar: Yeah but that's true for any business. I mean whether you're selling cars or selling a piece of technology or a product, anybody has to understand what you're doing, learn to trust you. So all of that takes time.

Last question, any big ones coming from your side this year which we can look out for and we can spot the visual dub name there?

Mandar Natekar: I think we are talking to three large studios Marita and we are under tight ND is there but as soon as we kind of contract up, you'll be the first one to know and you know I would love to have the pleasure of telling it to you like as soon as we are in a very close discussion with one of the top studios in the country for a big blockbuster movie coming on the 14th of August. I mean this much I can share.

Vanita Kohli-Khandekar: All I have to do is google to find out. Thank you so much Mandar for joining me. This was such fun because I think your technology is also fun.

Mandar Natekar: We have a lot of fun. I'm the oldest member in the team. I'm 48 years old.

My co-founders are 32 and the average age in my team is in mid-20s. So we have a lot of fun at work. I mean we are one of the coolest startups in the country and we want to maintain that same culture and vibe.

Working with young people keeps me young so I'm not complaining.

Vanita Kohli-Khandekar: Very good. Excellent. Loads of good luck.

I hope you get to do all the big studio films in Hollywood, in India, in Korea, wherever.

Mandar Natekar: Thank you very much. Thank you.

Mandar Natekar: From your lips to God's ears. There are many such media tech firms coming up in India.

---

Vanita Kohli-Khandekar (Host): They combine the best elements of India's tech prowess with its deep understanding of the content business. Remember we remain the world's second largest TV market. We are its largest producer of films and a huge consumer of video and incidentally we are also the second largest consumer of social media.

So here's wishing all of these media tech firms coming out of India loads of good luck.

Author: The Core Team

Today's episode features Frammer and Neural Garage

Tags: