Spaces:

suredream
/

cobralist

Running

cobralist / files /ycombinator_X6ndbpY2TYI.txt

Jun Xiong

3378e00 4 months ago

55.6 kB


	00:00
	00:00
	today we have Adam coats here for an interview Adam you run the AI lab at Baidu
	今天我们有 Adam Coats 接受采访 Adam 你在百度负责人工智能实验室
	in Silicon Valley could you just give us a quick intro and explain what Baidu is
	在硅谷，您能给我们简单介绍一下并解释一下什么是百度吗？
	for people who don't know yeah so Baidu is actually the largest search engine in
	对于那些不知道的人来说，是的，百度实际上是最大的搜索引擎
	China so it turns out the internet ecosystem in China is incredibly dynamic
	中国事实证明中国的互联网生态系统非常活跃
	environment and so Baidu I think sort of turned out to be an early technology
	环境等等，我认为百度是一种早期的技术
	leader and really established itself in PC search but then also has sort of
	领导者并真正在 PC 搜索领域确立了自己的地位，但随后也有一些
	remade itself in the mobile revolution and increasingly today is becoming an AI
	在移动革命中重塑自我，如今越来越多地成为人工智能
	company recognizing the value of AI for a whole bunch of different applications
	公司认识到人工智能对于各种不同应用的价值
	not just search okay and so yet what do you do exactly
	不仅仅是搜索，那么你具体做什么
	so I'm the director of the Silicon Valley AI lab which is one of four labs
	所以我是硅谷人工智能实验室的主任，这是四个实验室之一
	within Baidu research so especially is Baidu is becoming an AI
	在百度研究中，尤其是百度正在成为人工智能
	company the need for a team to sort of be on the bleeding edge and understand
	公司需要一个团队处于前沿并理解

	00:01
	00:01
	all of the current research be able to do a lot of basic research ourselves but
	目前所有的研究我们自己都能够做很多基础研究但是
	also figure out how we can translate that into business and product impact
	还弄清楚我们如何将其转化为业务和产品影响
	for the company that's increasingly critical so that's what Baidu research
	对于越来越重要的公司来说，这就是百度研究的内容
	is here for and the AI lab in particular we kind of founded recognizing how
	是为了人工智能实验室，特别是我们建立的人工智能实验室，认识到如何
	extreme this problem was about to get so I think the deep learning research and
	这个问题即将变得极端，所以我认为深度学习研究和
	AI research right now is flying forward so rapidly that the need for teams to be
	目前人工智能研究飞速发展，团队需要
	able to both understand that research but also quickly translate it into
	既能够理解该研究，又能快速将其转化为
	something that businesses and products can use is more critical than ever so we
	企业和产品可以使用的东西比以往任何时候都更加重要，因此我们
	founded the AI lab to try to close that gap and help the company move faster and
	成立了人工智能实验室，试图缩小这一差距并帮助公司更快地发展
	so then how do you break up your time in between like doing basic research for
	那么你如何分配你的时间，比如做基础研究
	around AI and actually implementing like
	围绕人工智能并实际实施
	bringing it forward to a product there's no hard and fast rule to this I think
	我认为将其转化为产品没有硬性规定

	00:02
	00:02
	one of the things that we try to to repeat to ourselves every day is that
	我们每天试图对自己重复的事情之一是
	we're mission oriented so the mission of the AI lab is is precisely to create AI
	我们以使命为导向，因此人工智能实验室的使命正是创造人工智能
	technologies that can have a significant impact on at least 100 million people
	可以对至少 1 亿人产生重大影响的技术
	we chose this to sort of keep bringing ourselves back to to the sort of final
	我们选择这个是为了让自己回到最终的状态
	goal that we want all the research we do to ultimately ends up in the hands of
	我们希望我们所做的所有研究最终都落到
	users and so sometimes that means that we spot something that that needs to
	用户，所以有时这意味着我们发现一些需要
	happen in the world to really change technology for the better and to help I
	世界上发生的事情真正使技术变得更好并帮助我
	do but no one knows how to solve it and there's a basic research problem there
	做了，但没有人知道如何解决，并且存在一个基础研究问题
	that someone has to tackle and so will will sort of go back to our visionary
	有人必须解决这个问题，所以会回到我们的远见卓识
	stance and think about the long term and invest in research and then as we have
	立场并思考长期并投资于研究，然后就像我们一样
	success there we shift back to to the other foot and take responsibility for
	在那里取得成功，我们回到另一只脚并承担责任
	carrying all of that to a real application and making sure we don't
	将所有这些带到真正的应用程序中并确保我们不会

	00:03
	00:03
	just solve the 90% that you might put in say your research paper but we also
	只需解决您可能放入研究论文中的 90%，但我们也
	solve the last the last mile we get to the 99.9 percent so maybe maybe the best
	解决最后一英里，我们达到 99.9%，所以也许是最好的
	way to do this then is to just explain like something that's started with
	那么做到这一点的方法就是像开头那样解释
	research here and how that's been brought on to like a full on product
	在这里进行研究以及如何将其变成完整的产品
	that exists so I'll give you an example we we've spent a ton of time on speech
	那是存在的，所以我给你举个例子，我们在演讲上花了很多时间
	recognition so speech recognition you years ago as one of these technologies
	识别所以语音识别你几年前就作为这些技术之一
	that always felt pretty good but not good enough and so traditionally speech
	总是感觉不错，但还不够好，所以传统的演讲
	recognition systems have been heavily optimized for things like mobile search
	识别系统已针对移动搜索等进行了大幅优化
	so if you hold your phone up close to your mouth
	所以如果你把手机靠近嘴
	and you say a short area you made non-human voice exactly the systems
	你说你在系统中发出了非人类声音的一小段区域
	could figure it out and they're getting quite good I think you know the speech
	能弄清楚并且他们做得很好我想你知道这个演讲
	engine that we've built it by do called deep speech it's actually super human
	我们建造的引擎叫做深度语音，它实际上是超级人类
	for these short queries because you have
	对于这些简短的查询，因为你有

	00:04
	00:04
	no context people can have thick accents so that speech engine actually started
	没有上下文的人可以有浓重的口音，以便语音引擎真正启动
	out as a basic research project we looked at this problem we said gosh what
	作为一个基础研究项目，我们研究了这个问题，我们说天哪
	would happen if speech recognition were human level for every product you ever
	如果您所使用的每一款产品的语音识别都达到人类水平，就会发生这种情况
	used so whether you're in your home or in your car or you pick up your phone
	无论您是在家里、在车里还是拿起手机，都可以使用
	whether you hold your phone up close or hold it away if I'm in the kitchen and
	如果我在厨房，你是否将手机靠近或拿开
	my toddler is you know yelling at me can I still use a speech interface
	我的孩子对我大喊大叫，我还能使用语音界面吗
	could it work as well as a human being understands us and so then how do you do
	它能像人类理解我们一样有效吗？那么你该怎么做？
	that what is the basic research that moved it forward to put it in a place
	是什么基础研究推动了它的发展并把它放在一个地方
	that it's useful so we have the hypothesis that maybe the thing holding
	它是有用的，所以我们假设可能持有的东西
	back a lot of the progress in speech is actually just scale maybe if we took
	言语上的很多进步实际上只是规模，也许如果我们采取
	some of the same basic ideas we could see in the research literature already
	我们已经在研究文献中看到了一些相同的基本想法
	and scaled them way up put in a lot more data invested a lot of time in solving
	并扩大规模，投入更多数据，投入大量时间来解决问题
	computational problems and built a much larger neural network than anyone had
	计算问题并建立了比任何人都大得多的神经网络
	been building before for this problem we
	我们之前一直在为这个问题构建

	00:05
	00:05
	could just get better performance and lo and behold with with a lot of effort we
	可以得到更好的表现，你瞧，我们付出了很多努力
	ended up with this pretty amazing speech recognition model like I said in
	最终得到了这个非常惊人的语音识别模型，就像我在
	Mandarin at least is actually super human you can actually sit there and
	普通话至少实际上是超级人类，你实际上可以坐在那里
	listen to a voice query that someone is trying out and you'll have native
	聆听某人正在尝试的语音查询，您将获得本机语音查询
	speakers sitting around debating with each other wondering what the heck the
	演讲者围坐在一起争论，想知道到底是什么
	person is saying Wow and then the speech
	人们说哇，然后演讲
	engine will give an answer and everybody goes oh that's what it was because it's
	引擎会给出答案，每个人都会说哦，就是这样，因为它是
	just such a thick accent from perhaps someone in rural China how much how much
	这么浓重的口音也许是来自中国农村的人多少多少
	data do you have to give it to train it you know to train it on a new line
	您是否必须提供数据来训练它您知道要在新线路上训练它
	because I think on the site I saw it was English and Mandarin yeah like if I
	因为我想在网站上我看到的是英语和普通话是的，就像我
	wanted German how much would I have to give it so one of the big challenges for
	想要德语，我需要付出多少，所以这是我面临的最大挑战之一
	these things is that they need a ton of data so our English system uses like 10
	这些事情是他们需要大量的数据，所以我们的英语系统使用大约 10
	to 20,000 hours of audio the Mandarin systems are using even more for four-top
	普通话系统使用的音频时间甚至超过 20,000 小时，用于四顶

	00:06
	00:06
	and products so this certainly means that the technologies at a state where
	和产品，所以这当然意味着技术处于这样的状态
	to get that superhuman performance you've got to really care about it so so
	为了获得超人的表现，你必须真正关心它
	for Baidu voice search maps things like that that our flagship products we can
	对于百度语音搜索地图之类的东西我们的旗舰产品我们可以
	put in the capital and the effort to do that but it's also one of the exciting
	投入资本和努力来做到这一点，但这也是令人兴奋的事情之一
	things going forward in the basic research that we think about is how do
	我们思考的基础研究未来的事情是如何做
	we get around that how can we develop machine learning systems that get you
	我们解决了如何开发机器学习系统来帮助您解决这个问题
	human performance on every product and do it with a lot less data so what I was
	人类在每种产品上的表现，并且用更少的数据来做到这一点，所以我是这样的
	wondering then like did you see that Lyrebird thing that was floating around
	想知道你有没有看到那个漂浮在周围的琴鸟
	the event this week okay they claim that they don't need all that much time all
	本周的活动还好，他们声称他们不需要那么多时间
	that much data audio data to emulate your voice or similar
	那么多数据音频数据来模拟你的声音或类似的声音
	whatever they call you guys have a similar project going on right that's
	不管他们怎么称呼你们，你们都有一个类似的项目正在进行，那就是
	right yeah we're working on Texas why can they achieve that with less data I
	是的，我们正在德克萨斯州工作，为什么他们可以用更少的数据实现这一目标？
	think the the technical challenge behind all of this is there's sort of two
	我认为这一切背后的技术挑战有两个
	things that we can do one is to try to share data across many applications so
	我们可以做的一件事就是尝试在许多应用程序之间共享数据，以便

	00:07
	00:07
	to take text-to-speech is one example if I learn to mimic lots of different
	如果我学会模仿许多不同的语言，那么将文本转语音就是一个例子
	voices and then you give me the 1000 and first voice you hope that the first
	声音，然后你给我 1000 个声音，你希望第一个
	thousand taught you virtually everything
	千教你几乎一切
	you need to know about language and that
	你需要了解语言
	what's left is really some idiosyncratic change that you could learn from very
	剩下的确实是一些特殊的变化，你可以从中学习
	little data so that's one possibility the other side of it is that a lot of
	数据很少，所以这是一种可能性，另一方面是很多
	these systems this is much more important for things like speech
	这些系统对于语音等事物来说更为重要
	recognition that we were talking about is we want to move from using supervised
	我们所讨论的认识是我们希望不再使用监督
	learning where a human being has to give you the correct answer in order for you
	了解人们必须在哪里给你正确的答案才能为你服务
	to train your neural network but move to
	训练你的神经网络，但转向
	unsupervised learning where I could just
	无监督学习，我可以
	give you a lot of raw audio and have you learn the mechanics of speech before I
	在我之前给你很多原始音频并让你学习语音机制
	ask you to learn a new language and hopefully that can also bring down the
	要求你学习一门新语言，希望这也能降低
	amount of data that we need and so then on the technical side like could you
	我们需要的数据量，那么在技术方面，你可以吗
	give us just a yeah somewhat of an overview of how that actually works like
	让我们大致了解一下它的实际工作原理

	00:08
	00:08
	how how do you process a voice for text-to-speech let's do both actually
	如何处理文本到语音的语音让我们实际执行这两个操作
	because I'm super interested right so closely let you start with yeah let's
	因为我非常感兴趣，所以非常密切让你从“是的，让我们”开始
	start with speech recognition before we go and train a speech system what we
	在我们开始训练语音系统之前，先从语音识别开始
	have to do is collect a whole bunch of audio clips so for example if we wanted
	要做的就是收集一大堆音频剪辑，例如如果我们想要
	to build a new voice search engine I would need to get lots of examples of
	要构建一个新的语音搜索引擎，我需要获得很多示例
	people speaking to me giving me little voice queries and then I actually need
	人们对我说话时很少向我询问语音问题，然后我实际上需要
	human annotators or I need some kind of system that can give me ground truth
	人类注释者或者我需要某种可以给我基本事实的系统
	that can tell me for a given audio clip what was the correct transcription and
	它可以告诉我对于给定的音频剪辑，正确的转录是什么
	so once you've done that you can ask a deep learning algorithm to learn the
	所以一旦你完成了，你就可以要求深度学习算法来学习
	function that predicts the correct text transcript from the audio clip so
	从音频剪辑中预测正确文本转录的函数
	this is this is called supervised learning
	这就是所谓的监督学习
	it's an incredibly successful framework we're really good with with this for
	这是一个非常成功的框架，我们对此非常擅长

	00:09
	00:09
	lots of different applications but the big challenge there is those labels that
	有很多不同的应用程序，但最大的挑战是那些标签
	someone has to be able to sit there and give you say ten thousand hours worth of
	必须有人能够坐在那里给你说一万个小时的时间
	labels which can be really expensive so how does it actually recognize what is a
	标签可能非常昂贵，那么它如何真正识别什么是
	software doing to recognize the intonation of the word well
	软件可以很好地识别单词的语调
	traditionally what you would have to do is break these problems down into lots
	传统上你要做的就是把这些问题分解成很多
	of different stages so I as a speech recognition expert would sit down and I
	不同的阶段，所以我作为语音识别专家会坐下来，我
	would think a lot about what are the mechanics of this language so for for
	会思考很多关于这种语言的机制是什么，所以对于
	Chinese you would have to think about tonality and how to break up all the
	在中文中，你必须考虑调性以及如何分解所有的音调。
	different sounds into some intermediate representation and then you would need
	不同的声音转化为某种中间表示，然后你需要
	some sophisticated piece of software we called a decoder that goes through and
	一些复杂的软件，我们称之为解码器，它可以通过
	tries to map that sequence of sounds to possible words that it might represent
	尝试将声音序列映射到它可能代表的可能单词
	and so you have all these different pieces and you'd have to engineer each
	所以你有所有这些不同的部分，你必须对每个部分进行设计

	00:10
	00:10
	one often with its own expert knowledge but deep speech and all of the new deep
	一个人往往拥有自己的专业知识，但演讲深刻，并且拥有所有新的深刻见解
	learning systems we're seeing now try to solve this in one fell swoop so the
	我们现在看到的学习系统试图一举解决这个问题，所以
	really the answer to your question is kind of the vacuous one which is that
	事实上，你的问题的答案有点空洞，那就是
	once you give me the audio clips and the characters that it needs to output a
	一旦你给我音频剪辑和输出所需的字符
	deep learning algorithm can actually just learn to predict those characters
	深度学习算法实际上可以学习预测这些字符
	directly and in the past it always looked like there was some fundamental
	直接地，在过去，看起来总是有一些基本的东西
	problem that maybe we could never escape this need for these hand engineered
	问题是我们可能永远无法逃避对这些手工设计的需求
	representations but it turns out that once you have enough data all of those
	但事实证明，一旦你有了足够的数据，所有这些
	things go away and so where where did your data come from like 10,000 hours of
	一切都消失了，那么你的数据从哪里来，比如 10,000 小时的数据？
	audio we actually do a lot of clever tricks in English where we don't have a
	实际上，我们用英语做了很多巧妙的技巧，但我们没有
	lot of a large number of English language products so for example it
	很多大量的英语产品，例如
	turns out that if you go onto say a crowdsourcing service you can hire
	事实证明，如果你继续说众包服务，你可以雇用
	people very cheaply to just read books to you and
	人们非常便宜地只是读书给你听

	00:11
	00:11
	it it's not the same as the kinds of audio that we hear in real applications
	它与我们在实际应用中听到的音频类型不同
	but it's enough to teach a speech system all about you know liaisons between
	但这足以教授一个语音系统所有关于你知道之间的联系
	words and you get some speaker variation and you hear strange vocabulary where
	单词，你会得到一些说话者的变化，你会听到奇怪的词汇
	English spelling is totally ridiculous and in the past you would hand engineer
	英语拼写完全是荒谬的，在过去你会手工设计
	these things you'd say well I've never heard that word before so I'm going to
	这些事情你会说得很好，我以前从未听过这个词，所以我要
	bake the pronunciation into my speech engine but now it's all data driven so
	将发音烘焙到我的语音引擎中，但现在都是数据驱动的，所以
	if I hear enough of these unusual words you see these neural networks actually
	如果我听够了这些不寻常的词，你实际上会看到这些神经网络
	learn to spell on their own even considering all the weird exceptions of
	即使考虑到所有奇怪的例外，也要学会自己拼写
	English interesting and you have the input right because I've heard of people
	英语很有趣，你的输入是正确的，因为我听说过有人
	doing it with like a YouTube video but then you need a caption as well with the
	就像 YouTube 视频一样，但是你还需要一个标题
	audio so it's twice as much if not more work interesting and so then what about
	音频，所以它的两倍，如果不是更多的工作有趣，那么呢
	the other way around how does that work on the technical side right so that's
	反过来说，这在技术方面是如何运作的，所以这就是
	one of the really kind of cool parts of deep learning right now is that a lot of
	目前深度学习最酷的部分之一是，很多

	00:12
	00:12
	these insights about what works in one domain keep transferring to other
	这些关于在一个领域有效的见解不断转移到其他领域
	domains so with text-to-speech you could see a lot of the same practices so you
	领域，因此通过文本转语音，您可以看到很多相同的做法，因此您
	would see that a lot of systems were hand engineered combinations of many
	会看到很多系统都是手工设计的许多系统的组合
	different modules and each module would have its own set of machine learning
	不同的模块，每个模块都有自己的一套机器学习
	algorithms with its own little tricks and so one of the things that our team
	算法有自己的小技巧，所以我们团队所做的事情之一
	did recently with a piece of work that we're calling deep voice was to just ask
	最近做的一项我们称之为“深声”的工作就是问
	what if I rewrote all of those modules using deep learning for every single one
	如果我对每个模块都使用深度学习重写所有这些模块会怎样？
	to not put them all together just yet but even just ask can deep learning
	暂时还没有把它们全部放在一起，但即使只是问一下深度学习也可以
	solve all of these adequately to to get a good speech system interrupt the
	充分解决所有这些问题以获得良好的语音系统
	answer is yes that you can basically abandon most of this specialized
	答案是肯定的，你基本上可以放弃大部分这个专业
	knowledge in order to to build all of the subsequent modules and in more
	知识，以便构建所有后续模块以及更多内容
	recent research that's in the deep learning community is seeing that of
	深度学习社区最近的研究发现
	course everyone is now figuring out how to make these things work end to end
	当然，每个人现在都在弄清楚如何使这些东西端到端地工作

	00:13
	00:13
	they're all data driven and that's the same story we saw for for deep speech so
	它们都是数据驱动的，这与我们在深度演讲中看到的故事是一样的，所以
	we're really excited about that that's a while and so do you have a team just
	我们对此感到非常兴奋，那么你们有一个团队吗？
	dedicated to parsing like research coming out of universities and then figuring
	致力于解析来自大学的研究，然后计算
	how to apply it are you testing everything that comes out it's a bit of
	如何应用它你测试了所有出来的东西吗？
	a mix is definitely our role to not only think about AI research but to think
	混合绝对是我们的角色，不仅要考虑人工智能研究，还要思考
	about AI products and how to get these things to impact I think there is
	关于人工智能产品以及如何让这些东西产生影响，我认为有
	clearly so much a I research happening that it's impossible to to look through
	显然，我的研究发生了太多，以至于无法查看
	everything but one of the big challenges right now is to not just digest
	除了目前最大的挑战之一之外，一切都不仅仅是消化
	everything but to identify the things that are truly important so what's like
	除了确定真正重要的事情之外，什么都可以
	a looks like a ninety million person product that's a sure like element well
	看起来像一个九千万人的产品，这肯定是一个相似的元素
	the speech recognition we chose because we felt in aggregate it had that
	我们选择语音识别是因为我们总体感觉它具有以下特点
	potential so as we have the next wave of AI products I think we're going to move
	潜力，因此当我们拥有下一波人工智能产品时，我认为我们将会采取行动

	00:14
	00:14
	from these sort of bolted on AI features to really immersive AI products so if
	从这些附加的人工智能功能到真正身临其境的人工智能产品，所以如果
	you look at how keyboards were designed you know a few years ago for for your
	你看看几年前你知道的键盘是如何设计的
	phone you see that everybody just bolted
	打电话你看到每个人都逃跑了
	on a microphone and they hooked it up to their speech API and then that was fine
	他们将其连接到他们的语音 API，然后就可以了
	for for that level of technology but as the technology is getting better and
	对于那种技术水平，但随着技术变得越来越好
	better we can now start putting speech up front we can actually build a voice
	更好的是，我们现在可以开始将语音放在前面，我们实际上可以构建一个声音
	first keyboard so it's actually something we've been prototyping in the
	第一个键盘，所以它实际上是我们一直在制作原型的东西
	AI lab we act you can actually download this for your Android phone so it's
	AI 实验室，我们认为您实际上可以将其下载到您的 Android 手机上，所以它
	called puck type in case anybody wants to try it yeah but is remarkable how
	称为冰球类型，以防有人想尝试，是的，但是很引人注目
	much it changes your habits I use it all the time and I never thought I would do
	它会改变你的习惯，我一直在使用它，但我从未想过我会这样做
	that and so it emphasized to me why the AI lab is here that we can sort of
	这向我强调了为什么人工智能实验室在这里，我们可以
	discover these changes in user habits we can understand how speech recognition
	发现用户习惯的这些变化我们就能了解语音识别是如何进行的

	00:15
	00:15
	can impact people much more deeply than it could when it was just bolted onto a
	与刚刚用螺栓固定在墙上时相比，它可以对人们产生更深远的影响
	product and that sort of Spurs us on to start looking at the full range of
	产品之类的东西促使我们开始寻找全系列的产品
	speech problems that we have to solve to get you away from this sort of close
	为了让你远离这种亲密关系，我们必须解决言语问题
	talking voice search scenario and to one where I can just talk to my phone or
	说话的语音搜索场景以及我可以只与我的手机交谈或
	talk to a device and have it always work so as you'd like you know given this to
	与设备交谈并让它始终工作，以便您希望知道这一点
	a bunch of users I assume and gotten their feedback have you been surprised
	我假设有一群用户并收到了他们的反馈，你是否感到惊讶
	with the IKE voice as in I know lots of people talk about it some
	用 IKE 的声音，如我知道很多人都在谈论它
	people say like it doesn't really make sense you know for example you see like
	人们说这并没有什么意义，你知道，例如你看到的
	Apple transcribing voicemails now are there certain use cases where you've
	Apple 转录语音邮件现在是否存在某些用例？
	been surprised at how effective it is and now there's where you're like I
	对它的效果感到惊讶，现在你就像我一样
	don't know if this will ever play out you know I think you know the really
	不知道这是否会发生你知道我想你知道真正的
	obvious ones like texting seem to be the most popular I think the feedback that
	像发短信这样明显的问题似乎是最受欢迎的，我认为反馈是
	is maybe the most fun for me is for when people with thick accents post a review
	对我来说最有趣的可能是当口音很重的人发表评论时
	they say oh I have this like you know crazy accent I grew up with and nothing
	他们说哦，我有这种就像你知道的疯狂口音，我从小就带着这种口音，什么也没有

	00:16
	00:16
	works for me but I try I tried this new keyboard and it works amazingly well I
	对我有用，但我尝试了这个新键盘，它工作得非常好我
	have a friend who has a thick Italian accent and he complains all the time
	有一个意大利口音很重的朋友，他总是抱怨
	that nothing works and and all of this stuff now works for
	没有任何效果，而所有这些东西现在都适用
	four different accents because it's all data-driven we don't have to think about
	四种不同的口音，因为这都是数据驱动的，我们不必考虑
	how we're going to serve all these different users if they're represented
	如果有代表，我们将如何为所有这些不同的用户提供服务
	in the data sets and we get some transcriptions we can actually serve
	在数据集中，我们得到了一些我们实际上可以提供的转录
	them in a way that really wasn't possible when we were trying to do it
	以一种我们尝试时确实不可能的方式
	all by hand that's fantastic and have you got it like through the whole system
	全部由手工完成，这太棒了，您是否通过整个系统获得了它
	in other words like if I want to give myself you know an Italian American
	换句话说，如果我想给自己一个意大利裔美国人
	accent what can I do that yet with Baidu
	重音我能用百度做什么呢
	we can't do that yet with our TTS engine but it is definitely on the way okay
	我们的 TTS 引擎还无法做到这一点，但它肯定已经在路上了，好吧
	cool so what else was on the way what are you researching what products are
	很酷，那么您正在研究什么产品呢？
	you working on what's coming to speech and text-to-speech I think these are
	你正在研究语音和文本转语音的内容，我认为这些是
	part of a big effort to make this next generation of AI products really fly
	这是让下一代人工智能产品真正飞起来所做的巨大努力的一部分
	once text to speech and speech are your primary interface to a new device they
	一旦文本到语音和语音成为您与新设备的主要界面，它们

	00:17
	00:17
	have to be amazingly good and after work for everybody and so I think there's
	下班后对每个人来说都必须非常好，所以我认为
	actually still quite a bit of room to run on those topics not just making it
	实际上，在这些主题上还有相当大的运行空间，而不仅仅是制作它
	work for a narrow domain but making it work for for really the full breadth of
	适用于狭窄的领域，但使其适用于真正广泛的领域
	what humans can do do you see a world where you can run this stuff locally or
	人类能做什么，你看到一个可以在本地运行这些东西的世界吗？
	will they always be calling anything yeah I think it's definitely going to
	他们会一直打电话吗是的，我想肯定会的
	happen one kind of funny thing is that if you look at folks who maybe have a
	发生一件有趣的事情是，如果你看看那些可能有
	lot less technical knowledge and don't really have the sort of instinct to
	技术知识少得多，而且没有真正的本能
	think through how a piece of technology is working on the back end I think the
	思考一项技术如何在后端工作我认为
	the response to a lot of AI analogies now because they're reaching
	现在对许多人工智能类比的反应是因为它们正在达到
	this sort of uncanny valley is that we often respond to them as though they're
	这种恐怖谷是我们经常对他们做出反应，就好像他们是
	sort of human and and that sets the bar really high our expectations for for how
	有点人性，这为我们设定了很高的期望
	delightful a product should be is now being set by our interactions with
	一个产品应该是令人愉快的，现在是通过我们与
	people and one of the things we discovered as we were translating deep
	我们在深入翻译时发现的人和事物之一

	00:18
	00:18
	speech into a production system was that latency is a huge part of that
	语音进入生产系统的原因是延迟是其中很大一部分
	experience that the difference between 50 or 100 milliseconds of latency and
	体验 50 或 100 毫秒的延迟和
	200 milliseconds of latency is actually quite perceptible and it really anything
	200 毫秒的延迟实际上是可以察觉的，而且确实很重要
	we can do to bring that down actually affects user experience quite a bit we
	我们可以做的就是降低它实际上会影响用户体验
	actually did a combination of research production hacking working with product
	实际上将研究生产黑客与产品结合起来
	teams thinking through how to make all of that work and that's a big part of
	团队思考如何使所有这些工作发挥作用，这是很重要的一部分
	this sort of translation process that we're here for that's very cool and so
	我们来这里的这种翻译过程非常酷，所以
	you know what happens on the technical side to make it run faster so when we
	你知道技术方面会发生什么才能使它运行得更快，所以当我们
	first started like the basic research for for deep speech like like all
	首先开始像所有深度语音的基础研究一样
	research papers you know we choose the model that gets the best benchmark score
	您知道的研究论文我们选择获得最佳基准分数的模型
	which turns out to be horribly impractical or we're putting on line and
	事实证明这是非常不切实际的，或者我们正在上线并且
	and so after sort of the initial research results team sat down with just
	因此，在初步研究结果出来后，团队坐下来讨论了

	00:19
	00:19
	a set of what you might think of as product requirements and started
	一组您可能认为是产品需求的内容并开始
	thinking through the what kinds of neural network models will allow us to
	思考什么样的神经网络模型将使我们能够
	get the same performance but don't require so much sort of future context
	获得相同的性能，但不需要太多的未来上下文
	they don't have to listen to the entire audio clip before they can give you a
	他们不必听完整个音频片段就可以给你一个
	really high accuracy response so kind of doing that like you know the language
	非常准确的响应，就像您了解该语言一样
	prediction stuff like the opening I guys
	预测诸如开场之类的东西
	we're doing with the Amazon reviews like predicting what's coming next maybe not
	我们正在利用亚马逊的评论来预测接下来会发生什么，也许不会
	even predicting what's coming next but one thing that humans do without
	甚至可以预测接下来会发生什么，但人类却做不到这一点
	thinking about it is if if I misunderstand a word that you said to me
	想想是不是我误解了你对我说的一句话
	and then a couple of words later I pick up context that disambiguates it
	然后几句话之后我就找到了消除歧义的上下文
	I actually don't skip a beat I just understand that as one long stream and
	实际上我不会跳过任何一个节拍，我只是将其理解为一个长流并且
	so one of the ways that our speech systems would do this is that they would
	所以我们的语音系统做到这一点的方法之一是
	listen to the entire audio clip first process it all in one fell swoop and
	首先听整个音频剪辑，一口气处理完所有内容，然后
	then give you a final answer and that works great for getting the highest
	然后给你一个最终答案，这对于获得最高分数非常有用

	00:20
	00:20
	accuracy but it doesn't work so great for a product where you need to give a
	准确性，但对于需要提供准确度的产品来说，它的效果不太好
	response online give people some feedback that lets them know that you're
	在线回复给人们一些反馈，让他们知道你
	listening and so you need to alter the neural network so that tries to give you
	倾听，所以你需要改变神经网络，以便尝试给你
	a really good answer using only what it's heard so far but can then update it
	一个非常好的答案，仅使用到目前为止所听到的内容，但可以更新它
	very quickly as it gets more contacts so I've noticed over the past few years
	很快，因为它有了更多的接触，所以我在过去几年注意到
	people have like gotten quite good at structuring sentences so Syria
	人们已经非常擅长构建句子，所以叙利亚
	understands them hmm you know they put like the noun in the
	理解他们，嗯，你知道他们把名词放在
	correct position so it like feeds back to data correctly I found this when I
	正确的位置，这样它就可以正确地反馈数据，当我
	was traveling like I was using a Google Translate and I after like one day
	就像我在使用谷歌翻译一样旅行，有一天我
	recognized that I couldn't give it a sentence but if I gave it a noun I could
	认识到我不能给它一个句子，但如果我给它一个名词，我就可以
	just show it to someone and like if I just show like you know bread it will
	只要把它展示给某人，就像我只要展示得像你知道面包一样
	translate it perfectly and give it do you find that like we're going to have
	完美地翻译它并给它你是否发现就像我们将要拥有的那样
	to slightly adapt how we communicate with machines or your goal is to
	稍微调整我们与机器通信的方式或者您的目标是
	communicate like perfectly as we would I really want it to be human level and I
	就像我们希望的那样完美地沟通，我真的希望它达到人类的水平，我

	00:21
	00:21
	don't see a serious barrier to getting there at least for really high valued
	至少对于真正高价值的人来说，不存在严重的障碍
	applications I think there's a lot more research to do but I I sincerely think
	应用程序我认为还有很多研究要做，但我真诚地认为
	there's a chance that over the next few years we're going to regard speech
	未来几年我们有可能会重视言论
	recognition as a solved problem that's very cool so what what are the really
	承认问题已解决，这非常酷，那么真正的问题是什么
	hard things happening right now like what are you not sure if it'll work
	现在正在发生困难的事情，比如你不确定它是否会起作用
	so I think we were talking earlier about getting all this data so I problems
	所以我想我们之前讨论过获取所有这些数据所以我遇到了问题
	where we can just get gobs of labeled data I think we've got a little bit more
	我们可以在其中获得大量标记数据我认为我们还有更多
	room to run there but we can certainly solve those kinds of applications but
	那里有运行的空间，但我们当然可以解决这些类型的应用程序，但是
	there's a huge range of what humans are able to do often without thinking that
	人类经常可以做很多事情，而无需考虑
	current speech engines just don't handle we can deal with crosstalk and a lot of
	当前的语音引擎无法处理我们可以处理串扰和很多
	background noise if you talk to me from the other side of a room even if there's
	如果你在房间的另一边跟我说话，即使有背景噪音
	a lot of reverberation and things going on it usually doesn't bother anybody
	很多混响和发生的事情通常不会打扰任何人
	that much and yet current speech systems often have a really hard time with this
	这么多，但当前的语音系统通常很难做到这一点

	00:22
	00:22
	but for the next generation of AI products they're going to need to handle
	但对于下一代人工智能产品，他们需要处理
	all of this and so a lot of the research that we're doing now is folk
	所有这一切以及我们现在正在进行的很多研究都是民间的
	just on trying to go after all of those other things how do I handle people who
	只是在尝试追求所有其他事情时，我如何处理那些
	are talking over each other or handle multiple speakers who are having a
	正在互相交谈或处理多个有问题的发言者
	conversation very casually how do i transcribe things that have very long
	非常随意的谈话我如何转录很长的内容
	structure to them like a lecture where over the course of the lecture I might
	对他们来说就像一场讲座，在讲座过程中我可能会
	realize I misunderstood something or a little bit of jargon gets spelled
	意识到我误解了某些东西或拼写了一些行话
	out for me and now I need to go and transcribe it so this is one place where
	为我准备好了，现在我需要去转录它，所以这是一个地方
	our ability to innovate on products is actually really useful we've just
	我们的产品创新能力实际上非常有用，我们刚刚
	launched recently a product vision called swift scribe to help
	最近推出了名为 swift scribe 的产品愿景来提供帮助
	transcriptionist be much more efficient and that's targeted at understanding all
	转录员的效率要高得多，目标是理解所有内容
	of these scenarios where the world wants
	世界想要的这些场景
	this long form transcription we have all
	这个长形式的转录我们都有
	of these conversations that we're having that are just sort of lost and we wish
	我们正在进行的这些对话有点迷失，我们希望

	00:23
	00:23
	we had written down but it's just too expensive to transcribe all of it for
	我们已经写下来了，但是将其全部转录起来太昂贵了
	for every day application so do um so in terms of emulating someone's voice do
	对于日常应用程序来说，嗯，就模仿某人的声音而言，这样做
	you have any concerns for faking it because I did you see the the face
	你担心假装，因为我确实看到了那张脸
	simulation I forget the the researchers name so I'll link to it but you know
	模拟我忘记了研究人员的名字，所以我会链接到它，但你知道
	what I'm talking about so essentially you can like feed it both
	我在说什么，所以本质上你可以同时喂它
	video and audio and you can recreate you know Adam talking do you have any
	视频和音频，你可以重新创建你知道亚当在说话，你有什么
	thoughts on like how we can prepare for that world you know I think in some
	关于我们如何为那个世界做准备的想法，你知道，我认为在某些方面
	sense this is a social question right I I think culturally we're all going to
	感觉这是一个社会问题，对吧，我认为从文化上来说，我们都会
	have to exercise a lot of critical thinking we've always had this problem
	必须运用大量的批判性思维，我们一直遇到这个问题
	in some sense that I can read an article that has someone's name on it and
	从某种意义上说，我可以阅读一篇带有某人名字的文章，并且
	notwithstanding understanding writing style I don't know for sure where that
	尽管了解写作风格，但我不确定它在哪里
	article came from and so I think we have habits for how to deal with that
	文章来自，所以我认为我们有如何处理这个问题的习惯
	scenario we we can be healthily skeptical and I think we're going to
	我们可以持健康的怀疑态度，我认为我们会

	00:24
	00:24
	have to come up with ways to adapt that to this sort of brave new world I think
	我认为必须想出办法使其适应这种勇敢的新世界
	those are big challenges coming up and I do think about them but I also think a
	这些都是即将到来的巨大挑战，我确实考虑过它们，但我也认为
	lot about just all the positives that that AI is going to have I you know I
	很多关于人工智能将带来的所有积极因素我你知道我
	don't talk about it too much like my mother actually has muscular dystrophy
	别说太多，就像我妈妈实际上患有肌肉萎缩症一样
	and so things like speech and language interfaces are just incredibly valuable
	因此，诸如语音和语言界面之类的东西非常有价值
	for for someone who cannot type on an iPad because the keys are too far apart
	适合因按键距离太远而无法在 iPad 上打字的人
	and so these are just all these like things that you don't really think about
	所以这些都是你没有真正考虑过的事情
	that that these technologies are going to address over the next few years and
	这些技术将在未来几年内解决这些问题
	on balance I know that we're going to have a lot of big challenges of like how
	总的来说，我知道我们将面临很多重大挑战，比如如何
	do we use these how do we as users adapt to all of the implications but I think
	我们是否使用这些，作为用户，我们如何适应所有的影响，但我认为
	we've done really well with this in the past and we're going to keep doing well
	我们过去在这方面做得非常好，并且我们将继续做得很好
	with it in the future so do you think we're a I will create new jobs for
	有了它，你认为我们将来会创造新的就业机会吗？

	00:25
	00:25
	people or will we all be like Mechanical Turk speeding ok I'm not sure I think
	人们还是我们都会像机械土耳其人一样超速行驶，好吧，我不确定我想
	this is this is something where you know the job turnover in the United States
	这是你了解美国工作流动率的地方
	every quarter is incredibly high it is actually shocking that the fraction of
	每个季度都令人难以置信的高，实际上令人震惊的是
	our workforce that quits one occupation and moves to another one is really high
	我们的劳动力退出一种职业并转向另一种职业的比例非常高
	I think it is clearly getting faster like we talked about this phenomenon
	我认为它明显变得更快，就像我们谈论这个现象一样
	within the AI lab here where the deep learning research is flying ahead so
	在人工智能实验室里，深度学习研究正在飞速发展，所以
	quickly that we're often remaking ourselves too to keep up with it and to
	很快我们也经常重塑自己以跟上它并
	make sure that we can keep innovating I think that might even be a little bit of
	确保我们能够不断创新我认为这甚至可能是一点点
	a lesson for for everyone that continual learning is going to become more and
	给每个人一个教训：持续学习将会变得更加
	more important going forward yes so speaking of like what are you teaching
	更重要的是，是的，所以说你在教什么
	yourself so the robots don't take your job I don't think we're at risk of
	你自己，这样机器人就不会抢走你的工作，我认为我们没有风险
	robots taking our jobs right now I actually it's kind of interesting we
	机器人现在抢走了我们的工作我实际上这很有趣

	00:26
	00:26
	thought a lot about like how does this change careers one thing that has been
	我思考了很多，比如这会如何改变职业生涯
	true in the past is that if you were to create a new research lab one of the
	过去的真实情况是，如果你要创建一个新的研究实验室，其中之一是
	first things you do is fill it with AI experts where they live and breathe AI
	你要做的第一件事就是让人工智能专家在这里生活和呼吸人工智能
	technology all day long I think that's really important I think for basic
	整天都在科技我认为这对于基本的我认为非常重要
	research you need that kind of specialization but because the field is
	研究你需要那种专业化，但因为这个领域是
	moving so quickly we also need a different kind of person now
	发展如此之快，我们现在也需要不同类型的人
	we also need people who are sort of chameleons who are these highly flexible
	我们还需要像变色龙一样高度灵活的人
	is that can understand and even contribute to a research project but can
	是可以理解甚至为研究项目做出贡献但可以
	also simultaneously shift to the other foot and think about how does this
	同时换到另一只脚并思考这是如何做到的
	interact with GPU hardware and a production system and how do I think
	与 GPU 硬件和生产系统交互，我是如何看待的
	about a product team and user experience because often product teams today can't
	关于产品团队和用户体验，因为当今的产品团队通常无法
	tell you what to change in your machine learning algorithm to make the user
	告诉你在你的机器学习算法中需要改变什么来吸引用户
	experience better it's very hard to quantify where it's falling off the edge
	体验更好，很难量化它在哪里下降

	00:27
	00:27
	and so you have to be able to think that through to change the algorithms you
	所以你必须能够思考这一点来改变你的算法
	also have to be able to look at the research community to think about what's
	还必须能够看看研究界来思考什么
	possible and what's coming and so there's a sort of amazing full-stack
	可能的以及即将发生的事情，所以有一种令人惊叹的全栈
	machine learning engineer that's starting to show up are they coming from
	开始出现的机器学习工程师是来自哪里
	like if I if I want to be that person what do I do
	就像如果我想成为那个人我该怎么办
	like now Sam you know eighteen they seem to be really hard to find right now
	就像现在山姆你知道十八个他们现在似乎很难找到
	leave it so in the AI library we really set ourselves to just creating them I
	把它留在人工智能库中，我们真的只需要创建它们我
	think this is sort of the way unicorns are that we have to find the first few
	我认为这就是独角兽的方式，我们必须找到前几个
	examples and see how exciting that is and then come up with a way for for
	示例并看看这是多么令人兴奋，然后想出一种方法
	people to to learn and become that sort of sort of professional actually one of
	人们实际上要学习并成为那种专业人士之一
	the cultural characteristics of our team is that we look for people who are
	我们团队的文化特征是我们寻找的是
	really self-directed and hungry to learn
	真正的自我导向和渴望学习
	that things are going so quickly we just can't guess what we're going to have to
	事情进展得太快，我们无法猜测我们将要做什么

	00:28
	00:28
	do in six months and having that sort of do-anything attitude saying well I'm
	六个月内做的事，并以那种做任何事的态度说，我是
	going to do research today and think about research papers but Wow once we
	今天要做研究并思考研究论文，但是一旦我们
	get some traction and the results are looking good we're going to take
	获得一些牵引力，结果看起来不错，我们将采取
	responsibility for getting this all the way to 100 million people that's a
	让这一切惠及一亿人的责任
	towering request of anyone on our team and the things that we find really help
	我们团队中任何人的高要求以及我们发现真正有帮助的事情
	everyone sort of connect to that and do really well with that is really
	每个人都对此有所联系并且做得很好
	self-directed and able to kind of deal with ambiguity and also really willing
	自我导向，能够处理歧义，并且非常愿意
	to learn a lot of stuff that isn't just AI research but is also stepping way
	学习很多东西，不仅仅是人工智能研究，而且也是迈出的一步
	outside of comfort zones and learning about GPUs and high-performance
	走出舒适区，学习 GPU 和高性能
	computing and learning about how a product manager thing
	计算和学习产品经理如何做事
	okay so this has been super helpful if if someone wanted to learn more about
	好的，如果有人想了解更多信息，这非常有帮助
	what you guys are working on or even just things that have been influential
	你们正在做什么，甚至只是有影响力的事情
	to you like what would you recommend they check out on the internet oh my
	你喜欢你推荐什么，他们在互联网上查看哦天哪

	00:29
	00:29
	goodness so I have to think about this for a second here I think the the stuff
	天哪，所以我必须在这里考虑一下，我认为这些东西
	that's actually been quite influential for me is actually like startup books I
	这实际上对我影响很大，就像我的创业书籍一样
	think especially with big companies it's easy to think of ourselves in silos of
	尤其是对于大公司来说，我们很容易认为自己处于孤岛之中
	having a single job one idea from the startup world that I think is really
	拥有一份工作，一个来自创业界的想法，我认为这确实是
	amazingly powerful is this idea that a huge fraction of what you're doing is
	这个想法非常强大，你正在做的事情的很大一部分是
	learning there's a tendency especially amongst engineers which I count my I
	了解到有一种趋势，尤其是在工程师中，我认为我的
	count myself a member it's like we want to build something and so one of the
	算我自己是一个成员，就像我们想要建造一些东西，所以其中之一
	disciplines I we all have to keep in mind is that we also have to be really
	我们都必须牢记的纪律是，我们还必须真正做到
	clear eyed and think about what do we not know right now and focus on learning
	头脑清醒，思考我们现在不知道的事情，并专注于学习
	as quickly as we can to find the most important part of AI research that's
	我们尽快找到人工智能研究中最重要的部分
	happening and find the most important pain point that people in the real world
	并找到现实世界中人们最重要的痛点
	are experiencing and then be really fast
	正在经历然后速度非常快

	00:30
	00:30
	at connecting those and I think a lot of that influence on my thinking has come
	在将这些联系起来时，我认为对我的思维产生了很大的影响
	from the startup world there you go that's a great answer okay cool thanks
	来自创业世界，这是一个很好的答案，好的，酷，谢谢
	man thanks so much you
	非常感谢你