File size: 55,598 Bytes
3378e00 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 |
00:00 00:00 today we have Adam coats here for an interview Adam you run the AI lab at Baidu 今天我们有 Adam Coats 接受采访 Adam 你在百度负责人工智能实验室 in Silicon Valley could you just give us a quick intro and explain what Baidu is 在硅谷,您能给我们简单介绍一下并解释一下什么是百度吗? for people who don't know yeah so Baidu is actually the largest search engine in 对于那些不知道的人来说,是的,百度实际上是最大的搜索引擎 China so it turns out the internet ecosystem in China is incredibly dynamic 中国 事实证明中国的互联网生态系统非常活跃 environment and so Baidu I think sort of turned out to be an early technology 环境等等,我认为百度是一种早期的技术 leader and really established itself in PC search but then also has sort of 领导者并真正在 PC 搜索领域确立了自己的地位,但随后也有一些 remade itself in the mobile revolution and increasingly today is becoming an AI 在移动革命中重塑自我,如今越来越多地成为人工智能 company recognizing the value of AI for a whole bunch of different applications 公司认识到人工智能对于各种不同应用的价值 not just search okay and so yet what do you do exactly 不仅仅是搜索,那么你具体做什么 so I'm the director of the Silicon Valley AI lab which is one of four labs 所以我是硅谷人工智能实验室的主任,这是四个实验室之一 within Baidu research so especially is Baidu is becoming an AI 在百度研究中,尤其是百度正在成为人工智能 company the need for a team to sort of be on the bleeding edge and understand 公司需要一个团队处于前沿并理解 00:01 00:01 all of the current research be able to do a lot of basic research ourselves but 目前所有的研究我们自己都能够做很多基础研究但是 also figure out how we can translate that into business and product impact 还弄清楚我们如何将其转化为业务和产品影响 for the company that's increasingly critical so that's what Baidu research 对于越来越重要的公司来说,这就是百度研究的内容 is here for and the AI lab in particular we kind of founded recognizing how 是为了人工智能实验室,特别是我们建立的人工智能实验室,认识到如何 extreme this problem was about to get so I think the deep learning research and 这个问题即将变得极端,所以我认为深度学习研究和 AI research right now is flying forward so rapidly that the need for teams to be 目前人工智能研究飞速发展,团队需要 able to both understand that research but also quickly translate it into 既能够理解该研究,又能快速将其转化为 something that businesses and products can use is more critical than ever so we 企业和产品可以使用的东西比以往任何时候都更加重要,因此我们 founded the AI lab to try to close that gap and help the company move faster and 成立了人工智能实验室,试图缩小这一差距并帮助公司更快地发展 so then how do you break up your time in between like doing basic research for 那么你如何分配你的时间,比如做基础研究 around AI and actually implementing like 围绕人工智能并实际实施 bringing it forward to a product there's no hard and fast rule to this I think 我认为将其转化为产品没有硬性规定 00:02 00:02 one of the things that we try to to repeat to ourselves every day is that 我们每天试图对自己重复的事情之一是 we're mission oriented so the mission of the AI lab is is precisely to create AI 我们以使命为导向,因此人工智能实验室的使命正是创造人工智能 technologies that can have a significant impact on at least 100 million people 可以对至少 1 亿人产生重大影响的技术 we chose this to sort of keep bringing ourselves back to to the sort of final 我们选择这个是为了让自己回到最终的状态 goal that we want all the research we do to ultimately ends up in the hands of 我们希望我们所做的所有研究最终都落到 users and so sometimes that means that we spot something that that needs to 用户,所以有时这意味着我们发现一些需要 happen in the world to really change technology for the better and to help I 世界上发生的事情真正使技术变得更好并帮助我 do but no one knows how to solve it and there's a basic research problem there 做了,但没有人知道如何解决,并且存在一个基础研究问题 that someone has to tackle and so will will sort of go back to our visionary 有人必须解决这个问题,所以会回到我们的远见卓识 stance and think about the long term and invest in research and then as we have 立场并思考长期并投资于研究,然后就像我们一样 success there we shift back to to the other foot and take responsibility for 在那里取得成功,我们回到另一只脚并承担责任 carrying all of that to a real application and making sure we don't 将所有这些带到真正的应用程序中并确保我们不会 00:03 00:03 just solve the 90% that you might put in say your research paper but we also 只需解决您可能放入研究论文中的 90%,但我们也 solve the last the last mile we get to the 99.9 percent so maybe maybe the best 解决最后一英里,我们达到 99.9%,所以也许是最好的 way to do this then is to just explain like something that's started with 那么做到这一点的方法就是像开头那样解释 research here and how that's been brought on to like a full on product 在这里进行研究以及如何将其变成完整的产品 that exists so I'll give you an example we we've spent a ton of time on speech 那是存在的,所以我给你举个例子,我们在演讲上花了很多时间 recognition so speech recognition you years ago as one of these technologies 识别所以语音识别你几年前就作为这些技术之一 that always felt pretty good but not good enough and so traditionally speech 总是感觉不错,但还不够好,所以传统的演讲 recognition systems have been heavily optimized for things like mobile search 识别系统已针对移动搜索等进行了大幅优化 so if you hold your phone up close to your mouth 所以如果你把手机靠近嘴 and you say a short area you made non-human voice exactly the systems 你说你在系统中发出了非人类声音的一小段区域 could figure it out and they're getting quite good I think you know the speech 能弄清楚并且他们做得很好我想你知道这个演讲 engine that we've built it by do called deep speech it's actually super human 我们建造的引擎叫做深度语音,它实际上是超级人类 for these short queries because you have 对于这些简短的查询,因为你有 00:04 00:04 no context people can have thick accents so that speech engine actually started 没有上下文的人可以有浓重的口音,以便语音引擎真正启动 out as a basic research project we looked at this problem we said gosh what 作为一个基础研究项目,我们研究了这个问题,我们说天哪 would happen if speech recognition were human level for every product you ever 如果您所使用的每一款产品的语音识别都达到人类水平,就会发生这种情况 used so whether you're in your home or in your car or you pick up your phone 无论您是在家里、在车里还是拿起手机,都可以使用 whether you hold your phone up close or hold it away if I'm in the kitchen and 如果我在厨房,你是否将手机靠近或拿开 my toddler is you know yelling at me can I still use a speech interface 我的孩子对我大喊大叫,我还能使用语音界面吗 could it work as well as a human being understands us and so then how do you do 它能像人类理解我们一样有效吗?那么你该怎么做? that what is the basic research that moved it forward to put it in a place 是什么基础研究推动了它的发展并把它放在一个地方 that it's useful so we have the hypothesis that maybe the thing holding 它是有用的,所以我们假设可能持有的东西 back a lot of the progress in speech is actually just scale maybe if we took 言语上的很多进步实际上只是规模,也许如果我们采取 some of the same basic ideas we could see in the research literature already 我们已经在研究文献中看到了一些相同的基本想法 and scaled them way up put in a lot more data invested a lot of time in solving 并扩大规模,投入更多数据,投入大量时间来解决问题 computational problems and built a much larger neural network than anyone had 计算问题并建立了比任何人都大得多的神经网络 been building before for this problem we 我们之前一直在为这个问题构建 00:05 00:05 could just get better performance and lo and behold with with a lot of effort we 可以得到更好的表现,你瞧,我们付出了很多努力 ended up with this pretty amazing speech recognition model like I said in 最终得到了这个非常惊人的语音识别模型,就像我在 Mandarin at least is actually super human you can actually sit there and 普通话至少实际上是超级人类,你实际上可以坐在那里 listen to a voice query that someone is trying out and you'll have native 聆听某人正在尝试的语音查询,您将获得本机语音查询 speakers sitting around debating with each other wondering what the heck the 演讲者围坐在一起争论,想知道到底是什么 person is saying Wow and then the speech 人们说哇,然后演讲 engine will give an answer and everybody goes oh that's what it was because it's 引擎会给出答案,每个人都会说哦,就是这样,因为它是 just such a thick accent from perhaps someone in rural China how much how much 这么浓重的口音也许是来自中国农村的人 多少多少 data do you have to give it to train it you know to train it on a new line 您是否必须提供数据来训练它您知道要在新线路上训练它 because I think on the site I saw it was English and Mandarin yeah like if I 因为我想在网站上我看到的是英语和普通话是的,就像我 wanted German how much would I have to give it so one of the big challenges for 想要德语,我需要付出多少,所以这是我面临的最大挑战之一 these things is that they need a ton of data so our English system uses like 10 这些事情是他们需要大量的数据,所以我们的英语系统使用大约 10 to 20,000 hours of audio the Mandarin systems are using even more for four-top 普通话系统使用的音频时间甚至超过 20,000 小时,用于四顶 00:06 00:06 and products so this certainly means that the technologies at a state where 和产品,所以这当然意味着技术处于这样的状态 to get that superhuman performance you've got to really care about it so so 为了获得超人的表现,你必须真正关心它 for Baidu voice search maps things like that that our flagship products we can 对于百度语音搜索地图之类的东西我们的旗舰产品我们可以 put in the capital and the effort to do that but it's also one of the exciting 投入资本和努力来做到这一点,但这也是令人兴奋的事情之一 things going forward in the basic research that we think about is how do 我们思考的基础研究未来的事情是如何做 we get around that how can we develop machine learning systems that get you 我们解决了如何开发机器学习系统来帮助您解决这个问题 human performance on every product and do it with a lot less data so what I was 人类在每种产品上的表现,并且用更少的数据来做到这一点,所以我是这样的 wondering then like did you see that Lyrebird thing that was floating around 想知道你有没有看到那个漂浮在周围的琴鸟 the event this week okay they claim that they don't need all that much time all 本周的活动还好,他们声称他们不需要那么多时间 that much data audio data to emulate your voice or similar 那么多数据音频数据来模拟你的声音或类似的声音 whatever they call you guys have a similar project going on right that's 不管他们怎么称呼你们,你们都有一个类似的项目正在进行,那就是 right yeah we're working on Texas why can they achieve that with less data I 是的,我们正在德克萨斯州工作,为什么他们可以用更少的数据实现这一目标? think the the technical challenge behind all of this is there's sort of two 我认为这一切背后的技术挑战有两个 things that we can do one is to try to share data across many applications so 我们可以做的一件事就是尝试在许多应用程序之间共享数据,以便 00:07 00:07 to take text-to-speech is one example if I learn to mimic lots of different 如果我学会模仿许多不同的语言,那么将文本转语音就是一个例子 voices and then you give me the 1000 and first voice you hope that the first 声音,然后你给我 1000 个声音,你希望第一个 thousand taught you virtually everything 千教你几乎一切 you need to know about language and that 你需要了解语言 what's left is really some idiosyncratic change that you could learn from very 剩下的确实是一些特殊的变化,你可以从中学习 little data so that's one possibility the other side of it is that a lot of 数据很少,所以这是一种可能性,另一方面是很多 these systems this is much more important for things like speech 这些系统对于语音等事物来说更为重要 recognition that we were talking about is we want to move from using supervised 我们所讨论的认识是我们希望不再使用监督 learning where a human being has to give you the correct answer in order for you 了解人们必须在哪里给你正确的答案才能为你服务 to train your neural network but move to 训练你的神经网络,但转向 unsupervised learning where I could just 无监督学习,我可以 give you a lot of raw audio and have you learn the mechanics of speech before I 在我之前给你很多原始音频并让你学习语音机制 ask you to learn a new language and hopefully that can also bring down the 要求你学习一门新语言,希望这也能降低 amount of data that we need and so then on the technical side like could you 我们需要的数据量,那么在技术方面,你可以吗 give us just a yeah somewhat of an overview of how that actually works like 让我们大致了解一下它的实际工作原理 00:08 00:08 how how do you process a voice for text-to-speech let's do both actually 如何处理文本到语音的语音 让我们实际执行这两个操作 because I'm super interested right so closely let you start with yeah let's 因为我非常感兴趣,所以非常密切让你从“是的,让我们”开始 start with speech recognition before we go and train a speech system what we 在我们开始训练语音系统之前,先从语音识别开始 have to do is collect a whole bunch of audio clips so for example if we wanted 要做的就是收集一大堆音频剪辑,例如如果我们想要 to build a new voice search engine I would need to get lots of examples of 要构建一个新的语音搜索引擎,我需要获得很多示例 people speaking to me giving me little voice queries and then I actually need 人们对我说话时很少向我询问语音问题,然后我实际上需要 human annotators or I need some kind of system that can give me ground truth 人类注释者或者我需要某种可以给我基本事实的系统 that can tell me for a given audio clip what was the correct transcription and 它可以告诉我对于给定的音频剪辑,正确的转录是什么 so once you've done that you can ask a deep learning algorithm to learn the 所以一旦你完成了,你就可以要求深度学习算法来学习 function that predicts the correct text transcript from the audio clip so 从音频剪辑中预测正确文本转录的函数 this is this is called supervised learning 这就是所谓的监督学习 it's an incredibly successful framework we're really good with with this for 这是一个非常成功的框架,我们对此非常擅长 00:09 00:09 lots of different applications but the big challenge there is those labels that 有很多不同的应用程序,但最大的挑战是那些标签 someone has to be able to sit there and give you say ten thousand hours worth of 必须有人能够坐在那里给你说一万个小时的时间 labels which can be really expensive so how does it actually recognize what is a 标签可能非常昂贵,那么它如何真正识别什么是 software doing to recognize the intonation of the word well 软件可以很好地识别单词的语调 traditionally what you would have to do is break these problems down into lots 传统上你要做的就是把这些问题分解成很多 of different stages so I as a speech recognition expert would sit down and I 不同的阶段,所以我作为语音识别专家会坐下来,我 would think a lot about what are the mechanics of this language so for for 会思考很多关于这种语言的机制是什么,所以对于 Chinese you would have to think about tonality and how to break up all the 在中文中,你必须考虑调性以及如何分解所有的音调。 different sounds into some intermediate representation and then you would need 不同的声音转化为某种中间表示,然后你需要 some sophisticated piece of software we called a decoder that goes through and 一些复杂的软件,我们称之为解码器,它可以通过 tries to map that sequence of sounds to possible words that it might represent 尝试将声音序列映射到它可能代表的可能单词 and so you have all these different pieces and you'd have to engineer each 所以你有所有这些不同的部分,你必须对每个部分进行设计 00:10 00:10 one often with its own expert knowledge but deep speech and all of the new deep 一个人往往拥有自己的专业知识,但演讲深刻,并且拥有所有新的深刻见解 learning systems we're seeing now try to solve this in one fell swoop so the 我们现在看到的学习系统试图一举解决这个问题,所以 really the answer to your question is kind of the vacuous one which is that 事实上,你的问题的答案有点空洞,那就是 once you give me the audio clips and the characters that it needs to output a 一旦你给我音频剪辑和输出所需的字符 deep learning algorithm can actually just learn to predict those characters 深度学习算法实际上可以学习预测这些字符 directly and in the past it always looked like there was some fundamental 直接地,在过去,看起来总是有一些基本的东西 problem that maybe we could never escape this need for these hand engineered 问题是我们可能永远无法逃避对这些手工设计的需求 representations but it turns out that once you have enough data all of those 但事实证明,一旦你有了足够的数据,所有这些 things go away and so where where did your data come from like 10,000 hours of 一切都消失了,那么你的数据从哪里来,比如 10,000 小时的数据? audio we actually do a lot of clever tricks in English where we don't have a 实际上,我们用英语做了很多巧妙的技巧,但我们没有 lot of a large number of English language products so for example it 很多大量的英语产品,例如 turns out that if you go onto say a crowdsourcing service you can hire 事实证明,如果你继续说众包服务,你可以雇用 people very cheaply to just read books to you and 人们非常便宜地只是读书给你听 00:11 00:11 it it's not the same as the kinds of audio that we hear in real applications 它与我们在实际应用中听到的音频类型不同 but it's enough to teach a speech system all about you know liaisons between 但这足以教授一个语音系统所有关于你知道之间的联系 words and you get some speaker variation and you hear strange vocabulary where 单词,你会得到一些说话者的变化,你会听到奇怪的词汇 English spelling is totally ridiculous and in the past you would hand engineer 英语拼写完全是荒谬的,在过去你会手工设计 these things you'd say well I've never heard that word before so I'm going to 这些事情你会说得很好,我以前从未听过这个词,所以我要 bake the pronunciation into my speech engine but now it's all data driven so 将发音烘焙到我的语音引擎中,但现在都是数据驱动的,所以 if I hear enough of these unusual words you see these neural networks actually 如果我听够了这些不寻常的词,你实际上会看到这些神经网络 learn to spell on their own even considering all the weird exceptions of 即使考虑到所有奇怪的例外,也要学会自己拼写 English interesting and you have the input right because I've heard of people 英语很有趣,你的输入是正确的,因为我听说过有人 doing it with like a YouTube video but then you need a caption as well with the 就像 YouTube 视频一样,但是你还需要一个标题 audio so it's twice as much if not more work interesting and so then what about 音频,所以它的两倍,如果不是更多的工作有趣,那么呢 the other way around how does that work on the technical side right so that's 反过来说,这在技术方面是如何运作的,所以这就是 one of the really kind of cool parts of deep learning right now is that a lot of 目前深度学习最酷的部分之一是,很多 00:12 00:12 these insights about what works in one domain keep transferring to other 这些关于在一个领域有效的见解不断转移到其他领域 domains so with text-to-speech you could see a lot of the same practices so you 领域,因此通过文本转语音,您可以看到很多相同的做法,因此您 would see that a lot of systems were hand engineered combinations of many 会看到很多系统都是手工设计的许多系统的组合 different modules and each module would have its own set of machine learning 不同的模块,每个模块都有自己的一套机器学习 algorithms with its own little tricks and so one of the things that our team 算法有自己的小技巧,所以我们团队所做的事情之一 did recently with a piece of work that we're calling deep voice was to just ask 最近做的一项我们称之为“深声”的工作就是问 what if I rewrote all of those modules using deep learning for every single one 如果我对每个模块都使用深度学习重写所有这些模块会怎样? to not put them all together just yet but even just ask can deep learning 暂时还没有把它们全部放在一起,但即使只是问一下深度学习也可以 solve all of these adequately to to get a good speech system interrupt the 充分解决所有这些问题以获得良好的语音系统 answer is yes that you can basically abandon most of this specialized 答案是肯定的,你基本上可以放弃大部分这个专业 knowledge in order to to build all of the subsequent modules and in more 知识,以便构建所有后续模块以及更多内容 recent research that's in the deep learning community is seeing that of 深度学习社区最近的研究发现 course everyone is now figuring out how to make these things work end to end 当然,每个人现在都在弄清楚如何使这些东西端到端地工作 00:13 00:13 they're all data driven and that's the same story we saw for for deep speech so 它们都是数据驱动的,这与我们在深度演讲中看到的故事是一样的,所以 we're really excited about that that's a while and so do you have a team just 我们对此感到非常兴奋,那么你们有一个团队吗? dedicated to parsing like research coming out of universities and then figuring 致力于解析来自大学的研究,然后计算 how to apply it are you testing everything that comes out it's a bit of 如何应用它你测试了所有出来的东西吗? a mix is definitely our role to not only think about AI research but to think 混合绝对是我们的角色,不仅要考虑人工智能研究,还要思考 about AI products and how to get these things to impact I think there is 关于人工智能产品以及如何让这些东西产生影响,我认为有 clearly so much a I research happening that it's impossible to to look through 显然,我的研究发生了太多,以至于无法查看 everything but one of the big challenges right now is to not just digest 除了目前最大的挑战之一之外,一切都不仅仅是消化 everything but to identify the things that are truly important so what's like 除了确定真正重要的事情之外,什么都可以 a looks like a ninety million person product that's a sure like element well 看起来像一个九千万人的产品,这肯定是一个相似的元素 the speech recognition we chose because we felt in aggregate it had that 我们选择语音识别是因为我们总体感觉它具有以下特点 potential so as we have the next wave of AI products I think we're going to move 潜力,因此当我们拥有下一波人工智能产品时,我认为我们将会采取行动 00:14 00:14 from these sort of bolted on AI features to really immersive AI products so if 从这些附加的人工智能功能到真正身临其境的人工智能产品,所以如果 you look at how keyboards were designed you know a few years ago for for your 你看看几年前你知道的键盘是如何设计的 phone you see that everybody just bolted 打电话你看到每个人都逃跑了 on a microphone and they hooked it up to their speech API and then that was fine 他们将其连接到他们的语音 API,然后就可以了 for for that level of technology but as the technology is getting better and 对于那种技术水平,但随着技术变得越来越好 better we can now start putting speech up front we can actually build a voice 更好的是,我们现在可以开始将语音放在前面,我们实际上可以构建一个声音 first keyboard so it's actually something we've been prototyping in the 第一个键盘,所以它实际上是我们一直在制作原型的东西 AI lab we act you can actually download this for your Android phone so it's AI 实验室,我们认为您实际上可以将其下载到您的 Android 手机上,所以它 called puck type in case anybody wants to try it yeah but is remarkable how 称为冰球类型,以防有人想尝试,是的,但是很引人注目 much it changes your habits I use it all the time and I never thought I would do 它会改变你的习惯,我一直在使用它,但我从未想过我会这样做 that and so it emphasized to me why the AI lab is here that we can sort of 这向我强调了为什么人工智能实验室在这里,我们可以 discover these changes in user habits we can understand how speech recognition 发现用户习惯的这些变化我们就能了解语音识别是如何进行的 00:15 00:15 can impact people much more deeply than it could when it was just bolted onto a 与刚刚用螺栓固定在墙上时相比,它可以对人们产生更深远的影响 product and that sort of Spurs us on to start looking at the full range of 产品之类的东西促使我们开始寻找全系列的产品 speech problems that we have to solve to get you away from this sort of close 为了让你远离这种亲密关系,我们必须解决言语问题 talking voice search scenario and to one where I can just talk to my phone or 说话的语音搜索场景以及我可以只与我的手机交谈或 talk to a device and have it always work so as you'd like you know given this to 与设备交谈并让它始终工作,以便您希望知道这一点 a bunch of users I assume and gotten their feedback have you been surprised 我假设有一群用户并收到了他们的反馈,你是否感到惊讶 with the IKE voice as in I know lots of people talk about it some 用 IKE 的声音,如我知道很多人都在谈论它 people say like it doesn't really make sense you know for example you see like 人们说这并没有什么意义,你知道,例如你看到的 Apple transcribing voicemails now are there certain use cases where you've Apple 转录语音邮件现在是否存在某些用例? been surprised at how effective it is and now there's where you're like I 对它的效果感到惊讶,现在你就像我一样 don't know if this will ever play out you know I think you know the really 不知道这是否会发生你知道我想你知道真正的 obvious ones like texting seem to be the most popular I think the feedback that 像发短信这样明显的问题似乎是最受欢迎的,我认为反馈是 is maybe the most fun for me is for when people with thick accents post a review 对我来说最有趣的可能是当口音很重的人发表评论时 they say oh I have this like you know crazy accent I grew up with and nothing 他们说哦,我有这种就像你知道的疯狂口音,我从小就带着这种口音,什么也没有 00:16 00:16 works for me but I try I tried this new keyboard and it works amazingly well I 对我有用,但我尝试了这个新键盘,它工作得非常好我 have a friend who has a thick Italian accent and he complains all the time 有一个意大利口音很重的朋友,他总是抱怨 that nothing works and and all of this stuff now works for 没有任何效果,而所有这些东西现在都适用 four different accents because it's all data-driven we don't have to think about 四种不同的口音,因为这都是数据驱动的,我们不必考虑 how we're going to serve all these different users if they're represented 如果有代表,我们将如何为所有这些不同的用户提供服务 in the data sets and we get some transcriptions we can actually serve 在数据集中,我们得到了一些我们实际上可以提供的转录 them in a way that really wasn't possible when we were trying to do it 以一种我们尝试时确实不可能的方式 all by hand that's fantastic and have you got it like through the whole system 全部由手工完成,这太棒了,您是否通过整个系统获得了它 in other words like if I want to give myself you know an Italian American 换句话说,如果我想给自己一个意大利裔美国人 accent what can I do that yet with Baidu 重音我能用百度做什么呢 we can't do that yet with our TTS engine but it is definitely on the way okay 我们的 TTS 引擎还无法做到这一点,但它肯定已经在路上了,好吧 cool so what else was on the way what are you researching what products are 很酷,那么您正在研究什么产品呢? you working on what's coming to speech and text-to-speech I think these are 你正在研究语音和文本转语音的内容,我认为这些是 part of a big effort to make this next generation of AI products really fly 这是让下一代人工智能产品真正飞起来所做的巨大努力的一部分 once text to speech and speech are your primary interface to a new device they 一旦文本到语音和语音成为您与新设备的主要界面,它们 00:17 00:17 have to be amazingly good and after work for everybody and so I think there's 下班后对每个人来说都必须非常好,所以我认为 actually still quite a bit of room to run on those topics not just making it 实际上,在这些主题上还有相当大的运行空间,而不仅仅是制作它 work for a narrow domain but making it work for for really the full breadth of 适用于狭窄的领域,但使其适用于真正广泛的领域 what humans can do do you see a world where you can run this stuff locally or 人类能做什么,你看到一个可以在本地运行这些东西的世界吗? will they always be calling anything yeah I think it's definitely going to 他们会一直打电话吗 是的,我想肯定会的 happen one kind of funny thing is that if you look at folks who maybe have a 发生一件有趣的事情是,如果你看看那些可能有 lot less technical knowledge and don't really have the sort of instinct to 技术知识少得多,而且没有真正的本能 think through how a piece of technology is working on the back end I think the 思考一项技术如何在后端工作我认为 the response to a lot of AI analogies now because they're reaching 现在对许多人工智能类比的反应是因为它们正在达到 this sort of uncanny valley is that we often respond to them as though they're 这种恐怖谷是我们经常对他们做出反应,就好像他们是 sort of human and and that sets the bar really high our expectations for for how 有点人性,这为我们设定了很高的期望 delightful a product should be is now being set by our interactions with 一个产品应该是令人愉快的,现在是通过我们与 people and one of the things we discovered as we were translating deep 我们在深入翻译时发现的人和事物之一 00:18 00:18 speech into a production system was that latency is a huge part of that 语音进入生产系统的原因是延迟是其中很大一部分 experience that the difference between 50 or 100 milliseconds of latency and 体验 50 或 100 毫秒的延迟和 200 milliseconds of latency is actually quite perceptible and it really anything 200 毫秒的延迟实际上是可以察觉的,而且确实很重要 we can do to bring that down actually affects user experience quite a bit we 我们可以做的就是降低它实际上会影响用户体验 actually did a combination of research production hacking working with product 实际上将研究生产黑客与产品结合起来 teams thinking through how to make all of that work and that's a big part of 团队思考如何使所有这些工作发挥作用,这是很重要的一部分 this sort of translation process that we're here for that's very cool and so 我们来这里的这种翻译过程非常酷,所以 you know what happens on the technical side to make it run faster so when we 你知道技术方面会发生什么才能使它运行得更快,所以当我们 first started like the basic research for for deep speech like like all 首先开始像所有深度语音的基础研究一样 research papers you know we choose the model that gets the best benchmark score 您知道的研究论文我们选择获得最佳基准分数的模型 which turns out to be horribly impractical or we're putting on line and 事实证明这是非常不切实际的,或者我们正在上线并且 and so after sort of the initial research results team sat down with just 因此,在初步研究结果出来后,团队坐下来讨论了 00:19 00:19 a set of what you might think of as product requirements and started 一组您可能认为是产品需求的内容并开始 thinking through the what kinds of neural network models will allow us to 思考什么样的神经网络模型将使我们能够 get the same performance but don't require so much sort of future context 获得相同的性能,但不需要太多的未来上下文 they don't have to listen to the entire audio clip before they can give you a 他们不必听完整个音频片段就可以给你一个 really high accuracy response so kind of doing that like you know the language 非常准确的响应,就像您了解该语言一样 prediction stuff like the opening I guys 预测诸如开场之类的东西 we're doing with the Amazon reviews like predicting what's coming next maybe not 我们正在利用亚马逊的评论来预测接下来会发生什么,也许不会 even predicting what's coming next but one thing that humans do without 甚至可以预测接下来会发生什么,但人类却做不到这一点 thinking about it is if if I misunderstand a word that you said to me 想想是不是我误解了你对我说的一句话 and then a couple of words later I pick up context that disambiguates it 然后几句话之后我就找到了消除歧义的上下文 I actually don't skip a beat I just understand that as one long stream and 实际上我不会跳过任何一个节拍,我只是将其理解为一个长流并且 so one of the ways that our speech systems would do this is that they would 所以我们的语音系统做到这一点的方法之一是 listen to the entire audio clip first process it all in one fell swoop and 首先听整个音频剪辑,一口气处理完所有内容,然后 then give you a final answer and that works great for getting the highest 然后给你一个最终答案,这对于获得最高分数非常有用 00:20 00:20 accuracy but it doesn't work so great for a product where you need to give a 准确性,但对于需要提供准确度的产品来说,它的效果不太好 response online give people some feedback that lets them know that you're 在线回复 给人们一些反馈,让他们知道你 listening and so you need to alter the neural network so that tries to give you 倾听,所以你需要改变神经网络,以便尝试给你 a really good answer using only what it's heard so far but can then update it 一个非常好的答案,仅使用到目前为止所听到的内容,但可以更新它 very quickly as it gets more contacts so I've noticed over the past few years 很快,因为它有了更多的接触,所以我在过去几年注意到 people have like gotten quite good at structuring sentences so Syria 人们已经非常擅长构建句子,所以叙利亚 understands them hmm you know they put like the noun in the 理解他们,嗯,你知道他们把名词放在 correct position so it like feeds back to data correctly I found this when I 正确的位置,这样它就可以正确地反馈数据,当我 was traveling like I was using a Google Translate and I after like one day 就像我在使用谷歌翻译一样旅行,有一天我 recognized that I couldn't give it a sentence but if I gave it a noun I could 认识到我不能给它一个句子,但如果我给它一个名词,我就可以 just show it to someone and like if I just show like you know bread it will 只要把它展示给某人,就像我只要展示得像你知道面包一样 translate it perfectly and give it do you find that like we're going to have 完美地翻译它并给它你是否发现就像我们将要拥有的那样 to slightly adapt how we communicate with machines or your goal is to 稍微调整我们与机器通信的方式或者您的目标是 communicate like perfectly as we would I really want it to be human level and I 就像我们希望的那样完美地沟通,我真的希望它达到人类的水平,我 00:21 00:21 don't see a serious barrier to getting there at least for really high valued 至少对于真正高价值的人来说,不存在严重的障碍 applications I think there's a lot more research to do but I I sincerely think 应用程序 我认为还有很多研究要做,但我真诚地认为 there's a chance that over the next few years we're going to regard speech 未来几年我们有可能会重视言论 recognition as a solved problem that's very cool so what what are the really 承认问题已解决,这非常酷,那么真正的问题是什么 hard things happening right now like what are you not sure if it'll work 现在正在发生困难的事情,比如你不确定它是否会起作用 so I think we were talking earlier about getting all this data so I problems 所以我想我们之前讨论过获取所有这些数据所以我遇到了问题 where we can just get gobs of labeled data I think we've got a little bit more 我们可以在其中获得大量标记数据 我认为我们还有更多 room to run there but we can certainly solve those kinds of applications but 那里有运行的空间,但我们当然可以解决这些类型的应用程序,但是 there's a huge range of what humans are able to do often without thinking that 人类经常可以做很多事情,而无需考虑 current speech engines just don't handle we can deal with crosstalk and a lot of 当前的语音引擎无法处理我们可以处理串扰和很多 background noise if you talk to me from the other side of a room even if there's 如果你在房间的另一边跟我说话,即使有背景噪音 a lot of reverberation and things going on it usually doesn't bother anybody 很多混响和发生的事情通常不会打扰任何人 that much and yet current speech systems often have a really hard time with this 这么多,但当前的语音系统通常很难做到这一点 00:22 00:22 but for the next generation of AI products they're going to need to handle 但对于下一代人工智能产品,他们需要处理 all of this and so a lot of the research that we're doing now is folk 所有这一切以及我们现在正在进行的很多研究都是民间的 just on trying to go after all of those other things how do I handle people who 只是在尝试追求所有其他事情时,我如何处理那些 are talking over each other or handle multiple speakers who are having a 正在互相交谈或处理多个有问题的发言者 conversation very casually how do i transcribe things that have very long 非常随意的谈话 我如何转录很长的内容 structure to them like a lecture where over the course of the lecture I might 对他们来说就像一场讲座,在讲座过程中我可能会 realize I misunderstood something or a little bit of jargon gets spelled 意识到我误解了某些东西或拼写了一些行话 out for me and now I need to go and transcribe it so this is one place where 为我准备好了,现在我需要去转录它,所以这是一个地方 our ability to innovate on products is actually really useful we've just 我们的产品创新能力实际上非常有用,我们刚刚 launched recently a product vision called swift scribe to help 最近推出了名为 swift scribe 的产品愿景来提供帮助 transcriptionist be much more efficient and that's targeted at understanding all 转录员的效率要高得多,目标是理解所有内容 of these scenarios where the world wants 世界想要的这些场景 this long form transcription we have all 这个长形式的转录我们都有 of these conversations that we're having that are just sort of lost and we wish 我们正在进行的这些对话有点迷失,我们希望 00:23 00:23 we had written down but it's just too expensive to transcribe all of it for 我们已经写下来了,但是将其全部转录起来太昂贵了 for every day application so do um so in terms of emulating someone's voice do 对于日常应用程序来说,嗯,就模仿某人的声音而言,这样做 you have any concerns for faking it because I did you see the the face 你担心假装,因为我确实看到了那张脸 simulation I forget the the researchers name so I'll link to it but you know 模拟 我忘记了研究人员的名字,所以我会链接到它,但你知道 what I'm talking about so essentially you can like feed it both 我在说什么,所以本质上你可以同时喂它 video and audio and you can recreate you know Adam talking do you have any 视频和音频,你可以重新创建你知道亚当在说话,你有什么 thoughts on like how we can prepare for that world you know I think in some 关于我们如何为那个世界做准备的想法,你知道,我认为在某些方面 sense this is a social question right I I think culturally we're all going to 感觉这是一个社会问题,对吧,我认为从文化上来说,我们都会 have to exercise a lot of critical thinking we've always had this problem 必须运用大量的批判性思维,我们一直遇到这个问题 in some sense that I can read an article that has someone's name on it and 从某种意义上说,我可以阅读一篇带有某人名字的文章,并且 notwithstanding understanding writing style I don't know for sure where that 尽管了解写作风格,但我不确定它在哪里 article came from and so I think we have habits for how to deal with that 文章来自,所以我认为我们有如何处理这个问题的习惯 scenario we we can be healthily skeptical and I think we're going to 我们可以持健康的怀疑态度,我认为我们会 00:24 00:24 have to come up with ways to adapt that to this sort of brave new world I think 我认为必须想出办法使其适应这种勇敢的新世界 those are big challenges coming up and I do think about them but I also think a 这些都是即将到来的巨大挑战,我确实考虑过它们,但我也认为 lot about just all the positives that that AI is going to have I you know I 很多关于人工智能将带来的所有积极因素我你知道我 don't talk about it too much like my mother actually has muscular dystrophy 别说太多,就像我妈妈实际上患有肌肉萎缩症一样 and so things like speech and language interfaces are just incredibly valuable 因此,诸如语音和语言界面之类的东西非常有价值 for for someone who cannot type on an iPad because the keys are too far apart 适合因按键距离太远而无法在 iPad 上打字的人 and so these are just all these like things that you don't really think about 所以这些都是你没有真正考虑过的事情 that that these technologies are going to address over the next few years and 这些技术将在未来几年内解决这些问题 on balance I know that we're going to have a lot of big challenges of like how 总的来说,我知道我们将面临很多重大挑战,比如如何 do we use these how do we as users adapt to all of the implications but I think 我们是否使用这些,作为用户,我们如何适应所有的影响,但我认为 we've done really well with this in the past and we're going to keep doing well 我们过去在这方面做得非常好,并且我们将继续做得很好 with it in the future so do you think we're a I will create new jobs for 有了它,你认为我们将来会创造新的就业机会吗? 00:25 00:25 people or will we all be like Mechanical Turk speeding ok I'm not sure I think 人们还是我们都会像机械土耳其人一样超速行驶,好吧,我不确定我想 this is this is something where you know the job turnover in the United States 这是你了解美国工作流动率的地方 every quarter is incredibly high it is actually shocking that the fraction of 每个季度都令人难以置信的高,实际上令人震惊的是 our workforce that quits one occupation and moves to another one is really high 我们的劳动力退出一种职业并转向另一种职业的比例非常高 I think it is clearly getting faster like we talked about this phenomenon 我认为它明显变得更快,就像我们谈论这个现象一样 within the AI lab here where the deep learning research is flying ahead so 在人工智能实验室里,深度学习研究正在飞速发展,所以 quickly that we're often remaking ourselves too to keep up with it and to 很快我们也经常重塑自己以跟上它并 make sure that we can keep innovating I think that might even be a little bit of 确保我们能够不断创新 我认为这甚至可能是一点点 a lesson for for everyone that continual learning is going to become more and 给每个人一个教训:持续学习将会变得更加 more important going forward yes so speaking of like what are you teaching 更重要的是,是的,所以说你在教什么 yourself so the robots don't take your job I don't think we're at risk of 你自己,这样机器人就不会抢走你的工作,我认为我们没有风险 robots taking our jobs right now I actually it's kind of interesting we 机器人现在抢走了我们的工作 我实际上这很有趣 00:26 00:26 thought a lot about like how does this change careers one thing that has been 我思考了很多,比如这会如何改变职业生涯 true in the past is that if you were to create a new research lab one of the 过去的真实情况是,如果你要创建一个新的研究实验室,其中之一是 first things you do is fill it with AI experts where they live and breathe AI 你要做的第一件事就是让人工智能专家在这里生活和呼吸人工智能 technology all day long I think that's really important I think for basic 整天都在科技我认为这对于基本的我认为非常重要 research you need that kind of specialization but because the field is 研究你需要那种专业化,但因为这个领域是 moving so quickly we also need a different kind of person now 发展如此之快,我们现在也需要不同类型的人 we also need people who are sort of chameleons who are these highly flexible 我们还需要像变色龙一样高度灵活的人 is that can understand and even contribute to a research project but can 是可以理解甚至为研究项目做出贡献但可以 also simultaneously shift to the other foot and think about how does this 同时换到另一只脚并思考这是如何做到的 interact with GPU hardware and a production system and how do I think 与 GPU 硬件和生产系统交互,我是如何看待的 about a product team and user experience because often product teams today can't 关于产品团队和用户体验,因为当今的产品团队通常无法 tell you what to change in your machine learning algorithm to make the user 告诉你在你的机器学习算法中需要改变什么来吸引用户 experience better it's very hard to quantify where it's falling off the edge 体验更好,很难量化它在哪里下降 00:27 00:27 and so you have to be able to think that through to change the algorithms you 所以你必须能够思考这一点来改变你的算法 also have to be able to look at the research community to think about what's 还必须能够看看研究界来思考什么 possible and what's coming and so there's a sort of amazing full-stack 可能的以及即将发生的事情,所以有一种令人惊叹的全栈 machine learning engineer that's starting to show up are they coming from 开始出现的机器学习工程师是来自哪里 like if I if I want to be that person what do I do 就像如果我想成为那个人我该怎么办 like now Sam you know eighteen they seem to be really hard to find right now 就像现在山姆你知道十八个他们现在似乎很难找到 leave it so in the AI library we really set ourselves to just creating them I 把它留在人工智能库中,我们真的只需要创建它们我 think this is sort of the way unicorns are that we have to find the first few 我认为这就是独角兽的方式,我们必须找到前几个 examples and see how exciting that is and then come up with a way for for 示例并看看这是多么令人兴奋,然后想出一种方法 people to to learn and become that sort of sort of professional actually one of 人们实际上要学习并成为那种专业人士之一 the cultural characteristics of our team is that we look for people who are 我们团队的文化特征是我们寻找的是 really self-directed and hungry to learn 真正的自我导向和渴望学习 that things are going so quickly we just can't guess what we're going to have to 事情进展得太快,我们无法猜测我们将要做什么 00:28 00:28 do in six months and having that sort of do-anything attitude saying well I'm 六个月内做的事,并以那种做任何事的态度说,我是 going to do research today and think about research papers but Wow once we 今天要做研究并思考研究论文,但是一旦我们 get some traction and the results are looking good we're going to take 获得一些牵引力,结果看起来不错,我们将采取 responsibility for getting this all the way to 100 million people that's a 让这一切惠及一亿人的责任 towering request of anyone on our team and the things that we find really help 我们团队中任何人的高要求以及我们发现真正有帮助的事情 everyone sort of connect to that and do really well with that is really 每个人都对此有所联系并且做得很好 self-directed and able to kind of deal with ambiguity and also really willing 自我导向,能够处理歧义,并且非常愿意 to learn a lot of stuff that isn't just AI research but is also stepping way 学习很多东西,不仅仅是人工智能研究,而且也是迈出的一步 outside of comfort zones and learning about GPUs and high-performance 走出舒适区,学习 GPU 和高性能 computing and learning about how a product manager thing 计算和学习产品经理如何做事 okay so this has been super helpful if if someone wanted to learn more about 好的,如果有人想了解更多信息,这非常有帮助 what you guys are working on or even just things that have been influential 你们正在做什么,甚至只是有影响力的事情 to you like what would you recommend they check out on the internet oh my 你喜欢你推荐什么,他们在互联网上查看哦天哪 00:29 00:29 goodness so I have to think about this for a second here I think the the stuff 天哪,所以我必须在这里考虑一下,我认为这些东西 that's actually been quite influential for me is actually like startup books I 这实际上对我影响很大,就像我的创业书籍一样 think especially with big companies it's easy to think of ourselves in silos of 尤其是对于大公司来说,我们很容易认为自己处于孤岛之中 having a single job one idea from the startup world that I think is really 拥有一份工作,一个来自创业界的想法,我认为这确实是 amazingly powerful is this idea that a huge fraction of what you're doing is 这个想法非常强大,你正在做的事情的很大一部分是 learning there's a tendency especially amongst engineers which I count my I 了解到有一种趋势,尤其是在工程师中,我认为我的 count myself a member it's like we want to build something and so one of the 算我自己是一个成员,就像我们想要建造一些东西,所以其中之一 disciplines I we all have to keep in mind is that we also have to be really 我们都必须牢记的纪律是,我们还必须真正做到 clear eyed and think about what do we not know right now and focus on learning 头脑清醒,思考我们现在不知道的事情,并专注于学习 as quickly as we can to find the most important part of AI research that's 我们尽快找到人工智能研究中最重要的部分 happening and find the most important pain point that people in the real world 并找到现实世界中人们最重要的痛点 are experiencing and then be really fast 正在经历然后速度非常快 00:30 00:30 at connecting those and I think a lot of that influence on my thinking has come 在将这些联系起来时,我认为对我的思维产生了很大的影响 from the startup world there you go that's a great answer okay cool thanks 来自创业世界,这是一个很好的答案,好的,酷,谢谢 man thanks so much you 非常感谢你 |