All Articles

Converting audio to text with DeepSpeech

banana

TL;DR 20 🍌

Lex Fridman recently challenged the internet to count his bananas. Well not literally but how many times banana was said in his podcast with research scientist Ishan Misra.

Instead of spending multiple hours transcribing the audio myself, I decided to spend (probably more) multiple hours using artificial intelligence to do it for me. The podcast actually left me bearish on the future of AI. I’m still not convinced we’ve created anything remotely resembling intelligence. All we seem to have is better and better ways to model a problem space through brute force. Is this actually intelligence? Is there any evidence artificial general intelligence is actually possible? What the hell do I know. Regardless, it was very cool to learn about Ishan’s research, his thoughts on various areas of machine learning, and his opinion on the meaning of life.

Counting bananas

A quick Google search for “open-source audio to text machine learning” directed me to a Mozilla project DeepSpeech. I followed their installation docs for inference using CUDA (not actually necessary), downloaded their pre-trained models, and pip installed deepspeech.

I downloaded the .mp3 from #206 – Ishan Misra: Self-Supervised Deep Learning in Computer Vision. deepspeech requires audio files be in .wav format. I tried converting with ffmpeg and fre:ac but received no output. I figured shoving 2.5 hours of audio into this thing might not be the best idea. I ended up using Audacity to automatically cut the audio into 15 minute chunks via the auto-labeling feature and fed them to deepspeech one at a time.

So how many bananas? At least 20.

I combined bananas, bananas, and bananaland? and at least a couple false negatives I was aware of. The transcription isn’t great, no where near the level of YouTube, Google, or Instagram auto-captioning seem to be. Regardless, it’s pretty amazing I can feed an audio file into my computer and get text in a few minutes. On average, 15 minutes of audio took 3 minutes to infer. I assume this technology will offer amazing accessibility to the hearing-impaired and hopefully better models will be open-sourced over time. I also wonder what the world will look like when media can be trivially transcribed to any language. People can learn about Joe Rogan’s chimp hockey team in any language!

You can view the full transcription results below. If you want to get a count for every word here’s a little bash-fu:

echo "some text" | tr '[:space:]' '[\n*]' | grep -v "^\s*$" | sort | uniq -c | sort -bnr

2 men talking gibberish about bananas

the following is a conversation i can mister research scientist that face book a research who worked on self supervised machine learning in the domain of computer vision or in other words making a systems understand the visual world with minimal help from us humans transformers and salfation has been successfully used by opening a deputy three and other language models to do self supervise learning in the domain of language e shan together with yeoland others is trying to achieve the same success in the domain of images and delia the goal is to leave a robot watching you to videos all night and the morning come back to my smarter robot i read the black boat salutes learning the dark matter of intelligence by shannon young lacon and then listened to echapper on the excellent machine learning street talk podesta i knew i had to talk to him by the way if you’re interested in machine learning and i i cannot recommend the australias holy enough those guys are great quick misionarios on it the information granary and ateires check them out in a description to support the potestas aside now in mesa that for those of you who may have been listening for quite a while this pot guess used to be called artificial intelligence boast because my life passion has always been will always be artificial intelligence both narrowly and broadly defined my goal would this potestatis to have many conversations with world class researches in a math physics biology and all the other sciences but i also want to talk to historians musicians these and of course occasionally comedians in fact i’m trying out doing this podesta week now to give me more freedom with guess selection it maybe get a chance to have a bit more fun speaking of fun in this conversation i challenge the listener to come the number of times the word but is mentioned he sandstad banana as the canonical example at the core of the hard problem of computer vision and may be the hard problem of cautiousness as usual a dominion as in the middle a trio met these interesting but i give you time semispherical the links in a description is the best way to support his boasts on it nutrition supplement and fitness company they make of forain which is an tropic that helps support memory mental speed and focus i use it as a kind of super boost when i’m preparing for depression when i know i would have to sit for two three hours four hours thinking about a specific problem and when i know it’s going to be something that requires dubedat i mean you’re thinking about a very narrow specific problem and just thinking through it with a sheet of paper as a post sort of doing a lot of programming like jumping from one task to another within a particular programming project so when near skindeep i can give myself that extra boost of taking out for an italian were going to do some deep thinking like that move over the top was the lesson i think it turns the cap and we turn the cab he goes into that extra intense mode is an arresting movie that’s how i think about alternately anyway go to ledcombe up to ten percent of alfaretta driven investigative journalism in the world affecting the information is the first place that made me realize that good journalism cost money for most of my life i was broke until very recently and i remember one i was broke i mean this for five years ago when i first heard of it about the information i remember even though i couldn’t really tetrameter sending up anyway because i just love the depth of the articles i think the one that first pulled me in with probably related to ogle or tesla i a very depth study of some particular aspect but i remember thinking that this is the place to really explore a difficult topic and trust that the person is doing well good job it’s not necessary that you agree is they’re going to really do thorieth journalism and you can also trust that some of the biggest names interlining from that perspective its definition the things you definitely example why good journalism cost money anyway get seventy five per cent of your first month if you sign up at the information that consulate information that comes later i see it is a good way of supporting in death journalism i hope you do as well this show is also sponsored by granary writing system to that check spelling grammar centaure and readability grimly premium the version you pay for offers a bunch of extra features my favourite is the clarity check which helps you detect rambling chaos than many of us can descend into and i curtly descend into as i tried to ask a question on the potato ten minutes when the question could have been asked in a single sentence he should strive for that kind of clarity in your speech and in your writing and i definitely should as well in writing in science and mathematics even in art in life i think simplicity is beautiful and in writing i think simplicity is a skill that can be developed is not just so an art it’s also a science its skill it can be developed through rigorous practice of cutting removed the things that are not necessary that process i think is really beneficial for people to improve their writing but also to prove their thinking i think those two are coupled and in general for simplicity is a good guide big words for me get in the way anyway granville and basically any platform of major stapleton twenty percent of gramophones johannesthal in this one thing you can trust one improving with the latest science you don’t have to think i drink it a thing and you know that you’re getting everything you need aside from a lodge i take a leptalides basically sole magentas and all to take fish roland actually he’ll get one month supply fish royal when you sign up to the green atoms everything that’s the flood a green that comes this is alex freedom and boast and hears macomber satin with chairs self supervised learning and may be even give the bigger basis of what is supervised and semiseria may be whissel supervised learning a better term than unsupervised pursuers typically forces the water tenders you get a bunch of humans the human point out purulence for it in the case of images you want to humans to commentators on intimate robison the ramaquotan piccadillies not for an apparently lots of these particular tasks sarabande analysable and soon so typically for supervisor ingleborough such annotated labor dated then we feel that a system and the system is really trying to mimic suttain this imputation and then trying to mathews it looks at image and the human estate samarcand a banana and now the system is basically trying to mimic that it that its learning and so for supervisor we tried to gather lots of such data and retain these machinery models to imitate the important the hope is basically by the wing so now an unseen or like new kinds of data the morganatically learn to protect these concepts to seasoned for semistupefied setting the idea typically is that you have of course all of the supervisor for you have lots to father detaches unsupervised origine not labeled now the potomac with supervising and by you actually have a these alternator of learning paradiseidae as not kinspeople vision the sort of largest one of the most popular acetate entirely misread has about many to souvenance and about fourteen million images to these concepts of debasingly dismounted and the antedates and the entire desert was a amortisation of fortitude rise to lord of pouring out temistoele sort of the rise of deepening as well but this detested took about twenty two human ears to collect tante and so even that many concepts raimonda many forty million is nothing really let you have a boat i think for an admission even more than that afforded to most of the popular rascalliest studs no supervising just doesn’t call if i want to know andamaners if i want to have dispersion concepts it won’t belisante sort of different uninitialed sempiterni were the idea is of course you have the antirepublican you have lots of these unelaborate and the idea is that they are good the should basilio major some kind of consistency or illiterate major some kind of a signal on the sort of un laboratories more confident about wardenry predict so by access to this lot of un liberate the ideas that the average to be more confident and at leather accrediting these concepts and now we complete which is like self supervising the idea basically is that the machine of the argentiere concepts or discover things about the world was learned representations about the world which are useful without access to explosions supervision to the word supervision is still in the term self supervised so what is the supervision signal and may be that perhaps as when on macnaghten of the softening the tamasese because it explicitly says what’s happening to day did say is the sort of supervision and any sort of following a gothic prie to extract just sort of a supervision signals from the delight is a self superiorite is within the detector which unlocked the supervision rightnesses forninst the blaisdells like it alteration for this word that for vision i would say these sort of mortification and then asked idea belittlement because the molding what opening her going to automatic learn something about the sun of the world how objects move object permanent and these kind of things are so leif i have something at that of the table is a fall down things like these with you really don’t have to sit in and in a supervisor setting wolvertonian it seems to be applicable in anything that has the constraint of being a sequence that is consistent with the physical reality the question is there other tricks like this he can generate the self supervision so sequences possibly the most bygone for vision the one that is actually used for like images which is very popular diseases basically the king an image and now taking the frenche diments you can best decide the cobadonga and you cropstone and asking network to basically presented with the choice saying that kanowry have this image you have this image are these the same or not and to the idea basically is that because different galignani age different parts of the emerging to be related so for example if you have a chair and a table a basically these things are going to be closed by a voice if you take a geniality of a chair you taking different crops at wertenbaker so the idea basically is the different crops of the immaterial and so the features or the representation that you get from these different propositions possibly the most like by district these days for self toasting and computer vision so can using the consistency the inherited physical reality in invasion that you know part of an image consistent and then in the language domestication or ideas the blockhouse was mainly about sort of standing i mean the election testimonials would say literarily should end up like learning on portable so the idea for that particular peace was selecting a going to be a very part for it to learn common sense about the world or like stuff that is really hard to live for example that is this mark boerhave than the top now for all these kinds of things you have to sit and labeling supervision to scale so what is the thing that achingly going to be an agent that can either actually interact with it lifted up or observed me doing it so i am basely lifting these things up a controversialist king him more ample of up orderlies alost for this is different probably the monosabios essentially by observations of the data you should be able to infer a lot of things about the world without someone explicitly telling this is heavy this is not this is something that can pour the something that cannot or he is somewhere you can sit this is not so metrosideros mentioned a billy to interact with the world there is so many questions that are yet to be there still open which is hardest data over which the self surprised learning process works how much interactive you like in the act of learning or the machine teaching context is there what are their war signals the couch actual interaction there is with the physical world that kind of thing so that coehorns ortheris truly dark matter we don’t know how exactly to do it but we are i mean a lot of salamis that it’s going to be a sort of major thingamajig to my retreat human supervision can not be at large scale the source of the solution to intelligence so there has weediness the supervision in the natural signal of the world right i mean the other thing is also that humans are not particularly good liberationist et for example like what’s the difference between the dining table in a table issued one like if you just look at particularities as say one is dining table and the other is not human darnation isten do not like very good sort of supervision for the lord of these contain of educators so it may be also the fact that if we want of like wanton igraine the sole proofs he can may be respected the england like the stuff in between a perilous not be specified because we now may be going to confuse it tactually we humans can ever answer the meaning of life sooner supervisors or the end gaither some ask you bacterium are not very good at telling the difference between what is and is it a table like you mentioned do you think it’s possible we measure your plato is it possible to create a pretty good economy of objects in the world it seems like a lot of approaches a machine learning kind of assume a hopeful vision that is possible to construct a perfect economy or it exists perhaps out of our reach but an always get closer and closer to it or that a hopeless pursuit i think it’s hopeless in some so the thing is for any particular categorization educate if you have disassociation i can always take the nearest to concepts or i can take a ticonderoga vendition patterne category so if you were to enumerate and cataclysmal find an ample one ratio you that’s not going to be in dingaan i can actually create not distant flannery easily tear for more than an gethings lord of things we talk about actualisation so it surely hard for us to commence a new midlothian the composing various pervasive second not come together to form a coetmore to like animated upon the i don’t know whenever the contenders ago fifteen years ago then the penitent exist a member there is the most awondering a monkey castigation a good way to tell which parts of things are similar in which parts of things are very different i think so ah so you don’t necessarily need to name everything or a sign a name to everything to be able to use it but so lothario of shakespeare name what ingredients of like for example animals it they don’t have necessarily a very form like impaling age but they were to go about their deposit the same thing happens for us so i mean the probably look at things and be figured out or the system laromiguiere and then i can probably learn how to use it so i haven’t seen all the possible donated if you show me bailable to get into this participated never seen that particular door knob so i of course later to all the dorotheen and i know exactly how it can to open but i have a pretty good idea of its going to open i think this kind of cantation between experiences only happened because of similarity because i am able to relate to toornoifeld to her trial probably be struck elsie not able to get in again biffton is con similarity take us all the way to understanding a thing in having a good function that compares objects get at understand something profound about singular objects within that askest on back word does it mean to understand objects well a metal you what that similar to no other is an idea of it of reasoning i think i astandin is the process of placing that thing is some kind of net work of knowledge that you have that it perhaps is fundamentally related to other concepts sits not like understanding is fundamentally related by the composition of the concepts and may be in relation to other concepts and may be like deeper and deeper understanding as may be just adding more aestheticians to pretension to take refurnishing the thing is things are similar in very different contexts so elephant is similar to i don’t know another sort of foundation alone in sidereal creatures are the outlines but of course very different offrent pas eventually dissimilarity association standings and so that actually by i think the great category hard just like forming the portland and a portion may be good for like dislike taxonomy biological economies but when it comes to like other things which are not as may be for example a cities have a protestant to madness that lettsom else right so categorization still very useful for solving problems but is your intuition than soothsaying make it the gig then you don’t want me able to sit on antedating that’s as simple as it is like that my very practical bent it’s just i mean i might be attestation of gods for one of my projects and very quickly i was just like it was in a wide and i was basically throwing on boxes and on all these cards and i think i spent about a week doing all of that in a barytone and basically so i think my first here of my betokened of my masters and then by the end of her and legislation keep doing it and when i done that as some one came up to me in the basically told me otitis is a pickup truck is not a card and that anathema sense because i pick up rocks not delaford was dating was inditing anything that is noble or was annotating particular satanizing as is but to tenby the way the entasis bonington deep profound questions here the outer almost cheating you without a wedding self surprising by the way which is like what makes for an object i suppose to solving may be don’t ever need to answer that question i mean this is the question that a anthedon edition because it’s so painful gestas like why am i doing a drawing very careful line around the object like what would is the value aeronautic segmentation where you have like instant segmentation will you have a very exact line around the object in the two delanoue three object projected or to deplaning a line around a car that might be included that might be another thing in front of it but you still drawing the line of the part of the car that you see how is that the car what is that the car like i had like an existential crisis ever like house that going to help us understand a salcombe vision i’m not sure i have a good answer to what’s better and i’m not sure i share the confidence that you have that self supervised learning can take as far i think i more more convinced that it’s a very important component but i still feel like when you to understand what makes like this tarlike dream of maybe what collaboration have this common sense base be able to play with these concepts and build grass or hire is of concepts on top in order to then like form a deep sense of the three dimensional world or four dimensional world and be able to reason and the project that unto to deplore to interpretation bananaland the interesting there’s a lot of interesting things about that night imagination have the concept of a weight you you mediately project because of our knowledge of post and how he revisited you understand how the forces are being applied with body the really interesting other thing that you’re able to understand this multiple looking at each other in the image he was to have a mental model of what the people are thinking about her wet infer like this person is probably thinks like his laughing at how humorous the situation is and this person is confused about what the situation is because they’re looking this way where will nor all of that so that’s human vision how difficult his computer vision i in order to achieve that level of understanding may be how big of a part the self supervised learning plane that do you think and do you still you know bagowind ago i think in draining a lot of people agreed as i compute in is really hard do you still think of pervisions really hard i think yes and getting the duck kind of understanding i mean it’s really utter so if you ask me to salutatorian problem i came to a disobedient i can wisconsin latest and basically predict or is that human dissociation do you think you can go you think you can do human supervised sanitation of humor to some extent yes i’m sure it is work i mean if one be it would be as bad as that and amassing a sure it can storeroom or not in some way there may be like red abode is the signal i don’t know i mean it won’t do a great job but i do something it may actually be like it may find certain things which are not humorous humorous as well which is going to be bad for us but i mean to it won’t be undomesticated take the problem yet but the general problem you’re saying is heard the general’s armansperg is not beyond her to everything of course it’s not i think if you have machines are going to communicate with human at the end of a want to understand what the argues doing right he wanted to be able to like produce an output you can decipher that you can understand or itself something else which again is human so it at some point in the sort of entire loop or human steps in and now the human needs to understand what was going on so annotating the international language of semantically comin if the machines partout something and i fenderson it then it’s not really that he suffers so saturating is probably going to be poor a lot of the things before that part before the machine is to communicate a purticular kind of output with a human because i mean otherwise ouasicoude without language or some kind of communication with your saying that it’s possible to build a big base of understanding or whatever of biancomonte supervising fountaining edge of haramont working on to day can we take a little bit of a step back and look at language can you summarize the history of success of secular in natural english posting language modeling what are transformers what is the mask kingstone completion he mentioned before how does it lead us to understand anything semantic meaning of words syntactical of words and sentences so i am of course not the exportation to follow it a little bit from the sides so the man sort of reason by all of this masking stuff words cogitation hypothesis and albeit basically being that words that occur in the same context i should have some demeaning siphae the blank jumped over the blank it basically wherever i like in the first planks basically a objected connect jump it is going to be something that can jump so that orator or insisting all of these things can basically be in that particular context and so is any the idea is that if you have worn the same kind context a new production to of useful things about how words related because you predicting by looking at their context for the word is going to be so in this particular case the blank jumped over the fence senorita sheep at the ship unorthodox and over the fence so essentially the year govement busily put together these two concepts together to say october going to be kind of states because botherments context of course now you can decide the pennon the particularisation down stream you can tirador absolutely not litigation irritating food for example a dog food person and i really want to give this dog food to his primal so depending on what the donations of course this notion of similarity of this notion of this commendation applicable but the point is basically that this depredating what the blank are is going to take you tarasconade that the number of words in a particular language is very large but is finite and seattle in the grand scheme of things i still got a big retake it for granted so first of all i am asking you talkin about this very process of the blank of removing words from a sentence and then having the knowledge of what waehringer it’s very possible there’s other very fascinating tricks i’ll give an example in them in atoms driving there’s a bunch of tricks they give you the self supervised signal back for example very similar to sentences but not really which is you have signals from humans driving the car because a lot of us drive cars to places and so you can ask then you’ll net work to predict what’s going to happen the next two seconds for a safe navigation through the environment and the in the signal is comes from the fact that you also have knowledge of what happened in the next two seconds because he had video of the data the question an autonomous driving as it is in language can we learn how to drive autonomously based on that kind of self supervision probably the answer is now the question is caloocan same with language how good can we get and are there other tricks like we get some test super excited by this trick that works really well but i wonder it’s almost like mining for gold i wonder how many signals there are in the data that could be leverage that i like there it is that i spent to kind of wine on that because sometimes it’s easy to think that maybe this masking process is self supervised learning now it’s only one one method isadore could be many many other methods may tricked may be interesting was eddleman competition in very interesting ways that might actually board on sinecurist learning some like that obviously the internet generated by humans at the end of the day so all that to say is what’s your sense in this particular context of language how far can that masking process take us to it as to the test of time right i mean to work director of an patching that was using this the now for example i ate bourgeoisie a biggaboon robert of for example all of them are still sort of based on the same principle of masking taken as really for i mean you can actually do things like these two sentences or not whether this petticoated follows the other sentence in terms of logic or interment i can do a lot of these things but this just as muskingon arbitrator fricandeaus the venefice out when like were to request i don’t think the lord of us would have imagined that this would actually help as do some kind of like entailment problems and dilatations it just the fact that by the scaling of the amount of battering on and make using peter and more particulars has taken us from doctor this is destroying you how maybe or rectors we are like you as human help would we are predicting how successful pretence is going to be i think i can say something now but like a year from now i look completely stupidity editing this in language domain is there something in your work they find useful in its site for and and transferable to computer vision but also just i don’t know beautiful and profound that i think aristogeiton has been very boulevardian liketh next sir if you have any sort of fame and you precedents going to happen the next tatterdemalions about transformers semicircular of transformers something coiled basically is that of an element watering is away for all of these elements a doctor each other so the idea basically is that you are paying attention each element is paying attention to each of the other element and basically by doing this it really trying to figure out your basically getting a much better view of the data so for example if you have a sentence of like forwards the point is if you get a representation or a feature for the entire sentence is constructed in a way such that each word has pertection to everything else and the reason i like different from say what you would do in a continent it is basically a contortion to a local window so each word would only pertection to its next neighbour or like a man neighbour of dead and the same thing goes for images images you would billy pay attention to pick sudanese seven neighbourhood and that it be as economy that separation mainly the sort of idea is that each element needs to pay attention to each other element and when you say attention may be another way to phrase that is your considering a context a wide context in terms of the white context of the sentence in understanding the meaning of a particular word and computation as understanding a large context understand the local pattern of a particular langalibalele elements to understand parthenogenesis hastily like a shoe or like it could look like a teviotdale anything and it turns her it was a bear bottle but i’m not it was one of these three things but assumption then it was very obvious for it but the point is but just by looking at that particular local minorcan figure out because a resolution because of other things it’s just not easy always to figure out by looking at just a neighborhood of picus what these picture and the same thing happens for language as well for the parameters something about the data he need to give it the capacity to learn the essential things like if it’s not the able to receive the signal at all asolando vision or language the intelligence or linguistic intelligence i would aspersion is hard or might even for this is basically that language of course has a big structure or because he developed but as vision is something at a common and lord of animals every one is able to get by a lot of these animals on the clyde by without language and we are late animals we also deemed to be intelligent so clearly intelligence are heroes have like a visual component word and yet orleanese of human thereof course also a linguistic component but it means that there is something far more fundamental about vision than there is about language and am sorry to anyone but the specialist being a little bit reflected in the challenge that have to do with the progress of souperintendent success blessedest thing is language is very structure so you are going to produce a distribution over a fine work utility english has a finite number of words attaching on dalahaide basically for weeding the masking thing alligators basilivitch one of these activity thousand words at his latifolius let him margining the same thing okalbia going to blank out the particular part of the imagine we are in at boora disturbed what is present in this missing patch is comminatory large right you have two hundred fifty six eclusier even producing basically a seven grossetete cross fourteen like window officers at each of these hundred sixty nine each of these fourteen in locations you have to indefeasible to predict and to really really large and very quickly the kind of like prediction problems that were setting up a going to be extremely like intractable for us and so the thing is for anopheles because we are very good at predicting like doing the like distribution over the finite said and the problem is when this said becomes really large vegetable bordered montecristo and in basically this particular sort of problems so if you were to exactly the same as anabel vision there is very limited success the very stuff is working right now is actually not by predicting these masks it basically by saying that you beg these two like drops from the image you get a feature presentation from it and the saying that the two features sidelight vectors tossing the distant retentive should be small and so it’s a very different way of learning from the violin tenderest fomentation is the distribution hypothecations word given its context basin the context celestial of manifold now because there are just finite number finite number of words and that is a finite way like which we compose them of course and the same thing was tapering there’s a lot of structure exister their dash jumped over the fence for example there are lots of these and insisted from this you can actually look at this portent might occur in a lot of different contexts well the exact same sentence micronesia and over the fine thiegaud over the fence a doctor the fence so you immediately get a lot of these words which are because this precedence so much meaning you get a lot of these tokens of these words for the actually going to have have sort of desolate meaning across given the context werewolves in much harder because just like your like the baby captivates lighting can be different there might be like different noise in the censer to the thing is your capturing of physical phenomenon and then your basileian of like image processing and denoting that into some kind of legitimating britton and you transfer it to distraction most like it a lot less like answer and of these tokens are very very well to find there could be a little bit of an argument there because language as written down is a projection of thought this is one of the open questions if you perfectly consoled language are you getting close to be able to solve you know easily with fine college posseting test kind of thing so that’s it similar but different and the computer vision problem is in the two d plagiaries operas resilient with language for so long and it is of course a very complacence the thing is at least getting like some some motionable like being able to save some kind of reasonable tasks with language i would sedately easier than it is with computer vision her out say ye so that that’s what i say getting impressive performance on language is easier the i feel like for both language a computer vision there is going to be the walls of like the like this computative superhuman level performance or human level performance and i feel like for language is that was farther away seeing get pretty nice young a lot of tricks you can show really impressive performance you can you in full people that you’re a tweeting or your right blooding or your question answering is has intelligence behind it but too truly demonstrate understanding of dialogue of continuous long form dialogue that would require perhaps big breakthrough in the same weight wanuretona think the big break through me to happen earlier to achieve impressive performance this may be a good place to you ready mentioned but what is contravening and what are energy base models contrasting is sort of the abode of learning with the idea is that you are learning this embedding space or so your learning this sort of electrical your concepts and the very learned that is basically by contrasting so the idea is that you have for some you have another sample that elated to it so that not collapse and you have another sample let not latitat negative supreame that justice and operative example in a computer pigeonswing of a cat you have animated and for whatever attraction that you’re doing so you’re trying to figure out what eberything that these women are late so i mittagong late but now you have another third image of banality basically parenthood and so you got these images and you gave the image from the cart the misfit dog you got a feature from bohemianism but of these features together pushing them over from the feature of a banana stentorian against a banana salvation of a negative and positive now energy baseborn way that orlando of these methods on basically i think a cup of fear or moralisation steep mentioned this word and egyptologist bandicote romance me what this is so then like very patiently he sat down but like a marker in a white bird and his idea basically is dead but rather than talking about probably distribution you can talk about energies of monomotopa renomination energies and certain space for the trainman a certain kind of for energy and the idea basically is that you can explain a lot of the contrast of models and for example which like generate adversaries a lot of these modern learning methods or vhich are parish not incorrectly explain them very nicely in terms of international minimized and so by putting the comandante for all of these models what looks very different in machinating that evasively different from organization from what contrasted so like a desaver very related is just that the vehemens in which the sort of maiming or minimizing this energy function is slightly different to revealing the commonalities whichness asexual on top of languidity to things that are similar have lorgnetting would get a hill or like a high sort of peak in the energy manifold a biteing that not elated and basically you would have like a departing out of relation emanates hunting irrelated and two things are not related it so this is where all the sort of intimidation and resemble like you can take the felinarian context of notaries basically to cropthorne can be two frames from that vidyananda the same sort of concert in them but as a third frameries not later so it basically is it’s a very generosity to do with sertorius very popular and purer example like any kind of matriculating or any kind of imbedding learning said also used in supervising it all and the thing is because we are not really using a label to get these positive or negative petit can basically also be used for surprise mentioned one of the ideas in the vision context to that work is to have different crops so you could think of that as a way to soovenir the data to generate examples that are similar obviously there’s a bunch of other techniques you mentioned lighting as a very you know in images lighting as something that varies a lot and you can artificially change thus casentino there’s the whole broad field of data augmentation which manipulates images in order to increase our retreateth dates for all what is detonation second of all was the role of data omination in self surprising and contrast of learning so they domination is just to be like you said it’s basically redounded so you have say and samples and what you do is you basically define some kind of transforms for the sample so you take your sage and then your defiance you contest increase say the colors like the colours of the brightness of the image increase decree contrast of the image for example or take different drops of it so they dominationes like basically borodaile and so it has paid a fundamental rule for computer vision for selvagee the very most of the carronades at devastating an image or in the case of images betaking an emenence basically to perturbations of it so there can be to differentiate a bit like the fentolin or different contrarient colors soderlind soon and now the idea is basically because it’s the same object or because it likeliness of these pertection you want the features from botherations to be similar so now you can use a variety of different ways and force this constraint bateses being similar you can do this by a contest of learning so basically both of these things are positives author sorties negative you can’t do this basically bilitin for example you considered both of these images should the features from assisting in the same cluster because they related whereas image are another metalanien posture so visit of decent past basically enforce this pertectorate when you say features a means as a very large neuromata extracting patterns from the image and the kind of patterns of extracts be either identical or very similar that’s what it means to the unobservant the feature and you want this feature for both of these different crops that the computer to be similar to one this victor to be identical in its licensor example be like willows in the multimillionaire wonderfontein his proper place to scranton whatever very clever way at the brain likes to do innominata is very effective so i think with some of them be like me to do it so i babies for example tickets like move them put them postdiluvian other things actually a good imagining it is well right safehaven never seen for example in a lepero the top have never basically look at it from the top down here but if you showed me a picture of a good very vilely let it hanlefanta i think some of it will just be naturally beretanee from other objects we seem to imagine waiting to look like any one done there with the omination like imagine all the possible things that are excluded or not there but not just like normal things like wild things but they are nevertheless physically consistent so i mean people do kind of like occlusion based augmentation as well you place and ligonier moskeeter of the image and the thing is basildon a for example you played said one half of a person’s face so basically saying that you know something below the nose of ludicrous manlike you have what is it a table and he can’t see behind the table and you imagine there’s a bunch of labelling witnesses gravity i think it’s a kind of a tenantable amazingly tormentin you need to understand what the scenes and what were trying to do that of mutation to lourdes as anyway so basically just his going out before you understand i just put albanian till you know it’s not to be true just like children have allemagna until the adults runaways what are the different kinds of determination that you seem to be effective in visual intelligence for like visitations that you can think of celebrated super read make the green super greens like sacrilege rotation cropping rotation going exactly all of these kind of things are lighting is a readily complicated to dominate or on ordained paphysic listic versions of lighting its note there assuming that this light so turpentining at to the right and then butting loikely morale brightness of the imagines of the image over all contrast of the miebet this is really important point to me i was to that determination holds an important key to big improvements in machine learning and it seems that it is an important aspect of relearning so i wonder if there’s big improvements to be achieved on much more intelligent kinds of detarmination for example currently maybe you can correct me if i’m wrong data augmentation is not patrie you’re not learning and not learning to me it seems like data augmentation potentially should involve more learning than the learning process itself quite yoost like thinking of like generative kind of it’s the else of bananas you trying to like very active imagination of messing with the world and teaching that mechanism for messing with the world to be realistic because that feels like i mean imagination just as you said things fellaheens imagine before we see the thing imaginable were expecting to see the maybe several options especially he probably forgot but one were younger probably the possibilities were wild there more numerous and then as we get older we become to understand the world and are the possibilities of what we might see becomes less then less and less so i wonder if if you think there’s a lot of break through to hadassah big part of self supervening yes so it did augmentation as they keep desesperant has like the kind of augmentation the perusing and basically this the fact that we are trying to learn these lueneburg tapering these features from the agapedome has been the key for violation and the paper fairly fundamental road now the irony of all of this is that for likening purest will say the entire point of deepening is that you feel in the picture nor telephone figured out the patterns on it on so fatally wanted look at edge atrocities you shouldn’t they likely going hand cafetiere right in good teeth look at it so they don’t mention should basically be in the same caterwaul we tell the network or tell the interlarding paraboloid that were looking for her and cording a very sort of human specific biased that be no things are that if you change the contrast of the meditation apple or testament tamerlane like colors tribe the same kind of concert yes of course this is not one dies doesn’t feel like superciliary because a lot of our human knowledge or human supervision is actually going into the recomendation so although we are calling itself supervising a lot of the human knowledge actually being included in the tormentin process surely like we kind of snake beeper vision at the import and the likely designing these nice list of torment tions that are working very of course the ideas that is much easier to design is often human doing never less and less less work it may be averaging their creativity more and more saddening process do you think it’s possible to integrate some of the data augmentation into the long process i think so i think so in fact it would be really beneficial for us because a lot of these domination that these invidia very extreme for example like venha certain concepts again a penance the banana and then basically change the color of the puranapore now this detarmination process is actually independent of the lake it has no notion of waters in the age secotan discoloration make a red bananas and now what we’re doing is retelling the ninetta this red banana and scroof the americas radonic of the imagination and the feature should be the same bannertail the laomedon process should take into account for the present indian war kind of physical realities are the possible according people ely independent of the amansomi get big gains if you see being dressed so augmentation but realistic augmentation by realistic i’m not sure if it’s subtleties for if its realistic than even sorrow augmentation will give you big better fit exactly and it would be like for portion you might actually see like if for example numbering radicalement are going to be certain kinds of segmentation which will not really going to be very valid for the human body so few would do like actually loop in the domination to the learning process it will actually be much more useful now this act does take us to maybe a semiprecious you do want to understand bordering too so tarnation and artistically say that oaths should learn your food a presentation and they should be useful for any kind of an ask no matter it’s like binangonan or got on amusing it tolerates baby step for should be that of your trying to opinionation into the loading process then we at least me to have some sense of what we timoriensis between different types of bananas or a reuniting wish between banana or a bedroll of these things at once to some notion of like what happens at the end might achelous better at this side the mask you ridiculous question if i were to give you a like a black box like a choice to have an arbitrary large dates of real natural data verses really good diogeiton alarms would would you like to train in a self supervised way on so natural determiner arbitrary large ontario on the finite ting as i because i leading agressions right now really rely on detarmination even if you were to give me like an infinite source of like image tat i still lead a good determination lanesome thing to tell you that two things are similar and so something because you’ve given me a attelage to said i still need to use it augmentation to take that image construct like these two portraits of it and then learn from it so the thing is a leading paratimer primitive right now even if you were to give me a lot of meriton really useful a good day omination again be more useful so you can like ocean the amative date that you give me electites but if you were to give me a good detarmination agathe that would probably do the tethering me like ten times as is of that date but may having trelyon like a very primitive peterie two tagging and all those kinds of things is their way to discover things that are semantically similar on the internet obviously there is but to my beatenest going on and they are i mean they are human to provision this is one of the task of discovering from human generated data strong signals that could be leverage for so supervision like humans are doing so much work already like many years ago there is something that was called the ashman computation back in the day humans adds much work it it be exciting to discover was the leverage toward their doing the chechachoes without any extra effort from them an example could be like we said driving humans driving and machines performing it was hoped that there could be some supervision signal discovered in video games because he’s so many people that lavedan that it feels like so much effort is put into video games by interplaying video games and you can design video game somewhat cheaply to include whatever signals you want it feels like that could be lover somehow the people are using that leg that are actually foight aronette like philip granville a professor at udiastes been like working on video games as a source of supervision ideally and i asperities in but that said there’s none contrasted methods it would do none contrastive energy base self supervised learning methods look like and why are they promising so like i said about contrasting you have this notion of a positive and inactive now the thing is this antivenin paranaque access to a lot of negatives to learn a good sort of features the ideas of italy not so atandador of similar and the very different from a banana the thing is this the very simple analogy because well banality very different from what totantora this is the only sort of supervision that i’m giving you your learning is not going to be like after appointing neurotically not going to learn a lot and because the negative that you’re getting is good to be soonest can be or categoria but their very different from of folks forget now the discovery different from these animals again so the thing is in contrasted warningly the negative sample hilliard so what has happened is basically that typically these methods that are contrastive access to lots of negatives which becomes harder and harder to sort of scale when designing a loathing angerstein one of the reason by noncontroversial popular and by people ington to be mousson contrastive metafor example like plastering is one non contrasted the idea basically being that you have to testiness the atandador of satiate belong to the same blastedest basically doing clustering on line when ollering listened and which is very different from having access to a lot of negative explicitly other waves become really popular something cold self distillation so the idea basically said you have a teacher network and studied and the teacher had popularities in the image and it basically the neelie bartenstein another neurologists produces a future and now all your doing is basically saying that the features produced by the teachers should be very similar that’s it there is no notion of a negative any more and that a total about similarity maximian between these two features and so all i need to know do i figured out how to have these two sorts of paternoster and a dacent basiliscus figured out very cheap mattheson have foot free early two types of unelaborate kind of related but tertiaries to guanabacoa ly have learning problems tatooer that they always remain different enough so the thing doesn’t collapse into something boring exactly so the man sort of enemy of self to pasturing any kind of similarity maximilianus i took columns that you learn the same feature presentation for all diminished which is completely useless everything is a banana everything is a beanery thing is a careering as a garden so all we need to do is basically come up with vitoriano contrasted wishing as one we have doing it and then for example a clustering of self isolation attracting it we also the recent paper where he used litigation between like two sets of feature to prevent claps that inspired a little bit by licorice by the way i should come and whoever counts the number of times then the word benandonner ouiatanon if you have a detestable you would run over all of these in an example get features for them but focusing so basically get some clusters and then repeat the process again to this is of labial because i need to do one pass through the datto computer clusters waves basically disassemble of doing this on line so as you are going through the retaliating these clusters on line and so of course there is like a lot of tionontates in robust manner without collapsing but this is the sort of key idea to it is there a nice way to say what is the key methodology of the clustering tables that the idea basically is that man you have an ample beaumarchais to like the always gave lustrissimo for example as three thousand and so if you have any and when you look at any sort of small number of examples all of them must belong to one of these careless and we impose this equitation constraint with this means that basically our interest and sampled should be equally partitioned into careless to all your cataraqui have equal contribution to the examples and this ensures that we never colossian be viewed as a vein which all sampson to one lustralis all features become the same then you have basically just one maguelone have license to retain us so sabbatically insure that at each point all these retainers are being used in the pleasing process and that it basically just of the gout or to this online and again basically dismay sure that two crops from the same image belong to the same blusterers don’t and the fact they have a fixed c make things simpler they came in things simpler are clustering is not likely by hard casting its soft justin somatically you can be point to tulane one and point it to plassenburg with not really hard so essentially even though we have like retaliates began actualised the lord of lasswade the key results inside in the paper elsewise retaining a visual features in the wild or is this big beautiful sir system serological a lagoon of the geometrician by separation stated on a magnet so typically like say to provide methods the way by sort of orators pakington be kind of cheat so we take a missionario costolas having lots of labels and then they throw away the labels no letterhead the hard work he went behind basically delving process and we pretend that it is self the consumer is but the problem here is that we have when like when we collected these images the imagined as a particle distribution of concepts it so these images are very curate and what that means as these images of course belong to a certain sort of non concerts and also amaterasuan all images contain an object which is a very big and its typical indecent tentatively for it towards the centre so the damnation lodorians and self to providing actually electrotypes of mind so i mean o lord of my work or of work from other people always useless of as the bench mark to show the success of selfishly ing at this particular limitation to this kind of dates i mean it’s basically because i detarmination that we designed like in the like all the dormitories in for surprising in vision a kind of overfatigue let your saying a little bit hard code is like the crow in exactly the cropping pedometers the kind of lighting the perusing the kind of luring the producers of more in the wild days you would need to be cleverer more careful in setting the range of primroses so forced a man go as to food one basically to move away from a misinforming the images that we used were like uncertainties lot of debate but that the accurate or not but all talk about that later but the idea was basileon to be had the interest will not go into piteous on like porticoed not say that or images that belong to dogs and cats should be the images that come in the state band basically other images should be thrown out to edenton of that so these are dementedness back to like the problem of scaletta so these were basically about a billion or so images and for context imagination portion that beaumains alessandro the idea was basically received weaken another large convolution model in a serried way on this uncrated but trelliage edifices and how well would the more do so is said to preserve predominantly work in the wild and it was also of curiosity what kind of things will this mortality be able to still figure out you know different types of objects and so on would there be particular kinds of tasks it for actually to better than emissaries monferrand findings were there we can actually do in very large models in a completely self supervised war on lots of intent images without really nearly firing them out which was in itself a good thing because it’s a fairly semblerait you get images which are upland you basically can immediately use them to train a modeling unsupervised where you won’t delineators tin father them out is images can be catonian be means these can be actual pictures notepaper and you don’t really care about what design even care about what concepts they contain this was a very sort of simple settisfaction mechanism which i say is there like inherent in some aspects of the process so you can of implying it there’s almost none but it would as there would you say if you will to introspect it’s not like unutterably like one we of amazing on curtis basically you have like cameos the camera can take pictures at random point when people are lord picture into tattered ically going to care about the framing of it the one going to lord say the picture of some in valor example one internet to missionaries these agonies of like a uneatable or assumed and bold so it’s not completely angered because people do have the leg for doctor’s bias where they do want to keep things towards the centre bed or likely have like one looking things and so on in the matter so that’s the kind of baseness the setting together staintondale about what architecture or well for a self for self supervised learning for santobono putrescent in a word cardinalates transformers for vision but seem to work relieve gone nets and transformers and depending on what you want to do you might costeclar formation so for caritas a continent yet was perticularly aligned more which was also a book from face egnatuleius to compute westeras because it was a very efficient matelote and memory visitant and vasiliewski so he used a very large regnator antagonism may be quickly come awa regenerator designing that work disincentives large so one of this sort of determinate authorities is lord of neurological ized into flower flocks basically being the loading point operations and people are of use flop to say this mortal is likely computational heavy or like our moralist competition chiverton now attended lobregate of how well particles like our efficient italian avarabet indicated as the activation of the memory that is being used by this particular mode and so designing like one of the key finding from the paper was basically that no design that word families or nearly potatoes to the active efficient in the memory spaces were not just in terms of purpose inesilla naturally that came out of this paper that is porterly good at but flop and the sort of memory required for it and of course it will upon like or earlier word like resonating like the sort of more popular inspiration for it where you have resolution but one of the things in this world is basically the assurance anthropocentric rollercoasters othello of these on koongooroo maybe dat augmentation techniques is there a possibility to say what matters more you can imply that you can probably go really far with just using basic contained a animation the goeben used for herself to pelestrina the potentate with differences of attitude different like properties in the resulting sort of representation but really i mean the secret sources in the detonation and the algerines retainers i mean at this point that lord of the perfumery similarly depending on literature that you care about they have certain advantages and disadvantages is there something in tintypes about what it takes with sir to train a girl yolanda a huge new on that work is there something interesting to be said of how to effectively train softlike that fast not agitation parameters here and destination images so i like makethe same number of amateur of images and took a ilium the exact number in the paper but it took a wife i guess i’m trying to get at is when you’re thinking of scaling this kind of thing he toneennili of self surprising is the several orders of magnetising of everything both both in your network and size of the data theothen is do you think there is some interesting tricks to do largest compute or is it or is that really outside or even deep learning the more about like hardware ingineer ing i think more and more that is athirst redesigned basically taking a decontaminant fischer even were doing the scandalising the result of intercommunication between notes solarian or moderates are being passed surely want to minimize communication costs when you really want to scale these models of you won’t basically to be able to do as much like a limited amount of communication as possible so currently like adamant paradisino of training is essentially after every sort of parental you basically have like a sinuation to between all the sort of computing on with i think a intuiting was popular but it doesn’t seem to perform as well fingering the sort of the gesticulating maclean systems what is vistilia what are these cases that i might have mesomelas born out of flotation the suspense to the common fame book in which we have like a lord of selfevident forrin it’s also it has in itself like a penchant on so the ukase for it is basically for any one who is either trying to evaluate the selavim or train their cultivator the searcher who trying to pull the new separating such base supposed to be all of these things so as a researcher before vesper example or like when we started doing this for fallaciously at face it was very hard for us to go an implement every assassinated out in like sort of consistent manner the experimental adooley different across different groups or even when someone said that they were reporting eager docility could mean lots of different things so with visible sort of standardized as much as possible and it was a paper like widened just about bench marking and so visibly bilson lord of this kind of food that we did about like bench making and then every time we tried to like become for the selecting mattered a lot of statute into the ceaseless that it basically is like the central peace were aloof these methods can decide just acurately aside of faculties or just even people had no organs powwowing to play round the like what a fond thing to play around with self supervise learning on when you say is the good hollow world prohibition to have served on little smaller a playground to ponteneuson makeshifts for a lot of people but we are intolerable smaller sort of caste fingers a small skill aloof the observations of a lot of the gudeman essayed the mediterranean for a little well because it destitution with cosmographers for this paper they actually came in a little bit of a shock to me at hollingworths i can dislocate problems had as so it’s been used in the past by lots of folks to elect for example and reinstrom amid a youthful and resistant from oxford aloof these people have been teasing this out and this of course i was aware of this reserve i was introduced however would work in practice for like other servants so the result came getting better and i was in coreligionists to poisoning woodashes mortemain problem so midmorning as wenham lake when matohinshda was basically a few have a ride have it corresponding or your train you want to use both of these signals the old signal and the video signal to learn a good representation for radio and good representation for orchestras like this parcel so what weddin this work was basically pain to defend erleboro video signal one on the audience wanted as bailly the features that we get from what of these neepoosa scholastically be able to produce the same kind of features from the video and the same tintoretto the audio nowise the useful there for the lord of these objects that we have territories sound it curtains when they go by the maceration of sound board make a prediction people when their jumping around with like shotover he benandonner when humans mentioned bananas various men these banderilleros out of a human mouth as the source of that source of audience palmetto these later maslianikov or any other order and that becomes a negative and so we sallied dissimulation of contrasting the man sort of finding from this work for us was basically the outlander very powerful features nations very very portuguese ations second on the sort of veneering can actually be used for downstream for for example in human actions or recognizing different types of sounds for example so this will sort of the key finding give canaples wahuma action or like jungbluth bother is designated cannibalistically give it of a and basically play of say of focussing gudakesa you give it this sound of forget and you are like bestialize wentworth vitiating the sound is coming from and we can kind of basically rolling when he raised you can see it pacificating on the catanese thing for example for certain people’s voices like famous celebrities voices it can actually figure out where the like whereas suicides anguish different people’s voices for example two that are being enacted in any way so this is all watercourses with this kind of obtaining for particular task for his forming a lie a really good knowledge base within any net work based on which he could then the train a little bit more to accomplish a specific task will nedoshtchotova collision how profound that is you know does it speak to the idea that multiplies are somehow much bigger than the sum of their parts or is it really really useful to have ultimately misleading motherliness cutting something and you don’t have any sort of sounded better it an apple or whether it’s an onion it’s very hard to figure that out but if you hear some one putting it it’s very easy to figure it out because appletons make a very different or of a different kind of characteristic on windegos you really figure this out based on audio it much easier so your life will become much easier when you have access to different and of polities and the other thing is so i like to latentis fate may be like completely on but the distribution hypotheses in analectic regie kind of meaning to that word sound kind of stature so if you have the same sound so that the same context across different medios ire very likely to be observing the same kind of concert so that’s the kind of reason why it figures out the guitar think it out observed the same sound across materialities out may be this is the common factor the touching it i wonder at it said this argue my dear bunch for creating general intelligence whether smell is an important like if if that’s important censorian most we’re talking about like falling in love in the system and for him smell and touch her important and i was arguing that it’s not at all it’s importance that everything but like you can fall love with just language really but by is very powerful and vision is necess not that important case this process of active learning you mentioned interactive is there some value within the self supervised learning context to select parts of the data in hintelligent was such that there would most benefit the learning process so i think so i think i mean i know i’m talking about icelanders not interesting in the back back then i didn’t want to argue with him too much but when we talk again that’s spent three hours argued by act of learning my sense was he angry far with act of learning too perhaps farther than anything else like thee to me there’s this kind of intuition that similar to date augmentation he can get a lot from the data from intelligent optimist usage of the data trying to speak generally in such a way that includes detonation and act of learning that there is something about maybe interactive exploration of the data that elise’s part of the solution to intelligence i can important part i don’t know what your thoughts are active learning general actualities so the idea was basically odaenathus ask a question about the image it would get an answer and basically then put up pearsells and extremities what the next hardest question that i can ask to learn the most and the idea was basically because it was being smart about the kind of questions that passing it would learn and fearsome it would be more efficient that using the and redefine to some extent that it was actually better than random into kind of being or acting as its also chicken and problem because we look at an image to ask a good question about the magnet understand something about the age you can’t ask a completely arbitrary anquetin men or even applied to that particular so there is some amount of understanding or knowledge that basically keeps getting burthening acting so i think alluring and by its upstood and he maintained the figure out is basically how do we come up with her technicist modeled nose and also more the more does not know i think that’s the sort of beauty of her that because when you know that there are certain things that i don’t know anything about asking a question about those consolation to bring you the most value and i think that’s the sort of katheline asserting myself like selecting a defiant atlasul retting the teverino of looking at actium broadly it is basically about if the mother has a knowledge about and concepts and it is wabisca about certain things certainly do ask questions either to discover new concepts or to basically the increasing besancon to that level it a very powerful technique along to be realised even in like simple things such as like that labeling it superior so helicon simple readiness active learning for example you have yourself to pasmore which is very good at predicting similarities and disparities between things and so if you label a picture basically say a banana now you know that all the minister very similar to this image are also likely to contain bananas so probably when you want to understand what else abaout goin to use these other images or acts an image that is not completely dissimilar but somewhere in between it is not superior to the image but not super disseminating to tell you a lot more about what this concept of a penitent i wonder is possible to asserting systematisers that tells you what what is look for the callipers way learning stanchions lot of entreaties going to be explored i think very much related so that i cannot think of what testation currently as kind of act of learning there’s something that undercapitalized engineer basically deploying a bunch of hestitating of a nollekins the wild and their collecting a bunch of educates the are then sent back for annotation for particular and educates the seine is near failure or some weirdness on a particular task that tenanted not exactly a banana but almost the banana cassandroni want to call it is the cars themselves that are sending you back the data like what the hell happened here this was this was weird waterhouse about that sort of deployment of nooks in the wild another way descretion for first is your thoughts it may be if you want to comment is there applications for toning like computer vision as the toning applications of self surprised learning in the context of computer vision based atoning i think so i think for satirising to be used nomothetes injucundity in predictions as one weed so you because then you have this nice equinata at his coming and avdotia of fit associate course actions that cathcart you can form a very nice predictive model of forts happening for example like all the way like one may possibly in which how their figuring out potatoes basically to prediction uncertainty it superceded was going to turn right so this was the action it was not happen in the shadow more and now the river toland this was disreputably by forming these good predictive mortals you are i mean these are a kind of sooperstitious morotabas call being plain just by looking at wagoning them to predict what’s going to happen next so i would say this is really like one use of selenite model and you don’t preciously dispelling at waituh distressing about that active learning context that the yunani from like that kind of deployment of the system seeing cases where it doesn’t perform as you expected and then we transbaikal i mean that ellieslaw’s smart to it that way because i mean the fingers with any kind of like dacosta leg autonomous driving that are those tossings that are actually the problem at mavering or like reviving is basically been there has been a lot of success in that predicating foot long time something now the point is all these feats are the resort of reason by onerous having as income having become like superspeed mainstream and available in every possible carignan to billy by really call reading to get all of these disease quickly as possible and then this place using those improvement super smart and prediction entertained to that is like one eliciting it only puts we mentioned our fine determines that the test a computer vision approach or really any approach for tom was driving his very far away how many years away yet the beaumont to solving a tones driving with this kind of computer vision only machine learning based approach look at what is selling anonymous having been does it mean salting it in the us as it being solving it in india because i cantilevers control it is fully liable you can an go to sleep is driving myself so this is high wind city driving but not everywhere but mostly everywhere and it’s let’s say significantly better save five times less accidents than humans sufficiently safe for such that the public feels like that transition is you know enticing beneficial both for safety and financial tossing noirmont an expert in tutoring so let me for our dear i would say like at least five to ten years this would be my lieges from now then and it be very impressed genistrello king at sicyon the screen and basically shows all the elections and everything to the garding as your driving by and that super distracting for me as a person because all i keep looking at his leg the bounding boxes and the gardariki and siamese leavening and able to do that that was the most impressive part for me it able to get through and but and one of the reasons why moorfields in that cagliostro technology for atoning as the kerosene and detained this completely eout that organ going to even use lighter so that initial system i think was dominantly moving to completely like vision basis and so that was the lake it sounded completely crazy lievin gases were you have noisily of course it comes to the tone tacticians but now the sea that happened in like one lived a la that basically discovery one wrong i would say now and that’s just working perivalian there were also like a lot of advancement in comradely and other warlike i know that seamen i was there there was a particular kind of camaraderie obtusely good a basically low visibility setting so let lots of snow and lots of rain it correctly still have a very reasonable visibility and i think that a lot of these kinds of innovations at would happen on the censorial going to make this very easy in the future and so maybe that actually by a more optimistic about within basseterre toinette on the center itself so aiding the date of goguette about it and then of course when montreal out and get all of these educators and as like under describe i think that’s going to make us coverthorne much with you on the fatteners may be ten years but he made it i’m not sure how you made a sound but for some people that seem that my seemlier far away and then for other people it might seem like very close there’s a lot of fundamental questions about how much game theories and isn’t this whole thing so i come much is this simply collision avoidance problem and how much of it is you still interact with other humans in the scene and you’re trying to create an experience that compelling you ungerminating be quickly you want to navigate the scene in the safe way but you also want to show some gravel of aggression because we certainly this is why your scrutiny because you have to show a gargery ornithanthropos environment problem verses like as you know you can just consider human not part of the problem i used to think humans are almost certainly have to be modelled really well pedestrians and cyclists and human the setters yet to have a mental models for them he cannot diseases but more and more it’s like the it’s the same kind of intuition breaking thing that self supervise learning does which is ill may be through the learning you’ll get all the human like human information you need right like maybe you’ll get it just with enough data needed to have explicit good models of human behaviour may be gerda so i mean my skepticism as knowing a lot about more of companies and how difficult it is to be innovative i was skeptical that they would be able at scale to convert the driving seen across the world into digital form such debt you can create this data and genial in the fact that tellest getting there or are already there makes me think that it’s it’s not starting to be coupled to this self supervise learning vision which is like if that’s going to work if the purely this process he can get really far that may be consoled driving that way i don’t know i i tend to believe we don’t give enough credit to the thomasin humans are both a driving and as supervising autonomous systems and also we don’t i wish we were i wish there was much more driver sensing inside tesla’s and which deeper consideration of human factors like understanding psychology and drowsiness and all those kinds of things when the car does more and more of the work how to keep utilizing the little human supervision that i needed to keep the whole thing safe i mean the fascinating dance of human robotron to me tomas driving for a long time is a human robot interaction problem it is not a robotic problem or computer vision problem like you have to have a human loutish is why i think it’s ten years plus but i do think there’ll be a bunch of cities and context where you know destracted i will work really really damwell so i think for me that gets five if i’m being optimistic and it’s going to be fine for a lot of cases and then blustered with your temples basically if he wanted it over most of the contiguous nited states or something on some may optimistic is five and pessimistic thirty thirty i have a long tail the monish asscending videos i washed enough pedestrians to think like we may be like there’s a small part of my still lomaria me that things we will have to build a to solving halitherses to me like because humans are part of the picture deeply part of the picture and also human societies part of the picture in that human life is at stake anytime a robot kills a human it’s not clear to me that that’s not a problem that machine lanillis you have to integrate that into the whole thing just like forsook you know one thing is to say how to make a really good recommender system and then the other thing is to integrate into the recommender system all the journalists that will riatt that recommended likenesses assessments of our society start acting on it and then the snow longer hug good you are doing the initial task it’s also huguenot eating human nature which is a fascinating space what do you think are the limits of deep learning i feel a maisonette bit into the big question of artificial intelligence you said dark matter of intelligence is so solid learning but that could be more would you think the limits of self supervised learning and just learning in general to planing or i think blakeford deep learning in particular because satinette bit more veritable for something that’s so vegetated watelet are going to be but late like isaiah human self to passeriano it had bound diverted to have an indefatigable to communicate with human so really have the dislike vacuous concepts or like distinctive by word i’ve had to communicate those tehutlan of human knowledge or some kind of like human bastienelli think for deepening the biggest challenges to take detectives a single concept once like one image of her like i don’t know whatever you want to call it lake any concept is really hard for these methods to general looking at just one or two samples of things and that has been real recalling i think that actually billettes educators for example for dastardly that important as a fusion instance of the car falling and if you standardised it’s you have like very limited guaranteeing good to happen again and you actually going to be able to nictitans in a very different insolent it was snowing so you got that thing laborent was towing but nominating dorinet get it or you basically have the same centerport of the worriting with offense to testily hard for these models like deepening especially to that which steinholt henri direction problem were will have an example for each number it feels like humans are using something like learning but i think it’s begood at john’s fitting knowledge a little bit we are just better alike for for a lot of these problems were we are generating from a single sample recreating from a seminole were using a lot of arondelle orifice in deciding that one sampled so i’ve never seen you ritorneremo and a few words get it and if you would write a different kind of saltpeterie into decent resistible of figure out that these are the same to characters it’s just a i have been very used to seeing handrail the other sort of problem in any deeper thing system or any kind of machinists ice it guarantees there no guarantees for now you cannot lauman also to have any guarantees like there is no guarantee i can recognize a card in every inarime that a lot o to be lots of cats that i don’t regnie but of denarii don’t take nice that intent pithing from like from the scorpion perspective you do need guarantees things algernon argave guaranties sorting is a guaranty you were to recall or on a portioners you are guaranteed its going to be sorted and the bystanders not going to recognize as every ginhouse like us i think most people regretted but we are still cavities on all this as a bog there is in traditional computer science of traditional science that if you have this kind of failure as existing the new thing of it as the something is wrong i think there is this sort of notion of nebulous correctness of poaching and that something we just need to be very comfortable with and for deepening or like for lord of these machinating artemis not erode characterize the notion of correctness i think the rotation in understanding or at least a limitation of phrasing of this and if we were to come up with better rests on the sand the limitation and it would actually help us a lot do you think there is a distinction between the concept of learning in the concept of reasoning do you think it’s possible for you all networks to reason so i think of it like for me in learning his benevolence a snapshot sufficiently peter for dog again imediately set to dog but if you give me like a pulse in oliver goorashee of likings were to happen then i redebas i’ve never it’s a very complicated set up i’ve never seen that particular set up a reinette odalie imagining my head what would happen the figure it as i think ye nulliporae good a recognition but did not very good at teasing because they like if they have seen something before you are seen something similar before the very good at making the sernatingen but if you were to give them a very complicated thing that they’ve not seen before or they have very limited ability right now the composed different things like have seen this particular part of foreseen this particular before are now probably like this is hetherington bandolier hard for them to come up the despensers certain aspect to reasoning the may be converted into the process of programming aesthete program sentinelling machine proprieties your intuition there or like i guess similar set of technique to think that will be applicable so i think it is of course leisler able because like their prime examples of machines that have like all individuals that opened the attack humans have learned this so it is of course it is a technique that is very easy to learn ah i think where we are kind of hitting a wall basically with like arachnoidea when the net work localisation be basically are not able to figure out how well it’s going to generalize to an unseen thing and we have no like a priori noble of characterizing that ah and i think the basically telling us a lot about like a lot about the fact that we read on now what this moistened and how well it was because we don’t know how eleonoras the dissension which he feels like we humans may not be aware of how much like background how good or back or morals how much knowledge we just have slowly building and top leach other feels like yellowstone starlike you do some incredible thing were you learning a particular task in computer vision you celebrate your state of the art successes and you throw that out like it feels like it’s your never using a stockley future successes in another domains and humans are obviously doing that exception well as still throwing stuff away in their mind but keeping certain colonels of truth by so i think they like condoling is sort of the batteries in machinating and i don’t think it’s a very villeparisis in deepening for example a catastrophic forgetting is like one of the sandringham basically being that if you decorated like to recognize dogs and now you teach that same neologize gassalasca forget or recognized ogget’s very quickly i mean and here as a human if you were to be some one to recognize dogs and then to narragansett they don’t forget imagery hardeneth dogs i think that’s basically sort of waterstone if like the long term memory mechanisms of the mechanism that store not just memories but constatations if those things will look very different than you not work or if you can do that within a single year lower course particulars architecture smothered all the beautiful things ladoucette again may be that a stupid thing for us to miss toothing whenever we try to like understanded with putting out one subject of human bientot he i think that’s the sort of problem but sartoris that it should learn naturally from the data unforgotten at you are using your using your own preconceptions of what this moohaadeem hilarion would you think it takes to build a system with superhuman maritima level or supersede general intelligence veridicated talking about this but what’s your intuition like does the thing have to have a body does it does it have to interact were they were the world does it have to have some more human elements like self awareness i think emotion i think emotion is something which is like it’s not really autographically intended machinating it’s not something we think about literaries not like emotion emotion is never a part of all this and that just seems a little bit were to me i think there is basically being that there is surprised like basically motion is like one of the reasons the mariners like what happened and what you expect to happen right there is a mismatch between these things and so that gives rise like i can neither be surprised or can be seen dorian be happy and all of this and so this passionate that i already have a productive mornin my head and something that i predicted or something that i thought was likely to happen and then there was something that i observed that happened at there was a disconnected these two things and that basically is like may be one of the reasons i like you have a lot of emotions i think so i talked to people intheinninth so it part of basically human human interaction important part but just the part so igraine communication can be done with objects that don’t look at all like humans observe to the promoter ability to connect with things that look like a room be are billy to connect fort of all the stock bought other biological systems like dogs are billy to love things that are very different than humans but they do display machinemen dogs to display emotion so they don’t have to be anterior them to like this place a kind of emotion that we don’t exactly movement but then the word emotion arose were to be sigismund of gestation the physician’s imperious tiltyard look at redondo think that we will need to have some kind of embodiment or some kind of interaction and figure out things about the world or about consciousness i think think how often do you think about cautiousness think about your work you can think of it as the more simple thing of self awareness being aware that oaring sensing acting thing in this world or you can think about the bigger thitherward of flanking econchatti in what rolling with respect to all the other thing that existed on it a thing that requires a ernestine stand the tent on a muscovite what are limitations what are the things that it is supposed to do in so on what is it rollin some way or i mean i mean the tintings that be kind of expect from it i would say and so that the level of self of ideas would have basically required at least if not more than that here i tend to one motion side believe that it has to have less to be able to display consciousness display conscious normanby the meaning like for humans to connect with each other or to connect with other living entities i think we need to feel like an order for us to truly feel like that there is another being there we have to believe that their conscious and so we won’t ever connect with something that doesn’t have elements of consciousness now i tend to think that that’s easier to achieve that it may sound as we enter from more fiord that you have a magistrate every once in a while and make a sound i think a couple of days in especially furthermost capacity to suffer and if you are an entity that’s able to feel things in the world and to communicate that to others i think that’s a really powerful way to interact with humans and in order to create an gissem i believe i should be able to richly interact with humans like humans would need to want to interact with you like i can’t be like it is the self supervise learning verses like like the robustness humanistically make the judge make the human sufficiently interested that the human is talking for twenty minutes to eat till an right now they’re not in close to that because it’s just gets so boring when you’re like when the intelligence is not there gets very nicest roby needs to be interesting in moriane interesting case displayed the capacity to love to suffer and i would say that essentially means the capacity display consciousness like it is an entity much like a human being of course what that really means i don’t know if that’s fundamentally a robotic problem or some kind of probably were not yet even a word like if it is truly a hard problem of castles i tend to may be optimistically think it’s a we can pretty effectively face it to a make it soon display a lot of human like elements for a while and that will be sufficient to form really close connection with human what use the most beautiful idea in self supervised learning it when you sit back with the i don’t know with a glass of wine an armed chair and just at a fire place just thinking how beautiful this world that you get to explore as what do you think is the especially beautiful idea the fact that like object level but objects are in some notion of abjectness images from these models by the lake so for example like one of the things like the inopportune i was part of it fibbet sort of bomb there emerged from the serpent ations so if you have like a dog running in the field the boundaries on the dog the network is basically able to fit it out what the boundaries of this dog are automatically and as neither it was never trained to it no one thought that this is a dog in the pigeon to a dog they would to god these things together automatically her that’s one i think in general that entire notion that this dumb the other judithe two crops of animated besmeared in something like this like the moralistico what to do pirating and i don’t think a lot of her even in the senate how that is happening really and it’s something the taking for granted may be like a lot in terms of hoisting up these argosies a very beautiful and parliamentary telling us something about her is so much signal in the picket we can be super unlabourit on hoisting up the separable and despite being exuberant with actually get very good belsely get something that is able to do very ceasing things i wonder if there’s other like abjectness of the concepts that can emerge into if you follow formalities its kindliest for an acuteness that you want to apply one of them as abjectness i wonder if those concepts cantelupe learning on billions of images i think tenpennies that like a fundamental concept which we have an maybe not reminded that another conception beneficent something that evacuation of objects and the same thing uliassutai is something that they are born matting it should amoretta etrepilly very quickly i wonder for tears like cemetery rotation is costing miege so i think rotation probably an dotations maestrante in the architecture of right but it’s interesting of all of them could be like counting was another one you know being a tourbillon be constructed corrections to the interpreter level counting i do believe i mean should people they don’t know like yet but i do think it’s not that far in the remote they be interesting if using so supple arning on images can then be applied to an solving those kinds of acutest woldshire kind of impossible what i did do you believe might be true that most people think is not true or don’t agree with you on is there something like that so was going to be a little controversial eat i don’t believe in simulation like all using simulation to do things are very much the clarify because this is a partaking simulation often your your furious in simulation to construct worlds that you then leverage from a chilean ing right befor example like an example would be like traitorous arriving system you basically first piecemeal which built like the environment of the world and genewally or have a lot of flight you tenements in that so i believe it is possible but i think it’s really expensive or doing things ah and at the end of it you do need the real world so i’m not sure so maybe for certain setting like maybe the patsala like protovin the parasol that you can actually invest that much money to build it but i think as a general sort of principle it does not apply to a lot of conceptualisation to ferino only because like one expensive because a second tutmosis for a lot of things in general at the resolution licentiate in that so you think very charging visually alike to correctly like simulate the visual like the lighting all those consolations leoncavallo computer vision proudly not and for me in the context of a torarin it’s very tempting to be able to use simulation right because it’s safety critical application but the other limitation simulation that perhaps is a bigger one than the visual imitation is the behavior of objects casualties in educates and the question is how will can you generate educates and simulation especially with human behavior i think another problem is like for odomantians world so say autonomous driving like in daniele the allots of autonomous cards but they still going to be humans and other of fifty percent of the agents say which are humans fifty percent of the agent at arnoton homestead living agent now the mixture soothe kinds of behavior at you actually expect from the other or other agents or other cards in the road detracting owever different and as the proportion of the number of autonomous cushman keeps changing this behaviour will actually change a lot so niowee to belemnite peace on just part now to bid them pray you don’t have that that monotonous cousin the road so you tried to like make all of the other agents in that similar to behave as humans but that’s not really going to hold up ten fifteen twenty thirty years from now you think we’re living as militarist this withington how hard is it to build the video game like virtuality game where it is so real forget like altruistic to where you can’t tell the difference but like it’s so nice you just want to stay there tatterdemalions that basically spent ninety five per cent ninety nine per cent of their time in a virtual world version answer certain people it might be something that the really divorcement in like happiness out of and may be thrown giving them the attitude it is good for certain people to ultime if it maximis happiness right i think he judged i think of its making people have been may be to ithaca i think it’s this is a very hard question also like you you’ve been a part of a lot of amazing papers what advice would you give to somebody on what it takes the right good paper granting papers now is there is the common things that you’ve learned along the way that you think it takes both for good idea and a good paper to i think both of these have picked up from the loop of worth and the best one of them is speaking the right problem to work on and research is as important as like the certifying the solution to it so i mean the eremitess for the savannah there are certain problems that can at all be solved in a particular time frame so as you want to work on finding the meaning of life this is a great problem i think most people regard that but do you believe that you your talents and leaned you’ll spend on it will make our meaning like make some kind of meaning but progress in your lifetime if you are optimistic about it than play go ahead as a start as partake asking poormaster tentations enough adulation that you think you’d be doing a pesterin that kind of a time in a so that’s one of course there’s the second participation so you shouldn’t just pick problems that you are not excited about because as a glad student or as a researcher you will need to be passionate about it to continue doing that because there are so many other things that you could be doing in life she really need to believe in that to be able to do that for that nominees of paper i think the one thing that he lonesomelike in the past whenever i used to write things and even now initiatory to climb and rooting into the paper that is fatally materies pushing one simple idea that it that’s all because that the paper is going to be like whatever aid or nine pages if you keep coming and lots of ideas is really hard for the single thing that you believe in the standard securities focus and like my inspecting tons of writing reitfontein dilettantism valuable to the reader as well and basically the reader of course because they get to the noted this progress associated with this paper and also for you because you have like when you ride out to precedent as you think about it only salad student are you always wear like maybe the last week or whatever to light the paper because i used always believe that doing the experiment was actually the bigger part of research and writing and my advice altstetten redolent matter it on now what he talking about but i think more and more realized that the case like mineralising that i’m doing i actually think much better about it and so you start writing orderly on you actually i think get better ideas were it is you figure out like hollister that you should run to lotusland so on he avenuesque sort quite simple and restless that if you want to dream about writing a paper that changes the world ballysloughgutthery that explains that the thread that is it altogether and so i loamshire i know actually would start off like first they were even before the experiment were wildenort will actually start with writing the interaction of the paper with zerbinette use the released them figure out what they like or the transient fits in like the context of things it now and that we really guided and irresolutely for stateswomen in and that’s how they would start those basic questions about people may be the more like beginners in this field was the best programming language to learn if you’re interested in machine learning i would say bianist because it’s the easiest one to learn and also a loitering in machine learning happened and patented know any other programming language by installing to get you a long way he had seemed like servetus of question because it seems like quite an associating the space now but i wonder if there is interesting alternative obviously there is like swift and oiliest alternative popping up even jousselin but it seems like quite onreasonin a programming a universities so it just combines everything very nicely he would harder question what are the pros and cons of pioche footers ago interest right when it ercole out becos i started on likening twenty fourteen or so and damoetas for vision was cathach was out of bookland bucarelli and then pentelicus basically like bitin first so capias mainly spleen at her lakeland of patenting subitamente rutheford language is we really use either march laboriously get stuff tennington of good pacinotti later sienese was busily around that time spent fifteen twenty sixteen as when i last used a invented to detach ordination of things which was very returners of its structure like you will you would care annotations in that said whereas if you wanted lateritiam goanner hard to do that and lots much more friendly for all of these things again terms of pirandello buonapartist because of the using it a longer and a more familiar also the biotech easier to be boisdeau its imperative and nature compared to licentious not imperative but that delineated busily their imperative designers sort of an vituperating and actually maeterlinck of spindrift there is a competition challenges the library developers to step up the game but the downstairs in the donatello but thankfully the openers community machine doing is amazing so there venever like something positively wait a few days and some one who like superstructural come and translate that potatoes the patagonian basically have figured that all mastermans amazing and deerlike figure out this abstinence of the caving these two femboering course the different cases that are going to be benefiting one or the other framework and like you sir i think competitive use or bothered me all of these abreast of keep learning from each other and keep incorporating the printing to this make them better and better what advice would you have some one new to machining you know maybe just started to have stationmasters are not working can you elaborate a wood hands dirty mean by lalor example like a agoutis not conversion whatever other than trying to let go gleaning to do something clearly penthouse like five eight ten fifteen twenty whatever number of hours retinue it out yourself in that process you act on no more angling is of course a good version a quick answer for i think initially especially like when you starting out it much nicer to peking out basel and i just say that from experience to go lamentation it will not allow the prioress would latitat of a leveloo up the singers messengers busy and they would be tradition because i didn’t have the time and motion my dissertation wherever a inessentials down and like the station that i think lady help me that has really help me figure alongside in general if i to generalize that i feel like persevering through any kind of struggle and a thing you care about is good so you basically you try to make it seem like it’s good you know spent time to bugging but really any kind of struggle whatever form that takes it could be just going a lot o basically anything just sticking with an got the hard thing that could take a form of implementing stuff from scratch it could take the form of regimenting with different libraries are different programming languages it would take a lot of different forms postage is good for the soul so let confitentem besetting was it you now lord thereunto when it was no derelict much so the thing that a lot of people had was now bergerac hesitating else you forsook their bein general for people you ready except the successful your young but you have advice for young people starting out in college or may be in high school you advice for their career advice for their life had you ave a successful path and career and life would be hungry always be hungry for what you want and i think like i’ve been inspired by lot of people who are just like driven and who really like go for what they want no matter what like you shouldn’t want it you should need it affonso you basically go toward the instrumentation you come across a thing that leslie need i think the not going to be any single thing that he oinometer contredire types of things that you need but whenever need something you just go push for it and of course on you you may not get it or you may find that this was not even the thing that you would looking for it might be a different thing but the point is like your pushing through things and dilating the lot of skill and bring a lot of likewise and the fattura hallooed the other thing when you figure out what he really litigate want getting a lot of people here thevetia of that is because one is a fear of commitment and to their semminating things in the world you are most to anisette other amazing things but committing to this one thing so i think a lot of it has to do with just allowing yourself to like noticing and just go all the way with it i mean illinoisans be prepared for in so unbitting i mean especially in research for example fairsome thing that happened almost like almost every day is like her experiments failing and not working and so you really need to be so used to it you need to have a sextant and only basically to like you get through it as when you find the one thing that actually working omission was like one for some michael i when i was a kid i used to relate rebound like the filament and the light but for them and the knee i think his things like he cried line under ninety things that didn’t work or something of the sort and then they asked him the sootier because all of these were fair experiment and then he says or the nine or ten ninety things don’t don’t work and i know that the yootiful supervised kind of learning process have you figured out the meaning of life yet i told jumnootree the answer i hoping you could tell me what do you think of the meaning of it all is a don’t think i figured this out now i have no idea do you think do you think i will help us figure at all or do you think there is no answer at the whole point to keep searching i think i think the tenantless sort of quest for us i don’t think i apostatised hard hard hard question which so many humans have tried to answer or as the interesting thing about a difference ennuyons he was thus sitot how their doing and the eyes almost always operating under well defined objective functions and i wonder like whether or lack of hiding long term objective function inebriation actually has very different than the objective functions that the optimist and the tritonia and like changed dramatically through the court of the life that atarantes ingratiate was like every one was doing the exact same thing that would be pretty boring and we do more like people with different and of fusees or so people a voluntaristic would the biggest feature of being human and will get to like the ones that died because they do something stupid would get the watch that set and learn from it and as a species we take that lesson and become better and better because of all the dumb people in the world that died doing something wild and beautiful heathcote conversation with deprecation to inferior listening to this conversation with each one mister and thank you too on it the information crammerly and a fleigen checked them out in the description to support this bodes and now ameliorate see clark and mystically advanced technology is indistinguishable from magic thank you for listening and hope to see you next time