The Friction is Your Judgment — Armin Ronacher & Cristina Poncela Cubeiro, Earendil

Transcript

문제점

0:15 · morning. Thanks for having us. Um, today I want to talk with Christina about friction a little bit. Um this is um a a social preview that came up automatically when someone submitted an issue um to um basically there was this is a forum post that goes with um a security incident that was deployed accidentally.

안녕하세요. 초대해 주셔서 감사합니다. 오늘은 크리스티나와 함께 ‘마찰(friction)‘에 대해 잠깐 이야기해 보려 합니다. 이건 누군가 이슈를 올렸을 때 자동으로 생성된 소셜 프리뷰인데요, 실수로 배포된 보안 사고에 관한 포럼 게시물이었습니다.

0:40 · It was a configuration change that caused a problem and the social preview post had the marketing tagline of that company which said ship without friction.

설정 변경이 문제를 일으켰는데, 그 소셜 프리뷰에는 해당 회사의 마케팅 슬로건 “마찰 없이 배포하라(ship without friction)“가 박혀 있었습니다.

0:49 · Um, and we want to encourage to add a little bit of friction to it. Um, and I’ll tell you why. So, who are we? Um, I’ve been doing software development for 20 years, most of it in the open source space. Um, I have created Flask, which is a Python framework, which ironically is so much in the weights that a lot of people um are learning about it now because the machines are producing it.

그런데 저희는 오히려 마찰을 조금 더 넣자고 권하고 싶습니다. 왜 그런지 설명드리죠. 먼저 소개부터. 저는 20년간 주로 오픈소스 쪽에서 소프트웨어를 개발해 왔습니다. 파이썬 프레임워크 Flask를 만들었는데, 아이러니하게도 이 코드가 모델 가중치 안에 워낙 많이 들어가 있어서, 기계가 Flask 코드를 뽑아내는 걸 보고 처음 알게 되는 분들도 많습니다.

1:15 · Um, and I left my previous company that I worked for, Sentry, in April last year, which perfectly coincided with um, me having time and then obviously Cloud Code. And so I fell deep into a hole of aicing engineering and I started writing on my blog and and and a lot of people reached out to me over the last year um, being all excited about this. Um, and then I started with a friend in October, a company called Arendelle where we are trying to make sense of all the AI things.

작년 4월에 전 직장 Sentry를 떠났고, 그 시점이 마침 제가 시간이 생긴 때, 그리고 Claude Code가 나온 시기와 맞물렸습니다. 그래서 AI 엔지니어링의 수렁에 깊이 빠져들었고, 블로그에 글을 쓰기 시작했죠. 지난 1년 동안 이 주제로 많은 분들이 연락을 주셨습니다. 그리고 10월에 친구와 함께 Earendil이라는 회사를 시작해서, AI 관련된 것들을 나름대로 이해해 보려 하고 있습니다.

1:43 · Um, yeah, and my name is Christina and I work with Armen at this company called Arendelle. But importantly, I am what I like to call a native AI engineer. And what that basically means is that these tools have been around longer than I have. Um, so what this means is like they’ve been super foundational in how I’ve become a software engineer. Not just because obviously I use them to work, but also because this is the means by which I’ve learned to do what I do.

안녕하세요, 저는 크리스티나이고 Earendil에서 아르민과 함께 일하고 있습니다. 중요한 건 제가 스스로를 ‘네이티브 AI 엔지니어’라고 부른다는 점입니다. 이 도구들이 저(의 커리어)보다 더 오래 존재했다는 뜻인데요, 제가 소프트웨어 엔지니어가 되는 과정 자체에서 이 도구들이 근본적인 역할을 했다는 의미입니다. 단순히 업무에 사용하는 수준이 아니라, 일을 배우는 수단 자체였거든요.

2:11 · And before Arendelle I was working at bending spoons.

Earendil 이전에는 Bending Spoons에서 일했습니다.

2:16 · So we want to share a little bit from practice not just theory but um I will readily admit that I don’t think we have all the solutions. So we have been building with or on agents for a good 12 months. Um we had huge leverage and great disappointment and we we really keep running into two types of problems.

저희는 이론만이 아니라 현장 경험도 공유하려고 합니다. 다만 모든 해답을 갖고 있진 않다는 점부터 솔직히 인정하겠습니다. 에이전트를 사용해서, 혹은 에이전트 위에서 무언가를 만들어 온 지 꽉 찬 12개월이 되었습니다. 엄청난 레버리지와 큰 실망을 모두 겪었고, 두 종류의 문제에 계속 부딪히고 있습니다.

2:34 · Um I I think especially if you listen to some earlier talks at at this conference you will have learned a lot about um that you should keep using your brain.

이 컨퍼런스의 앞선 발표들을 들으셨다면, “계속 머리를 써라”라는 얘기를 많이 접하셨을 겁니다.

2:43 · Um it’s for some reason that’s really really hard. So there’s a psychological problem and the other one is the engineering challenge is like the they seem to be producing worse code for some people and better code for some other people and like what is it that actually makes that work. Um and so this is really not a solution as it is our part of the journey of how we think so far we have managed. Um yeah, so problem number one is the psychology part which is like why is it even though everybody told you many times over that you should be using your brain, you should be slowing down, it’s actually incredibly hard.

그런데 이게 왠지 정말 어렵습니다. 한쪽에는 심리적 문제가 있고, 다른 한쪽에는 엔지니어링 차원의 도전이 있습니다. 누군가에게는 더 나쁜 코드를, 다른 누군가에게는 더 좋은 코드를 만들어 주는 것처럼 보이는데, 대체 무엇이 이 차이를 만드는 걸까요? 이건 해결책이라기보다는, 지금까지 저희가 해쳐 온 여정의 일부입니다. 문제 1번은 심리 부분입니다. “머리를 쓰고 속도를 늦춰라”라는 말을 그토록 많이 들었는데도, 왜 그게 그렇게 어려운가?

3:13 · It’s just one more prompt and and we don’t sleep that much. Like what is it that actually makes it so hard? And then would it be that hard if the machines would actually be writing perfect code and we wouldn’t have to think quite as much and like what is it is there something we can do to make this a little bit better?

“프롬프트 딱 하나만 더”, 그리고 잠은 잘 못 자는 상태. 뭐가 그렇게 어렵게 만드는 걸까요? 만약 기계가 완벽한 코드를 뽑아 줘서 우리가 덜 고민해도 된다면 이렇게까지 어려울까요? 이 상황을 조금이라도 개선할 수 있는 방법이 있을까요?

3:29 · So I’ll begin by introducing the first part of these problems, the psychology problem. And what I want to talk first about is the shift. So I’m sure a lot of us here who have been playing with these tools for a while now experienced this at some point. We were prompting prompting not so good and then at some point suddenly it clicked and they were really really useful for us and it was fun in the beginning and they gave us a lot of extra time right because not everyone was using them. They were actually tools that made us more productive, that made it more fun to do our jobs.

첫 번째 문제인 심리 문제부터 시작해 보겠습니다. 먼저 ‘전환(shift)‘에 대해 이야기하고 싶어요. 이 도구를 한동안 써 보신 분이라면 이런 경험을 하셨을 겁니다. 처음엔 프롬프트가 잘 안 먹히다가, 어느 순간 ‘딱’ 맞아 들어가기 시작했고, 정말 유용해졌고, 초반에는 재미도 있었죠. 모두가 쓰고 있진 않았으니 우리에게 여분의 시간도 줬고요. 생산성을 올려 주고, 일을 더 재미있게 만들어 주는 진짜 도구였습니다.

심리적 함정: AI 도구는 중독성이 있고 코딩을 수월하게 느껴지게 하기 때문에 엔지니어들은 코드를 설계하고 검토하고 진정으로 이해하는 데 시간을 투자하지 않는 경우가 많습니다.

3:58 · But very quickly, because they were so useful and they got us so hooked, everyone was using them. And so this kind of had the opposite effect where suddenly the baseline expectation was just that everyone is now using them and you have to use them. And so this this fun and free time translated into pressure. Now we all have to ship faster and produce more code. And it is just not sustainable to review and to actually have time to think.

하지만 너무 유용하고 중독성이 강하다 보니 금세 모두가 쓰게 되었고, 효과가 정반대로 뒤집혔습니다. 어느새 “모두가 쓰고 있으니 너도 써야 한다”가 기본 기대치가 되어, 재미와 여유 시간이 곧바로 압박으로 바뀌었죠. 이제는 더 빨리 배포하고 더 많은 코드를 만들어야 하고, 리뷰하고 충분히 생각할 시간은 지속 가능한 수준이 아닙니다.

4:25 · And so this leads us to the trap and I actually think there’s two parts of this problem of this trap and one of them a lot of engineers have spoken about and it’s that these tools are super addictive. You never know if that next prompt is going to be the one that makes your product work and you’ve added a new feature or if it’s going to be that last drop of slop that brings your product crashing down. And so it’s very addictive. We keep doing what we’re doing. It’s not a great solution.

그래서 함정에 빠지게 됩니다. 이 함정에는 두 측면이 있는데, 하나는 많은 엔지니어가 이미 언급한 것처럼 이 도구가 엄청나게 중독적이라는 점입니다. 다음 프롬프트가 제품을 돌아가게 만들고 새 기능을 추가해 줄 마법의 프롬프트일지, 아니면 마지막 한 방울의 슬롭(slop)이 되어 제품을 무너뜨릴지 알 수 없죠. 그래서 계속 반복하게 되고, 좋은 해결책은 아닙니다.

4:49 · But also most importantly, and I don’t think we realize this as much is that because we produce a lot of output very fast, we are tricked into thinking that we’re actually being more efficient doing more work. And this is quite the opposite because now we don’t have as much time to actually stop and think and design what we’re doing. Ask ourselves, is this the best way in which I can implement this or could I be some doing something better?

더 중요한 점은, 우리가 잘 의식하지 못하는 것인데 — 산출물이 빠르고 많아지니까 스스로 “나는 더 효율적이고 더 많은 일을 하고 있다”고 착각하게 된다는 겁니다. 실제로는 정반대입니다. 멈춰서 “이게 정말 최선의 구현인가? 더 나은 방법이 있진 않나?”를 생각하고 설계할 시간이 오히려 줄어들었기 때문이죠.

5:12 · And when you’re in this flow, it’s very difficult for yourself to stop and it’s definitely very difficult for your agent to stop because it’s running around and it’s reading files that it should have never even read. So we are the ones that need to actually have the agency to be in control here.

이 흐름에 들어가면 스스로 멈추기가 매우 어렵고, 에이전트를 멈추기는 더 어렵습니다. 건드리지 말았어야 할 파일까지 돌아다니며 읽고 있거든요. 결국 주체적으로 상황을 통제해야 하는 건 우리 자신입니다.

5:29 · And one thing that from a if you start scaling this from like one person to an engineering team that actually took me quite a while to realize is that it really changes the composition of the engineering team. We we were really supply constrained by creation of code and so like the balance between writing code and reviewing code and engineering teams was usually quite decent.

이걸 한 명 규모에서 엔지니어링 팀 규모로 확장했을 때 뒤늦게 깨달은 한 가지는, 이 도구가 엔지니어링 팀의 ‘구성’ 자체를 바꿔놓는다는 점입니다. 예전에는 코드 생산이 공급 제약이었기 때문에, 코드를 쓰는 쪽과 리뷰하는 쪽 사이의 균형이 대체로 괜찮았거든요.

5:47 · Now every engineer has a multitude of producing power compared to their reviewing power and so obviously we are piling up on poll requests but we are also slowly starting to expand the total amount of humans in an organization that are participating in engineering process. I talked to a lot of engineers over the last year and increasingly the one of the things that came up is like now I have marketing people shipping code. I have um former CEOs sh CEOs that used to be like engineers now shipping code again.

이제 엔지니어 개인의 생산 능력이 리뷰 능력 대비 몇 배로 뛰어서 PR이 쌓일 수밖에 없고, 동시에 엔지니어링 프로세스에 참여하는 ‘사람’의 총량도 서서히 넓어지고 있습니다. 지난 1년간 많은 엔지니어와 이야기해 보니, “이제는 마케팅 담당자가 코드를 배포한다”, “예전에 엔지니어였던 전직 CEO가 다시 코드를 배포한다” 같은 얘기가 점점 더 자주 나옵니다.

6:18 · And so the the roles that those people have in the companies also doesn’t give them there’s not that much um um the responsibility doesn’t rest in them. The the responsibility still rests with the engineering team. And so the the total number of entities both humans and machines that are participating in the code creation process outnumbers the ones that can carry responsibility.

그런데 이 사람들이 회사에서 맡은 역할상, 책임이 그들에게 있는 건 아닙니다. 책임은 여전히 엔지니어링 팀에 있죠. 결국 코드 생성 과정에 참여하는 주체(사람+기계)의 수가, 책임을 질 수 있는 주체의 수를 넘어서고 있는 겁니다.

6:41 · We’re not there where the machine can be responsible for the code changes. And so that has led to more and more code reviews being skipped being rubber stamped. Um and on the goal to small PRs that that we want to see again so that this reviewing process goes um this amplification is something that at the very least we need to recognize.

기계가 코드 변경에 대해 책임질 수 있는 단계는 아직 아닙니다. 그 결과 코드 리뷰가 건너뛰어지거나 형식적으로 승인되는 일이 점점 늘어나고 있죠. 리뷰가 제대로 돌아가게 하려면 다시 작은 PR을 지향해야 하지만, 적어도 이 ‘증폭 현상’이 일어나고 있다는 사실 자체는 인식해야 합니다.

6:59 · And so when you get this pull request that looks really daunting and has 5,000 lines of code in it, this is actually when you should be thinking and that’s exactly when it’s the most overwhelming and and increasingly we’re tapping out of this.

5,000줄짜리 압도적인 PR이 날아왔을 때 — 사실 바로 그 순간이 가장 깊이 생각해야 할 때입니다. 그런데 동시에 가장 압도당하기 쉬운 순간이기도 하고, 점점 더 많은 사람이 이 지점에서 포기하고 있습니다.

7:13 · On the engineering side, what we’re doing is we are creating larger pull requests. We’re creating these massive changes because it is free now, right?

엔지니어링 쪽에서 벌어지는 현상은, PR이 점점 커지고 있다는 겁니다. 코드 생산이 이제 사실상 공짜가 됐기 때문에 거대한 변경을 마구 만들어내고 있죠.

엔지니어링 과제: 에이전트는 유지 관리가 용이하거나 아키텍처적으로 견고한 코드가 아닌, 실행되는 코드를 생성하는 데 최적화되어 있습니다. 이로 인해 예상치 못한 오류 발생 조건과 엔트로피를 초래하는 “엉성한” 코드가 생성되는 경우가 많습니다.

7:23 · And the if you think about how the agents work, they’re really optimized to creating code that runs. Like their main objective is write some code, run the tests, make some progress. The reinforcement learning sort of gets this in. And so the the agents are writing kind of code that is is when you as a human as an software engineer start learning how to write code you wouldn’t necessarily write. So for instance, you see quite a bit of code that tries to read a config file and if it doesn’t read a config file, it loads some defaults.

에이전트의 작동 방식을 생각해 보면, 이들은 ‘실행되는(runs) 코드’를 만드는 쪽으로 최적화되어 있습니다. 주 목표가 “코드를 쓴다 → 테스트를 돌린다 → 진전을 만든다”거든요. 강화학습이 이 패턴을 모델 안에 주입합니다. 그래서 에이전트는, 사람이 소프트웨어 엔지니어로 성장하면서 배웠다면 굳이 그렇게 쓰지 않을 종류의 코드를 씁니다. 예를 들어 설정 파일을 읽으려 하다가 실패하면 조용히 기본값을 로드해 버리는 코드를 꽤 자주 볼 수 있습니다.

7:51 · And as an engineer, you know, that’s actually not great because I might not notice that I’m reading reading the default config file. And so I might only discover that I have a massive problem after two hours when I already wrote database records with wrong data. And so these machines, they they optimize towards making progress to shipping stuff to like unblocking themselves. And as a result, they’re creating many more failure conditions than human written code normally would do. in parts is because you as a human feel a little bit of a you feel bad when you write code like this.

엔지니어라면 이게 좋지 않다는 걸 압니다. 내가 ‘기본 설정 파일을 읽고 있다’는 사실을 인지하지 못할 수도 있고, 잘못된 데이터로 DB 레코드를 두 시간 동안 쓰고 나서야 큰 문제를 발견할 수도 있으니까요. 이 기계들은 “진전 만들기, 배포하기, 본인의 막힘 해소” 쪽으로 최적화되어 있어서, 사람이 쓴 코드보다 훨씬 더 많은 실패 조건을 만듭니다. 사람은 이런 코드를 쓸 때 어딘가 찝찝함을 느끼기 때문이에요.

8:22 · There’s there’s something that sort of builds up emotionally in yourself, but the agent doesn’t have a reason for this. It it doesn’t feel anything. And so if you if you create these services that are sort of hobbling along and they’re actually willing to to recover from local failures, you actually create very very brittle systems. And this also means that you’re very quickly creating a codebase of the size and complexity that the agent itself can no longer dig itself out from. It’s going to start no longer reading all the files that it should.

사람 안에는 감정적으로 뭔가 쌓이는데, 에이전트에게는 그럴 이유가 없습니다. 아무것도 느끼지 않죠. 그래서 ‘로컬 실패를 조용히 회복해 버리며 절뚝거리는 서비스’를 만들다 보면, 실제로는 매우 매우 취약한 시스템을 만드는 겁니다. 동시에, 코드베이스가 너무 빠르게 커져서 에이전트 스스로도 헤쳐나올 수 없는 크기와 복잡도에 도달하게 됩니다. 읽어야 할 파일들을 더 이상 다 읽지 않기 시작하고요.

8:50 · It’s it’s creating code in a new file that has already done somewhere else. And so this this entire machinery over time creates much more entropy in a source code than you would normally have if if humans were on it. And a big part of this is that humans feel bad and agents don’t really have any emotions that they communicate to you.

이미 다른 곳에 구현된 기능을 새 파일에 또 만들고 있습니다. 이런 전체 흐름이 시간이 지날수록, 사람이 작업했을 때보다 훨씬 많은 엔트로피를 코드베이스에 쌓아 올립니다. 그 근본 원인의 큰 부분은 “사람은 찝찝함을 느끼지만 에이전트는 당신에게 어떤 감정도 전달하지 않는다”는 데 있습니다.

9:11 · But as Armen likes to say, don’t worry, not all is lost. We have s found some correlation between what the agents really excel at doing and the types of code bases that we actually put them to work into. And for example, the main example here is libraries versus products. What we found is that for libraries, they tend to excel a lot more. And this makes sense because intrinsically when you’re building a library, you tend to have a very clearly defined problem that you’re trying to solve. And most of the time you can even map the set of features that you want to build to the API service and it has very tight constraints.

아르민이 즐겨 말하듯 — 다 잃은 건 아닙니다. 에이전트가 정말 잘해내는 영역과, 우리가 에이전트에게 맡기는 코드베이스 종류 사이에 어떤 상관관계가 있다는 걸 발견했거든요. 대표적인 예가 ‘라이브러리 vs 제품’입니다. 라이브러리에서는 에이전트가 훨씬 더 잘해냅니다. 라이브러리는 본질적으로 풀려는 문제가 아주 명확하게 정의되어 있고, 만들려는 기능 집합을 API 표면에 거의 일대일로 매핑할 수 있으며, 제약이 타이트하기 때문입니다.

9:43 · And because this is something that you probably want to build on top of or make accessible to other people, it’s likely that it’s going to be a very simple core in which you can then plug into. And on the other hand, products and perhaps this is a bit more unlucky for the rest of us because we all probably are more into building products. Uh it’s much harder because there are so many interacting concerns and components like for example you have your UI, your API response. You have different permissions depending on the feature flags, the billing and so on.

그리고 라이브러리는 다른 사람들이 그 위에 무언가를 쌓거나 접근해서 쓰는 걸 염두에 두고 만들기 때문에, 단순한 코어 위에 플러그인되는 구조가 될 가능성이 큽니다. 반대로 제품은 — 우리 대부분이 제품을 만들고 있어서 조금 불운한 상황이지만 — 훨씬 어렵습니다. UI, API 응답, 피처 플래그에 따라 달라지는 권한, 빌링 등, 서로 얽히는 관심사와 컴포넌트가 너무 많거든요.

10:12 · And so there’s this very heavy intertwining between different components. And what this means is that for the agent itself, it’s impossible to fe fit all of this into its context window. it has no way to actually understand the entire global structure and so locally the agent tends to be very reasonable but when it gets to the global scale it becomes a bit demented.

서로 다른 컴포넌트들 사이의 얽힘이 매우 심하다는 뜻입니다. 그래서 에이전트는 이 전체를 자기 컨텍스트 윈도우에 다 집어넣을 수 없고, 전역 구조를 실제로 이해할 방법이 없습니다. 결과적으로 국소(local)에서는 꽤 합리적으로 행동하지만, 전역(global) 스케일로 가면 조금 ‘치매’에 걸린 듯한 상태가 됩니다.

10:34 · So what we’re proposing here is that just as you would do with any type of system design in the past, your codebase has now become infrastructure and as such you have to design it in the way so that it is also legible for the agent and it can make the most of it.

그래서 저희가 제안하는 건 이렇습니다. 과거에 시스템 설계를 해왔던 것처럼 — 이제는 코드베이스 자체가 ‘인프라’가 되었다고 보고, 에이전트도 읽을 수 있도록(legible) 설계해서 에이전트가 최대치를 낼 수 있게 해야 한다는 것입니다.

10:51 · And so this is what we’re proposing is an agent legible codebase and one of the main points that is very clear to all of us I’m sure is modularization. So like we have different components and this makes it easy for the agent to add one feature in one spot without corrupting everything else. But importantly this also means modularizing your code flow itself. So for example I’ve been working on some refactoring. We’re building somewhat of an AI assistant. And for me it was super important to understand which steps of my code are actually like the main points.

저희가 제안하는 건 ‘에이전트가 읽을 수 있는 코드베이스’이고, 그 첫 번째 포인트는 모두에게 익숙한 ‘모듈화’입니다. 컴포넌트가 분리되어 있으면 에이전트가 한 지점에서 기능을 추가하더라도 다른 부분을 망가뜨리지 않을 수 있죠. 중요한 건 ‘코드 흐름’ 자체의 모듈화이기도 합니다. 예컨대 최근 AI 어시스턴트를 만들면서 리팩터링 작업을 했는데, ‘내 코드에서 어느 단계가 진짜 핵심 지점인지’를 명확히 아는 게 저에겐 무엇보다 중요했습니다.

에이전트가 읽을 수 있는 코드베이스: AI의 효율성을 극대화하기 위해 발표자들은 코드베이스를 인프라처럼 설계할 것을 제안합니다. 여기에는 다음이 포함됩니다.

11:20 · So say like you get user message then I pass the message to the agent loop and then I have to deal with the output. And this is where these points are very clearly defined for me. So the code was not as messy. But it happens to be that between these points, between these steps, that’s where the agent tends to add the most fuzz. So it will be parsing between different types.

예를 들어 “사용자 메시지 수신 → 에이전트 루프에 전달 → 출력 처리”처럼 주요 지점이 저에게 아주 명확히 정의되어 있었습니다. 덕분에 코드가 지저분하진 않았는데요, 문제는 이 지점들 사이의 틈에서 에이전트가 가장 많은 잡음을 집어넣는다는 점이었습니다. 타입 간 변환을 한다거나,

11:41 · It’s adding things to state that shouldn’t be in state. And so you end up with these behaviors that you didn’t want to support and that are unexpected and can be quite dangerous. Another point is trying to follow all of the known patterns because I think we all know by now there’s no point in fighting the RL the reinforcement learning. The more we can lean into it the better that our output is going to be and it’s also more scalable down the line.

상태(state)에 들어가선 안 될 것들을 상태에 집어넣거나… 결과적으로 지원하려 하지도 않았던, 예기치 않고 꽤 위험할 수도 있는 동작들이 생겨납니다. 또 하나의 포인트는 ‘알려진 패턴을 최대한 따르기’입니다. 이제 강화학습(RL)과 싸워 봐야 소용없다는 건 다들 아시잖아요. 거기에 몸을 맡길수록 출력물 품질도 좋아지고, 장기적 확장성도 좋아집니다.

12:04 · Then as mentioned with libraries like if you have a simple core and you push the complexity to other abstraction layers then it’s going to be easier for yourself and the agent to be able to read your codebase and no hidden magic.

그리고 앞서 라이브러리에서 얘기했듯, 단순한 코어를 두고 복잡성을 다른 추상화 레이어로 밀어내면, 자기 자신과 에이전트 모두 코드베이스를 읽기가 더 쉬워집니다. 그리고 ‘숨겨진 마법(hidden magic)을 두지 않기’.

12:18 · So for example here uh using react server actions or using OM instead of rorowsql what this does is that it hides intent from the agent and if the agent can’t see something it can surely not respect it and so to be more precise these are the examples of mechanical enforcement that we have been using at the company and most of these we actually achieve with uh linting rules. So the main example would be no bare catch holes. Great.

예를 들어 React Server Actions를 쓴다거나 raw SQL 대신 ORM을 쓰면, 에이전트에게는 의도가 ‘숨겨진’ 상태가 됩니다. 에이전트가 볼 수 없는 것은 당연히 존중할 수도 없으니까요. 더 구체적으로, 저희 회사에서 실제로 쓰고 있는 ‘기계적 강제(mechanical enforcement)‘의 예시들입니다. 대부분 린팅 규칙으로 달성하고 있고, 대표적인 예가 “bare catch 금지”입니다.

엄격한 린팅을 통한 기계적 강제(예: 포괄적인 예외 처리 금지, 고유한 함수 이름 사용, 동적 임포트와 같은 숨겨진 기법 사용 금지)

12:48 · Imagine that there’s an example here.

여기 예시가 있다고 상상해 주세요.

12:50 · The agent found the very catch all and was like, “Oh no, this is bad. Edited it.” But yeah, so we also try to have our SQL uh always in one query interface so that the agent doesn’t have to go hunting around the codebase finding all of the different places because if it misses one then you can have breaking behaviors and again that’s dangerous. We try to have one primitives components library for the UI and not have any raw for example input uh input boxes. Uh so that it’s we always have one type of styling. It’s very consistent one kind of behavior.

에이전트가 포괄 catch를 발견하고 “이런, 이건 안 되지”라며 직접 수정하는 모습입니다. 그리고 저희는 SQL을 항상 하나의 쿼리 인터페이스에 두려고 합니다. 그래야 에이전트가 코드베이스를 여기저기 뒤져 다니며 모든 지점을 찾아다닐 필요가 없거든요. 하나라도 놓치면 동작이 깨질 수 있고, 이는 위험합니다. UI도 하나의 프리미티브 컴포넌트 라이브러리를 쓰고, 예컨대 raw input 박스는 두지 않으려 합니다. 그래야 스타일링이 한 가지 타입으로 유지되고, 동작도 한 종류로 매우 일관되게 유지됩니다.

13:21 · We don’t have any dynamic imports. And this may not sound as important but actually we enforce unique function names. And the reason for this is not just more legibility for you and the agent, but it’s actually also the token efficiency. So if your agent is gripping for a specific feature or something in your codebase, if it only gets one output, it’s going to be much better at continuing with the loop. And we’ve started exploring something recently called erasable syntax only TypeScript mode. And what this does is that your code is basically JavaScript and it has the type annotations on top.

동적 임포트는 쓰지 않습니다. 별로 중요해 보이지 않을 수 있지만, 저희는 ‘함수 이름 유일성’도 강제합니다. 이유는 사람·에이전트 양쪽의 가독성뿐 아니라 ‘토큰 효율성’ 때문입니다. 에이전트가 코드베이스에서 특정 기능을 grep할 때 결과가 한 개만 나오면 루프를 훨씬 잘 이어갈 수 있거든요. 최근에는 ‘erasable syntax only TypeScript’ 모드도 탐색 중입니다. 이 모드에서는 코드가 사실상 JavaScript이고 그 위에 타입 어노테이션만 얹힌 형태가 됩니다.

13:54 · And this means that there’s no transpiling direction because there’s one source of truth between your actual code and the compiler. And so when the agent is looking for errors, it doesn’t have to have this like confusion of oh my god, where am I looking at? It is much better at finding them.

즉 실제 코드와 컴파일러 사이에 ‘단일 진실원(single source of truth)‘이 생기고, 트랜스파일 방향 자체가 사라집니다. 그래서 에이전트가 에러를 찾을 때 “대체 어딜 봐야 하지?” 하는 혼란이 없고, 훨씬 잘 찾아냅니다.

14:11 · And so the goal really is get in this loop somehow like get the agent to produce as good code as it can, but you really need to find a way to feel the pain that the agent doesn’t feel and you need to be woken up in a way when you should be looking at this. And one of the things we have been doing is we build a PI extension for our review needs where we are separating out the kind of input that normally would go back to the agent. So this is mechanical bugs. It is where it clearly violated the agents MD.

결국 목표는 이런 루프 안에 들어가는 것입니다. 에이전트가 낼 수 있는 최선의 코드를 뽑아내되, 에이전트가 느끼지 못하는 ‘아픔’을 사람이 대신 느낄 방법을 찾아야 하고, 사람이 들여다봐야 하는 순간에는 사람이 깨어나야 합니다. 저희가 해 온 것 중 하나는 리뷰용 PI 확장을 만든 것입니다. 평소라면 에이전트에게 곧바로 되돌려 줄 종류의 입력 — 예컨대 기계적 버그나 agents.md 규칙을 명백히 위반한 항목 — 을 분리해 놨죠.

마찰 재도입: 발표자들은 속도를 늦추는 것을 권장합니다. 데이터베이스 마이그레이션이나 권한 변경과 같이 인간의 판단이 필수적이며 개발 프로세스에 의도적으로 다시 도입해야 하는 특정 고위험 영역을 식별할 것을 권장합니다.

14:41 · Um but then we specifically call out the kind of changes where the human’s brain should reactivate, right? It’s like we don’t think that the database migration should ever go in without the human making a judgment call on this because it very much depends on the locks, the size of the data in production. Um if there are permissioning changes, you better think about this themselves rather than the agent because they can be they can be underdocumented.

그러고 나서는, 사람의 두뇌가 다시 깨어나야 하는 변경 유형을 일부러 별도로 표시합니다. 예컨대 DB 마이그레이션은 사람이 판단을 내리지 않은 채로 그냥 들어가선 안 된다고 봅니다. 프로덕션의 락 상황, 데이터 크기에 따라 결과가 크게 달라지니까요. 권한(permission) 변경 역시 에이전트보다는 사람이 직접 생각해야 합니다. 문서화가 부실한 경우가 많거든요.

15:04 · Just some examples where we learned if we miss it, we regret it. Um and you will miss it. But this these machines can help you find this and then you see this and then you actually get a little bit of a hit like, oh now now I have to kick into gear and do something here. Um this is what this looks like in pi.

“놓치면 반드시 후회하는” 영역의 예시들입니다. 그리고 여러분은 반드시 놓칩니다. 다만 이 기계들이 그 지점을 찾아내는 걸 도와줄 수 있고, 그걸 본 순간 “아, 이제 내가 직접 개입해서 뭔가 해야겠다”는 작은 각성이 옵니다. PI에서 이게 어떻게 보이는지 보여드리죠.

15:22 · Um you have the um on the bottom you have the human call outs on the top you have what is go what basically if you were to end this review and say like fix the issues the the agent would go back and automatically act on the first two um but but this is the moment where I will now go and see like is this a dependency I actually want to have in this codebase like do I like the maintainers is this does this work for me and we obviously like the speed like

화면 아래쪽에는 ‘인간 호출(human call-out)’, 위쪽에는 — 리뷰를 종료하면서 “이슈 고쳐 줘”라고 했을 때 에이전트가 자동으로 처리해 버릴 — 처음 두 항목이 있습니다. 하지만 지금 이 순간이 바로 제가 “이 의존성을 정말 이 코드베이스에 둘 것인가? 메인테이너는 믿을 만한가? 나에게 맞는 선택인가?”를 직접 판단하는 시점입니다. 우리 모두 속도를 사랑합니다.

15:51 · this is addictive it is great we feel there’s a lot of productivity But it is so devious if you start relying on it speed where you really shouldn’t. And so I can only encourage you to find the areas where you you have this feeling that this is actually net positive. For me a lot of this is reproduction cases like when a customer reports an issue I can I can have the age and reproduce this perfectly and I have a really good starting point exploring different type of product directions for as long as you commit yourself to doing this uh with the code that it generates.

중독성 있고, 기분 좋고, 생산성도 느껴지죠. 하지만 정말 속도에 기대선 안 될 지점에서까지 속도에 기대기 시작하면 — 그건 아주 교묘한 덫입니다. 그러니 “이건 진짜로 순가치가 양(+)이다”라고 느껴지는 영역을 스스로 찾아내시길 권합니다. 저에게 대표적인 건 ‘재현 케이스’입니다. 고객이 이슈를 보고하면 에이전트가 그걸 완벽하게 재현해 주거든요. 제품 방향을 여러 갈래로 탐색할 때의 출발점으로도 유용합니다 — 생성된 코드를 끝까지 본인이 책임지고 본다는 전제 아래에서요.

16:20 · Um all of this is great but on the other hand system architecture creating reliability in the system they’re not just very good at because we really still have to go slow. It’s there is so much mess that can appear in a codebase in so little time.

이 모든 건 좋습니다만, 반대로 시스템 아키텍처나 신뢰성 확보 같은 영역에서는 아직 이들이 그렇게 잘하지 못합니다. 여기서는 여전히 속도를 늦춰야 합니다. 짧은 시간 안에 코드베이스에 정말 엄청난 난장판이 생길 수 있거든요.

16:35 · Mario was already talking about this earlier but like we forget that we producing months and months of technical debt in the in in a time of weeks in a time of days sometimes and it becomes so much harder to actually understand what’s going on as codebase. the when the understanding of your own code drops, it is really really hard and it’s also psychologically hard. I’ve found some code pieces that actually didn’t work in production and I was kind of frustrated learning that I was the one that committed it with the agent and just didn’t really see that.

마리오도 앞서 이야기했지만, 우리는 몇 주, 때로는 며칠 만에 몇 달치 기술 부채를 만들어내고 있다는 걸 잊습니다. 그리고 코드베이스에서 무슨 일이 벌어지고 있는지 이해하는 게 훨씬 더 어려워집니다. 자기 코드에 대한 이해도가 떨어질 때는 정말 정말 힘들고, 심리적으로도 힘듭니다. 프로덕션에서 동작하지 않는 코드를 발견했는데, 알고 보니 그걸 에이전트와 함께 커밋하고 제대로 보지 못한 게 저 자신이었다는 걸 알게 됐을 때 상당히 좌절스러웠습니다.

17:02 · It’s it’s a very disappointing experience when it happens and then you realize that you actually were the one that screwed up. Um, and so it is it is psychologically incredibly hard to to really judge objectively the state of the codebase.

망친 사람이 본인이었다는 걸 깨달을 때 굉장히 실망스러운 경험입니다. 그래서 코드베이스의 상태를 객관적으로 판단한다는 게 심리적으로 엄청나게 어렵습니다.

17:18 · And the only way right now is to really slow down a little bit on on that front and this this friction. I know that friction like every engineering team I’ve ever worked at said like we need to get rid of the friction in shipping and and that is true. Like there’s a lot of stuff that’s very very annoying and shouldn’t be there. But if you have worked on large enough engineering work, SLOs’s are a great system that is intentionally designed to put friction into the engineering process to make you think, do I need this reliability? Do I need this criticality of the service? Am I sufficiently staffed to run it?

지금으로선 유일한 방법은 그 지점에서 속도를 조금 늦추는 것, 즉 ‘마찰(friction)‘을 받아들이는 것입니다. 제가 일했던 모든 엔지니어링 팀은 “배포에서 마찰을 없애야 한다”고 말해 왔고, 그건 어느 정도는 사실입니다. 정말로 짜증 나고 사라져야 할 마찰도 많죠. 하지만 충분히 큰 엔지니어링 조직에서 일해 보셨다면, SLO가 좋은 예입니다. SLO는 엔지니어링 프로세스에 ‘의도적으로 마찰을 집어넣어’ 우리가 이렇게 묻게 만드는 장치예요 — “이 신뢰성이 정말 필요한가? 이 서비스의 중요도는 얼마인가? 운영할 인력은 충분한가?”

결론

17:48 · And with the agents, we have now gotten this idea that we should get rid of all of this when in all reality we need of it.

그런데 에이전트를 쓰면서 우리는 “이런 마찰을 전부 없애야 한다”는 발상에 빠졌습니다. 실제로는 오히려 필요한 건데요.

17:56 · Um because the friction actually in many ways is what’s necessary on a physical level to steer. like without friction there’s no steering and and that is really necessary. Um so you should you should put a little bit more of a positive association to this idea of friction. Um because this is really where your judgment is. This is where your experience is and you should be inserting that and start feeling it.

마찰은 여러 의미에서 물리적으로 ‘조향(steering)‘을 가능하게 하는 조건이거든요. 마찰이 없으면 방향을 틀 수 없고, 그게 정말 필요한 겁니다. 그러니 ‘마찰’이라는 개념에 조금 더 긍정적인 이미지를 부여해 보시길 권합니다. 바로 그 지점이 여러분의 판단력이 발휘되는 곳이고, 경험이 쌓여 있는 곳이니까요. 그걸 의식적으로 집어넣고, 직접 체감하셔야 합니다.

18:19 · Thank you.

감사합니다.

18:20 · Thank you.