Разрабатываем. Тестируем. Наблюдаем: Software Engineering at Google by Fergus Henderson

14 февраля 2019

Software Engineering at Google by Fergus Henderson

Собственно один из сотрудников Google сделал для всех заинтересованных лиц краткую выжимку того как устроена разработка внутри.
Какой-то эксклюзивной информации вы там не найдете, все это и так уже было в интернетах.
Я перечитал документик и все что будет написано ниже - лишь мои заметки на полях, ибо я вижу как практики зарождающиеся в отраслевых гигантах постепенно расползаются по планете, интересно потом самому же будет перечитать.

Software development

The Source Repository

Write access to the repository is controlled: only the listed owners of each subtree of the repository can approve changes to that subtree.

Очень интересно кто и когда определяет кто является владельцем той или иной части дерева? почему этот человек на протяжении долгого времени является компетентным в этой части дерева?
Как происходит процесс изменения владельца?

Each subtree is required to have at least two owners, although typically there are more, especially in geographically distributed teams.

Most larger teams also have a “build cop” who is responsible for ensuring that the tests continue to pass at head, by working with the authors of the offending changes to quickly fix any problems or to roll back the offending change.

Интересная роль, наводит на мысли о flaky tests и всем таком нехорошем, но вполне реалистичном.

The Build System

The build system’s implementation uses Google’s distributed computing infrastructure.

Enforcing that all dependencies be correctly declared is a consequence of distributing the build: only the declared inputs are sent to the machine on which the build step is run.

The build system tracks dependencies on changes to the build rules themselves, and knows to rebuild targets if the action to produce them changed, even if the inputs to that action didn’t, for example when only the compiler options changed.

The build system stays resident in memory so that for rebuilds it can incrementally analyze just the
files that have changed since the last build.
The tests can be either synchronous, i.e. run before sending the change for review and/or before committing the change to the repository (good for fast-running tests); or asynchronous, with the results emailed to the review discussion thread.

Я уже не раз в разговорах про монорепу объяснял людям что свалить код в одну репу - это еще и построить инфраструктуру и целый набор инструментов для этой репы.
Особенно про асинхронно выполняемые тесты доставляет : то есть они должны асинхронно выполнятся настолько быстро чтобы ревьюер не успел прошляпить, ну или есть неиллюзорный шасн просрать полимеры?

Code Review

Google has tools for automatically suggesting reviewer(s) for a given change, by looking at the ownership and authorship of the code being modified, the history of recent reviewers, and the number of pending code reviews for each potential reviewer.
сам давно думаю про назначение ревьюрам авторов изначального кода, но есть ряд вопросов к правильности реализации. Балансировку код-ревью по загруженности ревьюеров уже давно у себя сделали, работает.

In addition to the main section of the repository, there is an “experimental” section of the repository where the normal code review requirements are not enforced.
Людям надо давать место для песочницы, иначе они сами себе его сделают, но уже на гитхабе.

One way in which keeping changes small is encouraged1 is that the code review tools label each code review with a description of the size of the change, with changes of 30-99 lines added/deleted/removed being labelled “medium-size” and with changes of above 300 lines being labelled with increasingly disparaging labels, e.g. “large” (300-999), “freakin huge” (1000-1999), etc.

Да, да и еще раз да. И еще бы сделать так что бы изменения большого и очень большого размера нельзя было бы коммитить или хотя бы не нужно было ревьюить, потому что занятие это бессмысленное.

Programming languages

Commonality of process is a key to making development easy even with an enormous code base and a diversity of languages: there is a single set of commands to perform all the usual software engineering tasks (such as check out, edit, build, test, review, commit, file bug report, etc.) and the same commands can be used no matter what project or language. Developers don’t need to learn a new development process just because the code that they are editing happens to be part of a different project or written in a different language.

Вот эта кроссязыковая унификация - это прям очень недемократично, но уже даже в масштабе средней конторы очень нужно. Потому что в таком зоопарке (officially-approved programming languages at Google: C++, Java, Python, Go, or JavaScript) должны быть общие правила для всех животных.

Launch approval

The launch of any user-visible change or significant design change requires approvals from a number of people outside of the core engineering team that implements the change. In particular approvals (often subject to detailed review) are required to ensure that code complies with legal requirements, privacy requirements, security requirements, reliability requirements (e.g. having appropriate automatic monitoring to detect server outages and automatically notify the appropriate engineers), business requirements, and so forth.
Особенно актуально в контексте всяких GDPR, пакетов Яровой, масштабных сливов пользовательских баз и прочего.
Ну и да - культурой стартапа тут уже не пахнет :).

Post-mortems

Whenever there is a significant outage of any of our production systems, or similar mishap, the people involved are required to write a post-mortem document.
The impact section tries to quantify the effect of the incident, in terms of duration of outage, number of lost queries (or failed RPCs, etc.), and revenue.
Периодически проводя разборы полетов убеждаюсь, что вместо встречи можно сделать шаблонный документ, с довольно строгими критериями его ревью (может быть даже отдельно на эту тему напишу).

Frequent rewrites

Most software at Google gets rewritten every few years.
This may seem incredibly costly. Indeed, it does consume a large fraction of Google’s resources. However, it also has some crucial benefits that are key to Google’s agility and long-term success. In a period of a few years, it is typical for the requirements for a product to change significantly, as the software environment and other technology around it change, and as changes in technology or in the marketplace affect user needs, desires, and expectations. Software that is a few years old was designed around an older set of requirements and is typically not designed in a way that is optimal for current requirements. Furthermore, it has typically accumulated a lot of complexity. Rewriting code cuts away all the unnecessary accumulated complexity that was addressing requirements which are no longer so important. In addition, rewriting code is a way of transferring knowledge and a sense of ownership to newer team members. This sense of ownership is crucial for productivity: engineers naturally put more effort into developing features and fixing problems in code that they feel is “theirs”. Frequent rewrites also encourage mobility of engineers between different projects which helps to encouragecross-pollinationofideas. Frequentrewritesalsohelptoensurethatcodeiswritten using modern technology and methodology.

Привел этот раздел целиком , потому что считаю это действительно важным.
Понимание этой идеи лично для меня было прям изменением парадигмы.
Куча компаний занимается откровенной херней в попытках написать софт не то чтобы даже на века, но софт, который нужно будет обслуживать по минимуму.
Это не верно. Его все равно нужно будет обслуживать, а хотелки потребителя будут меняться.
Вы все равно будете вносить изменения.
Более того - довольно часто вы себе даже не представляете какие изменения вы будете делать и в каком контексте.

Frequent rewrites also help to ensure that code is written using modern technology and methodology.
Лучше постоянно переписывать софт, чем написать его один раз на коболе, а спустя 30 лет заниматься полным перепроектированием в купе с реверс-инжинирингом для того чтобы избавится от нподдерживаемого никем легаси. Но даже для этого нужно найти и вывести из криосна кобол-программиста, что сука дорого!

This sense of ownership is crucial for productivity: engineers naturally put more effort into developing features and fixing problems in code that they feel is “theirs”.
Суровая правда о том, почему почти никто не любит фиксить баги в чужом коде.

Project management

20% time

Secondly, it provides management with visibility into activity that might otherwise be hidden. In other companies that don’t have an official policy of allowing 20% time, engineers sometimes work on “skunkwork” projects without informing management. It’s much better if engineers can be open about such projects, describing their work on such projects in their regular status updates, even in cases where their management may not agree on the value of the project.
Лучше мы расскажем, чем во дворе объяснят (с) старый анекдот.
Вам что больше нравится - когда разрабы сидя в офисе в youtube пырятся или хотя бы иллюзорный шанс того что они хоть что-то полезное напишут?

Objectives and Key Results (OKRs)

OKRs provide a key mechanism for communicating what each part of the company is working on, and for encouraging good performance from employees via social incentives... engineers know that their team will have a meeting where the OKRs will be scored, and have a natural drive to try to score well, even though OKRs have no direct impact on performance appraisals orcompensation. Defining key results that are objective and measurable helps ensure that this human drive to perform well is channelled to doing things that have real concrete measurable impact on progress towards shared objectives.

Самое главное - это не для того чтобы оценить насколько вам поднять зарплату или кого уволить. Это чтобы понимать куда вся эта херня движется. Когда у вас хотя бы 100 разрабов в 10-ке команд - это уже нихера не так очевидно.

Project approval

Although there is a well-defined process for launch approvals, Google does not have a well-defined process for project approval or cancellation. Despite having been at Google for over 10 years, and now having become a manager myself, I still don’t fully understand how such decisions are made.
Опаньки ! Даже в гугле проекты непонятно откуда берутся. 🙂
Я кстати общаясь с коллегами много где такую фигню наблюдаю.
Кто-то что-то напрототипировал, где-то в курилке посовещались, поймали тех.дира за жопу, и проект стартовал. Хорошо если в трекере хотя бы остаются какие-то артефакты начала проекта. Чаще всего их нет, а именно в начале проекта принимаются все самые критически важные и дорого стоящие потом решения.
Резюмирую : природа зарождения проектов в современных организациях редко бывает ясной и понятной.

Corporate reorganizations

In a large, technology-driven organization, somewhat frequent reorganization may be necessary to avoid organizational inefficiencies as the technology and requirements change.
Осталось придумать как это делать без визгов со стороны чилавекаф.
Facilities
Employees are assigned an individual seat, but seats are re-assigned fairly frequently (e.g. every 6-12 months, often as a consequence of the organization expanding), with seating chosen by managers to facilitate and encourage communication, which is always easier between adjacent or nearly adjacent individuals.
Прям игра в классного руководителя и рассадку учеников по классу.

Training

In addition, each Noogler is usually appointed an official “Mentor” and a separate “Buddy” to help get them up to speed.
Давно думаю что действительно одного человека на задаче мало. Всегда нужен "контрольный тролль", который вовремя встролльнет исполнителя.

Performance appraisal and rewards

Feedback is strongly encouraged at Google. Engineers can give each other explicit positive feedback via “peer bonuses” and “kudos”.
Очень хочется попробовать такую схему, особенно в части с анонимным распределением по команде пусть и скромного бонуса. То есть дать всем и каждом возможность распределить бонусный фонд и дальше посмотреть на то , кто больше денег получит.

Google has a very careful and detailed promotion process, which involves nomination by self or manager, self-review, peer reviews, manager appraisals; the actual decisions are then made by promotion committees based on that input, and the results can be subject to further review by promotion appeals committees. Ensuring that the right people get promoted is critical to maintaining the right incentives for employees.
Без комментариев. Хотелось бы больше деталей.

Manager performance is assessed with feedback surveys; every employee is asked to fill in an survey about the performance of their manager twice a year, and the results are anonymized and aggregated and then made available to managers. This kind of upward feedback is very important for maintaining and improving the quality of management throughout the organization.

Вот это вот очень правильно и в прошлом году я на себе прочувствовал результаты анонимного опроса своих подчиненных. Очень хорошая жизненная практика.

P.S. в разделе про тестирование ну ничего интересного не написано.

Разрабатываем. Тестируем. Наблюдаем

Разделы