Why another micro-service blog?
As ever, this is a reminder for the author, linking to primary sources and acting as a crib sheet. A lot of the content comes from on-line talks, conferences and so on.
If you cannot deploy intraday then tell everyone that isn’t good enough. Out of hours and weekend releases are a measure of silliness.
- Scale your teams in a way you cannot with monoliths. Each team only needs to know the inputs and outputs of a small service.
- React on Kafka - do not use service discovery, do not use point to point. Use Kafka.
- Docker and cloud or nothing. If you are not using Docker then don’t bother. Unless you need to do a proof of concept - in which can mock up a tiny elastic capacity system using scripts. But thats it, don’t build a micro-service runtime, as Docker is the way to go.
- Secrets and config maps have to be used from day one.
- How big? Each service should be no more than 600 to 800 lines, and should be re-writable in about a week by about a team of 3.
- Don’t use schema repositories, Avro, Thrift or Proto-bufs. You are so old that what you say makes young people laugh. Use Json.
- Make services self descriptive - they should describe the topics they listen to, and the topics they publish to. They should be able to describe themselves in language that a team could use to rewrite the service. Note this can be readme.md, or it can be code, or it can even be a rest end point anyone and query.
- Each topic should contain only one message type.
- Each message should describe the strongly typed class which serialises to Json. eg add a field called dataClass which names the class.
- Each message and topic name should have a version number.
- Each service name should have a version number.
- Each service should be able to consume test (canary) flow or prod flow. Test flow should exist on separate kafka topics. The words test or prod are overloaded, as is canary, so lets call these Beta and live.
- Every service should listen to a command channel which breaks all the above rules. Many message types can be sent down it, and the topic doesn’t have to be versioned. The command channel can turn service versions on and off, switch from Beta to Live flow and so on. Whatever you need. Kubernetes should actually do most of the above for you, but defining a command channel at the start opens up features/abilities for devops via Kafka.
- DevOps tax and information radiators. Your devs need to be able to see the latencies on the system results, deviations from normality for this time of day/week/month. They need to be able to search for services, see logs, see exceptions, see which versions are live and when they were deployed, and by who and why. Understand that your micro-services are a monolith broken into a murder mystery. Create the level of monitoring with ELK, Graphana, Promethius or something else.
- Learn reacive streams, and argue about it. Do you want orchestration services or stream joins. Think about prod issues and which would you prefer to diagnose.
- Cloud native. If it can’t move to the cloud (eventually) then don’t bother.
- Cloud native - don’t use any vendor product that isn’t cloud native.
- Kubernetes, mesos/marathan, dc/os…. It’s clear that orchestration of pods is still an evolving landscape, so don’t get hooked on your chosen solution, its likely to change.
- Don’t write frameworks. If you write a wrapper for secrets, or a wrapper for logging, or a wrapper for threads or whatever, please leave. Let all the devs have access to the API’s. This way you can scale your teams as new devs don’t have to learn your mental model, they can use the standards.
- Don’t use patterns. No bridges, facades, factory of factories.
- Don’t have a common lib. Or if you do keep it tiny. Let every service use the raw api and copy/paste common code from the web. Code reuse in libs for micro-services is an anti-pattern. Let the devs use the native APIs badly, rather than have to use your API which you think is perfect, but which I assure you is not. It slows down people who are joining your team.
- Each service should consist of: The Service, a Client project holding the typed objects which form the Json events they publish. These should be versioned, and they should be published to a topic with the same name. Add a util project for system testing your service - an upstream and a downstream client. Run this automatically as part of your CI.
- Sunset windows - every time you uprade your service you must sadly preserve the old versions of published events.
- Topics per version, really? You may have instances where you just use the same topic, and send an array of the versions of the messages. This may result in a system which is easier to deploy and maintain. i.e. multiple versions of the same message in one payload may actually be fine.
- Ensure your devops can highlight services which are consuming old event versions, so you can schedule their replacement. Ensure any end to end flow can report the services and versions which contributed.
- Do no use Kafka for big data payloads - sending large messages through lots of services is dim. Events should be small, have a large data store on hand and pass references which are enriched through the different services prior to aggregation and streaming out to clients. Kafka is not a database. Sending large payloads via multiple hops is silly.
- Kafka was designed for thousands of millisecond events to be processed in batches. Your use case is business critical single events which take minutes to process, which means you must handle threads (akka), your must ack carefully, you must recover carefully, you must have an external store for large results, you must deal with asynch events arriving before the data and so on. Use your brain, think about red-path. Don’t ever stop the poll loop, otherwise you will kill Kafka.
- Each service does one logical thing. Yes that means lots of services. Yes.
- Each service CAN process multiple ‘one things’ in parallel - use correlationId’s, and embed the reply topic in the message. You can do RPC calls to a cloud of microservices using events. It doesn’t have to be done as point to point.
- Each message should have an internal audit log showing which services processed it, together with entry and exit times. If there are parent and child correlation Id’s then the first correlation Id should always be included and logged, along with the current child id. This allows you to easily find the entire path for a client query with nothing fancy in your DevOps stack.
- Stateless. Well, some folks don’t get it, tell them it means I can turn them all off, then start them all again. The system just works. Also, I can add another x instances. And it still works.
- Idempotent. Like stateless. If the same events get replayed due to restart then it doesn’t matter. No matter how many times the events arrive and fail, arrive and succeed, whatever, then the system works and the final result is correct.
- It isn’t hard if you don’t want it to be. Circuit breakers and all that. Yes, great, you can, and maybe you should. But you don’t have to, so, maybe, well, just don’t. Same with versioning services. Its hard to do, sunset windows, upgraded packets in topics and so on. So, rather than let this stop you, just don’t worry about it for now.
- I have changed a monolith to micro-services. I still believe in it. This time it is Kafka partitions which is making the difference. The murder mystery prod issue is a pain though.
- If your company cannot go cloud native for prod, then convince them to use it for build and test (CI, CD). You will learn enough so when the CTO is replaced and the new one says OK you will be ready to go.
- Data models. Ensure that none of your rest json that faces clients is used for any of your internal comms. Ie decouple client changes from internal changes. If you have an external big data store then only have one place update it, a bit CQRS. Only add to the data store. ie the biggest mistake you can make is to couple your services together with a shared data model. All data models used for comms should be separate from data models used for storage. All storage and comms data models should be separate from client facing data models. Trust me.
- Is it a closely coupled losely coupled system. i.e. if you deploy one thing, do you have to deploy everything. If so, you have failed.
Now for some links:
12 Factor apps https://12factor.net/
And some buzz words for you to read about: Docker, Kubernetes, Amazon Web Services, Microsoft Azure, Google Cloud, Cassandra, Kafka, Zookeeper, ELK, Akka, Lagom