There are several approaches to design a logging architecture.
Best option depends obviously on the specific requirements.
In this blog post I won’t tell which logging architecture is best but I will give you options of what’s used in the industry.
Summary of approaches Native Option
First approache is to go all with AWS as Ancestry did ( https://www.youtube.com/watch?v=igcnes0PI10). With this option you will end up using kinesis for data streaming, which means you will probably also have Lambda functions (mapping to shards) and passing the results to another kinesis stream or storing them in Elasticsearch or S3.
Since this is AWS based, then API Gateway is needed for the Lambda Functions to work as Services.
This is what companies like Pinterest do ( https://www.youtube.com/watch?v=DphnpWVYeG8). With this option usually kafka is used for data streamming as depicted in the image below.
For data processing there are few options, like Spark or Storm ( TODO: add sample) to read from kafka’s partititons as they do at Airbnb (link here).
The following table summarizes the approaches (no duplicate entries).
kineses VS kafka: https://medium.com/faun/apache-kafka-vs-apache-kinesis-57a3d585ef78
Sample spark job: https://spark.apache.org/examples.html
Sample ElasticSearch with Java: https://www.baeldung.com/elasticsearch-java
Sample HBase with Java: https://www.baeldung.com/hbase
Sampel streamming with kafka and Spark: https://www.baeldung.com/kafka-spark-data-pipeline
Hive is a data warehouse software and HBase is a column-oriented database
Other options to Review later Generic
https://eng.uber.com/distributed-tracing/ Streaming, Flexible Log Parsing with Real-Time Application
Originally published at http://jacace.wordpress.com on September 4, 2020.