Stock Market Real-Time Data Analysis — Kafka + AWS Pipeline

A real-time data streaming pipeline that simulates live stock market data, streams it through Apache Kafka on AWS EC2, and stores the results in Amazon S3 for downstream analysis with AWS Glue and Athena.

Architecture

Stock Data (CSV) → Kafka Producer (EC2) → Kafka Topic → Kafka Consumer (EC2) → S3 → Glue Crawler → Athena

Tech Stack

Layer	Tool
Streaming	Apache Kafka 3.7
Cluster Coordination	Apache ZooKeeper
Compute	AWS EC2
Storage	Amazon S3
Cataloging	AWS Glue Crawler
Querying	Amazon Athena
Language	Python

How It Works

Producer (`KafkaProducer.ipynb`)

Reads historical stock index data from indexProcessed.csv
Randomly samples one row per second to simulate a live market feed
Serializes each record as JSON and publishes to a Kafka topic on EC2

while True:
    dict_stock = df.sample(1).to_dict(orient="records")[0]
    producer.send('demo_test', value=dict_stock)
    sleep(1)

Consumer (`KafkaConsumer.ipynb`)

Subscribes to the Kafka topic
Deserializes each incoming message
Writes each record as a JSON file to S3 using s3fs

for count, i in enumerate(consumer):
    with s3.open("s3://kafka-stock-market-bucket/stock_market_{}.json".format(count), 'w') as file:
        json.dump(i.value, file)

Kafka Setup (EC2)

Kafka was deployed on an AWS EC2 instance. Key setup steps:

# Start ZooKeeper
bin/zookeeper-server-start.sh config/zookeeper.properties

# Start Kafka broker
export KAFKA_HEAP_OPTS="-Xmx256M -Xms128M"
bin/kafka-server-start.sh config/server.properties

# Create topic
bin/kafka-topics.sh --create --bootstrap-server <EC2-IP>:9092 \
  --replication-factor 1 --partitions 1 --topic demo_test

Dataset

indexProcessed.csv — Historical stock market index data used to simulate a real-time feed by random sampling.

Author

Gkeri Pepelasi

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Commands.txt		Commands.txt
KafkaConsumer.ipynb		KafkaConsumer.ipynb
KafkaProducer.ipynb		KafkaProducer.ipynb
README.md		README.md
indexProcessed.csv		indexProcessed.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stock Market Real-Time Data Analysis — Kafka + AWS Pipeline

Architecture

Tech Stack

How It Works

Producer (`KafkaProducer.ipynb`)

Consumer (`KafkaConsumer.ipynb`)

Kafka Setup (EC2)

Dataset

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stock Market Real-Time Data Analysis — Kafka + AWS Pipeline

Architecture

Tech Stack

How It Works

Producer (KafkaProducer.ipynb)

Consumer (KafkaConsumer.ipynb)

Kafka Setup (EC2)

Dataset

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Producer (`KafkaProducer.ipynb`)

Consumer (`KafkaConsumer.ipynb`)

Packages