pyspark-streaming

Here are 11 public repositories matching this topic...

DebanjanSarkar / pyspark-maestro

This repo contains implementations of PySpark for real-world use cases for batch data processing, streaming data processing sourced from Kafka, sockets, etc., spark optimizations, business specific bigdata processing scenario solutions, and machine learning use cases.

json kafka spark python3 pyspark spark-streaming kafka-streams spark-sql spark-mllib kafka-python pyspark-mllib pyspark-api pyspark-streaming pyspark-machine-learning

Updated Jul 24, 2024
Jupyter Notebook

chaitanya-basava / Image-Search-Engine

Star

end-to-end image search app

elasticsearch kafka reactjs embeddings data-engineering clip image-search-engine image-embeddings pyspark-streaming fastapi text-embeddings reverse-search-images

Updated Aug 9, 2024
TypeScript

scott-mcnulty / simple-pyspark-streaming-example

Star

Simple app to test out pyspark streaming from Kafka.

python docker streaming kafka pyspark-streaming kafkacat

Updated Dec 7, 2018
Python

SiyaMathe / nedbank-streaming-pipeline

Star

Production-grade real-time ELT pipeline using PySpark Structured Streaming and Delta Lake. Replicates a high-impact architectural migration from Mercedes-Benz to achieve exactly-once upsert semantics and 60% reduction in cloud compute overhead.

data-engineering apache-kafka event-driven-architecture realtime-analytics azure-databricks pyspark-streaming delta-lake elt-pipeline

Updated Apr 7, 2026
Python

SAAD3XK / kafka-debezium-postgresql

Star

An integration of Debezium PostgreSQL connectors with Kafka and Pyspark.

docker-compose kafka-topic postgresql-database confluent-kafka pyspark-streaming debezium-connector

Updated Mar 27, 2024
Jupyter Notebook

bmjprasad / DDOS_Detection_ApacheAccessLog

Star

kafka flume pyspark-streaming

Updated Sep 30, 2019
Python

PrasetyoWidyantoro / Nifi-kafka-pysparkstream

Star

Nifi - Kafka - Pyspark merupakan sarana belajar saya untuk mengeksplorasi lebih dalam terkait penggunaan tools tersebut

json csv pyspark kafka-topic kafka-consumer kafka-producer nifi nifi-processors indonesian-language pyspark-notebook pyspark-streaming pyspark-sql

Updated Sep 13, 2023
Jupyter Notebook

AimanxxAnsari / PySpark-Practice

Star

Repository for practicing data manipulation and transformation using PySpark. Contains sample scripts for data pipelining, showcasing various techniques and best practices for handling and processing large datasets efficiently.

kafka data-transformation pyspark data-engineering data-pipeline kaggle-dataset pyspark-streaming