Realtime Chat

Problem

실시간 채팅은 방 안의 메시지 순서, 여러 서버 인스턴스 간 브로드캐스트, Consumer 실패 복구, 채팅방 목록 조회 성능을 동시에 다뤄야 했습니다.

Decision

메시지 순서는 Kafka roomId 파티션으로 고정하고, 서버 간 전달은 Redis Pub/Sub로 분리하며, 반복 조회는 Cache Aside로 낮추는 구조를 선택했습니다.

Build

Implementation

STOMP CONNECT 인증과 /topic/room.{roomId} SUBSCRIBE 멤버십 검증으로 방 권한을 먼저 막았습니다.
Kafka publish ACK/NACK, roomId partitioning, manual offset commit, 재시도와 DLT manual replay를 조합해 실패 경계를 분리했습니다.
채팅방 목록은 JPQL 프로젝션과 Redis 캐시를 적용하고 방 생성, 메시지 수신, 읽음 처리 시점에 무효화하도록 정리했습니다.

실시간 시스템은 WebSocket 연결 자체보다 순서 유지, 장애 격리, 읽기 모델을 분리해 설명할 때 설계 의도가 명확해진다는 점을 얻었습니다.

Proof

Verification

docs/PERF_RESULT.md의 REST API 부하 결과에서 Cache Aside 전후 RPS와 p95를 비교했고, WebSocket/Kafka 경로는 권한·ACK/NACK·DLT·읽음 처리 검증으로 분리했습니다.

937 → 1,598 RPS

Measured

REST API

k6 200 VU, 50s, local Docker 기준

ACK/NACK

Verified

Kafka Publish

broker 전송 성공/실패 경계를 검증

DLT replay

Verified

Failure Recovery

수동 재처리 절차와 실패 격리 경로 검증

Measured

937 → 1,598 RPS

REST 처리량

Scenario: 채팅방 목록과 읽음 처리 중심 REST API 부하
Method: k6 200 VU, 50초, local Docker 기준으로 Cache Aside 전후 비교
Result: p95는 212.85ms에서 149.22ms로 낮아졌고 에러율 0%를 확인했습니다.

성능 결과

Verified

ACK/NACK + DLT

메시징 신뢰성

Scenario: 구독 권한, Kafka 발행 결과, 실패 메시지 격리, 읽음 처리
Method: Testcontainers 기반 Kafka/Redis/DB 통합 테스트와 수동 DLT replay 절차 확인
Result: SUBSCRIBE 멤버십 검증, publish ACK/NACK, roomId ordering, read receipt, presence 경로를 검증했습니다.

성능 결과

Boundaries

Trade-offs & Limitations

Trade-offs

roomId 파티셔닝은 방 안의 순서를 지키지만, 특정 대형 방에 트래픽이 몰리면 파티션 핫스팟이 생길 수 있습니다.
Redis Cache Aside는 읽기 성능을 높이지만 메시지 수신·읽음 처리 시 무효화 지점을 늘립니다.

Limitations

WebSocket 수치는 send-to-receive latency나 전달 completeness 측정값으로 해석하지 않습니다.
mixed WebSocket traffic 성능 결과는 measured로 주장하지 않습니다.
DLT 재처리는 자동 복구 시스템이 아니라 수동 확인 후 재투입하는 절차입니다.

아키텍처

전체 아키텍처▼

architecture

전체 아키텍처

Mermaid 원본을 파싱해 텍스트는 DOM으로 렌더링합니다.

Client

사용자 진입과 실시간 상태 확인

Client (WebSocket/STOMP)

Application

API, 도메인 로직, 워커 처리

App Instance 1 :8081

App

App Instance 2 :8082

App

chat.messages (6 partitions, key=roomId)

Kafka

chat.read-receipts (3 partitions)

Kafka

chat-read-receipt 읽음 처리

Consumers

Data / Messaging

상태 저장, 캐시, 이벤트 전달

DLT (Dead Letter Topics)

Kafka

chat-persistence DB 저장 + 멱등성

Consumers

chat-broadcast Redis Pub/Sub

Consumers

Redis (Pub/Sub + Cache)

PostgreSQL

핵심 연결 흐름

1. Client (WebSocket/STOMP) → App Instance 1 :8081 · STOMP
2. Client (WebSocket/STOMP) → App Instance 2 :8082 · STOMP
3. App Instance 1 :8081 → chat.messages (6 partitions, key=roomId) · produce
4. App Instance 1 :8081 → chat.read-receipts (3 partitions) · produce
5. App Instance 2 :8082 → chat.messages (6 partitions, key=roomId) · produce
6. App Instance 2 :8082 → chat.read-receipts (3 partitions) · produce
7. chat.messages (6 partitions, key=roomId) → chat-persistence DB 저장 + 멱등성 · consume
8. chat.messages (6 partitions, key=roomId) → chat-broadcast Redis Pub/Sub · consume
9. chat.read-receipts (3 partitions) → chat-read-receipt 읽음 처리 · consume
10. chat-persistence DB 저장 + 멱등성 → PostgreSQL · 저장
11. chat-persistence DB 저장 + 멱등성 → DLT (Dead Letter Topics)
12. chat-broadcast Redis Pub/Sub → Redis (Pub/Sub + Cache) · publish
13. chat-read-receipt 읽음 처리 → PostgreSQL · 저장
14. Redis (Pub/Sub + Cache) → App Instance 1 :8081 · subscribe
15. Redis (Pub/Sub + Cache) → App Instance 2 :8082 · subscribe

원본 Mermaid 보기

graph TB
    Client["Client\n(WebSocket/STOMP)"]
    subgraph App["Spring Boot Cluster"]
      App1["App Instance 1\n:8081"]
      App2["App Instance 2\n:8082"]
    end
    subgraph Kafka["Apache Kafka (KRaft)"]
      T1["chat.messages\n(6 partitions, key=roomId)"]
      T2["chat.read-receipts\n(3 partitions)"]
      DLT["DLT (Dead Letter Topics)"]
    end
    subgraph Consumers["Consumer Groups"]
      CG1["chat-persistence\nDB 저장 + 멱등성"]
      CG2["chat-broadcast\nRedis Pub/Sub"]
      CG3["chat-read-receipt\n읽음 처리"]
    end
    Redis["Redis\n(Pub/Sub + Cache)"]
    DB["PostgreSQL"]
    Client -->|STOMP| App1
    Client -->|STOMP| App2
    App1 -->|produce| T1
    App1 -->|produce| T2
    App2 -->|produce| T1
    App2 -->|produce| T2
    T1 -->|consume| CG1
    T1 -->|consume| CG2
    T2 -->|consume| CG3
    CG1 -->|저장| DB
    CG1 -.->|재시도 실패| DLT
    CG2 -->|publish| Redis
    CG3 -->|저장| DB
    Redis -->|subscribe| App1
    Redis -->|subscribe| App2

ERD

전체 ERD▼

erd

전체 ERD

Mermaid 원본을 파싱해 텍스트는 DOM으로 렌더링합니다.

users

bigserial: id; PK
varchar: email; UK
varchar: password
varchar: nickname
varchar: status
timestamp: last_seen_at

+ 1개 필드

chat_rooms

bigserial: id; PK
varchar: name
varchar: type
bigint: created_by; FK
timestamp: created_at

chat_room_members

bigserial: id; PK
bigint: room_id; FK
bigint: user_id; FK
bigint: last_read_message_id
int: unread_count
timestamp: joined_at

messages

bigserial: id; PK
uuid: message_key; UK
bigint: room_id; FK
bigint: sender_id; FK
text: content
varchar: type

+ 3개 필드

관계 요약

#1 users ||--o{ chat_rooms · created_by
#2 users ||--o{ chat_room_members · 참여
#3 chat_rooms ||--o{ chat_room_members · 멤버
#4 users ||--o{ messages · 발신
#5 chat_rooms ||--o{ messages · 소속

원본 Mermaid 보기

erDiagram
    users ||--o{ chat_rooms : "created_by"
    users ||--o{ chat_room_members : "참여"
    chat_rooms ||--o{ chat_room_members : "멤버"
    users ||--o{ messages : "발신"
    chat_rooms ||--o{ messages : "소속"

    users {
      bigserial id PK
      varchar email UK
      varchar password
      varchar nickname
      varchar status
      timestamp last_seen_at
      timestamp created_at
    }
    chat_rooms {
      bigserial id PK
      varchar name
      varchar type
      bigint created_by FK
      timestamp created_at
    }
    chat_room_members {
      bigserial id PK
      bigint room_id FK
      bigint user_id FK
      bigint last_read_message_id
      int unread_count
      timestamp joined_at
    }
    messages {
      bigserial id PK
      uuid message_key UK
      bigint room_id FK
      bigint sender_id FK
      text content
      varchar type
      int kafka_partition
      bigint kafka_offset
      timestamp created_at
    }

시퀀스 다이어그램

메시지 전송 & 브로드캐스트▼

sequence

메시지 전송 & 브로드캐스트

Mermaid 원본을 파싱해 텍스트는 DOM으로 렌더링합니다.

Participants

ClientApp-1Kafkachat-persistencechat-broadcastPostgreSQLRedisApp-2Client (다른 서버)

1처리
Client → App-1
STOMP /app/chat.send
2이벤트
App-1 → Kafka
produce (key=roomId)
3par Consumer Group 1
control
par Consumer Group 1
조건: par Consumer Group 1
4이벤트
Kafka → chat-persistence
consume
조건: par Consumer Group 1
5저장
chat-persistence → PostgreSQL
멱등성 체크 + 메시지 저장
조건: par Consumer Group 1
6par Consumer Group 1
chat-persistence → PostgreSQL
unreadCount 증가
조건: par Consumer Group 1
7and Consumer Group 2
control
and Consumer Group 2
조건: and Consumer Group 2
8이벤트
Kafka → chat-broadcast
consume
조건: and Consumer Group 2

+ 5개 단계는 아래 상세 메시지에서 확인할 수 있습니다.

전체 메시지 상세

Step	From → To	Message	Condition
1	Client → App-1	STOMP /app/chat.send	-
2	App-1 → Kafka	produce (key=roomId)	-
3	control	par Consumer Group 1	par Consumer Group 1
4	Kafka → chat-persistence	consume	par Consumer Group 1
5	chat-persistence → PostgreSQL	멱등성 체크 + 메시지 저장	par Consumer Group 1
6	chat-persistence → PostgreSQL	unreadCount 증가	par Consumer Group 1
7	control	and Consumer Group 2	and Consumer Group 2
8	Kafka → chat-broadcast	consume	and Consumer Group 2
9	chat-broadcast → Redis	PUBLISH room:{roomId}	and Consumer Group 2
10	Redis → App-1	subscribe	-
11	Redis → App-2	subscribe	-
12	App-1 → Client	STOMP /topic/room.{roomId}	-
13	App-2 → Client (다른 서버)	STOMP /topic/room.{roomId}	-

원본 Mermaid 보기

sequenceDiagram
    participant C as Client
    participant A1 as App-1
    participant K as Kafka
    participant CG1 as chat-persistence
    participant CG2 as chat-broadcast
    participant DB as PostgreSQL
    participant R as Redis
    participant A2 as App-2
    participant C2 as Client (다른 서버)
    C->>A1: STOMP /app/chat.send
    A1->>K: produce (key=roomId)
    par Consumer Group 1
      K->>CG1: consume
      CG1->>DB: 멱등성 체크 + 메시지 저장
      CG1->>DB: unreadCount 증가
    and Consumer Group 2
      K->>CG2: consume
      CG2->>R: PUBLISH room:{roomId}
    end
    R-->>A1: subscribe
    R-->>A2: subscribe
    A1->>C: STOMP /topic/room.{roomId}
    A2->>C2: STOMP /topic/room.{roomId}

읽음 처리▼

sequence

읽음 처리

Mermaid 원본을 파싱해 텍스트는 DOM으로 렌더링합니다.

Participants

ClientAppKafkachat-read-receiptPostgreSQLRedis Cache

1요청
Client → App
POST /api/rooms/{roomId}/read
2이벤트
App → Kafka
produce (chat.read-receipts)
3이벤트
Kafka → chat-read-receipt
consume
4처리
chat-read-receipt → PostgreSQL
unreadCount 초기화
5처리
chat-read-receipt → PostgreSQL
lastReadMessageId 갱신
6처리
chat-read-receipt → Redis Cache
evict(userId)

전체 메시지 상세

Step	From → To	Message	Condition
1	Client → App	POST /api/rooms/{roomId}/read	-
2	App → Kafka	produce (chat.read-receipts)	-
3	Kafka → chat-read-receipt	consume	-
4	chat-read-receipt → PostgreSQL	unreadCount 초기화	-
5	chat-read-receipt → PostgreSQL	lastReadMessageId 갱신	-
6	chat-read-receipt → Redis Cache	evict(userId)	-

원본 Mermaid 보기

sequenceDiagram
    participant C as Client
    participant A as App
    participant K as Kafka
    participant CG3 as chat-read-receipt
    participant DB as PostgreSQL
    participant Cache as Redis Cache
    C->>A: POST /api/rooms/{roomId}/read
    A->>K: produce (chat.read-receipts)
    K->>CG3: consume
    CG3->>DB: unreadCount 초기화
    CG3->>DB: lastReadMessageId 갱신
    CG3->>Cache: evict(userId)

Consumer 장애 복구▼

sequence

Consumer 장애 복구

Mermaid 원본을 파싱해 텍스트는 DOM으로 렌더링합니다.

Participants

KafkaConsumerPostgreSQLDead Letter Topic

1이벤트
Kafka → Consumer
consume
2처리
Consumer → PostgreSQL
처리 시도
3alt 성공
control
alt 성공
조건: alt 성공
4alt 성공
Consumer → Kafka
manual commit
조건: alt 성공
5else 실패 (1~3회 재시도)
control
else 실패 (1~3회 재시도)
조건: else 실패 (1~3회 재시도)
6else 실패 (1~3회 재시도)
Consumer → PostgreSQL
재시도
조건: else 실패 (1~3회 재시도)
7노트
Note · Consumer
FixedBackOff 1초 간격, 3회
조건: else 실패 (1~3회 재시도)
8else 재시도 실패
control
else 재시도 실패
조건: else 재시도 실패

+ 2개 단계는 아래 상세 메시지에서 확인할 수 있습니다.

전체 메시지 상세

Step	From → To	Message	Condition
1	Kafka → Consumer	consume	-
2	Consumer → PostgreSQL	처리 시도	-
3	control	alt 성공	alt 성공
4	Consumer → Kafka	manual commit	alt 성공
5	control	else 실패 (1~3회 재시도)	else 실패 (1~3회 재시도)
6	Consumer → PostgreSQL	재시도	else 실패 (1~3회 재시도)
7	Note · Consumer	FixedBackOff 1초 간격, 3회	else 실패 (1~3회 재시도)
8	control	else 재시도 실패	else 재시도 실패
9	Consumer → Dead Letter Topic	DeadLetterPublishingRecoverer	else 재시도 실패
10	Note · Dead Letter Topic	수동 확인 후 재처리	else 재시도 실패

원본 Mermaid 보기

sequenceDiagram
    participant K as Kafka
    participant CG as Consumer
    participant DB as PostgreSQL
    participant DLT as Dead Letter Topic
    K->>CG: consume
    CG->>DB: 처리 시도
    alt 성공
      CG->>K: manual commit
    else 실패 (1~3회 재시도)
      CG->>DB: 재시도
      Note over CG: FixedBackOff 1초 간격, 3회
    else 재시도 실패
      CG->>DLT: DeadLetterPublishingRecoverer
      Note over DLT: 수동 확인 후 재처리
    end