Design-loving front-end engineer
Ryong
Design-loving front-end engineer
전체 방문자
였늘
μ–΄μ œ
    • Framework
    • React
      • Concept
      • Library
      • Hook
      • Component
      • Test
    • NodeJS
    • Android
      • Concept
      • Code
      • Sunflower
      • Etc
    • Flutter
      • Concept
      • Package
    • Web
    • Web
    • CSS
    • Language
    • JavaScript
    • TypeScript
    • Kotlin
    • Dart
    • Algorithm
    • Data Structure
    • Programmers
    • Management
    • Git
    • Editor
    • VSCode
    • Knowledge
    • Voice
Design-loving front-end engineer

Ryong

[ React ]  μ›Ή μ†ŒμΌ“ 톡신 + Web Audio API
React/Concept

[ React ] μ›Ή μ†ŒμΌ“ 톡신 + Web Audio API

2022. 6. 2. 12:10

πŸ”΄  TTS

βšͺ  μ‹œλ‚˜λ¦¬μ˜€

πŸ””  TTS ν…ŒμŠ€νŠΈλ₯Ό μ§„ν–‰ν•˜κ³  싢은 ν…μŠ€νŠΈλ₯Ό μž…λ ₯ν•œ ν›„ μ„œλ²„λ‘œ μ „μ†‘ν•œλ‹€.

πŸ””  μ„œλ²„μ—μ„œ 일정 chunk λ‹¨μœ„λ‘œ λ‚˜λˆ μ„œ μž…λ ₯ν•œ ν…μŠ€νŠΈμ— λ§žλŠ” μŒμ„± Raw 데이터가 μ›Ή μ†ŒμΌ“ ν†΅μ‹ μœΌλ‘œ λ„μ°©ν•œλ‹€.

πŸ””  μŒμ„± 데이터λ₯Ό 버퍼에 μ €μž₯ν•œ ν›„ μ˜€λ””μ˜€λ‘œ μž¬μƒμ‹œν‚¨λ‹€.

 

βšͺ  μ½”λ“œ

 
const TtsScreen = () => {

    const inputText = useSelector(state => state.tts.inputText);
    const isPlaying = useSelector(state => state.tts.isPlaying);
    const dispatch = useDispatch();

    // recorder
    const [player] = useState(() => new Player());

    const handleText = (text) => {
        dispatch(setInputText(text));
    }
    
    const handleBtn = async () => { 
        if (isPlaying) return;
        if (inputText) {
            if (player.audioContext) {
                player.audioContext.resume().then(() => {
                    player.connect(inputText, scenario.url,
                        () => { dispatch(setIsPlaying(false)); }
                    );
                })
            }
            dispatch(setIsPlaying(true));
        } else toastError("λ¨Όμ € λ¬Έμž₯을 μž…λ ₯ν•΄μ£Όμ„Έμš”.");
    }
    
    useEffect(() => {
        player.init();
    }, []);

    return (
        <>
            <textarea text={inputText} onChange={(text) => {handleText(text)}} />
            <button onClick={() => handleBtn()} />
        </>
    )
}
 
const Player = function() {
    this.audioContext = null;
    this.buffers = [];
    this.source = null;
    this.playing = false;

    this.init = function() {
        const audioContextClass = (window.AudioContext ||
            window.webkitAudioContext ||
            window.mozAudioContext ||
            window.oAudioContext ||
            window.msAudioContext);
        if (audioContextClass) {
            return this.audioContext = new audioContextClass();
        } else {
            return toastError("μ˜€λ””μ˜€ κΆŒν•œ 섀정을 ν™•μΈν•΄μ£Όμ„Έμš”.");
        }
    }
    
    this.addBuffer = function(buffer) {
        this.buffers.push(buffer);
    }
    
    this.connect = function(inputText, url, onComplete = f=>f) {
        const self = this;
        const path = `ws://112.220.79.221:${url}/ws`;
        let socket = new WsService({
            path : path,
            onOpen : function(event) {
                console.log(`[OPEN] ws://112.220.79.221:${url}/ws`);
                socket.ws.send(inputText);
                console.log(`[SEND] ${inputText}`)
            },
            onMessage : function(event) {
                console.log("[MESSAGE]");
                console.log(event.data);
                if (event.data.byteLength <= 55) return;
                self.addBuffer(new Int16Array(event.data));
                self.play();
            },
            onClose : function(event) {
                console.log("[CLOSE]");
                socket.ws = null;
                onComplete();
            }
        });
    }
    
    const wait = function() {
        this.playing = false;
        this.play();  // λ‹€μŒ chunk μ˜€λ””μ˜€ 데이터 μž¬μƒ
    }
    
    this.play = function() {
        if (this.buffers.length > 0) {
            if (this.playing) return;
            this.playing = true;
            let pcmData = this.buffers.shift();
            const channels = 1;
            const frameCount = pcmData.length;
            const myAudioBuffer = this.audioContext.createBuffer(channels, frameCount, 22050);
            // ν™”μ΄νŠΈ λ…Έμ΄μ¦ˆλ‘œ 버퍼λ₯Ό μ±„μš΄λ‹€.
            for (let i = 0; i < channels; i++) {
                const nowBuffering = myAudioBuffer.getChannelData(i, 16, 22050);
                for (let j = 0; j < frameCount; j++)
                    nowBuffering[j] = ((pcmData[j] + 32768) % 65536 - 32768) / 32768.0;
            }
            pcmData = null;
            this.source = this.audioContext.createBufferSource();
            this.source.buffer = myAudioBuffer;
            this.source.connect(this.audioContext.destination);
            this.source.start();
            this.source.addEventListener("ended", wait.bind(this));  // μ˜€λ””μ˜€ μ’…λ£Œ μ‹œ 이벀트
        }
    }
}

πŸ”΄  Web Audio API

πŸ”Š  AudioContext κ°μ²΄λŠ” μ—¬λŸ¬ 개의 Audio Nodeλ“€λ‘œ κ΅¬μ„±λ˜μ–΄ μžˆλ‹€.

πŸ”Š  일반적인 μž‘μ—… 흐름에 따라 Inputs(μž…λ ₯), Effects(효과), Destination(좜λ ₯) λ…Έλ“œ μˆœμ„œλ‘œ 이루어진닀.

πŸ”Š  ν•„μš”μ— 따라 λ…Έλ“œλ“€μ„ μƒμ„±ν•˜κ³ , 각 λ…Έλ“œλ“€μ„ μ—°κ²°ν•΄μ€˜μ•Ό ν•œλ‹€.

AudioContext 생성

πŸ”₯  Web Audio API의 λͺ¨λ“  κΈ°λŠ₯은 AudioContext 객체λ₯Ό μƒμ„±ν•˜λ©΄μ„œ μ‹œμž‘λœλ‹€.

 
// Audio Contextλ₯Ό Webkit/Blink λΈŒλΌμš°μ € 버전을 ν¬ν•¨ν•˜μ—¬ 생성
const audioContext = new (window.AudioContext || window.webkitAudioContext)();

AudioBuffer 생성

 
// μ˜€λ””μ˜€λ₯Ό μž¬μƒμ‹œν‚€κΈ° μœ„ν•΄ ν•„μš”ν•œ μ˜€λ””μ˜€ 버퍼λ₯Ό 생성
const audioBuffer = this.audioContext.createBuffer(numOfChannels, length, sampleRate);
// lengthλŠ” numSeconds x sampleRate의 곱으둜 ν‘œν˜„λ˜κΈ°λ„ ν•œλ‹€.

AudioBufferSourceNode 생성

πŸ”₯  AudioBufferSourceNodeλŠ” AudioBufferλ₯Ό μŒμ›μœΌλ‘œ μž…λ ₯λ°›λŠ” 객체이며, 주둜 45초 μ΄λ‚΄μ˜ 짧은 μ˜€λ””μ˜€λ₯Ό 단 1회 μž¬μƒν•˜λŠ” μš©λ„λ‘œ μ‚¬μš©λœλ‹€. ν•œ 번 μž¬μƒλ˜λ©΄ 가비지컬렉터에 μ˜ν•΄ μ œκ±°λœλ‹€.

πŸ”₯  45초 μ΄μƒμ˜ κΈ΄ μ˜€λ””μ˜€λŠ” MediaElementAudioSourceNodeλ₯Ό μ‚¬μš©ν•˜λ„λ‘ ꢌμž₯λœλ‹€.

 
// λ©”μ„œλ“œ λ°©μ‹μœΌλ‘œ AudioBufferSourceNode 객체λ₯Ό μƒμ„±ν•œλ‹€.
// μƒμ„±μž 방식은 일뢀 λΈŒλΌμš°μ €μ—μ„œ μ§€μ›λ˜μ§€ μ•Šμ„ μˆ˜λ„ μžˆλ‹€.
const audioBufferSourceNode = audioContext.createBufferSource();
// AudioBufferλ₯Ό μŒμ›μœΌλ‘œ μ£Όμž…ν•΄μ€€λ‹€.
audioBufferSourceNode = audioBuffer;

Audio Graph μ—°κ²°

 
audioBufferSourceNode.connect(audioContext.destination);

μŒμ› μž¬μƒ

 
audioBufferSourceNode.start();
 
const WsService = function({path, onOpen = f=>f, onMessage = f=>f, onClose = f=>f}) {
    this.ws = new WebSocket(path);
    this.initMsg = '{"language":"ko","intermediates":true,"cmd":"getsr"}';

    this.ws.binaryType = "arraybuffer";
    this.ws.onopen = function(event) { onOpen(event); };
    this.ws.onmessage = function(event) { onMessage(event) };
    this.ws.onclose = function(event) { onClose(event) };
}

 

🟠  STT

βšͺ  μ‹œλ‚˜λ¦¬μ˜€

πŸ””  STT ν…ŒμŠ€νŠΈ 진행을 μœ„ν•œ μ˜€λ””μ˜€ λ…ΉμŒμ„ μœ„ν•΄ μ›Ή 마이크 κΆŒν•œμ„ ν—ˆμš©ν•΄μ€€λ‹€.

πŸ””  λ…ΉμŒ λ²„νŠΌμ„ λˆ„λ₯΄κ³  λ°œν™”λ₯Ό μ§„ν–‰ν•˜λ©°, λ°œν™” 인식은 μ§€μ†μ μœΌλ‘œ μ§„ν–‰λœλ‹€.

πŸ””  STT ν…ŒμŠ€νŠΈλ₯Ό μ€‘λ‹¨ν•˜κ³  μ‹ΆμœΌλ©΄ λ²„νŠΌμ„ λ‹€μ‹œ ν΄λ¦­ν•˜λ©΄ λœλ‹€.

 

βšͺ  μ½”λ“œ

 
const SttScreen = () => {

    const inputText = useSelector(state => state.stt.inputText);
    const isPlaying = useSelector(state => state.stt.isPlaying);
    const dispatch = useDispatch();

    // ws
    const [recorder] = useState(() => new Recorder());

    const handleText = (text) => {
        dispatch(setInputText(text));
    }
    
    const handleBtn = useCallback(event => {
        dispatch(setIsPlaying(!isPlaying));

        if (isPlaying) {
            recorder.stop();
        } else {
            dispatch(setInputText(""));
            recorder.start();
        }
    }, [isPlaying]);
    
    useEffect(() => {
        recorder.init({
            path : scenario.url,
            // λ…ΉμŒ μ‹œμž‘
            onStart : function(event) {
                console.log("λ…ΉμŒ μ‹œμž‘ : 마이크 on 해쀄것");
                setSequence({
                    segments : [0, 60],
                    forceFlag : false,
                })
                dispatch(setIsPlaying(true));
            },
            // μΈμ‹λœ κ²°κ³Ό μ „λ‹¬λ°›μŒ
            onResult : function(text, isRepeat) {
                console.log("κ²°κ³Ό λ°›μ•˜μŒ : 화면에 좔가해쀄것");
                dispatch(setInputText(text));
            },
            // μ’…κ²° 전달 λ°›μŒ
            onClose : function(event) {
                console.log("κ²°κ³Ό 전달 λλ‚¬μŒ : 마이크 off 해쀄것");
                setSequence({
                    segments : [0, 1],
                    forceFlag : false,
                })
                dispatch(setIsPlaying(false));
            },
            // μ—λŸ¬ λ°œμƒ
            onError : function(event) {
                toastError("μ„œλ²„μ™€μ˜ 연결을 μ‹€νŒ¨ν•˜μ˜€μŠ΅λ‹ˆλ‹€.");
            }
        });
    }, []);

    return (
        <>
            <textarea text={inputText} onChange={(text) => {handleText(text)}} />
            <button onClick={() => handleBtn()} />
        </>
    )
}

export default SttScreen;
 
const Recorder = function() {
    this.serverUrl = null;
    this.audioContext = (window.AudioContext || window.webkitAudioContext || window.mozAudioContext || window.oAudioContext || window.msAudioContext);
    this.context = null; // new audioContext
    this.audioInput = null;
    this.recorder = null;
    this.recording = false;
    this.stream = null;
    this.wsServiceConfig = {
        repeat : "repeat",
        drop : "nodrop",
    };
    this.callback = {
        onStart : f=>f,
        onResult : f=>f,
        onClose : f=>f,
        onError : f=>f,
    }
    this.webSocket = null;
    navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia;
    
    const self = this;
    this.init = function({path, onStart, onResult, onClose, onError}) {
        // μ„œλ²„ 초기 μ—°κ²° : repeat / drop μ„€μ •
        self.serverUrl = `wss://ai-mediazen.com:${path}/ws`;
        const wsService = new WsService({
            path : self.serverUrl,
            onOpen : function(event) {
                if (wsService.ws.readyState === 1) {
                    wsService.ws.send(wsService.initMsg);
                }
            },
            onMessage : function(event) {
                const _arr = event.data.split(",");
                
                self.wsServiceConfig.repeat = (_arr[1] === "0")? "norepeat" : "repeat";
                self.wsServiceConfig.filedrop = (_arr[2] === "0")? "nodrop" : "drop";

                wsService.ws.close();
            },
        });

        // 콜백 μ €μž₯
        self.callback.onStart = onStart;
        self.callback.onResult = onResult;
        self.callback.onClose = onClose;
        self.callback.onError = onError;
    }

    this.setup = async function() {
        try {
            this.stream = await navigator.mediaDevices.getUserMedia({ audio : {optional: [{echoCancellation:false}]}, video : false });
        } catch (err) {
            return console.log("getStream 였λ₯˜");
        }
        this.context = new this.audioContext({
            sampleRate : 16000,
        });
        this.audioInput = this.context.createMediaStreamSource(this.stream);

        const bufferSize = 4096;
        this.recorder = this.audioInput.context.createScriptProcessor(bufferSize, 1, 1);  // mono channel
        this.recorder.onaudioprocess = function(event) {
            // λ…ΉμŒ 데이터 처리
            if (!self.recording) return;
            self.sendChannel(event.inputBuffer.getChannelData(0));
        }
        this.recorder.connect(this.context.destination);
        this.audioInput.connect(this.recorder);
    }

    this.sendChannel = function(channel) {
        console.log("[Recorder] process channel");
        const dataview = this.encodeRAW(channel);
        const blob = new Blob([dataview], { type : "audio/x-raw" });

        // send : 졜초 1회 μ—°κ²°
        if(!self.webSocket) {
            self.webSocket = new WsService({
                path : self.serverUrl,
                onOpen : function(event) { 
                    console.log("open");
                    self.webSocket.ws.send("{\"language\":\"ko\",\"intermediates\":true,\"cmd\":\"join\"}");
                },
                onMessage : function(event) {
                    console.log(event.data);
                    const receiveMessage = JSON.parse(event.data);
                    const payload = JSON.stringify(receiveMessage.payload);
                    const textMessage = JSON.parse(payload);
                    if (receiveMessage.event === "reply") {
                        console.log("μ‹œμž‘");
                        self.recording = true;
                    }
                    if (receiveMessage.event === "close") {
                        console.log("closed");
                        if(textMessage.status){
                            self.callback.onResult(event, "norepeat");
                        }
                        self.webSocket.ws.close();
                        self.webSocket = null;
                        self.stop();
                        self.callback.onClose(event);
                    }
                    // κ²°κ³Όλ¬Ό
                    else if (textMessage.text) {
                        self.callback.onResult(textMessage.text, self.wsServiceConfig.repeat);
                    }
                    else if (textMessage.stt) {
                        self.callback.onResult(textMessage.stt[0].text, self.wsServiceConfig.repeat);
                    } 
                }
            });
        }

        if(self.webSocket.ws.readyState === 1) {
            if(blob.size > 0) self.webSocket.ws.send(blob);
        }

    }
    this.encodeRAW = function(channel) {
        const buffer = new ArrayBuffer(channel.length * 2);
        const view = new DataView(buffer);
        let offset = 0;
        for (let i = 0; i < channel.length; i++, offset+=2){
            const s = Math.max(-1, Math.min(1, channel[i]));
            view.setInt16(offset, s < 0 ? s * 0x8000 : s * 0x7FFF, true);
        }
        return view;
    }

    this.start = function() {
        console.log("[Recorder] start");
        if(!self.recorder) {
            this.setup();
        }
        self.recording = true;
    }

    this.stop = function() {
        console.log("[Recorder] stop");
        this.close();
        self.recording = false;
    }

    this.close = function() {
        if(this.webSocket && this.webSocket.ws.readyState === 1) {
            console.log("closed");
            this.webSocket.ws.send("{\"language\":\"ko\",\"intermediates\":true,\"cmd\":\"quit\"}");
        }
    }

}

export default Recorder;

πŸ”΄  Web Audio API μ˜€λ””μ˜€ λ…ΉμŒ

AudioContext 생성

 
const audioContext = (
    window.AudioContext ||
    window.webkitAudioContext ||
    window.mozAudioContext ||
    window.oAudioContext ||
    window.msAudioContext
);
const context = audioContext({ sampleRate: 16000, });

πŸ”Š  audioContextλ₯Ό μƒμ„±ν•˜λ©΄μ„œ μ˜΅μ…˜μœΌλ‘œ sampleRate 16000을 μ„€μ •ν•œλ‹€.

MediaStream 생성

 
const stream = await navigator.mediaDevices.getUserMedia({
    audio : {optional: [{echoCancellation:false}]},
    video : false
});

πŸ”Š  μ‚¬μš©μžμ—κ²Œ λ―Έλ””μ–΄ μž…λ ₯ μž₯치 μ‚¬μš© κΆŒν•œμ„ μš”μ²­ν•˜λ©°, μ‚¬μš©μžκ°€ μˆ˜λ½ν•˜λ©΄ μš”μ²­ν•œ λ―Έλ””μ–΄ μ’…λ₯˜μ˜ νŠΈλž™μ„ ν¬ν•¨ν•œ MediaStream을 λ°˜ν™˜ν•œλ‹€.

MediaStreamAudioSourceNode 생성

 
const audioInput = context.createMediaStreamSource(stream);

πŸ”Š  MediaStream을 λ§€κ°œλ³€μˆ˜λ‘œ ν•˜μ—¬ μ˜€λ””μ˜€λ₯Ό μž¬μƒν•˜κ³  μ‘°μž‘ν•  수 μžˆλŠ” 객체인 MediaStreamAudioSourceNode을 μƒμ„±ν•œλ‹€.

ScriptProcessorNode 생성 (Deprecated)

 
const bufferSize = 4096;
const recorder = this.audioInput.context.createScriptProcessor(bufferSize, 1, 1);  // mono channel
recorder.onaudioprocess = function(event) {
    // λ…ΉμŒ 데이터 처리
    self.sendChannel(event.inputBuffer.getChannelData(0));
}

πŸ”Š  ScriptProcessorNode μΈν„°νŽ˜μ΄μŠ€λŠ” 두 개의 버퍼에 μ—°κ²°λœ μ˜€λ””μ˜€ λ…Έλ“œ 처리 λͺ¨λ“ˆμ΄λ‹€.

πŸ”Š  ν•˜λ‚˜λŠ” μž…λ ₯ μ˜€λ””μ˜€ 데이터λ₯Ό ν¬ν•¨ν•˜κ³ , λ‹€λ₯Έ ν•˜λ‚˜λŠ” 처리된 좜λ ₯ μ˜€λ””μ˜€ 데이터λ₯Ό ν¬ν•¨ν•œλ‹€.

πŸ”Š  AudioProcessingEvent μΈν„°νŽ˜μ΄μŠ€λ₯Ό κ΅¬ν˜„ν•˜λŠ” μ΄λ²€νŠΈλŠ” μž…λ ₯ 버퍼에 μƒˆ 데이터가 포함될 λ•Œλ§ˆλ‹€ 객체둜 μ „μ†‘λ˜λ©°, 좜λ ₯ 버퍼λ₯Ό λ°μ΄ν„°λ‘œ μ±„μš°λ©΄ 이벀트 ν•Έλ“€λŸ¬κ°€ μ’…λ£Œλœλ‹€.

Audio Graph μ—°κ²°

 
recorder.connect(this.context.destination);
audioInput.connect(this.recorder);

 

μ €μž‘μžν‘œμ‹œ

'React > Concept' μΉ΄ν…Œκ³ λ¦¬μ˜ λ‹€λ₯Έ κΈ€

[ React ] Props  (0) 2022.07.14
[ React ] Hook  (0) 2022.05.28
    'React/Concept' μΉ΄ν…Œκ³ λ¦¬μ˜ λ‹€λ₯Έ κΈ€
    • [ React ] Props
    • [ React ] Hook
    Design-loving front-end engineer
    Design-loving front-end engineer
    λ””μžμΈμ— 관심이 λ§Žμ€ λͺ¨λ°”일 μ•± μ—”μ§€λ‹ˆμ–΄ Ryongμž…λ‹ˆλ‹€.

    ν‹°μŠ€ν† λ¦¬νˆ΄λ°”