π΄ TTS
βͺ μλ리μ€
π TTS ν μ€νΈλ₯Ό μ§ννκ³ μΆμ ν μ€νΈλ₯Ό μ λ ₯ν ν μλ²λ‘ μ μ‘νλ€.
π μλ²μμ μΌμ chunk λ¨μλ‘ λλ μ μ λ ₯ν ν μ€νΈμ λ§λ μμ± Raw λ°μ΄ν°κ° μΉ μμΌ ν΅μ μΌλ‘ λμ°©νλ€.
π μμ± λ°μ΄ν°λ₯Ό λ²νΌμ μ μ₯ν ν μ€λμ€λ‘ μ¬μμν¨λ€.
βͺ μ½λ

const TtsScreen = () => {
const inputText = useSelector(state => state.tts.inputText);
const isPlaying = useSelector(state => state.tts.isPlaying);
const dispatch = useDispatch();
// recorder
const [player] = useState(() => new Player());
const handleText = (text) => {
dispatch(setInputText(text));
}
const handleBtn = async () => {
if (isPlaying) return;
if (inputText) {
if (player.audioContext) {
player.audioContext.resume().then(() => {
player.connect(inputText, scenario.url,
() => { dispatch(setIsPlaying(false)); }
);
})
}
dispatch(setIsPlaying(true));
} else toastError("λ¨Όμ λ¬Έμ₯μ μ
λ ₯ν΄μ£ΌμΈμ.");
}
useEffect(() => {
player.init();
}, []);
return (
<>
<textarea text={inputText} onChange={(text) => {handleText(text)}} />
<button onClick={() => handleBtn()} />
</>
)
}

const Player = function() {
this.audioContext = null;
this.buffers = [];
this.source = null;
this.playing = false;
this.init = function() {
const audioContextClass = (window.AudioContext ||
window.webkitAudioContext ||
window.mozAudioContext ||
window.oAudioContext ||
window.msAudioContext);
if (audioContextClass) {
return this.audioContext = new audioContextClass();
} else {
return toastError("μ€λμ€ κΆν μ€μ μ νμΈν΄μ£ΌμΈμ.");
}
}
this.addBuffer = function(buffer) {
this.buffers.push(buffer);
}
this.connect = function(inputText, url, onComplete = f=>f) {
const self = this;
const path = `ws://112.220.79.221:${url}/ws`;
let socket = new WsService({
path : path,
onOpen : function(event) {
console.log(`[OPEN] ws://112.220.79.221:${url}/ws`);
socket.ws.send(inputText);
console.log(`[SEND] ${inputText}`)
},
onMessage : function(event) {
console.log("[MESSAGE]");
console.log(event.data);
if (event.data.byteLength <= 55) return;
self.addBuffer(new Int16Array(event.data));
self.play();
},
onClose : function(event) {
console.log("[CLOSE]");
socket.ws = null;
onComplete();
}
});
}
const wait = function() {
this.playing = false;
this.play(); // λ€μ chunk μ€λμ€ λ°μ΄ν° μ¬μ
}
this.play = function() {
if (this.buffers.length > 0) {
if (this.playing) return;
this.playing = true;
let pcmData = this.buffers.shift();
const channels = 1;
const frameCount = pcmData.length;
const myAudioBuffer = this.audioContext.createBuffer(channels, frameCount, 22050);
// νμ΄νΈ λ
Έμ΄μ¦λ‘ λ²νΌλ₯Ό μ±μ΄λ€.
for (let i = 0; i < channels; i++) {
const nowBuffering = myAudioBuffer.getChannelData(i, 16, 22050);
for (let j = 0; j < frameCount; j++)
nowBuffering[j] = ((pcmData[j] + 32768) % 65536 - 32768) / 32768.0;
}
pcmData = null;
this.source = this.audioContext.createBufferSource();
this.source.buffer = myAudioBuffer;
this.source.connect(this.audioContext.destination);
this.source.start();
this.source.addEventListener("ended", wait.bind(this)); // μ€λμ€ μ’
λ£ μ μ΄λ²€νΈ
}
}
}
π΄ Web Audio API
π AudioContext κ°μ²΄λ μ¬λ¬ κ°μ Audio Nodeλ€λ‘ ꡬμ±λμ΄ μλ€.
π μΌλ°μ μΈ μμ νλ¦μ λ°λΌ Inputs(μ λ ₯), Effects(ν¨κ³Ό), Destination(μΆλ ₯) λ Έλ μμλ‘ μ΄λ£¨μ΄μ§λ€.
π νμμ λ°λΌ λ Έλλ€μ μμ±νκ³ , κ° λ Έλλ€μ μ°κ²°ν΄μ€μΌ νλ€.
AudioContext μμ±
π₯ Web Audio APIμ λͺ¨λ κΈ°λ₯μ AudioContext κ°μ²΄λ₯Ό μμ±νλ©΄μ μμλλ€.

// Audio Contextλ₯Ό Webkit/Blink λΈλΌμ°μ λ²μ μ ν¬ν¨νμ¬ μμ±
const audioContext = new (window.AudioContext || window.webkitAudioContext)();
AudioBuffer μμ±

// μ€λμ€λ₯Ό μ¬μμν€κΈ° μν΄ νμν μ€λμ€ λ²νΌλ₯Ό μμ±
const audioBuffer = this.audioContext.createBuffer(numOfChannels, length, sampleRate);
// lengthλ numSeconds x sampleRateμ κ³±μΌλ‘ ννλκΈ°λ νλ€.
AudioBufferSourceNode μμ±
π₯ AudioBufferSourceNodeλ AudioBufferλ₯Ό μμμΌλ‘ μ λ ₯λ°λ κ°μ²΄μ΄λ©°, μ£Όλ‘ 45μ΄ μ΄λ΄μ μ§§μ μ€λμ€λ₯Ό λ¨ 1ν μ¬μνλ μ©λλ‘ μ¬μ©λλ€. ν λ² μ¬μλλ©΄ κ°λΉμ§μ»¬λ ν°μ μν΄ μ κ±°λλ€.
π₯ 45μ΄ μ΄μμ κΈ΄ μ€λμ€λ MediaElementAudioSourceNodeλ₯Ό μ¬μ©νλλ‘ κΆμ₯λλ€.

// λ©μλ λ°©μμΌλ‘ AudioBufferSourceNode κ°μ²΄λ₯Ό μμ±νλ€.
// μμ±μ λ°©μμ μΌλΆ λΈλΌμ°μ μμ μ§μλμ§ μμ μλ μλ€.
const audioBufferSourceNode = audioContext.createBufferSource();
// AudioBufferλ₯Ό μμμΌλ‘ μ£Όμ
ν΄μ€λ€.
audioBufferSourceNode = audioBuffer;
Audio Graph μ°κ²°

audioBufferSourceNode.connect(audioContext.destination);
μμ μ¬μ

audioBufferSourceNode.start();

const WsService = function({path, onOpen = f=>f, onMessage = f=>f, onClose = f=>f}) {
this.ws = new WebSocket(path);
this.initMsg = '{"language":"ko","intermediates":true,"cmd":"getsr"}';
this.ws.binaryType = "arraybuffer";
this.ws.onopen = function(event) { onOpen(event); };
this.ws.onmessage = function(event) { onMessage(event) };
this.ws.onclose = function(event) { onClose(event) };
}
π STT
βͺ μλ리μ€
π STT ν μ€νΈ μ§νμ μν μ€λμ€ λ Ήμμ μν΄ μΉ λ§μ΄ν¬ κΆνμ νμ©ν΄μ€λ€.
π λ Ήμ λ²νΌμ λλ₯΄κ³ λ°νλ₯Ό μ§ννλ©°, λ°ν μΈμμ μ§μμ μΌλ‘ μ§νλλ€.
π STT ν μ€νΈλ₯Ό μ€λ¨νκ³ μΆμΌλ©΄ λ²νΌμ λ€μ ν΄λ¦νλ©΄ λλ€.
βͺ μ½λ

const SttScreen = () => {
const inputText = useSelector(state => state.stt.inputText);
const isPlaying = useSelector(state => state.stt.isPlaying);
const dispatch = useDispatch();
// ws
const [recorder] = useState(() => new Recorder());
const handleText = (text) => {
dispatch(setInputText(text));
}
const handleBtn = useCallback(event => {
dispatch(setIsPlaying(!isPlaying));
if (isPlaying) {
recorder.stop();
} else {
dispatch(setInputText(""));
recorder.start();
}
}, [isPlaying]);
useEffect(() => {
recorder.init({
path : scenario.url,
// λ
Ήμ μμ
onStart : function(event) {
console.log("λ
Ήμ μμ : λ§μ΄ν¬ on ν΄μ€κ²");
setSequence({
segments : [0, 60],
forceFlag : false,
})
dispatch(setIsPlaying(true));
},
// μΈμλ κ²°κ³Ό μ λ¬λ°μ
onResult : function(text, isRepeat) {
console.log("κ²°κ³Ό λ°μμ : νλ©΄μ μΆκ°ν΄μ€κ²");
dispatch(setInputText(text));
},
// μ’
κ²° μ λ¬ λ°μ
onClose : function(event) {
console.log("κ²°κ³Ό μ λ¬ λλ¬μ : λ§μ΄ν¬ off ν΄μ€κ²");
setSequence({
segments : [0, 1],
forceFlag : false,
})
dispatch(setIsPlaying(false));
},
// μλ¬ λ°μ
onError : function(event) {
toastError("μλ²μμ μ°κ²°μ μ€ν¨νμμ΅λλ€.");
}
});
}, []);
return (
<>
<textarea text={inputText} onChange={(text) => {handleText(text)}} />
<button onClick={() => handleBtn()} />
</>
)
}
export default SttScreen;

const Recorder = function() {
this.serverUrl = null;
this.audioContext = (window.AudioContext || window.webkitAudioContext || window.mozAudioContext || window.oAudioContext || window.msAudioContext);
this.context = null; // new audioContext
this.audioInput = null;
this.recorder = null;
this.recording = false;
this.stream = null;
this.wsServiceConfig = {
repeat : "repeat",
drop : "nodrop",
};
this.callback = {
onStart : f=>f,
onResult : f=>f,
onClose : f=>f,
onError : f=>f,
}
this.webSocket = null;
navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia;
const self = this;
this.init = function({path, onStart, onResult, onClose, onError}) {
// μλ² μ΄κΈ° μ°κ²° : repeat / drop μ€μ
self.serverUrl = `wss://ai-mediazen.com:${path}/ws`;
const wsService = new WsService({
path : self.serverUrl,
onOpen : function(event) {
if (wsService.ws.readyState === 1) {
wsService.ws.send(wsService.initMsg);
}
},
onMessage : function(event) {
const _arr = event.data.split(",");
self.wsServiceConfig.repeat = (_arr[1] === "0")? "norepeat" : "repeat";
self.wsServiceConfig.filedrop = (_arr[2] === "0")? "nodrop" : "drop";
wsService.ws.close();
},
});
// μ½λ°± μ μ₯
self.callback.onStart = onStart;
self.callback.onResult = onResult;
self.callback.onClose = onClose;
self.callback.onError = onError;
}
this.setup = async function() {
try {
this.stream = await navigator.mediaDevices.getUserMedia({ audio : {optional: [{echoCancellation:false}]}, video : false });
} catch (err) {
return console.log("getStream μ€λ₯");
}
this.context = new this.audioContext({
sampleRate : 16000,
});
this.audioInput = this.context.createMediaStreamSource(this.stream);
const bufferSize = 4096;
this.recorder = this.audioInput.context.createScriptProcessor(bufferSize, 1, 1); // mono channel
this.recorder.onaudioprocess = function(event) {
// λ
Ήμ λ°μ΄ν° μ²λ¦¬
if (!self.recording) return;
self.sendChannel(event.inputBuffer.getChannelData(0));
}
this.recorder.connect(this.context.destination);
this.audioInput.connect(this.recorder);
}
this.sendChannel = function(channel) {
console.log("[Recorder] process channel");
const dataview = this.encodeRAW(channel);
const blob = new Blob([dataview], { type : "audio/x-raw" });
// send : μ΅μ΄ 1ν μ°κ²°
if(!self.webSocket) {
self.webSocket = new WsService({
path : self.serverUrl,
onOpen : function(event) {
console.log("open");
self.webSocket.ws.send("{\"language\":\"ko\",\"intermediates\":true,\"cmd\":\"join\"}");
},
onMessage : function(event) {
console.log(event.data);
const receiveMessage = JSON.parse(event.data);
const payload = JSON.stringify(receiveMessage.payload);
const textMessage = JSON.parse(payload);
if (receiveMessage.event === "reply") {
console.log("μμ");
self.recording = true;
}
if (receiveMessage.event === "close") {
console.log("closed");
if(textMessage.status){
self.callback.onResult(event, "norepeat");
}
self.webSocket.ws.close();
self.webSocket = null;
self.stop();
self.callback.onClose(event);
}
// κ²°κ³Όλ¬Ό
else if (textMessage.text) {
self.callback.onResult(textMessage.text, self.wsServiceConfig.repeat);
}
else if (textMessage.stt) {
self.callback.onResult(textMessage.stt[0].text, self.wsServiceConfig.repeat);
}
}
});
}
if(self.webSocket.ws.readyState === 1) {
if(blob.size > 0) self.webSocket.ws.send(blob);
}
}
this.encodeRAW = function(channel) {
const buffer = new ArrayBuffer(channel.length * 2);
const view = new DataView(buffer);
let offset = 0;
for (let i = 0; i < channel.length; i++, offset+=2){
const s = Math.max(-1, Math.min(1, channel[i]));
view.setInt16(offset, s < 0 ? s * 0x8000 : s * 0x7FFF, true);
}
return view;
}
this.start = function() {
console.log("[Recorder] start");
if(!self.recorder) {
this.setup();
}
self.recording = true;
}
this.stop = function() {
console.log("[Recorder] stop");
this.close();
self.recording = false;
}
this.close = function() {
if(this.webSocket && this.webSocket.ws.readyState === 1) {
console.log("closed");
this.webSocket.ws.send("{\"language\":\"ko\",\"intermediates\":true,\"cmd\":\"quit\"}");
}
}
}
export default Recorder;
π΄ Web Audio API μ€λμ€ λ Ήμ
AudioContext μμ±

const audioContext = (
window.AudioContext ||
window.webkitAudioContext ||
window.mozAudioContext ||
window.oAudioContext ||
window.msAudioContext
);
const context = audioContext({ sampleRate: 16000, });
π audioContextλ₯Ό μμ±νλ©΄μ μ΅μ μΌλ‘ sampleRate 16000μ μ€μ νλ€.
MediaStream μμ±

const stream = await navigator.mediaDevices.getUserMedia({
audio : {optional: [{echoCancellation:false}]},
video : false
});
π μ¬μ©μμκ² λ―Έλμ΄ μ λ ₯ μ₯μΉ μ¬μ© κΆνμ μμ²νλ©°, μ¬μ©μκ° μλ½νλ©΄ μμ²ν λ―Έλμ΄ μ’ λ₯μ νΈλμ ν¬ν¨ν MediaStreamμ λ°ννλ€.
MediaStreamAudioSourceNode μμ±

const audioInput = context.createMediaStreamSource(stream);
π MediaStreamμ λ§€κ°λ³μλ‘ νμ¬ μ€λμ€λ₯Ό μ¬μνκ³ μ‘°μν μ μλ κ°μ²΄μΈ MediaStreamAudioSourceNodeμ μμ±νλ€.
ScriptProcessorNode μμ± (Deprecated)

const bufferSize = 4096;
const recorder = this.audioInput.context.createScriptProcessor(bufferSize, 1, 1); // mono channel
recorder.onaudioprocess = function(event) {
// λ
Ήμ λ°μ΄ν° μ²λ¦¬
self.sendChannel(event.inputBuffer.getChannelData(0));
}
π ScriptProcessorNode μΈν°νμ΄μ€λ λ κ°μ λ²νΌμ μ°κ²°λ μ€λμ€ λ Έλ μ²λ¦¬ λͺ¨λμ΄λ€.
π νλλ μ λ ₯ μ€λμ€ λ°μ΄ν°λ₯Ό ν¬ν¨νκ³ , λ€λ₯Έ νλλ μ²λ¦¬λ μΆλ ₯ μ€λμ€ λ°μ΄ν°λ₯Ό ν¬ν¨νλ€.
π AudioProcessingEvent μΈν°νμ΄μ€λ₯Ό ꡬννλ μ΄λ²€νΈλ μ λ ₯ λ²νΌμ μ λ°μ΄ν°κ° ν¬ν¨λ λλ§λ€ κ°μ²΄λ‘ μ μ‘λλ©°, μΆλ ₯ λ²νΌλ₯Ό λ°μ΄ν°λ‘ μ±μ°λ©΄ μ΄λ²€νΈ νΈλ€λ¬κ° μ’ λ£λλ€.
Audio Graph μ°κ²°

recorder.connect(this.context.destination);
audioInput.connect(this.recorder);
'React > Concept' μΉ΄ν κ³ λ¦¬μ λ€λ₯Έ κΈ
[ React ] Props (0) | 2022.07.14 |
---|---|
[ React ] Hook (0) | 2022.05.28 |