[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sox-users
Subject:    Re: [SoX-users] sound to sox to .txt to chatscript
From:       Jan Stary <hans () stare ! cz>
Date:       2014-12-31 14:25:44
Message-ID: 20141231142544.GA27660 () www ! stare ! cz
[Download RAW message or body]

On Dec 30 22:57:32, 4-werk@gmx.com wrote:
> Sorry Jan I should have replied quicker.
> You wrote ???Does that mean you want the *.txt file to be a text transcript of the \
> speech???? My answer is almost but not quite, I want the *.txt file to be a \
> ???............pattern transcript of the speech. I want sox to transcribe a \
> twentieth of a second length of microphone in put into a 32 bit binary pattern in \
> a. .txt file.

I still don't know what you want. What kind of 32bit pattern?
Individual 32bit samples (assuming it's a 32bit recording)?
What does it have to do with a "txt" file, whatever that means?

For example, here is a tweniteth of a second of sound:
$ sox -c 1 -r 48000 -n file.wav synth 0.05 sine 440
What would the corresponding "32bit pattern" be?

> By devied I mean splitting the mono input from the mic to stereo tracks
> that can be processed separately. Sorry if my switching between talking about \
> tracks and files as the same thing caused any confusion.

Splitting a mono input to stereo tracks surely is confusing.

> Jan said ???The signal itself carries that information. why do you need the echo, \
> exactly???? I will try to explain it like this. the sound coming in from the mic \
> has all of the information about the sound at that instant. If you where to make an \
> image that instant of sound, and stack it next to images of all of the proceeding \
> instances, you would have a picture of how the sound had changed over time.

Simply, the soundwave. Now what?

> A conventional speech to text system dose something like that but with out the \
> pretty picture and uses statistical analysis to try to understand the sound. By \
> adding the multiple echoes I will be adding in the information about how the sound \
> has changed.

No you won't. The original sundwave already contains that information.

> Imagine just 100 samples per second each 32 bits in size

With a samplerate of 100Hz, you cannot record speech.

> and containing the information about how the sound had changed.
> Each of these 32 bit files should unequally pass the information
> on to the chatscript side of the project to print to the screen the letter or \
> letters hat match the sounds

This just means passing the sequence of individual samples,
i.e. the soundwave, to some speech-to-text program. I still
don't know why you want to cut it to individual samples first.

> to look at this an other way. A sample contains its own information, but nothing \
> about the samples that preceded it. the echoes smear the information across \
> samples.

Riiight. So make 100 echoes for a second of your 100Hz recording.
That way, the last sample will contain all of the information.

> The reference to no clobber comes from the sox PDF. I read it and thought I was \
> understanding it until I to the bit that said that the effect that stops the train \
> has to be first. 

The only reference to "clobber" I can see in the manpage
are the --clober and --no-clober options, meaning do (do not, resp.)
overwrite output files without asking.

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Sox-users mailing list
Sox-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sox-users


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic