Friday, 26 October 2012

Week 5: 26/10/2012


This week we reviewed the test we had sat two weeks ago, so that we could see what the correct answers were and review what we need to learn.

After that we went to the lab and for each of the previous waveforms (the sound files from last week), identified the largest and smallest amplitude measured. The find the difference in decibels and calculate the ratio.

For the englishwords2 file, I found the largest amplitude to be -2dB and the smallest to be -27dB. I then used the formula:

10log[10](maxAmp/minAmp)

To work out the ratio which was 316. I then did the Sopran ascenddescend(1) files which had the largest amplitude of -2 dB and the smallest at -21dB, it had a ratio of 79.

Notes to remember:

  • The spectrum is measured on a graph as amplitude Vs frequency.
  • The ratio is also know as the dynamic range.

I hope to get a copy of the test so that I can write notes of what I need to remember in more detail.

Friday, 19 October 2012

Week 4: 19/10/2012


In today’s lecture we discussed the human ear and how it processes sound.

To start off we discussed the structure of the human ear. Below is a sectional view of the human ear.


What does the human ear do?
The ear will receive the sound then it will change it into something that our brain can understand, similar to how a computer will change it to binary.  The sound will go through the ear canal to the ear drum, this will then vibrate. Due to the vibrations, the ossicles will start to do its bit and send the vibrations and their frequencies through the cochlea.

The cochlea then decides if the frequency is high, medium or low using the small hair cells within it. High frequencies are picked up at the begin of the cochlea as they die of quicker than lower ones, which are picked up later on. Below shows the cochlea structure:



Below shows the frequency response of the cochlea:



Below shows the process that happens from hearing the sound and it getting to the auditory nerve:



A few features of the auditory process are that:
  -It separates the left and right ear signals.
-It separates low and high frequency information.
-It also separates timing from intensity information.
-A two channel set of time-domain signals in contiguous and non-linearly spaced frequency bands.
-At various specialised processing centres in the hierarchy it can re-integrate and re- distribute.

Different animals have different audible frequency ranges, for example bats have such a high range that they use there hearing to map out a landscape but humans do not possess this ability as our hearing range is lower. Below is a graph showing some ranges for a few animals:



We discussed the “normal” hearing in humans and the ranges that we have. Below is a slide from the lecture showing these.



The MPEG/MP3 audio coding process uses lossy compression. This is where data the human would not perceive, if it was kept, is discarded by the computer to create space and get rid of useless information. It also uses psychoacoustic models which is a model of the human hearing. Below is a diagram of the process:



During the lab we used Soundbooth to edit a sound file so we could gain some knowledge on how the software works.

We then tried out some effects on the file, the following are what each effect did to the file as well as an explanation of what they do (for future reference):

Analogue Delay: This effect makes both echoes and subtle effects to the track.
Delays of 35 milliseconds or more create discrete echoes.
Delays of 15–35 milliseconds create a simple chorus or flanging effect. (The results won’t be as effective as the Chorus/Flanger effect, because the delay settings don’t change over time.)
Further reducing a delay to 10–15 milliseconds adds stereo depth to a mono sound.

Chorus/Flanger: This is a combination of two delay-based effects.
The chorus effect will stimulate several voices or instruments played at once by adding multiple short delays with a small amount of feedback.
This makes the edited track sound fuller and richer (like a chorus in a song).
Use this effect to enhance vocal tracks or add stereo spaciousness to mono audio.
The Flanger effect makes psychedelic, phase‑shifted sounds by mixing a varying, short delay with the original signal.
This makes the edited track sounds like the pitch is being slid up and down which creates the psychedelic feel.

Compressor: This effect will reduce the dynamic range, producing consistent volume levels and increasing perceived loudness.
Compression is particularly effective for voice-overs, because it helps the speaker stand out over musical soundtracks and background audio.
An Example would be classical music isn’t compressed and has dips in the volume where newer music has been fully compressed and has a consistent volume level.

Convolution Reverb: This effect will change the echoes in a track to make it sound like it is in a different space (closet, concert hall etc.).
Sound is bounced of surfaces like the ceiling, walls and floor when it is travelling to your ears. These reach your ears at almost the same time meaning that you don’t hear them and separate echoes, but as a sonic ambience that creates an impression of space. (Hall or cupboard)
Convolution-based reverbs use impulse files to simulate acoustic spaces. The results are incredibly realistic and life-like.

Distortion: Use the Distortion effect to simulate blown car speakers, muffled microphones, or overdriven amplifiers.

Dynamics: This effect is used as a compressor, limiter and expander.
 As a compressor and limiter, this effect reduces dynamic range, producing consistent volume levels.
As an expander, it increases dynamic range by reducing the level of low‑level signals. (With extreme expander settings, you can totally eliminate noise that falls below a specific amplitude threshold.)

EQ: Graphics: This effect boosts or cuts specific frequency bands and provides a visual representation of the resulting EQ curve.
Unlike the parametric equalizer, the graphic equalizer uses preset frequency bands for quick and easy equalization.
An example would be changing it to sound like someone is talking to you through an old telephone (muffled). Or changing the sound for a voice over.

EQ: Parametric: This effect provides maximum control over tonal equalization.
Unlike the graphics equalizer, that only gives a fixed number of frequencies to you, this one gives you total control over the frequencies.
For example, you can simultaneously reduce a small range of frequencies centered around 1000 Hz, boost a broad low-frequency shelf starting around 80 Hz, and insert a 60-Hz notch filter.

Mastering: This effect of optimizes audio files for a particular medium, such as radio, video, CD, or the web.
Before mastering audio, consider the requirements of the destination medium. If the destination is the web, for example, the file will likely be played over computer speakers that poorly reproduce bass sounds. To compensate, you can boost bass frequencies during the equalization stage of the mastering process.

Phaser: This effect is similar to flanging, it phasing shifts the phase of an audio signal and recombines it with the original, creating psychedelic effects.
But unlike the Flanger effect, which uses variable delays, the Phaser effect sweeps a series of phase-shifting filters to and from an upper frequency.
Phasing can dramatically alter the stereo image, creating unearthly sounds.

Vocal Enhancer : This will quickly improve the quality of voice over recordings.
It reduces sibilance and plosives, as well as microphone handling noise(low rumbles).
It will give vocals a characteristic radio sound.
The Music mode optimizes soundtracks so they better complement a voice-over.

Saturday, 13 October 2012

Week 3: 12/10/2012


This week in our lecture we went through a Powerpoint which was about Digital Processing. I collected a copy of the Powerpoint on to my pen drive so that I can go through it later in my own time and read it. We also had a multiple choice question paper today.

To start the day off we discussed a typical digital signal processing system. We discussed the steps it takes between recording sound and getting in into an electrical format, editing it, and then getting it back out again. Below is a diagram that I created showing the process.


The steps.
1. The signal is passed in via a microphone or other recording equipment.
2. The recording is then converted from analogue to digital (into binary numbers).
3. Editing is then done to the digital copy. E.g filtering, pitch warp, echo etc.
4. It is then changed from digital back into analogue.
5. It is then smoothed out.
6. The recording is passed back out edited.

The system cannot understand analogue signal so that is why they must be converted first and naturally we don't understand binary so must be converted back again. 

Next we spoke about why we would use digital processing, and the three main reasons are: 
Precision
Robustness
Flexibility 

Precision: The precision of the Digital Signal Processing System is, in theory, only limited only by the conversion process at input and out put (analogue to digital and digital to Analogue).
In practice, sampling rate (sampling frequency) and word length restrictions (number of bits) modify this.
However if the operating speed and word length of the modern digital logic is increased, this allows more areas of application.

However the increase operating speed and word length of modern digital logic is allowing many more areas of application.

Robustness: The Digital Processing Systems robustness is shown clearly when it is compared to the Analogues System. The Digital System is less susceptible to electrical noise (pick-up ) and component tolerance variations due to logic noise margins. 
Adjustments for electrical drift and component ageing are essentially removed; This is important for complex systems.
Inappropriate component values can also be avoid with the Digital System. E.g. Very large capacitors or inductors for Very Low Frequency filtering.

Flexibility: Due to the flexibility of the Digital Processing System, its programmability allows it to be upgraded and have its processing operations expanded easily without necessarily incurring large scale hardware changes.
Practical system with desired Time Varying and/or Adaptive characteristics can be constructed. 
All of this can only happen if a sound card is working and being used.


We also learned about sampling a signal, this is when the system samples the signal at a time, nT seconds. It then samples the signal ever period after that, T seconds. 


The rate that a signal is usually sampled at is double the frequency of the human hearing range. i.e. a signal heard at 10 Hz would be sampled at 20Hz.


Most modern sound cards support a 16 bit word length coding of quantised sample values. 
 This allows a representation of 2^16 (65536) different signal levels within the input voltage range of the card. Below is an example of this:


Below shows an example of quantising the signal amplitude. It would be rounded to the nearest value as shown. The red squares being the amplitude and the grey showing the quantised samples amplitude.



The Dynamic Range is the ratio of the largest signal amplitude to the smallest. Since a 16 bit word length allows 2^16 (65536) different siganls, the dynamic range (DR) is calculated as

DR = 20log([Voltage range]/[Quantisation step size]) d/b
DR = 20log(2^16) dB
DR = 96 dB

As the human ear has a dynamic range of greater than 120 dB, even “CD quality” reproduction has some compromise. 

After the lecture we went to the Lab where we all took part in a test to assess how much we have learned in the last 2 weeks. I feel that I was able to answer most of the question but a few did have me stumped. Once I get my result back I will know what sections I will need to revise more thoroughly. I will not be too disappointed with my mark as this will be a good chance to assess what needs to be addressed now before its too late.

During this week I plan to search for links to help with the previous weeks work that help me to understand and possible but them in those weeks blogs.