Friday, 14 December 2012

Week 12: 14/12/2012


Today was the last day of the trimester and it consisted of us sitting the last assessment for the class.

I felt that I did not too badly in the final test as I scored 11.67 out of 20. This was an improvement on the last two test as I score 10 out of 20 on both of them.

I feel that this class gave me a better insight on sound and images as I now understand how to make both a better quality as well as sound and image techniques that make the respecting item sound or look a certain way.
Also it gave me a better understanding on file sizes and the best compression to use for both video and images so that the quality is not affected. I feel that this class what very beneficial.

Friday, 7 December 2012

Week Eleven: 07/12/2012


This week in the lecture we did some tutorial questions as there are no more lectures to do.

The questions where are follows:

Q1. A true colour image
640x512 pixels
RAW uncompressed
What is the minimum storage space required?

Ans. Total = 640x512x3
=~ 0.9mb

Q2. If a video player played images of the above type at 30fps
What is the bitrate?

Ans. 30x0.9mb/s
30x8x0.9mb/s (the 8 is due to the bitrate)
=235mb/s

Q3. Video(including audio) is transmitted
1080x1024
120fps
3bytes per pixel
3D
AUDIO 24 bit sample
96kHz sample rate
7 channel surroundsound
Lasts 90mins
If no compression, what is the minimum file size?

Ans. 262069862400 bytes

These might be useful for revision for the test next week.

LAB
In the lab I went through all my blogs to check they were up-to-date and put in some links and picture that I had previously forgot to put in.
I also uploaded the video from last week to YouTube, I was having a problem with it at home.
I feel that my blogs have enough information to help me pass my test next week.

AT HOME 
Between now and next week i will look at the example tests on moodle and revise all the Powerpoint and my blog to ensure I pass the test next week.

Week 10: 30/11/2012


Today in the lecture we spoke about moving images and digital videos.

Warning - There is a lot of text here! BUT it is VERY useful so look at it!

Moving Images
We spoke about persistence vision and how it is a theory that states that the human eye will remember and image for one twenty-fifth of a second on the retina and this gives the brain the illusion that the image is moving.
This, however, is and old idea and is now regarded as the myth of persistence of vision.
A more plausible theory to explain motion perception are two distinct perceptual illusions:



Digital Video
We also spoke about the amount of space need for videos.
Uncompressed HD video files could be large, for example, 3bytes per pixel, 1920x1080 by 60 frames per second. This equals 373.2 megabytes per second.
This is approximately 1 gigabyte every three seconds.
Even as we stand today this is an extreme amount of data.
This is why we have many varieties of compression algorithms and standards to dramatically reduce the amount of data used in video storage, processing, streaming and transmission.

Below is some VERY IMPORTANT  terminology:



We then spoke of different video file formats:

MPEG-1 - Development of the MPEG-1 standard began in 1988, Finalised in 1992 and the first MPEG-1 decoder was made available. Compressing video to about 26:1 and audio 6:1, the MPEG-1 format was designed to compress VHS quality raw digital video and CD audio with a minimum of quality loss.
Today, it is the most widely compatible lossy compression format in the world. (ie very blocky compression artifacts) The MPEG-1 standard is part of the same standard that gives us the MP3 audio format. Fortunately, the MPEG-1 video and Layer I/II audio can be now be implimented in applications royalty free and without license fees, since the patents expired in 2003.



MPEG-2 - The MPEG-2 format was an improvement on the MPEG-1 format. The MPEG-1 format had less efficient audio compression, and was restricted when it came to the packet types it accepted. It also did not support interlaced video.
MPEG-2 is the format of choice for digital televison broadcasts.
Work on the MPEG-2 began 1990 — before the first draft of MPEG-1 was ever written. It was intended to extend the MPEG-1 format to provide full broadcast quality video at high bitrates, between 3 and 5 Mbits/s.



MPEG-4 - Is essentially a patented collection of methods to define compression of video and audio, designating a standard for a group of audio and video codecs.(coder/decoder) MPEG-4 encompasses many of the features of MPEG-1 and MPEG-2, while adding support for 3-D rendering, Digital Rights Management (DRM), and other types of interactivity.

QuickTime - Appeared in 1991 under a proprietary license from Apple, beating Microsoft’s Video for Windows to the "market" by nearly a full year.
QuickTime video playback. Possibly the best of the Linux programs that can handle playback of most QuickTime files are VLC and MPlayer, both of which are in the PCLinuxOS repository.
In 1998 the ISO approved the QuickTime file format as the basis of the MPEG-4 file format. The benefit is that MOV and MP4 files (containers) are interchangeable on a QuickTime-only environment (meaning running in an "official" QuickTime player, like QuickTime on the Mac OS X or QuickTime for Windows), since both use the same MPEG-4 codecs.

AVI (Audio Video Interleave) - appeared in1992 by Microsoft as a part of its Video for Windows technology. It is basically a file container the allows synchronized audio and video playback.
Since AVI files do not contain pixel aspect ration information, and many players render AVI files with square pixels, the frame (image) may appear stretched or squeezed horizontally when played back. However, VLC and MPlayer have solved most problems related to the playback of AVI files.
Although being "older" technology, there is a benefit to using AVI files. Because of it being around for so long, coupled with Microsoft’s market penetration, AVI files can frequently be played back on the widest variety of systems and software, second only to MPEG-1. It has gained widespread acceptance and adoption throughout the computer industry, and can be successfully played back, so long as the end user has the proper codec installed to decode the video properly. Additionally, the AVI format is well documented, not only from Microsoft, but also many, many third parties.


WMV(Windows Media Video) - is made with several different proprietary codecs, made by Microsoft. It has gained adoption for use with BluRay discs.
The WMV files are often wrapped in the ASF, or Advanced Systems Format. WMV files, themselves, are not encoded. Rather, the ASF wrapper is often responsible for providing the support for Digital Rights Management, or DRM. Based on Windows Media 9, WMV files can also be placed inside an AVI container.  In that case, the WMV file claims the AVI file extension.
WMV files can be played on PCLinuxOS, using VLC, MPlayer, or most any other program that uses the FFmpeg implementation of the WMV codecs.


3GP - is actually two similar formats. The first, 3GPP, is designed as a container format for GSM phones (in the U.S., primary GSM wireless carriers are AT&T and T-Mobile). The second, 3GPP2, is designed as a container format for CDMA phones (in the U.S., primary CDMA wireless carriers are Verizon and Sprint). 3GPP files will often carry a 3GP file extension, while 3GPP2 files will often carry a 3G2 file extension.
(A little complex) 3GP and 3G2 files store video streams using MPEG-4 Part 2, H.263, or AVC/H.264 codecs. Some cell phones will use the MP4 file extension to represent 3GP video. Both formats were designed to decrease storage and bandwidth requirements to accommodate mobile phones.
Software support under PCLinuxOS is, once again, achieved with VLC and MPlayer. Additionally, 3GP files (and most 3G2 files) can be encoded and decoded with FFmpeg.

FLV - Flash Video, are a file container format used primarily to deliver video over the Internet. In fact, it has become the defacto format of choice for such sites as YouTube, Google Video, Yahoo! Video, Metacafe, and many news outlets.While the FLV format is an open format, the codecs used to produce FLV files are mostly patented. The most common codecs used are the Sorenson Spark (H.263 codec variant) and On2’s VP6. FLV files can also be encoded as H.264 as in the more recent releases of Adobe Flash.



The great the compression on a video means the greater loss of information.
Algorithms that compress video predicatively still have problems with fast unpredictable and detailed motion!
Automatic Video Quality Assessment could be the solution.

LAB

In the lab we produced an edited video using clips and music given to us. Below is a link to the finished video that I made (there was an issue with time at the end so the music goes a bit strange sorry):



Friday, 23 November 2012

Week 9: 23/11/2012


Today we spoke about Digital Image Processing and why we use it.

We use it for editing pictures, it provides a flexible environment for successive experimental attempts to achieve some desired effect.
It allows us to manipulate, enhance and transform photos that are not available when using darkroom based photography.

We also spoke about digital camera imaging system and digital camera image capture. Below are the slides from the powerpoint as they describe these better than I would:



We spoke about pixelization, this can be seen by the human eye if the sensor array resolution is too low. If you increase the number of cells in the sensor array then the resloution of the image will also increase. Modern sensor devices have more than one million cells.
Below is a picture of pixelization:


To capture images in colour, red, green and blue filters are placed over the photocells. 
Each cell is assigned three 8 bit numbers (giving 2^8 = 256 levels) corresponding to is red, green and blue brightness value e.g. A pixel has:
  • red brightness level of 227
  • green level of 166
  • blue level of 97
Below are the slides for the digital camera optics and the digital image fundamentals slides (for extra info):



Pixels are individually coloured, they are only ab approximation of the actual subject colour.

The Dynamic range of a visual scene is effectively the number of colours or shades of grey(grey scale).
However, the range of digitized images are fixed by the number of bits(bit-depth) the digital system uses to represent each pixel.
This determines the maximum number of colours or shades of grey in the palette.

Below is an image of a typical digital image processing system looks like:


We spoke about what digital image processing was:



Analysis:






Manipulation:




Enhancement:










NOTE: http://lodev.org/cgtutor/filtering.html A site that lets you look at how filtering works.

Transformation:



In the lab:

In the lab we looked at tutorial videos for Adobe Premiere Pro CS4.
The like to the tutorial site is :



Friday, 16 November 2012

Week 8: 16/11/2012


This week in the lecture we discussed light and how it moves in the air.

What is light?
Light is a form of energy detected by the human eye, unlike sound, it does not require a medium to propagate. It can travel from the Sun through the vacuum of outer space to reach Earth.

Light is a transverse wave. [Traverse Wave - is like a rippling pond in which the water molecules move away from the disturbance.]

We also discussed light waves and the rage visible to humans. Below is a slide explaining this.


Below is the electromagnetic Spectrum:


We also discussed the velocity of light and how it changes when it passes through air or glass. In air light travels about one million times faster that the speed of sound(33 m/s).



We also briefly discussed the frequency and wavelength of a lightwave but we have already discussed this in week 2.

Also we talked about the visible light spectrum and how light bends when passed  through a prism, this is due to the density bending them.

We also discussed it affects on the environment such as reflection. I will be review these more at a later date.

In the Lab we took an image of a church tower and edited it using Photoshop.

We applied a default filter to the image and it cause it to get lighter and have white pixel areas on it.
We then applied preset masks and they did the following:

High Pass - made the image become grayer.
Maximum - made the image go lighter and blurry.
Minimum - made the image go darker and blurry.

I then created a custom filter that had all zeros except the center value, which was one, this cause nothing to happen to the image. This happens because when the filter is added to the pixel it doesn't change the value enough for us to notice a change.

I then created create a new filter with a two by two matrix of one’s at its center, this made the picture do so much bright the picture is nearly all white.

I then create a new filter with a two by two matrix with +one’s on one diagonal and –one’s on the other, this made the picture go almost entirely black, only some faint white outlines are left.



Friday, 9 November 2012

Week 7: 09/11/2012


This week in the lecture we discussed lighting and how its position will affected pictures.

What positions of lights are there and what effect do they have?

Front lighting will cover the picture subject in light.

Side-lighting will create some shadows, this is good for drawing portraits.

Back-lighting will cover the picture subject in shadows.

A more in depth explanation of tall three:

Front light
Lighting a subject directly from the front removes quite a bit of depth from the resulting image. To accomplish a front lighting effect without losing your depth, have a light on each side of the camera, about 45 degrees upward, pointing down at the subject. This setup gives a wider front light that seems less intense and can preserve the depth of the subject.

Side light
Side light is great for emphasizing the shape and texture of an object. It clarifies an object's form, bringing out roughness and bumps. A blend between front and side light is common, as it communicates shape and form, while softening the flaws that direct side lighting can reveal.

Back light
Back light is wonderful for accentuating edges, and emphasizing the depth of an image. Back light often gives a thin edge of light around objects, called rim lighting, although it's hard to see it if the light is positioned directly behind the subject. Giving a foreground object a rim light will make it stand out from the background, accentuating the division in depth.


There is also:

Top light
Direct top light alone can make for a very sad and almost spooky feeling. Although we're used to seeing subjects lit from above (sunlight and most indoor lighting), there are usually other light sources filling in the shadows. Therefore, to achieve this effect, fill lights, if used, must be dramatically reduced in intensity.


Top-lighting
Bottom light
Bottom light is the light we're least accustomed to seeing. It has an intense impact when used, making objects look completely different and often sinister.

Bottom-lighting

We also spoke about how the higher the contrast on a picture the clearer it will look.


We then went to the Lab where we completed an exercise on editing a sound file to remove a unwanted sound and edit the file with effects to create the illusion of being in a room.

To do this I cut out the unwanted sound so only the speech was left. I then added a reverb onto the file to create to feel that the speaker was in a church or other large area.

I also increase the volume of the speaker to give the appearance that he was angry by making the sample louder and adding a fade in effect.
I feel the exercise was a success as ~I managed to complete it without any help and the file sounded great when it was done.

Friday, 2 November 2012

Week 6: 02/11/2012


This week in the Lecture we did another test to see how much we have learned and improved since the last one.

We then went to the lab and used Soundbooth to edit the SopranAscendDescend file again.

When the Compression effect, For Voice – Moderate, was applied the waveforms amplitude decreased in size as well as the sound becoming quieter.

The Use of Audio Compressors - What are the uses?

The first, and mostly the only, reason to use compressors should be for the sound. If used properly, a compressor – or more correctly a limiter - will place an absolute cap on the maximum level that can be passed.
This is invaluable for preventing a large PA system from distorting, or making certain that the ADC (Analogue to Digital Converter) does not clip (exceed the maximum conversion voltage).
Digital distortion is extremely unpleasant, and is to be avoided, as with all forms of hard clipping.
There are many other reasons to use compression, for example, many instruments don’t have the sustain that musicians desire. So by using compression, as the signal fades, the compressor increases its gain, so the note lasts longer.
For more examples and more reasons on why it’s used visit: http://sound.westhost.com/compression.htm#why_use

Spectral Frequency Display



This is the spectral frequency display of the file we are editing; it’s consistent with the waveform.



This is the spectral frequency display of the new file, englishwords2, we are editing; it’s consistent with the waveform. Also it is in spikes rather than lines as the words are spoken and not a constant sound like singing.

Reverb
After applying convolution reverb, clean room – aggressive, I found that the file had taken on a more computerizes quality. It sounds more robotic.
After applying convolution reverb, roller disco - aggressive, I found that the file had taken a more echo like quality. It sounds like it was recorded in a large open room.

How reverb is created in a Room
It is created when the sound is bounced off the walls, floor and ceiling and returns back to the recording equipment making the reverb.

 How reverb is created by a computer on a wav file
 The computer takes the track and adds a distorted version of it to the original to create the illusion of reverb.

Computer Speech Transcription
 I tried out this feature on the computer and it managed to pick out a few words with little problem but some of the other words it had difficulty with. It thought some words were in fact two, for example freedom - free and.

The spectrogram, the one above, shows that most of the speech energy is coming from the middle if the word spoke. The begin and end of the word does not have the same amount of speech energy.

Friday, 26 October 2012

Week 5: 26/10/2012


This week we reviewed the test we had sat two weeks ago, so that we could see what the correct answers were and review what we need to learn.

After that we went to the lab and for each of the previous waveforms (the sound files from last week), identified the largest and smallest amplitude measured. The find the difference in decibels and calculate the ratio.

For the englishwords2 file, I found the largest amplitude to be -2dB and the smallest to be -27dB. I then used the formula:

10log[10](maxAmp/minAmp)

To work out the ratio which was 316. I then did the Sopran ascenddescend(1) files which had the largest amplitude of -2 dB and the smallest at -21dB, it had a ratio of 79.

Notes to remember:

  • The spectrum is measured on a graph as amplitude Vs frequency.
  • The ratio is also know as the dynamic range.

I hope to get a copy of the test so that I can write notes of what I need to remember in more detail.

Friday, 19 October 2012

Week 4: 19/10/2012


In today’s lecture we discussed the human ear and how it processes sound.

To start off we discussed the structure of the human ear. Below is a sectional view of the human ear.


What does the human ear do?
The ear will receive the sound then it will change it into something that our brain can understand, similar to how a computer will change it to binary.  The sound will go through the ear canal to the ear drum, this will then vibrate. Due to the vibrations, the ossicles will start to do its bit and send the vibrations and their frequencies through the cochlea.

The cochlea then decides if the frequency is high, medium or low using the small hair cells within it. High frequencies are picked up at the begin of the cochlea as they die of quicker than lower ones, which are picked up later on. Below shows the cochlea structure:



Below shows the frequency response of the cochlea:



Below shows the process that happens from hearing the sound and it getting to the auditory nerve:



A few features of the auditory process are that:
  -It separates the left and right ear signals.
-It separates low and high frequency information.
-It also separates timing from intensity information.
-A two channel set of time-domain signals in contiguous and non-linearly spaced frequency bands.
-At various specialised processing centres in the hierarchy it can re-integrate and re- distribute.

Different animals have different audible frequency ranges, for example bats have such a high range that they use there hearing to map out a landscape but humans do not possess this ability as our hearing range is lower. Below is a graph showing some ranges for a few animals:



We discussed the “normal” hearing in humans and the ranges that we have. Below is a slide from the lecture showing these.



The MPEG/MP3 audio coding process uses lossy compression. This is where data the human would not perceive, if it was kept, is discarded by the computer to create space and get rid of useless information. It also uses psychoacoustic models which is a model of the human hearing. Below is a diagram of the process:



During the lab we used Soundbooth to edit a sound file so we could gain some knowledge on how the software works.

We then tried out some effects on the file, the following are what each effect did to the file as well as an explanation of what they do (for future reference):

Analogue Delay: This effect makes both echoes and subtle effects to the track.
Delays of 35 milliseconds or more create discrete echoes.
Delays of 15–35 milliseconds create a simple chorus or flanging effect. (The results won’t be as effective as the Chorus/Flanger effect, because the delay settings don’t change over time.)
Further reducing a delay to 10–15 milliseconds adds stereo depth to a mono sound.

Chorus/Flanger: This is a combination of two delay-based effects.
The chorus effect will stimulate several voices or instruments played at once by adding multiple short delays with a small amount of feedback.
This makes the edited track sound fuller and richer (like a chorus in a song).
Use this effect to enhance vocal tracks or add stereo spaciousness to mono audio.
The Flanger effect makes psychedelic, phase‑shifted sounds by mixing a varying, short delay with the original signal.
This makes the edited track sounds like the pitch is being slid up and down which creates the psychedelic feel.

Compressor: This effect will reduce the dynamic range, producing consistent volume levels and increasing perceived loudness.
Compression is particularly effective for voice-overs, because it helps the speaker stand out over musical soundtracks and background audio.
An Example would be classical music isn’t compressed and has dips in the volume where newer music has been fully compressed and has a consistent volume level.

Convolution Reverb: This effect will change the echoes in a track to make it sound like it is in a different space (closet, concert hall etc.).
Sound is bounced of surfaces like the ceiling, walls and floor when it is travelling to your ears. These reach your ears at almost the same time meaning that you don’t hear them and separate echoes, but as a sonic ambience that creates an impression of space. (Hall or cupboard)
Convolution-based reverbs use impulse files to simulate acoustic spaces. The results are incredibly realistic and life-like.

Distortion: Use the Distortion effect to simulate blown car speakers, muffled microphones, or overdriven amplifiers.

Dynamics: This effect is used as a compressor, limiter and expander.
 As a compressor and limiter, this effect reduces dynamic range, producing consistent volume levels.
As an expander, it increases dynamic range by reducing the level of low‑level signals. (With extreme expander settings, you can totally eliminate noise that falls below a specific amplitude threshold.)

EQ: Graphics: This effect boosts or cuts specific frequency bands and provides a visual representation of the resulting EQ curve.
Unlike the parametric equalizer, the graphic equalizer uses preset frequency bands for quick and easy equalization.
An example would be changing it to sound like someone is talking to you through an old telephone (muffled). Or changing the sound for a voice over.

EQ: Parametric: This effect provides maximum control over tonal equalization.
Unlike the graphics equalizer, that only gives a fixed number of frequencies to you, this one gives you total control over the frequencies.
For example, you can simultaneously reduce a small range of frequencies centered around 1000 Hz, boost a broad low-frequency shelf starting around 80 Hz, and insert a 60-Hz notch filter.

Mastering: This effect of optimizes audio files for a particular medium, such as radio, video, CD, or the web.
Before mastering audio, consider the requirements of the destination medium. If the destination is the web, for example, the file will likely be played over computer speakers that poorly reproduce bass sounds. To compensate, you can boost bass frequencies during the equalization stage of the mastering process.

Phaser: This effect is similar to flanging, it phasing shifts the phase of an audio signal and recombines it with the original, creating psychedelic effects.
But unlike the Flanger effect, which uses variable delays, the Phaser effect sweeps a series of phase-shifting filters to and from an upper frequency.
Phasing can dramatically alter the stereo image, creating unearthly sounds.

Vocal Enhancer : This will quickly improve the quality of voice over recordings.
It reduces sibilance and plosives, as well as microphone handling noise(low rumbles).
It will give vocals a characteristic radio sound.
The Music mode optimizes soundtracks so they better complement a voice-over.

Saturday, 13 October 2012

Week 3: 12/10/2012


This week in our lecture we went through a Powerpoint which was about Digital Processing. I collected a copy of the Powerpoint on to my pen drive so that I can go through it later in my own time and read it. We also had a multiple choice question paper today.

To start the day off we discussed a typical digital signal processing system. We discussed the steps it takes between recording sound and getting in into an electrical format, editing it, and then getting it back out again. Below is a diagram that I created showing the process.


The steps.
1. The signal is passed in via a microphone or other recording equipment.
2. The recording is then converted from analogue to digital (into binary numbers).
3. Editing is then done to the digital copy. E.g filtering, pitch warp, echo etc.
4. It is then changed from digital back into analogue.
5. It is then smoothed out.
6. The recording is passed back out edited.

The system cannot understand analogue signal so that is why they must be converted first and naturally we don't understand binary so must be converted back again. 

Next we spoke about why we would use digital processing, and the three main reasons are: 
Precision
Robustness
Flexibility 

Precision: The precision of the Digital Signal Processing System is, in theory, only limited only by the conversion process at input and out put (analogue to digital and digital to Analogue).
In practice, sampling rate (sampling frequency) and word length restrictions (number of bits) modify this.
However if the operating speed and word length of the modern digital logic is increased, this allows more areas of application.

However the increase operating speed and word length of modern digital logic is allowing many more areas of application.

Robustness: The Digital Processing Systems robustness is shown clearly when it is compared to the Analogues System. The Digital System is less susceptible to electrical noise (pick-up ) and component tolerance variations due to logic noise margins. 
Adjustments for electrical drift and component ageing are essentially removed; This is important for complex systems.
Inappropriate component values can also be avoid with the Digital System. E.g. Very large capacitors or inductors for Very Low Frequency filtering.

Flexibility: Due to the flexibility of the Digital Processing System, its programmability allows it to be upgraded and have its processing operations expanded easily without necessarily incurring large scale hardware changes.
Practical system with desired Time Varying and/or Adaptive characteristics can be constructed. 
All of this can only happen if a sound card is working and being used.


We also learned about sampling a signal, this is when the system samples the signal at a time, nT seconds. It then samples the signal ever period after that, T seconds. 


The rate that a signal is usually sampled at is double the frequency of the human hearing range. i.e. a signal heard at 10 Hz would be sampled at 20Hz.


Most modern sound cards support a 16 bit word length coding of quantised sample values. 
 This allows a representation of 2^16 (65536) different signal levels within the input voltage range of the card. Below is an example of this:


Below shows an example of quantising the signal amplitude. It would be rounded to the nearest value as shown. The red squares being the amplitude and the grey showing the quantised samples amplitude.



The Dynamic Range is the ratio of the largest signal amplitude to the smallest. Since a 16 bit word length allows 2^16 (65536) different siganls, the dynamic range (DR) is calculated as

DR = 20log([Voltage range]/[Quantisation step size]) d/b
DR = 20log(2^16) dB
DR = 96 dB

As the human ear has a dynamic range of greater than 120 dB, even “CD quality” reproduction has some compromise. 

After the lecture we went to the Lab where we all took part in a test to assess how much we have learned in the last 2 weeks. I feel that I was able to answer most of the question but a few did have me stumped. Once I get my result back I will know what sections I will need to revise more thoroughly. I will not be too disappointed with my mark as this will be a good chance to assess what needs to be addressed now before its too late.

During this week I plan to search for links to help with the previous weeks work that help me to understand and possible but them in those weeks blogs.