So I wanted to identify quickly character frequency in a text file and quickly throw this out as a bar chart.

As I enjoy python it made sense to code it in python. The bar chart uses the pyplot bits from matplotlib.

It was also important to import collections because dictionaries are unordered and the bar chart would not display in alphabetical order.

1
2
3
4
5
C:\Users\phillipsme\Desktop\python>python.exe charfreqency.py cipher.txt
OrderedDict([('a', 135), ('b', 16), ('c', 60), ('d', 45), ('e', 174), ('f', 37), ('g', 21),
('h', 32), ('i', 122), ('j',0), ('k', 15), ('l', 61), ('m', 30), ('n', 125), ('o', 110),
('p', 42), ('q', 2), ('r', 103), ('s', 116), ('t',168), ('u', 54), ('v', 21), ('w', 16),
('x', 4), ('y', 27), ('z', 1)])

There is a small function being used ‘charanal()’ which returns an ordered dictionary with the frequency of each letter.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#/usr/bin/env python
import sys,collections
from matplotlib import pyplot
try:
    filename=sys.argv[1]
    rawstring=open(filename, 'r').read()
    filteredstring=rawstring.lower().replace('\n','')
except: print "Usage: %s filename.txt" % sys.argv[0] ; sys.exit()

def charanal(string):
    specials=""
    for bad in range(256): if bad<97 or bad>122: specials+=chr(bad)
    for char in specials: string=string.replace(char,'')
    results={}
    for letter in 'abcdefghijklmnopqrstuvwxyz': results[letter]=0
    for char in string: results[char]+=1
    return collections.OrderedDict(sorted(results.items()))

orderedfrequency=charanal(filteredstring)
print orderedfrequency

pyplot.bar(range(len(orderedfrequency)), orderedfrequency.values())
pyplot.xticks(range(len(orderedfrequency)), orderedfrequency.keys(),ha='left')
pyplot.show()

barchart

Leave a Reply