SE251Ex:Differencing

From Marks Wiki
Revision as of 05:21, 3 November 2008 by Mark (talk | contribs) (7 revision(s))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

A simple differencing tool

Prerequisites

Description

Write a simple differencing tool that takes in two files, A and B, and outputs all lines in A that B doesn't contain and conversely all lines in B that A doesn't contain. For example, say file A.txt contains the following lines:

Hello there.
You're reading file A.
Nice hair,
Aye Aye.
Hello there.
Good bye.

And file B.txt contains the following lines:

Hello there.
You're reading file B.
Nice hair,
Hello hello.
Good bye.

Then the output of the program will be something like:

-- Lines unique to A.txt:
You're reading file A.
Aye Aye.
-- Lines unique to B.txt:
You're reading file B.
Hello hello.

Note that lines are considered equivalent if they contain exactly the same sequence of characters - e.g. although there are two occurrences of "Hello there." in A.txt they are essentially considered as one.

Exactly how the user specifies the file names is up to you: you can do it either by prompting through the console, or through command line arguments.

Tips

Realise this is effectively a set difference problem. The basic idea is to convert the lines from both files into the respective sets and compute the difference.

Optional requirements

Optionally, ensure that the order in which the unique lines are presented is consistent with the order in which they appear in the original file. In other words, below shouldn't happen because the line "You're reading file A" comes before "Aye Aye." inside A.txt; likewise "You're reading fileB" comes before "Hello hello" inside B.txt:

-- Lines unique to A.txt:
Aye Aye.
You're reading file A.
-- Lines unique to B.txt:
Hello hello.
You're reading file B.

To do this you may want to utilise the Chain data structure from SE251Ex:Chain, which is in one way a special kind of Set that retains the ordering in which elements were added.

Discussion

Here you can ask questions and discuss stuff