graffy76
Civil/Environmental
- Sep 25, 2008
- 2
Hello,
I'm trying to develop a cost index for the purposes of estimating costs for construction projects. The data is run through a KMeans cluster analyzer I wrote in C++ / MS-Access.
The data I have has four essential components - the unit price of every contractor bid on every project (regardless of whether or not they won the bid) let in the state since 2003, the quantity of the payitem, the location (by county) of the project, and the date of the bid.
Quantity, location, and date all affect the unit price. The effect of quantity is pretty much fixed - it's inversely proportional to price. However, there is no discernible relationship between location / price or date / price. Some counties are more expensive than others and some lettings may have higher prices because of material or other conditions at the time.
The idea is that if I can provide a reasonably accurate index for payitems by location and date, I can better estimate unit prices. For example, say I'm using a payitem that is rarely used in one particular county, but shows up quite often in another. If both counties have similar "location" indicies (that is, prices tend to be equally higher or lower than the statewide average), then prices in one county may be considered equivalent to the other. The same goes for the contract lettings. If price levels for a payitem in a 2004 letting match those in a 2008 letting, then why shouldn't 2004 prices be just as valid as 2008? If nothing else, this should at least sort out the bids for lettings or counties with unusually high or low prices.
In the end, I expect a simple comparison of price and quantity to do most of the sorting. If two separate contracts each have 10,000 feet of paint striping, and contractors bid similar unit prices on each contract, then it doesn't really matter what county or date they were bid - they are equivalent bids. However, if I happen to have different unit prices for the same quantity of an item (say $5 / foot and $10 / foot each for 10,000 feet of paint striping), then there must be a reason for the difference - maybe because of material availability by area or material costs at certain times.
I have much more to say, but I wanted to at least pose the problem for comments before I start describing how I've tried to approach the solution.
I'm trying to develop a cost index for the purposes of estimating costs for construction projects. The data is run through a KMeans cluster analyzer I wrote in C++ / MS-Access.
The data I have has four essential components - the unit price of every contractor bid on every project (regardless of whether or not they won the bid) let in the state since 2003, the quantity of the payitem, the location (by county) of the project, and the date of the bid.
Quantity, location, and date all affect the unit price. The effect of quantity is pretty much fixed - it's inversely proportional to price. However, there is no discernible relationship between location / price or date / price. Some counties are more expensive than others and some lettings may have higher prices because of material or other conditions at the time.
The idea is that if I can provide a reasonably accurate index for payitems by location and date, I can better estimate unit prices. For example, say I'm using a payitem that is rarely used in one particular county, but shows up quite often in another. If both counties have similar "location" indicies (that is, prices tend to be equally higher or lower than the statewide average), then prices in one county may be considered equivalent to the other. The same goes for the contract lettings. If price levels for a payitem in a 2004 letting match those in a 2008 letting, then why shouldn't 2004 prices be just as valid as 2008? If nothing else, this should at least sort out the bids for lettings or counties with unusually high or low prices.
In the end, I expect a simple comparison of price and quantity to do most of the sorting. If two separate contracts each have 10,000 feet of paint striping, and contractors bid similar unit prices on each contract, then it doesn't really matter what county or date they were bid - they are equivalent bids. However, if I happen to have different unit prices for the same quantity of an item (say $5 / foot and $10 / foot each for 10,000 feet of paint striping), then there must be a reason for the difference - maybe because of material availability by area or material costs at certain times.
I have much more to say, but I wanted to at least pose the problem for comments before I start describing how I've tried to approach the solution.